How to do Internal Tooling Efficiently
The way I see our role in developer productivity is to abstract and remove a lot of the "Netflix-isms" or things unique to Netflix's developer ecosystem so that developers can get up to speed in a very performant; effective, and secure way. Everything we create is called a Paved Road. However, due to the Netflix culture that leans very heavily into freedom and responsibility, our customers (internal developers) are not mandated to use these things because we trust their judgment. If we don't hit the 85% use case that our customer needs, they can choose to use something else.
What Kind of Internal Tooling Should You Develop?
If Productivity is a central organization (like it is in Netflix), it just doesn’t scale to build and/or operate every internal tool needed by developers. After all, Productivity already has a high product load that they need to maintain and KTLO (Keep the Lights On) in addition to feature work and tech debt.
Therefore, we have more of a federated model where we support our developers to leverage our platforms, tools, and infrastructure.
At Netflix, the Develop pillar in Productivity builds, owns, and operates tools, frameworks, platforms, and services that are either very high leverage or used more broadly across the organization, while the Delivery Engineering and Observability Engineering pillars provide tooling to enable developers to better deploy and operate any tools they built themselves.
In this way, we are more of an enabler in that we make it easier for people to build, test, own, and operate their software.
Who maintains internal tooling?
One of our guiding principles is that you operate what you build, but we provide the tools and the infrastructure to make that easier to do so. This is where Delivery Engineering and Observability Engineering come into play.
The Delivery Engineering team provides tools to help set up the right profiles and CI/CD.
Then the Observability Engineering team provides tools to monitor the absolute health within the system, so it's easy to identify and troubleshoot issues.
Build vs. Buy for Internal Tooling
The main factor in the build vs. buy debate is how anomalous the things you are doing are. The companies that need to build tooling are usually pushing the envelope in terms of things that conventional third party is not attuned to e.g., scale or latency. For example, at a streaming company like Netflix, we have bespoke tooling around reducing latency as we need to be able to constantly test the network at any given time. Other companies that build their own tools are very large, well resourced, or have been around for a long time. Companies that started a while ago had to build their tools from scratch as there was nothing on the market to meet their needs at the time. However, just because your company always used a bespoke tool doesn’t mean they always have to.
When to Buy: Re-assessing the Need for Bespoke Tools
It is often more sustainable to buy tooling if there is something on the market that can fulfill the need. It is also important to constantly evaluate whether the cost of owning and maintaining your own tooling is worth it. Sometimes it's better to go with an open sourced or enterprise alternative, especially if the current tool is holding your team back or your heart lies elsewhere. This is particularly true if the industry/market has surpassed the capabilities of a tool you once developed.
Why You Should Speak To Vendors About Your Use Case
Another thing to do is to work with vendors to adapt their tools to fit your needs. Many vendors like Google Cloud or AWS go out of their way to work with bigger customers to tailor their offerings to the needs of their customers. Depending on the size of your user base or market cap, these other companies will want to have your input on their roadmap. You can partner very closely with many of these other companies and say, ‘We need this, and they'll say, ‘Sounds good. We'll get it in there.’
Try to Buy So You Can Learn What You Need to Build
It is always important to test and learn. To bootstrap an effort, you can Frankenstein something by gluing together third-party tools. Then you can analyze the friction points and decide, with some effort, that can work. Otherwise, you’ll be confident going your own way and truly understanding the tradeoffs.
How To Provide Tooling That Supports Multiple Languages
Tooling can get complex when you support a workforce that codes in multiple languages. At Netflix, it's about ‘finding the right language for the right job’ and ‘context, not control.’ This means although a lot of our historical support for productivity has been through backend services, i.e., Java, our developers code in a variety of languages.
The Productivity team at Netflix tries to pave the road for our primary languages or build out as many generalities as possible, but we cannot serve everyone. However, we have people passionate about particular languages across Netflix who have banded together to provide a community-supported model. So in some cases, we create tooling that can be extended or built out for specific app types or languages. This is again federated. In some cases, we rely on community teams to provide more bespoke or ad hoc language support.
We regularly discuss what tech stack we should make part of the Paved Road versus more of a ‘remote landing strip on an island’ (to extend the metaphor). A tool we could use to decide this is a tech radar which is a radar chart to figure out what's the cost-benefit analysis of adopting versus not adopting.
One language we are evaluating right now is Python which is currently very robustly community supported. There is a lot of demand for Python when it comes to machine learning, personalization, etc. So it would be great if we could expand our tooling to include Python at some point.
How to Measure How Your Tooling Improves Developer Productivity
You need to be judicious about what you instrument up - after all, it would be a Herculean effort to go and instrument every tool you have. Focus your attention on tools that are highly leveraged, for example, a Java tool that many people use, and even within that tool, be thoughtful and only measure the things that matter most to you. e.g.,” we're going after this, so let's instrument this, and make this the theme of the next couple of semesters.”
But there are a bunch of different ways that you can measure things. And much of that comes through developer satisfaction, not Net Promoter Score. I mean, that's helpful, but it's not the only thing.
NPS (Net Promoter Score)
Although NPS is helpful when assessing satisfaction in general, it might not be the best method for assessing the satisfaction of internal teams. NPS excels with external customers because you have statistical significance with the number of people you're evaluating. But I have learned from experience that the Net Promoter Score for central teams that service internal customers are traditionally much lower when you compare apples to apples. The more technically educated the workforce you are surveying, the higher their expectations are, and the lower the score!
I like to use the SPACE method for evaluating developer productivity. SPACE stands for Satisfaction, Performance, Activity, Communication, and Efficiency and flow. We can measure satisfaction by asking:
- “Would recommend this is a place to work for other developers I have in my network?”
- “When I leave, would I get emotional about the tools I'm going leave behind?” We can also assess performance by asking questions like
- How quickly can somebody go from idea to pushing that idea out the door?
- How long does it take you to generate an application?
- What is the startup time performance?
“We have a lot of app gen code. So how do you get [it] up and running? How long does it take you to do that? There are a lot of dependencies - a lot of things that get brought down. And are you sitting there? Is it enough for you to go get coffee or do you have to go to lunch? “ -Kathryn Koehler, Director of Developer Productivity Engineering at Netflix
Customer Interviews and Insight from Customer Support
We have a bunch of feelers we use to get more crisp around the SPACE metrics such as customer interviews and anecdotes from Support. We might interview a developer and ask questions like
- Just walk me through your day
- Let me see what you're doing.
- Do you know how to do this?
Anecdotal evidence from support might be:
- What are the pain points of those who reach out to support? How bad is it - is it death by a thousand cuts?
- What are blockers for people?
You can also use surveys that ask pointed questions like
- How easy is this to use?
- Is it helpful?
- Can you discover the things that we own and operate?
- How good is our documentation?
- How good is our support?
Surveys are very subjective, though, every person will interpret the questions differently, and answers differ depending on when in the day they are answering the questions.
[Surveys] are very subjective and it depends on “has the person had coffee or not that morning” - Kathryn Koehler, Director of Developer Productivity Engineering at Netflix
DORA is great as it gives hardened operational metrics like deployment frequency (DF), lead time for changes (LT), mean time to recovery (MTTR), and change failure rate (CFR). Make sure you actually instrument for these metrics up front, though - many people say they will do it later, and it doesn’t get done.