OKAY | Engineering productivity can be measured

From each of our two experiences starting out as introductory-level engineers at Box, to becoming first-time managers overseeing five-person teams, then directors overseeing 30-50, and ultimately VPs managing hundreds, we've experienced software engineering from every angle.

At every step of the way, we asked ourselves: "how do we know if we are bringing value as engineering leaders?" Effective leadership uniquely blends human qualities - influence, empathy, courage, and results. This latter quality always brings the question of productivity - how effective are you at producing results through enabling others?

Since the advent of the software industry, most engineering teams have seen productivity as a black box. Only recently have people even begun to build internal tools that optimize performance. Unfortunately, most of these tools measure the wrong metrics and are shockingly similar across companies. We even built some of these tools and made mistakes. Now, we'd like to share the way forward.

Engineering Effectiveness is Behind the Times

Engineering teams are both the most expensive and most fundamental part of tech companies. As more companies become tech-enabled, the importance of engineering will only increase. Yet today, a full three decades after the advent of the internet, most engineering departments still rely exclusively on qualitative signals of performance.

The evolution of engineering effectiveness is paralleling the spirit of sales' recent transformation. In the early 2000s, sales was considered an art, so sales leaders could skate by on charisma alone. Today, with tools like Salesforce, Clari, and People.ai, sales has fully executed the transition toward scientific, metric-based leadership for analyzing and improving performance.

In the coming years, engineering will adopt a similar, data-driven mode of management. In making this transition, however, most engineering teams are making these mistakes:

Mistake #1: Measuring Approximations of Output

In their haste to become more data-driven, many engineering leaders are measuring their team's performance based on metrics intended to approximate output. These metrics fail because they encourage engineers to game the system.

If you measure a fixed metric like lines of code or number of tickets closed, your engineers will begin splitting code into more lines or breaking bug fixes into multiple tickets. Even sprint points, which attempt to convert engineering work into a standard unit, suffer from this pitfall: Some engineers will slow down after reaching their sprint points for the week while others will strategically inflate their tasks to be awarded additional points.

Experienced engineers will recognize that this type of measurement is fake. While they may not be as influenced to "play the game," their morale will plummet and they will self-select out. By rewarding approximated metrics of output, you're encouraging engineers to increase them regardless of how they correlate with software development success.

No matter which metrics you choose, abstract output approximations will distract engineers from their actual jobs, ultimately decreasing both your team's effectiveness and morale.

Mistake #2: Not Measuring Anything

On the other end of the spectrum sit engineering leaders who avoid measurement entirely. Many of these leaders have heard about the dangers of measuring the wrong metrics and therefore ricochet to the other extreme. They may emphasize the artisanal and social dimensions of engineering, claiming "software engineering is too complex to measure."

Non-measurement can even be self-reinforcing, because it places the leader into the position of being "the good guy." Instead of being a metrics-obsessed big brother, the leader can be the friendly older brother exclusively focused on keeping their team happy.

Non-measurement fails because it prioritizes politics over productivity. If you don't measure any metrics, your engineering leaders will simply justify failures, telling stories like "The customer didn't give us the right requirements" or "We were surprised by unexpected vacations." Software development is complex enough that something always goes wrong; if you don't measure any data, you're at the mercy of individual stories.

Non-measurement unfairly rewards people with charisma while productive but less-persuasive engineers wallow in frustration.

At some point, your top performers will see through these political machinations and quit because your culture lacks accountability.

In the short term, non-measurement can have a positive effect on team morale but it destroys morale in the long-term, especially among high performers.

The Solution: Measure Blockers at the Team Level

Instead of measuring some approximation of engineering output, software teams should measure actual, observable metrics that directly correlate to effectiveness.

Productivity is a relationship between inputs and output. In software development, the inputs are a blend of factors--technical, individual, human, etc.--while the output should be functional software that creates value for customers. Productivity in engineering therefore naturally increases when you remove the blockers getting in the way of your team.

Why You Should Measure Blockers

Even at the beginning of the software revolution, there existed the notion that engineers should be nurtured. Starting with Microsoft in the '80s, tech companies gave engineers free resources (like food and gyms) that would remove blockers to their work. Empirical management practices over the last forty years have reinforced the importance of "servant leadership" and "unblocking your team," while recent research emphasizes the importance of optimized inputs and best practices.

For engineers, these inputs include:

Quality of developer tools
Frequency and quality of internal activities (like meetings or code reviews)
Focused maker time (free from disruptive meetings)
Easy access to documentation
Psychological safety on the team
Work-life balance
Presence of other high-performers
A fair system of rewards

The blockers to these inputs already exist and can be quantified, such as:

How much free, uninterrupted time does an engineer have to code?
How long is an engineer waiting on a response from another engineer's review?
How often do dev tools get in the way instead of helping accelerate work?
How often are engineers required to context switch, preventing deep work?
How often do engineers receive pages outside of business hours, interrupting their sleep or family life?

An engineering leader exists to enable their team to achieve their goals. Together, these quantified blockers allow engineering leaders to answer key questions like:

What is preventing the engineers from building faster?
What issues are arising in real time?
What technology or process investments would increase team engagement?

Each engineering team is unique, so its blockers will be specific. It's not so simple as "more maker time is better." If your engineering team is new or temporarily misaligned on key goals, more meeting time might be the answer. What never changes is the need for measurement and well-considered, deliberate decisions.

Over the last year, COVID-19 has helped demonstrate the value of measuring blockers. For many leaders running newly remote teams, new blockers have arisen that would never have been noticed if managers were only focusing on the desired outcome.

If your team is full of competent, driven engineers, removing their blockers is the fastest way to enable forward movement.

Why "Team" is the Right Level to Improve

Since software development requires complex interaction between team members, it would be inappropriate to assign individuals their own metrics: Some engineers are effective individual contributors while others enable their teammates to perform. Engineers also hate being micromanaged, so tracking individual activity can make them feel untrusted.

Just as a sports team wins or loses together, so too should the engineering team be treated as the fundamental unit of success.

Approaching engineering at the team level also places the proper accountability on the manager. It raises helpful questions like "What behaviors, structures, and work habits are preventing us from succeeding?"

Looking at the team level also enables managers to catch blockers as they evolve. Small issues, for instance, may not be apparent when the company is young, but can evolve into 10,000 papercuts only apparent at the team level. If code reviews take days instead of hours, at first one engineer complains, then two, then three... and if you don't pay attention, years later the engineering culture is shot. By compiling these small quantifications and observing their trends, a manager can understand whether a report is one individual's experience or truly relevant to the overall performance of the team.

Perhaps most important, if blockers follow a constant evolution (e.g. one person is often the canary in the coalmine), an engineering leader can map how the new blockers are likely to evolve and prioritize which should be solved first.

Just as an airplane pilot must monitor dozens of different metrics to keep the plane flying, so too would an engineering leader benefit from viewing their team's metrics to understand overall performance.

In an optimal engineering dashboard, a leader would be able to assess the blockers that prevent the ultimate success of their team.

Toward Engineering Effectiveness

"Productivity" is an appropriate measure for someone making widgets at a factory: "How many products did you produce in an hour?"

Engineering should instead be about effectiveness: "How able is this engineer to effect positive impact?"

Looking forward, engineering effectiveness will have three parts:

Measuring the experience of engineering teams in their most frequent activities. (Think distributed tracing, but for human activities.)
Using the results of those measurements to quantitatively improve the developer experience.
The final result will take care of itself. (As Jeff Bezos or legendary football coach Bill Walsh would agree, through a better usage of inputs, the output will improve.)

If you're an engineering leader, here are a few tips to get started right away:

Look at your team's calendars:
- Do people have enough blocks of uninterrupted time to execute complex and intellectually rewarding tasks?
- How much of the meeting load was created by you, the engineering leader?
- Are there recurring meetings that could be done asynchronously?
- Are the same people always asked to perform all the interviews?
Check your team's Git logs:
- Are most code reviews taking less than a day or can you find very long or even stale code reviews?
- Are remote parts of the team getting the same review treatment or do they have to wait longer than their local counterparts?
- Are people adjusting to long review cycles by opening many tracks of work at once (which is bad for focus)?
Survey your team:
- Are they happy with the tooling or is it getting in their way?
- How do they feel about pages and alerting? Is it affecting their personal lives?
Finally, turn all these questions into metrics.
- Pick goals and commit to improving the most urgent blockers.
- Share the dashboard of these metrics with the team.

In all these cases, the right solution will be up to the manager. For example, if meetings are the issue, perhaps they should be later in the day, more clumped together, or eliminated altogether: once you've found the blocker, it's the manager's job to improve.

By quantifying blockers and focusing on the team, engineering leaders can transform their department into a data-driven practice that enables everyone's success.

Engineers will become more focused and engaged, managers will become more effective and empathetic, and companies will build faster with higher quality. Engineering will rise to a whole new level.