Measuring developer productivity? A response to McKinsey 2

Aug 31, 2023

The consultancy giant has devised a methodology it claims measures software developer productivity. They only measure activity, not productivity from a business perspective. And measuring activity comes with costs & risks they do not address. Here’s how we think about measurement. Part 2. (Gergely’s version is here.)

In the first part of this two-part article, we covered:

A mental model of the software engineering cycle
Where does the need for measuring productivity come from?
How do sales and recruitment measure productivity so accurately?
Measurement tradeoffs in software engineering

We now wrap up this topic with:

The danger of measuring only outcomes and impact
Team vs individual performance
Why does engineering cost so much?
How do you decide how much to invest in engineering?
How do you measure developers?

The danger of only measuring outcomes and impact

So far, sales and recruitment have been on something of an accountability pedestal, as both capture team performance and individual performance with indisputable metrics. However, we have seen the dark side of only focusing on measuring – and rewarding – outcomes and impact: that people game the system for their own benefit in ways that defeat the purpose of measurement, and ultimately disadvantage the business by generating junk data.

Below, Kent shares what he’s seen happen when the only thing that matters is sales quotas:

“Individual goals discourages people from working together, as everyone is chasing their own quota. And this can lead to missed opportunities.
For example, take the opportunity to close a mega-sale that spans several regions. For this, multiple sales representatives must work together, and the sale would be credited to the representative who discovered the opportunity. But other sales folks are incentivized not to help. The company loses an attractive prospect without knowing it. Individual incentives can work against the long-term profitability goals of the company.
I saw firsthand how a ‘star’ salesperson operated. They always hit their quota, and collected a hefty bonus. How did they do it? Well, they knew that sales could fall through any time, so they always had several prospects ‘in the pocket’ who they could convert anytime, and so put them off until the end of each quarter. They only converted them if they needed to hit their quota. In order to maximize personal gains, they did not maximize the company’s gain.”

In the area of recruitment, Gergely experienced how rewarding recruiters by candidates closed, can backfire later:

“I once worked with a recruiter whom other hiring managers raved about. This recruiter had a 100% close rate. Closing means that when a candidate gets an offer, we get them to sign. Back then, my part of the organization was so stretched that recruiters did most of the closing conversations and took care of details. Most recruiters closed 70-80% at most. I was told this recruiter is a ‘rockstar’ among their peers.
I discovered in a painful way how they did it. About 6 months after this recruiter left, performance reviews and bonuses time came around. Several engineers in our group complained about their bonus amounts, saying they’d been “guaranteed” 10x the amount then they actually got. After some digging, all signs pointed to the rockstar recruiter; they’d made verbal promises to engineers in private settings, which were outrageously untrue.
This recruiter focused on outcomes, and unwritten – as well as written – rules. It took us managers months to sort out the mess, and left engineers feeling tricked and with declining faith in the company.”

Measuring outcomes and impact is important, but there must be checks and balances which ensure outcomes are reached the right way. In the end, this is exactly what a healthy company culture is about. In contrast, in a “cutthroat” or toxic culture only easily-measurable outcomes and impact matter – and ends always justify means. A healthier culture takes outcomes and impact into account, and will curtail the rewards of outcomes achieved in unprofessional ways, or ways that don’t consider collaboration or the bigger picture.

Team vs individual performance

What’s more important, team performance or individual performance? Sport provides a pointer, as an industry where individual performance can be measured quite accurately.

Take soccer as an example. There are many examples that prove team performance trumps individual performance. A team with objectively worse players can beat an opponent with more talented players by playing as a team. This was resoundingly proved when Greece won the Euro 2004 international soccer tournament with a squad ranked the 15th most likely to triumph from 16 national teams taking part. The secret behind this success? The documentary King Otto reveals it came down to teamwork, playing to players’ strength, and an outstanding German coach, Otto Rehhagel.

It’s common for teams filled with star players to struggle for success, despite possessing individuals with objectively superior skills. The Spanish club team Real Madrid proved this with its “Galácticos” recruitment policy in the early-mid 2000s, where superstar players were signed but the team regularly failed to win trophies.

We see similar dynamics in software engineering: teams punching well above their skill and experience level by working together, morale being high, and a manager with the right intuition. I’ve also seen a team filled with senior-or-above engineers that struggle to deliver expected outcomes, suffer low morale, confused direction, as well as poor management and leadership.

Let’s look at another sport, ice hockey. This uses an interesting statistic called “plus-minus:” which measures a player’s goal differential, which captures how many more goals a team scores and how many fewer it concedes, when that player is on the ice. It’s a sort of “contribution to the team success” indicator, and is useful for identifying players who make a team much more efficient.

Could we find a kind of “plus-minus” indicator for software engineers? If such an indicator existed, it could be worth measuring. However, a five-on-five hockey game and a software engineering project consisting of 5-10 engineers, designers, testers, and product specialists is very different. Hockey teams play games weekly, there’s strict time limits; and the terms of victory are very clear: score more. In contrast, software projects tend to last much longer, may have no time limit, and there’s no simple scoring system.

Individual performance does not directly predict team performance. And if it’s not possible to deduct team performance from individual performance in a domain that’s as easy to measure as sports, then we can expect even less success in software engineering.

Team performance is easier to measure than individual performance. Engineering teams track performance by projects shipped, business impact, and other indicators, similarly to how sports teams track performance via numbers of wins, losses, and other stats.

Why does engineering cost so much?

The question of “why does engineering cost so much” is one that will be a surprisingly frequent one. Here’s a suggestion on how to tackle this specific question:

Imagine a world where the company spends 0% of its budget on engineering. I know: it’s absurd. But do this. What would this mean for the company? What would customers experience? How would business trend?
Now, imagine where the company spends 100% on engineering, and 0% on everything else. What would happen?
Now that we know the two extremes: what percentage of the overall budget does the company spend on engineering? With this number: the decision is what would happen if we moved this number down by a few percentages, or increased it by a few percentages. Which approach would benefit the business more, and why?

This exercise turns the question from “why does engineering cost so much” to a comparison exercise, where the decision is whether to reduce or increase engineering spend by $X, versus make this investment – or reduction – with another area.

How do you decide how much to invest in engineering?

Another common reason that C-level executives want to measure the productivity of engineering is because they want to get a sense of how much it’s worth to invest further into engineering – versus allocating the planned investment in, say, sales, or marketing.

The real question an executive is asking, however, is not about productivity. The question is: “How much should we invest into engineering?”

To answer this question, consider that software engineering results are unpredictable, especially measured on a small scale. Sure: there are industries where you know exactly what you want to build, and engineering is merely an execution game. But at companies where engineering innovates: the decision on what to invest is more akin to how oil companies decide where and what to invest.

Oil companies don’t know for sure how much profit an oil drilling operation will build. So they make smart investments. It’s impossible to tell if any single exploratory drill will uncover a new and profitable oil field: so they fund several of these at once; expecting that some will, eventually bring promising results. Armed with more data, bigger investment decisions are then made.

It is a pragmatic approach for engineering leaders – and executives – to approach investing in software engineering, as a research and development activity, in a similar way. Place small but inexpensive bets: and double down on the ones that show tangible promise.

How do you answer the question?

Kent here.

Be clear about why you are asking & what your power relationship is with the person or people being measured. When the person with power is measuring the person without, you’re going to get distortions.
Self-measurement for self-improvement is great! My test-driven development book includes an analysis of the delay between test runs as I developed the sample code for the book. I found that experience enlightening.
Avoid perverse incentives by analyzing data at the same level as you collect the data. I can analyze my own data. My team can analyze its own aggregated data.
If you choose to create incentives (broadly defined—money, status, promotion, autonomy) around measures, know that you will never again receive accurate data.
Red Team any incentives. What would a clever (you hired them, of course they are clever), motivated person do to make the numbers look better. How much damage could they cause in the process.
Get comfortable with your own judgement. Executives relying unnecessarily on “data” seem to be running from responsibility. Grasp that nettle. Ask for explanations that make sense to you. Then own your own conclusions.
Measure developer productivity? Not possible. There are too many confounding factors, network effects, variance, observation effects, and feedback loops to get sensible answers to questions like, “How many times more profit did Joan create than Harry?” or, “How much better (or worse) is Harry doing this quarter than last?” You can measure activity, but not without directing that activity away from the ends you care about. And your customers. And your investors.
Be suspicious of anyone claiming to measure developer productivity. Ask who is asking & why. Ask them what unit they are measuring & how those units are connected to profit.
I am 100% pro-accountability. Weekly delivery of customer-appreciated value is the best accountability, the most aligned, the least distorting.

A guest post by

Gergely Orosz

Big Tech and startups from the inside. Especially relevant for software engineers / AI engineers, useful for anyone working in tech.

Bob Batcheler

I am generally HIGHLY resistant to paying for content subscriptions. It’s not that I am cheap; it’s just that I love / want so much variety in content that it can become extremely expensive. But, a week ago, I took a paid annual subscription to Kent Beck’s Substack because I have so much regard for his thinking and writing over the years. This two-part post is worth the entire year’s subscription! Thank you, Kent!

1 reply

Ashley Engelund

Changing the perspectives so that software development & delivery is viewed more as a series of exploratory attempts is a massive organizational culture shift and likewise means a radical change in perspective for most managers. [To be clear, it's a perspective I totally agree with.] The word "engineering" has for so long been synonymous with cut-and-dried, precisely measurable things. I suspect that word in "software engineering" is an obstacle in changing the perspective. I don't have an alternative and would love to hear ideas. Even if the alternatives are 'only' used to introduce and explain these issues to C-levels and managers, etc.

I've been along for this ride in software development ("engineering") since the late 80s, always with a strong interest in the very human side of things. (To the extent that I also majored in psychology.)

Humans and our systems (cultures, societies, communities) are what makes software development complex: the interactions, communications, and -- as these 2 posts show -- the very difficult realm of performance and rewards (which means some sort of measures) are, in my not-so-humble opinion, vastly more complex and difficult than any technical endeavor.

"You get what you measure" is such a fundamental mantra in science -- and especially psychology. But C-level folks and managers are most often measured on short-term metrics (e.g. quarterly stock value or profits) and so that is what they are incentivized to maximize. (Not to mention many don't know of or understand this fundamental concept.)

In reading this, I am struck by other areas where performance is also difficult. For example: measuring the performance of teachers in US public schools. (Teacher performance is affected by a large number of forces beyond their control, namely the capabilities of each individual student that happen to land in their classroom in any given year, the number of students they must teach, etc.) Or measuring the performance of individual health care workers (ex: nurses in a hospital ward).

Extreme examples abound of gaming the system because of ill-thought out rewards. Ex: The US bank Wells Fargo rewarded employees for opening new accounts. So some employees figured out how to open accounts for real people without their knowledge or consent. (And were ultimately discovered and sued in court.) In Georgia, teachers were rewarded based on standardized testing scores. Teachers altered student answers on tests in order to improve test scores. This scheme was uncovered and ultimately many were taken to court. https://en.wikipedia.org/wiki/Atlanta_Public_Schools_cheating_scandal

Those examples show just how much pressure there is to both come up with metrics and how often that leads to really awful metrics and thus awful incentives and awful results.

- - -

[As a slight aside, I am curious if there are examples other than sports that can also be used as examples where there are metrics for individuals, individual contribution to team performance, _and_ overall team performance. Perhaps there aren't examples that would communicate the ideas so clearly. Perhaps there aren't examples that are so universally understood.]

7 more comments...

Software Design: Tidy First?

Discussion about this post

Ready for more?