Recruiters and sales people also game the system. If recruiter is measured by lead time and closure of positions, we end up hiring the below-average people to fill in the slots. And talk about system dynamics from there.
In fact, they can game the system more easily. You can close deals that can't be delivered, you can hire people that are not qualified; all of the consequences are delayed. User/client outcomes can't be easily gamed (you either address people's JTBDs satisfactorily or you don't). The challenge is that establishing the link between user/client outcomes and business impacts is not as straightforward as simply counting $ coming in.
They also need to be good hires that gel with the team to really make an impact. And we all know sometimes adding head count reduces ability to deliver value.
This is great! Measuring output is an artifact from the pre-digital industrial age, when output was a decent predictor of outcome. Operational risk was paramount. (If we can produce a black Model T efficiently enough, the middle class will buy.)
Entire companies are structured around optimizing efficiency of output. This makes measuring efficiency of outcomes difficult. Engineering teams may very efficiently build features customers don’t care about. This means you need other disciples involved. Company structure makes this difficult.
I think a common issue with these kinds of setups is that there's a split between product development which determines what to build and "engineering" who are supposed to build that.
One then wants to measure how effectively the engineers are at producing the product dictated.
Instead of forming one department that works together and are measured on how well the product performs.
"If you consider why a startup moves so fast and executes so well, it’s because they have to do so out of necessity, even if they do not measure this." - Given that 90% of startups fail (couldn't get specific numbers for software startups), I am not sure that most startups actually end up delivering positive customer impacts even though they might deliver a customer facing thing per week.
At the end of the day, Goodhart's law will always prevail.
I would like to agree with others that pointed out that measuring and rewarding of people work can lead to undesirable results in all fields not just in software engineering. Book "The tyranny of metrics" by Jerry Z. Muller has great examples (I like one where surgeons wouldn't operate patient with even small risk so they won't get bad grades).
Most often when people try to measure dev productivity, they use other fields and industries for comparison like engineering, sales, and support. I think this is what leads to the over arching confusion. I would love to see someone do a study on how to measure productivity in, say, a research and development field like medical developments, or even a field that is more artistic like writing or painting. In my opinion, those fields are much closer in nature to software development than the ones people typically use. It would be interesting to see, say, how a company measures the "performance" of a mystery novel writer or someone who writes screen plays, or how to encourage "better performance" from a research scientist who is trying to develop a new vaccine or a cure for some currently incurable illness. If one of those fields has ways of measuring their employees performance, then perhaps we would have a reasonable baseline of comparison for our industry.
Overall, great article by the way. I'm looking forward to reading part 2!
Measuring the performance of teachers in the US public school system is a fraught topic and I think has many parallels. There are definite differences. But the main point is that teachers have to deal with an enormous amount of factors beyond their control. There are fantastic (and awful) teachers, but what makes them so good (or so awful) consistently over a longer period is very, very complex.
Thank you for the article. Could you please explain why you put DORA metrics to outcomes? The fact we deployed smth could have no impact on customer behaviour.
It depends on where you draw the boundaries of “the system”. DORA metrics aren’t directly relevant to the business. I prefer to draw the largest boundary I have any hope of influencing.
Agree that DORA is not relevant to the business. Do not we have to place them to 'Outputs' in the framework? And for Outcomes we can choose smth more related to the business. Does it make sense?
That’s a reasonable boundary for the system. Where you draw the boundary depends on what you can influence. Doesn’t make sense to analyze the whole galaxy if you can’t affect it. Doesn’t make sense to focus on your own work if the constraints are all external to you.
Layoffs are at least an understandable use case for measuring "productivity". I don't have to like it, but at least I can see the "money out/money in" justification.
Completely agree, or as a friend would have said: "Measure business, not busyness".
I have worked with (outsourced) IT support who were measured on number of tickets closed and that was useless.
The incentive to close as many tickets as possible often resulted in no resolution or a bad one. Or even a closed ticket with a message telling me to open another ticket on another department.
Separate note: software engineers are often abstracted from outcomes and even more so from impact. Other disciplines are generally making decisions about what to build and how it should function.
Many engineers are simply asked to build the thing. I think at that point the outcomes ICs can be responsible is the output of their work: the thing they were asked to build. Whether or not is has actual business impact is generally someone else's decision to hold to account. From an impact perspective, the developer's customer is the requestor of the feature - the PM or PO or whathaveyou.
I'm not saying this is the way to do it, just kind if taking the temperature of this kind of system for making software. The engineering team in this type of system is NOT responsible for many business outcomes because they're not asked to deliver business outcomes. They're asked to deliver features.
Sadly, they're even less likely to be asked to create impact.
Thanks for sharing your point of view. Very insightful.
There are a few moments in this article that are confusing me though:
1. When talking about how HR metrics fit in the proposed mental model, you mention in the text that “number of heads filled” is categorized as impact but list it in the “outcome” box in the illustration. It feels more like outcome than impact so I am assuming there is an error in the text. Or am I missing something?
2. When you talk about the Space framework, you mention that the people behind it put a warning on “effort and outcome” metrics. Did you mean “effort and output” there? Some outcome metrics could need a warning as well depending on how people could game them though.
3. When you propose to measure the “please the customer once per team, per week”, you imply that this is an “output” metric, but it feels more to me like an “outcome” one. Can you enlighten me on why you’d consider this an output metric?
Nice read. Thanks for outlining the issues with measurements in the software industry, it's a big problem. In any industry.
With highly productive and skilled engineers we can very productively deliver products nobody buys or uses.
The productivity problem is so much engrained in our Taylorism style of management. McKinsey et al still fit into this style of management. We as management are incapable of learning to solve problems, therefore we hire consultants to tell us what to do.
If you want to measure (which is a huge investment for many), then measuring (the right) outcomes in combination with team behavior (e.g. collaboration). Put incentives only on (team) behaviors. Never on outcomes. Lean OKRs is a good framework (but I'm biased).
There might be many issues with the Mck article but they try to 'prescribe' a solution which can be operationalized. Unless the software / product industry comes up with better prescription CEOs will continue to listen to the consulting giants. Everyone understands it's important to measure outcomes, but how many orgs are able to do this at scale (it doesn't make sense to count start ups only).
I absolutely agree that measuring outcomes and connecting them to impacts is the way to go in the software business (Jeff Patton has been shouting that from the rooftops for years now). I also noticed that a lot of companies don't even know where to start, and articles like McKinsey's don't make it any easier. One thing I noticed about your article is the conspicuous absence of product management, which is directly focused on this outcome and impact correlation thing, including making sure that product strategy supports business strategy (impacts). I think that may be as much of a blind spot for many engineering leaders as the importance of focusing on teams rather than individuals in product development is for many non-technical senior executives (both are business leaders, and the very distinction between engineering and "the business" is at the root of the problem). I am not saying that engineers can't be great product managers, only that engineering and product management are two very big things that at any but the smallest scale call for a team of specialists working closely together, as everything else in product development.
I feel like you missed the two main reasons we care about developer productivity:
1. Developers want to be more productive. Part of the reason we got into the field is to make things better and more efficient, to solve problems faster using automation. That applies to our own jobs as well, so developers really don't like inefficient processes in their own workflows. Yet if we don't measure this it's hard to know where to focus on improving.
2. Teams want to be more productive and ship more valuable software. We want to add value to customers using our skills at engineering, so if we can do that more quickly that's great. If one team ships 2x as much as another team that could reflect some superior process the former team has that could be more widely adopted.
The problem only comes when you misuse these metrics. Thus it's the "becomes a target" part of Goodhart's law that is bad, not the act of measuring and trying to improve. As soon as we say "lines of code should go up 10%" or "your lines of code was lowest on the team so you're fired" we've created a problem, but that doesn't mean in general it's useless to measure lines of code. For example if we notice aggregate LOC across the company is trending down that could be a worrying sign that we've put in place some processes to slow down development that bear looking into.
In the original piece with @Gergely I also called out those motivations. The distorting uses of measurement, though, seem to come from a shadowy set of needs that no one seems to want to own.
Also, we can say "Goodhart's" until we're blue in the face but if people keep misusing measurement, it won't help. It's time to dig deeper.
Srikanth, you've raised a crucial point. The temptation to game the system exists in various domains, and the consequences can be significant. As we explore the complexities of measuring productivity, your insight underscores the need for thoughtful considerations to avoid unintended outcomes. It's a delicate balance between incentivizing performance and preserving the integrity of the outcomes. Looking forward to delving deeper into these dynamics in Part 2.
This is spot on, great job of balancing unequivocal criticism of the McK "article" (which is really a thinly veiled sales pitch), without getting petty.
Robert (Bob) Lewis has been writing about the perils of metrics for many years. Originally in InfoWorld and then his own online outlet. Here's a search of his content for "metrics" that yields dozens of articles: https://issurvivor.com/?s=metrics
Recruiters and sales people also game the system. If recruiter is measured by lead time and closure of positions, we end up hiring the below-average people to fill in the slots. And talk about system dynamics from there.
In fact, they can game the system more easily. You can close deals that can't be delivered, you can hire people that are not qualified; all of the consequences are delayed. User/client outcomes can't be easily gamed (you either address people's JTBDs satisfactorily or you don't). The challenge is that establishing the link between user/client outcomes and business impacts is not as straightforward as simply counting $ coming in.
Filling seats != better impact
They also need to be good hires that gel with the team to really make an impact. And we all know sometimes adding head count reduces ability to deliver value.
Agreed. Taking a systems perspective (both holism and pluralism) is key here, but that is unlikely to happen given some of the reasons laid above.
This is great! Measuring output is an artifact from the pre-digital industrial age, when output was a decent predictor of outcome. Operational risk was paramount. (If we can produce a black Model T efficiently enough, the middle class will buy.)
Entire companies are structured around optimizing efficiency of output. This makes measuring efficiency of outcomes difficult. Engineering teams may very efficiently build features customers don’t care about. This means you need other disciples involved. Company structure makes this difficult.
I think a common issue with these kinds of setups is that there's a split between product development which determines what to build and "engineering" who are supposed to build that.
One then wants to measure how effectively the engineers are at producing the product dictated.
Instead of forming one department that works together and are measured on how well the product performs.
"If you consider why a startup moves so fast and executes so well, it’s because they have to do so out of necessity, even if they do not measure this." - Given that 90% of startups fail (couldn't get specific numbers for software startups), I am not sure that most startups actually end up delivering positive customer impacts even though they might deliver a customer facing thing per week.
At the end of the day, Goodhart's law will always prevail.
We have a correlation/causation question here, multiplied by survivorship bias.
I would like to agree with others that pointed out that measuring and rewarding of people work can lead to undesirable results in all fields not just in software engineering. Book "The tyranny of metrics" by Jerry Z. Muller has great examples (I like one where surgeons wouldn't operate patient with even small risk so they won't get bad grades).
Most often when people try to measure dev productivity, they use other fields and industries for comparison like engineering, sales, and support. I think this is what leads to the over arching confusion. I would love to see someone do a study on how to measure productivity in, say, a research and development field like medical developments, or even a field that is more artistic like writing or painting. In my opinion, those fields are much closer in nature to software development than the ones people typically use. It would be interesting to see, say, how a company measures the "performance" of a mystery novel writer or someone who writes screen plays, or how to encourage "better performance" from a research scientist who is trying to develop a new vaccine or a cure for some currently incurable illness. If one of those fields has ways of measuring their employees performance, then perhaps we would have a reasonable baseline of comparison for our industry.
Overall, great article by the way. I'm looking forward to reading part 2!
My thoughts exactly.
Measuring the performance of teachers in the US public school system is a fraught topic and I think has many parallels. There are definite differences. But the main point is that teachers have to deal with an enormous amount of factors beyond their control. There are fantastic (and awful) teachers, but what makes them so good (or so awful) consistently over a longer period is very, very complex.
Thank you for the article. Could you please explain why you put DORA metrics to outcomes? The fact we deployed smth could have no impact on customer behaviour.
It depends on where you draw the boundaries of “the system”. DORA metrics aren’t directly relevant to the business. I prefer to draw the largest boundary I have any hope of influencing.
Agree that DORA is not relevant to the business. Do not we have to place them to 'Outputs' in the framework? And for Outcomes we can choose smth more related to the business. Does it make sense?
That’s a reasonable boundary for the system. Where you draw the boundary depends on what you can influence. Doesn’t make sense to analyze the whole galaxy if you can’t affect it. Doesn’t make sense to focus on your own work if the constraints are all external to you.
@Kent Beck
If developers productivity can't be measured then look at the implications:
- Promotion , PIP low performer's and Layoffs.
are done arbitrarily.
- organizations won't accept this publicly because they are scared of lawsuits.
Layoffs are at least an understandable use case for measuring "productivity". I don't have to like it, but at least I can see the "money out/money in" justification.
This is one example from a reputed organization but no one from that company will publicly accept this.
https://x.com/petergyang/status/1576985038511448064?s=20
You should do a similar post on Interviews.
Completely agree, or as a friend would have said: "Measure business, not busyness".
I have worked with (outsourced) IT support who were measured on number of tickets closed and that was useless.
The incentive to close as many tickets as possible often resulted in no resolution or a bad one. Or even a closed ticket with a message telling me to open another ticket on another department.
Separate note: software engineers are often abstracted from outcomes and even more so from impact. Other disciplines are generally making decisions about what to build and how it should function.
Many engineers are simply asked to build the thing. I think at that point the outcomes ICs can be responsible is the output of their work: the thing they were asked to build. Whether or not is has actual business impact is generally someone else's decision to hold to account. From an impact perspective, the developer's customer is the requestor of the feature - the PM or PO or whathaveyou.
I'm not saying this is the way to do it, just kind if taking the temperature of this kind of system for making software. The engineering team in this type of system is NOT responsible for many business outcomes because they're not asked to deliver business outcomes. They're asked to deliver features.
Sadly, they're even less likely to be asked to create impact.
Thanks for sharing your point of view. Very insightful.
There are a few moments in this article that are confusing me though:
1. When talking about how HR metrics fit in the proposed mental model, you mention in the text that “number of heads filled” is categorized as impact but list it in the “outcome” box in the illustration. It feels more like outcome than impact so I am assuming there is an error in the text. Or am I missing something?
2. When you talk about the Space framework, you mention that the people behind it put a warning on “effort and outcome” metrics. Did you mean “effort and output” there? Some outcome metrics could need a warning as well depending on how people could game them though.
3. When you propose to measure the “please the customer once per team, per week”, you imply that this is an “output” metric, but it feels more to me like an “outcome” one. Can you enlighten me on why you’d consider this an output metric?
Thanks again!
Nice read. Thanks for outlining the issues with measurements in the software industry, it's a big problem. In any industry.
With highly productive and skilled engineers we can very productively deliver products nobody buys or uses.
The productivity problem is so much engrained in our Taylorism style of management. McKinsey et al still fit into this style of management. We as management are incapable of learning to solve problems, therefore we hire consultants to tell us what to do.
If you want to measure (which is a huge investment for many), then measuring (the right) outcomes in combination with team behavior (e.g. collaboration). Put incentives only on (team) behaviors. Never on outcomes. Lean OKRs is a good framework (but I'm biased).
There might be many issues with the Mck article but they try to 'prescribe' a solution which can be operationalized. Unless the software / product industry comes up with better prescription CEOs will continue to listen to the consulting giants. Everyone understands it's important to measure outcomes, but how many orgs are able to do this at scale (it doesn't make sense to count start ups only).
I've been writing for decades about prescriptions that meet more needs.
I absolutely agree that measuring outcomes and connecting them to impacts is the way to go in the software business (Jeff Patton has been shouting that from the rooftops for years now). I also noticed that a lot of companies don't even know where to start, and articles like McKinsey's don't make it any easier. One thing I noticed about your article is the conspicuous absence of product management, which is directly focused on this outcome and impact correlation thing, including making sure that product strategy supports business strategy (impacts). I think that may be as much of a blind spot for many engineering leaders as the importance of focusing on teams rather than individuals in product development is for many non-technical senior executives (both are business leaders, and the very distinction between engineering and "the business" is at the root of the problem). I am not saying that engineers can't be great product managers, only that engineering and product management are two very big things that at any but the smallest scale call for a team of specialists working closely together, as everything else in product development.
> Who wants to measure productivity, and why?
I feel like you missed the two main reasons we care about developer productivity:
1. Developers want to be more productive. Part of the reason we got into the field is to make things better and more efficient, to solve problems faster using automation. That applies to our own jobs as well, so developers really don't like inefficient processes in their own workflows. Yet if we don't measure this it's hard to know where to focus on improving.
2. Teams want to be more productive and ship more valuable software. We want to add value to customers using our skills at engineering, so if we can do that more quickly that's great. If one team ships 2x as much as another team that could reflect some superior process the former team has that could be more widely adopted.
The problem only comes when you misuse these metrics. Thus it's the "becomes a target" part of Goodhart's law that is bad, not the act of measuring and trying to improve. As soon as we say "lines of code should go up 10%" or "your lines of code was lowest on the team so you're fired" we've created a problem, but that doesn't mean in general it's useless to measure lines of code. For example if we notice aggregate LOC across the company is trending down that could be a worrying sign that we've put in place some processes to slow down development that bear looking into.
In the original piece with @Gergely I also called out those motivations. The distorting uses of measurement, though, seem to come from a shadowy set of needs that no one seems to want to own.
Also, we can say "Goodhart's" until we're blue in the face but if people keep misusing measurement, it won't help. It's time to dig deeper.
Srikanth, you've raised a crucial point. The temptation to game the system exists in various domains, and the consequences can be significant. As we explore the complexities of measuring productivity, your insight underscores the need for thoughtful considerations to avoid unintended outcomes. It's a delicate balance between incentivizing performance and preserving the integrity of the outcomes. Looking forward to delving deeper into these dynamics in Part 2.
This is spot on, great job of balancing unequivocal criticism of the McK "article" (which is really a thinly veiled sales pitch), without getting petty.
Robert (Bob) Lewis has been writing about the perils of metrics for many years. Originally in InfoWorld and then his own online outlet. Here's a search of his content for "metrics" that yields dozens of articles: https://issurvivor.com/?s=metrics