Since the mid-to-late 1980’s, the idea of measuring industrial processes in order to improve their efficiency and quality has been growing in popularity. “You can’t improve what you can’t measure,” was adopted as the catchphrase of such movements as Six Sigma and Total Quality Management. This, in itself, is an unassailable assertion and entire nations (notably Japan) have risen in their fortunes by adopting such notions.
But… when you’re writing software what should you measure?
In the manufacturing disciplines, this isn’t usually such a difficult question. You can easily measure quantities such as number of items assembled per unit time (cars rolling off the assembly line per day), physical tolerances in assemblies (3 microns variance across all LEGO bricks), injuries per day on the work-line, percentage of product returns, etc. When I say “easily” I mean three different things:

- The measurements themselves are not difficult to capture (just count items, run a stopwatch, use calipers, etc.).
- I also mean that driving the numbers in the right direction makes sense. There used to be 7 injuries per month. I institute a new policy or change the work-line. Now there are 2 injuries per month. Win!
- I also mean (and this is trickier) that the behavior driven by the attempt to improve the values under test turns out what you were actually looking for. More on this later.
So… let’s take a look at a few things we might want to measure in a world-class software development organization.
To answer this let’s start with a (hopefully) easier question: what do we want to improve?
Here are a few things most people wouldn’t argue with:
- Fewer defects introduced in new code
- More defects repaired per unit time
- More working code per unit time
- “Better” code
Given these, a well-intentioned engineering manager might decide to count (and reward) the number of bugs fixed per person per week.
We want to fix those bugs, and this sounds great… but here’s what happens:
- All of a sudden we’re seeing twice as many bugs fixed per week, but we’re also seeing twice as many bugs created per week.
- The unintended consequence was clear: if I’m being rewarded for fixing bugs, I’ll need more bugs to fix.
Be sure what you are measuring makes sense
Ok, well what about more working code per unit-time. That’s what we’re being paid to build, right? So, if efficiency is making more of what you’re supposed to make per hour, then this sounds like a good idea. What would we expect to see if lines of code were encouraged and rewarded? We would see a large increase in the number of lines of code written, yes… but is that what we want? It turns out the software efficiency, maintainability, and reusability are all generally improved and denoted by the uses of fewer lines of code. This would actually encourage the exact opposite of what we’re trying accomplish!
The bottom line, here, is not that we shouldn’t attempt to measure aspects of the software development process. Rather: it is that software development is not the same as other manufacturing processes. We should indeed measure things, but we need to choose carefully what we’re measuring.
Here are a few examples of better things to measure:
- Deviations from software best practices. Much of this can be automated and reduced via code reviews and required compliance.
- Cyclomatic complexity (e.g., require splitting of modules where CC > 10).
- Efficiency of important processes. It’s relatively easy to measure things like run-time of particular processes or number of messages handled per unit time.
- Number of error log messages generated per unit run time. (note: be careful not to just reward the removal of useful log messages!)
Software is (or at least should be) an engineering discipline like any other, but we need to be vigilant about how we try to improve it. It is not the same as assembling transmissions or harvesting potatoes. So, when considering metrics for software quality: be careful what you measure!”



