Story Points Aren’t Dead–You’re Just Making the Wrong Optimisation!

I recently had a Eureka moment with Story Points that I want to distil and share. Understanding this subtle frameshift explains much of the dissent towards Story Points as implemented in many Scrum teams.

I have used Story Points (to varying degrees) for my entire software career and have been feeling lately that they have been the cause of some tension in my team. I knew the theory behind them quite well, but there was some slight misconception causing misalignment on “how to do the accounting without cooking the books”.

If you take one thing away from this article, it should be an understanding of the following:

Story Points (or “Complexity” Points) measure the degree to which the system’s components must be changed to achieve the smallest releasable increment; this is completely independent of the time taken to do so.
Or more simply: complexity != time.

This may seem straightforward, but before you stop reading, stick with me while I explore; if Complexity is not a measure of time, how should we assign Story Points?

Why People Subtly Estimate in Time

One of the great failings of “Corporate Agile” is that it took the language of Agile but nothing else. This creates a deceptive surface area where people–using the same vocabulary–believe they are having substantive discussions about the same thing.

Where I stand on the root cause of conflicting opinions on Agile

The discourse around Story Points (invented in XP and then included in Scrum) also succumbs to this phenomenon.

My Eureka moment was this:

Two irreconcilable systems are at play under the guise of a single system, driven by two intrinsically conflicting intents for Story Points. We have a single name for these different systems because the push for a common vocabulary (e.g. through Scrum Master qualifications) superseded the push for a common understanding.

To explore these two systems, let’s look at the intent–the result each persona wants to see by using the system–of each. The Story Points system services two main personas:

The Account/ Project Managers: the people who oversee the project at a high level. They are often responsible for managing timelines, and budgets, and are more senior in their organisations than the second persona.

The Scrum Team/ Delivery Manager: the people responsible for delivering/ developing the actual features as quickly as possible.

The Story Point system's intent for Project Managers is primarily to estimate when features will be complete to plan releases and retrospectively track the cost to inform feature ROI analysis. It enables the Scrum team to report on capacity planning.

As the Scrum team members often hold less power in the organisation, they are incentivised by the Project Managers to think about Story Points in terms of time. When the Scrum team say “We have three features on the roadmap of 30, 50, and 70 Complexity Points”, the Project Manager equates this directly to time and may plan their releases accordingly. The Scrum team (Scrum Master) may also be responsible for producing reports showing projected timelines (e.g. Gantt charts).

While there is value in following a plan, the Agile Manifesto asserts that responding to change is the superior value. The Scrum team often appreciates the need for capacity planning but, ultimately, they are concerned with picking up the next highest priority, completing it as quickly as possible, and minimising iteration cycles. Tracking Complexity Points takes time that could otherwise be spent on development–actively detracting from delivering value.

I’ve experienced corporate project management in the past few years, and recently, I’ve seen how this requirement to follow a set plan trickles down to how the Developers think about Story Points–as a mechanism to report on progress and predict when we will be “done”. With the Epic Burndown Chart in mind when estimating tickets, and wanting to keep the indicator as accurate a picture of our progress (measured in time because, ultimately, the thing we have constantly in mind is meeting the arbitrary deadline we have been set), estimations become reflections of the time we will spend on items (or at the very least slightly normalised to keep the team speed stable).

This undertone of estimating and tracking time does not mix well with the intent of Story Points, as I was originally taught–that Story Points are not days, hours, weeks, or minutes, nothing indicating time spent. The rest of the Scrum team know this too, but the focus on the deadline meant we had forgotten it without realising it.

A Different Measure: Value-Added Tasks

To figure out what was going wrong, in an act of a long time coming, I decided to write down almost everything I know about Story Points and the adjacent systems that they feed into (see the standard I wrote below). In doing so, I managed to articulate an intent of the system that delivers value to the Scrum team:

💡

Story Points are primarily a tool to highlight delivery and tech problems so the team can invest in solving them in an informed way to increase delivery speed.

This is quite a different purpose than what the Project Manager sees, so how does this work?

It really matters what we measure. What we measure is what we optimise.

If we measure speed (Story Points/ day), Developers mistakenly attach self-worth to their ability to deliver fast and will try to increase speed however they can. This can result in some gaming of the system:

inflated estimations

post-estimating tickets that took longer

adding pointed tickets for meeting time

For Story Points to bring value to the Scrum team, we must identify the difference between “value-adding” tasks and “non-value-adding” tasks. A value-adding task is one that contributes to creating the final product’s value to the user. There are two kinds of non-value-adding tasks:

Required: the activities we accept we must do to enable the valuable transformations (or to spend more time on value-adding tasks). For example, our team writes a small technical strategy on each ticket outlining the changes we expect to make in the ticket. This is required, as having a set of instructions to follow while coding increases our productivity.

Non-required: things that don’t contribute at all to value for the end user.

The main value-adding developer task is writing the line of code that gets deployed to production and gets run by a user.

I assert that Complexity Points should only be a measure of value-adding tasks and Developers should defend this as a mechanism for making their work more fulfilling.

This way, when a drop in speed is shown on our burndown chart, it visually illustrates that the Developer encountered some problem that prevented them from delivering value. We can then react and prevent the problem from reoccurring; solving these small everyday problems and ensuring the team is constantly learning creates compounding positive effects on the environment the developers have to deliver value.

In practice, this means that when making estimations, I think about the elements of the system that will need to change to do this ticket (i.e. roughly the git diff for the resulting PR) and assign Story Points in comparison to previous tickets of similar sizes (we call these reference tickets which are used to keep the sizing of estimations consistent over time). This trick, thinking about the git diff as a mental model for ticket estimations, keeps me from subconsciously estimating based on the time I think it will take me to complete.

A Small Example of How to Estimate

There are lots of problems on my current project. Our burdown chart is a tool to highlight these problems so that we can see and solve them.

Here’s an example of a problem I see on our project and how it unfolds in both systems.

In order to change the behaviour of some of our app’s flows, we have a visual code editor (a lo-code tool, that we can use to visually represent a user flow), from which we export and deploy an XML representation which then controls the app logic. At some point, to extend the functionality of this lo-code tool, someone wrote a script which takes the XML output and does some pre-processing before we deploy it. Over time, this script got more complicated and we started using features of this lo-code that meant that all our exports were no-longer compatible with the pre-processing script.

Our process to change functionality in these flows is now:

go to the visual editor and make some changes

save and export the user flows

manually edit the exported output so it is compatible with out pre-processing script

run the pre-processing script against the output

add an SQL script (don’t ask) which deploys the new code on app start-up

All this process takes time, but produces a PR with fairly minimal diff.

This is a problem. The user doesn’t care about any of these steps, only the lines of deployed code that produce the desired behaviour–if I can reduce my time on these non-value-adding tasks, I can spend more of my time writing cool features for users.

Estimating by Time

If we’re playing to the system that the Project Managers want us to use, we would include some extra Story Points to account for the fact that more time will be spent doing these extra steps. In this case, we give a more accurate indication of when we expect the work to be complete (we give a bigger estimation because it will take more time), however our burdown chart now hides the problems that we know exist.

Further, increasing an estimation for a task that is more “complex” or more uncertain, is just a proxy for saying, “we expect this to take more time. Some people believe the tradition of using Fibonacci for Story Point values is because it accounts for uncertainty with larger items of work, but this shouldn’t be true–using complexity as a unit for estimations explains Fibonacci usage by pointing out that our projection of what the required changes (diff) gets less precise the bigger the piece of work (I still think we use Fibonacci because it’s nerdy).

Estimating by Complexity

As the diff is small (i.e. the value-adding part of the task is small), we have a fewer Complexity Points assigned to this ticket, and when the Developer takes a longer time to complete the ticket, which raises this problem which can be investigated, triggering the Tech Lead to come along and fix the pre-processing script to help the devs move quicker.

It’s important to note that we shouldn’t plan to fail a Sprint. If we know that we have lots of these tickets coming up, we can use our intuition to reduce our capacity for the Sprint–projecting to do fewer tickets because those we are taking in we project will take longer. A typical case for this might be that we know we are working on an older section of the codebase and that we might run into some nasty, brittle code.

Any time that someone speaks out in a Sprint planning to reduce our capacity, we should also consider their reason for doing so to be a cause for problem-solving:

“Hey, this section of the codebase is a mess, I’m not sure I’ll be able to finish all these this week”.

“OK, let's take out some of these tickets, but we should consider refactoring that section of the codebase so we can continue to work on it quickly in the future”.

Our new burndown chart would look like this–we still plan to complete the Sprint by reducing the capacity, we still see the problem occurring, and we still succeed the Sprint!

Speed as a Negotiation Tool for Increasing Productivity

With this system, we can do some cool analysis on speed which visualises wasted time due to problems and where we can make the biggest investments for increasing productivity.

If our speed looks like this:

We can investigate why the speed dropped (and it really did drop in real terms of value produced) for the group of tickets–we will probably discover some nasty code-coupling that can be fixed with a little work!

But Estimating Tickets by Time Gives Us a Roadmap!

It’s worth noting that this method of estimation I’m proposing does not give a 100% clear projection of when the features will be completed, only an indication. If we expect the speed to vary from feature to feature to visualise problems, then you cannot simply project a constant speed and calculate deadline = estimated complexity * team speed.

It’s also worth noting that humans are notably bad at estimating the time required to complete a task–especially on the macro level. We can make a prediction, but this often leads someone to promise it will be done by X day, and when our prediction turns out to be wrong, all hell can break loose.

I think using the system I propose, which doesn’t promise to make projections, exposes this uncertainty and hopefully highlights that a timeline projection using Complexity Points is a best guess, to which people can add the contingency that they wish.

Should We Just Measure the Rate of Git Contributions?

Personally, I don’t promote this. I don’t think it captures all value-adding activities and it also opens the system up to other types of gaming–for example committing extra whitespace or preferring solutions with more code (rather than a simpler solution).

Using git contributions as a heuristic for estimations doesn’t tell us two important things:

If our code solves a real user problem: we could be producing code at a fast pace, but actually no one uses our app. It’s important to keep this in mind–I use the word “speed” and not “velocity” because this method doesn’t measure velocity (i.e. speed in a given direction). We could be rowing the boat very quickly in the wrong direction.

The quality of our solution: we only see how easy the system is to change (speed). It’s an extremely important thing to optimise, as it makes the business more agile, but it can’t tell us if we’re over-engineering our solutions and only accepting the “essential complexity” required for a solution–we could write our solution in assembly code (probably a bad idea), writing thousands of lines instead of just using a few lines of React to create our GUI.

It’s important to keep these in mind.

I think measuring git contributions also creates a competitive/ toxic environment that I wouldn’t want to be a part of.

When to Break the System

Sometimes it’s more pragmatic to include some buffer for time in our estimations in edge cases. To know when this is OK, keep the following rule in mind:

Any time you treat Story Points as a proxy for time, you are saying, “There is a problem on my project that I accept we will not solve. To achieve greater delivery visibility, I am increasing the Story Points on this ticket to account for that”.

Conclusion

At the very least, I don’t think Story Points are leaving the world of software development anytime soon–whether they are a net positive or negative, they are here to stay. Hopefully understanding how they can be used to benefit those who have to use them brings a better implementation and less frustration.

Appendix: A Standard For Lean Story Points

📖 Definitions

📖

Story Points (or Complexity Points): a measure of the degree to which the system’s components must be changed to achieve the smallest releasable increment (a ticket).

This measure excludes the contributions of “non-value-added activities” associated with changing the system.

📖

💛 Intent

1️⃣

2️⃣

✅ Key Points