A 5-why root cause analysis retrospective

Sandy Mamoli

22 Apr

The idea

For quite a while I have been waiting for an opportunity to try a 5-why root cause analysis in a sprint retrospective.

The 5-why analysis has its origins within Toyota and lean manufacturing and is used to find the root cause of a problem through identifying a symptom and then repeating the question “Why?” five times. General wisdom and experience state that the nature of a problem and its solution usually become clear after 5 iterations of asking “Why?”.

Here’s an example from wikipedia:

Problem: My car won’t start.

Why? - The battery is dead.
Why? - The alternator is not functioning.
Why? - The alternator belt has broken.
Why? - The alternator belt was well beyond its useful service life and has never been replaced.
Why? - I have not been maintaining my car according to the recommended service schedule.

Solution: I will start maintaining my car according to the recommended service schedule.

The plan

I have found 5-why root cause analysis very useful in the past but had never tried it with a group of people or a software development team.

I “conspired” with our very talented Scrum Master and the plan was to share data from previous sprints, analyse the data as a team and see if we could identify a problem. If so, we would suggest a 5-why analysis to see whether it would point us towards a root cause and a solution.

The execution

1) We started by presenting velocity data:

We charted our planned (blue) and achieved (red) velocity over the last 10 sprints.

It became obvious that, during the last 4 sprints, we had consistently bitten off way more than we could chew. While it is generally a good thing to strive for what is just out of reach and to improve though practice we thought the gap between attempted goal and real achievement was too big. We certainly didn’t want to lose management’s trust by over-promising and under delivering.

Therefore, we decided to focus on this problem for the remainder of the retrospective.

The problem

The problem was easily summed up as “We over-promise and under-deliver”

The first why: Too many stories are almost done

We came up with two reasons for why we kept over-promising and under-delivering:

At the end of the sprint too many stories were “almost” but not entirely done (testing not finished, found a defect, etc)
Two of our team members had just gone from 50% to 100% and we overestimated the immediate benefit in terms of velocity

Much like in a decision tree we chose to pursue the path of being left with too many almost finished stories as the other reason was probably a one-off we could safely put into the “shit happens - we have learned” category.

The second why: We are running mini-waterfall

To find out why so many stories were almost but not entirely finished by the end of the sprint we had a look at last sprint’s burndown and cumulative flow diagrammes (I love those Rally reports).

The cumulative flow showed us that by day 4 most of our stories were in progress (yellow) but not many were actually completed (blue and green). Only 2 days before the end of the sprint the majority of stories were completed (blue) and then accepted by the product owner on the very last day (green).

This looked suspiciously like a mini-waterfall process where we first went though a development and then a testing phase.

The problem with any waterfall approach is that it forces feedback towards the end of the time box where it is hardest to react to. This is true for feedback on quality (through testing) and feedback on whether we have correctly understood the needs of our business and users.

The third why: We’re not doing tasks in parallel

We were then asking ourselves why we were running a mini-waterfall process inside a sprint.

Not only were we doing testing at the end but we also seemed to have a problem with co-ordinating GUI (Flex) and backend (web services) work. People found themselves waiting for someone else to finish a task before they were able to pick up the next task to finish the story. Often people decided to work on the next user story rather than wait for someone else to “unblock” them.

We decided we were simply not good enough at working on tasks in parallel to get one specific user story finished.

The forth why: No TDD, test automation and stubbing

People came up with several reasons why we weren’t working on tasks in parallel to get one story “over the line”:

Lack of communication
Only partial TDD, test automation and stubbing
Lack of collective responsibility
Not cross-functional enough

As we now had four possible paths to follow we briefly discussed each of them and then did a dot vote on which path to follow. The team decided on number 2: No TDD, test automation and stubbing.

The fifth why: We have an attitude problem

We really went into deep discussions about why we weren’t doing things that everyone knew were good and healthy and after another dot vote amongst several candidates we decided the most relevant root cause was that we had an attitude problem.

We knew what needed to be done, we knew that we needed to make it happen but somehow we hadn’t managed to just get it done yet. Probably because we hadn’t been fully aware of the consequences. We agreed to start fixing the problem immediately.

The solution

More specifically we all agreed to:

Ask for help when stuck
Accept help gracefully
Automate our regression testing (Flexmonkey/RiaTest), do TDD when possible
Make more use of stubbing and mocking to be able to develop the front and backend part of a story simultaneously
Hand over to the next person after finishing a task, i.e. don’t expect people to see progress from the task wall only; inform the rest of the team when e.g. the web service is finished or the story is ready for testing

My conclusion

Here’s the list of my personal takeaways and things I have learned:

A 5-why root cause analysis can work amazingly well for a targeted retrospective.
To avoid any danger of a 5-why retrospective turning into something like the Spanish inquisition it needs to be very well facilitated (Hallelujah for good Scrum Masters :-)
A 5-why analysis with a group of people will produce a decision tree as there will be mostly more than one answer to each “Why?”.
Voting to decide on which reason is the most relevant one and should be discussed works well and speeds things up.
It made my day that I got the opportunity to try this ;-)

I wrote this to illustrate and examplify a retrospective technique that worked extremely well for our team. I think this was one of the best retrospectives I have ever participated in and I hope other people will give it a shot and share their experiences.

RetrospectivesAgileLeanroot cause analysis

Vlad @ NoJoke Agency