The inherent problem
Before trying to fix a problem, it is important to understand what is causing it. In most modern organizations there is a discontinuity between the way we manage work and the way work takes place.
Figure 1 represents a typical company hierarchy. People report to managers who are responsible for the work done by the people reporting to them. This continues down until it includes the teams actually doing the work.
Figure 2 illustrates the value stream, the flow of work across the organization (the up and down arrows represent management approvals). This, of course, is an extreme simplification. If we actually graphed how the work went across the organization we’d see a lot more green arrows going in all directions.
This sets up a challenge shown in Figure 3: we manage people in a top-down manager within management hierarchies while our work goes across the organization.
Focusing on our people tends to have us take our eyes off the flow of work and focus on where the work is being done. But this takes our eyes off of two important things- 1) how much is the work waiting, 2) how is the environment within which the work is being done helping or hurting us.
Instead, we tend to focus on the productivity of the individuals in this workflow and implore them people to work harder. We should be focusing on creating a better eco-system within which they should work. This shift is a central tenet of Lean and is incorporated into SAFe. Another issue is that no one is managing the value stream well. In most companies it’s a combination of people with this responsibility. It needs to be called out explicitly.
Imagine the work in your organization, how much of the time is the work waiting for someone to be available? How much does making sure everyone is busy contribute to this? How well can you see the cause and effect of keeping people busy and the actual impact it has on the work going from concept to consumption?
This can be more readily seen if we map out our value stream – the discussion of the next chapter.
Why Looking at the Value stream Is So Important
One reason that looking at the value stream is so important is that it gives us a way to see the work being done in a better manner than just watching people. As Don Reinertsen states in The Principles of Product Development Flow: 2nd Generation Lean Product Development (and as SAFe mirrors) – “if you only measure one thing, measure cost of delay.” This reflects that we are trying to eliminate delays. Not just in value realization, but in anything that directly or indirectly causes delays in value realization.
These all not only delay value delivered but they literally increase the amount of work to be done. Lowering them is critical and can be done by:
In Mr. Reinertsen’s brilliant book, which is the basis for much of SAFe’s foundations, he devotes an entire chapter to managing queues. Some of this is referred to in SAFe’s Principle #6 – Visualize and limit WIP, reduce batch sizes, and manage queue lengths.
Why Looking at Delays Is So Important
One of the significant differences between software development and the physical world is that software, while being developed, is essentially invisible. But we can track its progress by looking at where it is in development. We can also get a good sense of this by looking at the queues before each step.
All of our work should add value to the software programs and services we produce; however, much of the work done in software organizations is created because of problems and delays in workflow. We call this “induced work,” the work we make for ourselves beyond what would have otherwise been needed to accomplish our goals. It happens at all levels and scales of an organization.
If we can identify such delays and remove them, we can “stop creating waste” (at least some of it). This reduces the overall time needed to finish our creative work, so we become more productive. And this in turn creates a virtuous cycle where many other benefits follow, including higher quality, avoiding or fixing errors quickly, and gaining a better understanding of features so even less work is wasted. We find delays by looking at where time is spent in the process, so time is key.
Delays reveal loss… and opportunity
Waste: Hiding in plain sight
The picture is meant to be comical, but is, unfortunately, all too true. Yes, the ditch digger in the background isthrowing dirt into the other person’s hole.
We sometimes hear the mantra “eliminate waste” and in this case that would mean stop throwing dirt from one hole into the other.
Unfortunately, as in this case, we often don’t realize we are creating the waste we need to eliminate. In this example, to the ditch digger in the foreground, there is just dirt in his hole that he has to remove. There is not the “useful” dirt that was there that he has to remove and the “waste” dirt that was thrown in by the other ditch digger. There is just dirt.
Note also that the other ditch digger isn’t aware of the extra work he is causing. In other words, if you told these folks to “eliminate waste” they’d probably just shrug their shoulders, think “what waste?” and get on with doing what they are doing. Waste often can only be seen when one looks from outside the problem. Yet another example of why an holistic approach is required.
“Eliminate waste” or “Stop creating it”?
Rather than “eliminate waste,” I prefer to focus on “stop creating waste.” Because all too often, half of our work involves digging out dirt that has been put into our hole by another group. Lean suggests that the way forward is to focus on eliminating delays in the workflow rather than trying to do work faster.
I suggest that much of our time is spent working on what I call induced work. It is work that is literally created from delays in your process and is self-inflicted (even though unintentionally). It can result in a significant amount of additional work you have to do that you wouldn’t have to do if you managed your delays more effectively.
For example, consider the challenge of dealing with bugs in software. A developer writes a bug. Now imagine that he/she is told about it immediately. How long does it take to fix? Let’s say an hour. Now, imagine that they aren’t told about this for a couple of weeks and further imagine that nothing else has changed. How long does fixing take now? A lot longer, maybe even days longer. And it gets even worse if you have other work going on where the code has been changed by others or is using code modified by others since the original code was written.
What does “induced work” look like?
I suggest that much of our time is spent working on what I call induced work. This is work that is literally created from delays in your process and is self-inflicted (even though unintentionally). It can result in a significant amount of additional work you have to do that you wouldn’t have to do if you managed your delays more effectively. In this chapter we’ll take a look at this and why it occurs.
The following lists show work we intended to do and extra work that can be said to be self-inflicted by making mistakes and not having quick feedback to identify them.
Our intended work
Our intended work makes progress on the mission of your organization. Induced work is work was created by making a mistake or having a misunderstanding. I’m not suggesting mistakes and misunderstandings can be avoided. However, the amount of the induced work greatly increases the longer the time from the error until it is detected. I suggest we can usually vastly reduce the cost of the mistakes even when we can’t avoid making them. The common theme in doing this will be to minimize the time from making the mistake until detecting it. The notion that delays increase our waste can also be applied to most of the other items on the right. Let’s see.
Re-doing requirements or working from old requirements is caused when you have a delay from when you got the requirement until you needed to use it. Building the wrong feature is usually due to a miscommunication between the customer (or their proxy) and the development team. The greater the delay between getting the initial requirement and actually building it will increase the amount of work involved. Building unneeded features is so axiomatic in our industry that we think it unavoidable. However, if one builds features in stages, one can often learn that a feature isn’t needed by the time one gets ready to build it.
If we focus on building the most important features in small batches we can use what we learned to see if we actually need the pieces we deferred. This is another tenet of Lean – work on small batches. This accelerates value delivery while shortening delays to feedback. All of this contributes to reducing induced work.
Let’s look at the other items on the list of induced work. You may have noticed that the fixing, in fixing bugs, is in quotes. The reason is that developers don’t actually spend a lot of time on fixing bugs even though they have the experience that they do. Let me explain.
Consider this, imagine the worst bug you’ve ever had in your experience, or the worst bug you’ve seen a developer have if you’ve never been one. Think of the time they spent “fixing” it. Most likely, the first few hours were investigating the problem, then trying something, then setting things back after that didn’t work. Notice, up to this point, no fixing has been done. Investigating and relearning has taken place. The fix itself typically takes very little time.
Some people protest that this is just semantics. I disagree, but even if true it’d be important. There are two activities taking place here. The first is a discovery of what we have to do (finding) and the second is doing it (fixing).
Let’s take a look at this another way. Imagine a developer writes a bug. As a small aside I’ve noticed that developers talk about bugs as if they don’t write bugs but rather that they either show up or testers put them in. Notice how they often say “I found a bug!” or “testing found a bug!” as if they had nothing to do with it. BTW: I noticed this by observing myself, so I’m not deriding anyone. Anyway, now imagine that he/she is told about it immediately. How long does it take to fix? Let’s say an hour. Now, imagine that they aren’t told about this for a couple of weeks and further imagine that nothing else has changed. How long does fixing take now? Lot’s longer, maybe days longer. And it gets even worse if you have other work going on where the code has been changed by others or is using code modified by others since the original code was written.
The additional time required to find and fix from the first case to the second case is not semantics and it is a different nature than fixing code. It is clearly additional re-learning and discovery time. The reality is that we spend much more time finding our problems than fixing them and the greater the delay from creating the error until detecting it the greater this amount of increased time is. Also notice that this is not task-switching time as it is often attributed to – one might start working on the bug fix and concentrate on it alone and this phenomenon will still occur.
Continuing down our list, I would suggest that ‘overbuilding frameworks’ and ‘essentially duplicating components’ are more due to a lack of technical skills that can be improved through the use of design patterns and emergent design. Duplication is also exacerbated by delays as sometimes people forget what has been done.
The last work type on the right is “integration” errors. Again, note the quotes. I mark them that way since integration errors are exceedingly rare. An integration error would be an error in integration. More than 99.9% of the things I’ve seen called integration errors are actually errors that occurred well before integration. That is, the teams needing to integrate did not stay in sync with their understanding or their code. The integrator integrated just fine. The error lay in the fact that the components he integrated properly just don’t work together properly. Calling an error that occurred upstream an “integration” error is equivalent to calling a bug found in testing a “testing” error. Again we can see that the greater the delay from the error occurring and its detection in integration will increase the work taken to fix it. Note that his is just another reason why continuous integration is good. Continuous integration isn’t about avoiding integration errors, it is about detecting miscommunications between groups working together as they occur.
It is easier to save time by not creating induced work
It is important to notice that with the exception of automating testing, the work (on the left) we find valuable will likely be difficult to speed up. Yet the work on the right, which we don’t want to do, can be mostly eliminated by cutting out the delays in our workflow.
It is worth taking a few minutes and consider how much time your organization spends on the left side of the table compared with the right side of the table. Pause, take a minute.
In my classes I ask this question and the general consensus is 30-70% is spent on the left. I actually think this is a bit optimistic, but even so, it provides a lot of motivation to shorten delays and shrink the work we do that isn’t useful.
Much of the work we do is actually not making progress on our goals but is literally induced (created) by the delays in our workflow. Lean suggests that we look at the delays between our workflow in order to eliminate the waste created by these delays. While we should also be looking how to improve our work, our biggest initial returns are likely going to be by attending to time.