Skip to content

Let it fail then learn to succeed

On the scrum development mailing list, Dave Nicollette recommended shortening sprint length until it failed, then backing up one step:

“Oh, my God! You’re going to let a sprint fail, just so you can determine the optimum length?” Yes.

In other words, failure lets you learn your limits. But more importantly, as suggested here, is that failure can be deliberately introduced in order to improve the process – to decrease your limitations. I would say, for an organisation committed to process optimisation, you should shorten the sprint until it fails, but rather than just accepting that limit, learn why it fails and then experiment with optimising it until it succeeds. Let me explain.

Courtesy of a tip from Doug South at the Brisbane XP user’s group, I’m nearly finished reading a book at the moment, Chasing the Rabbit by Steven J. Spear of MIT and formerly of Harvard. The book’s full title is Chasing The Rabbit: How market leaders outdistance the competition and how great companies can catch up and win. It’s a business book that describes the essential feature of successful companies such as Toyota and the Toyota Production System.

Before you roll your eyes and think I’m just another one of these lean software guys who’s about to rave over kanban, kaizen, and just-in-time, and applying manufacturing production-line process, possibly inappropriately, to the professional practice of software development, let me assure you I’m not. Neither does Spear. Basing his book on years of hands-on research into these high-velocity organisations (including time spent actually working on real production lines) he characterises kanban and friends as merely an artifact of a deeper philosophy towards building a successful company. In some ways its breathtaking simple, although hard to practice. The theme of the book is discovery.

The factory was not only a place to to produce physical products, it was also a place to learn how to produce those products and – most important of all – it was a place to keep learning how to produce those products. In fact, this is exactly what so much of the early research about Japanese management had revealed – that learning and discovery were intrinsic to success. But that idea had gotten lost as people focused on the particular tools and artifacts used in the workplace at the expense of understanding the principles of how those systems where managed. (p 15).

These companies don’t succeed or fail. They succeed or learn to succeed.

Now it is of course slightly more complex that just that. There are two characteristics and four capabilities that high-velocity organisations exhibit. The four capabilities are:

  1. Specifying design to capture existing knowledge and building in tests to reveal problems.
  2. Swarming and solving problems to build new knowledge.
  3. Sharing new knowledge throughout the organisation.
  4. Leading by developing the first three capabilities.

Now these capabilities can be developed – in different manifestations – in any sort of process, not just production-line processes. Spear shows it in action in health-care, in nuclear engineering, in aluminium production, and of course automotive production. I’ll into further detail perhaps in a later post. Here I’m going to discuss a simple example illustrating not just capability 1, in which a process not only captures knowledge but has built-in tests, and also capability 2, involving problem solving and process improvement, where the process is tested to failure, in order to reveal what knowledge is missing about the process.

In the book he describes a mattress factory, Aisin, that, after a long capability improvement process ends up with two production lines each capable of making (e.g.) 100 units a day to custom order. On some given day, imagine they have a requirement to make say 180 units. Rather instead of running one line at 100 the other at 80, or both at 90 units, they will instead deliberately overload one of the lines (e.g. ramp it to 110 or 120 units) in order to discover the process failure modes (that is, the process’s bottlenecks) so as to to improve them. They already know how to make 100 units a day. They can learn more about their process if they try to make 120. The reason they do this on days of lighter loading, is that the can use the unstressed production line as a back up, if the test was too stressful for the line under test.

Experienced software engineers might recognize this stress testing from their performance testing toolkit. Loading a software system to the point of failure is one quick way to learn where a system’s bottlenecks are, and what to do about eliminating them, and thus making your software architecture more efficient, and discovering new knowledge about how your architecture performs and what needs to change.

The point is of course, I’m not talking about stress testing software. I’m talking about stress testing software development methodology so you can discover what you don’t already know about your process, and therefore improve it. I don’t mean, just arbitrarily place more demands on your software teams to produce more with less. That’s just the most retarded type of management-by-objective.

Think about all the moving parts in your development process. What do you know about them? What do you assume? What do you just take for granted? How much do your team members know about the process? Can they identify problems and propose solutions? Do they need to learn that capability? Where are your failure modes? Do you just accept them, and perform a work around, or do you understand them as signals to elicit further discovery about the things you don’t understand about the process? Embrace failure. In fact, actively seek it out. Use it an opportunity to discover unknown information about your business and to improve your team’s skills.

Chasing The Rabbit is a great book, highly recommended that all software development practioners read it, and special thanks to Doug for bringing it to my attention.