First published October, 2010. I had just debuted a new talk called Software G Forces, about how development must change as deployment goes from yearly to quarterly to monthly to weekly to daily to hourly. Practices essential at one pace are fatal a couple of increments later.
The premise of my recent Software G Forces talk is that deployment cycles are shrinking, and that what constitutes effective software development at one cycle (say annual deployments) can be fatal at another (like daily deployment). Each transition—annual→quarterly→monthly→weekly→daily→hourly—requires a different approach to development. Everyone can find their current deployment cycle in the sequence above and everyone is under pressure to shrink the cycle.
Almost everyone. I gave a workshop based on the G Forces model in Israel recently and one workgroup made it clear that their current deployment cycle was just fine. As a followup, someone else asked the fundamental question, “Why should we deploy more frequently?” My inspiration for the talk was my long-standing observation that cycles are shrinking, but I never really thought about why, so I didn’t have a good answer. This post, then, gives me a chance to think about why to shrink deployment cycles. (I’ll be giving the talk in Hamburg on Thursday, November 4, 2010 if you’d like to see it live.)
Competition
The obvious reason to deploy more frequently is to get a jump on the competition. If you are in a head-to-head competition where features matter and you can bring them out faster, you should win. If the villain gets ahead, you can rapidly catch up. If they get behind, you can keep them from catching up. Analogies to the OODA loop come to mind.
When I tried to come up with examples of such competition, though, I had a hard time finding any recently. The days of word processors competing on feature lists is long gone, resulting as it does in bloat and complexity. One recent example is Bing versus Google. Even there, the struggle is more to learn about user behavior more quickly than the competition, not a strict feature battle. It would be an advantage if one of them could deploy weekly and the other only monthly, but the winner still would be the one who understood users best.
Scaling
A lesson I learned from my officemate at Oregon, David Meyer (now a director at Cisco), is that as systems grow in complexity, every element is potentially coupled to every other element. This suggests that systems be made as simple as possible to keep that N^2 from blowing up too far, and it suggests that changes be made one at a time. If any change can potentially affect any part of the system, then introducing two changes at once is much more complicated to debug than introducing one change. Was the problem change A? Change B? Some interaction of A and B? Or was it just a coincidence? Introducing one change at a time keeps the cost of maintenance in check.
At the same time, systems need to grow rapidly. You need many changes but you can only introduce one change at a time. One way to reconcile the conflicting needs is to reduce the cycle time. If you can change a system in tiny, safe steps but you can make those steps extremely quickly, then it can look from the outside like you are making big changes.
Timothy Fitz, formerly of IMVU, told a story that brought this lesson home to me. The discipline they came to was that as soon as they said to themselves, “That change couldn’t possibly break anything,” they deployed immediately. If you weren’t at least a little worried, why would you even say that? By making the overhead of deployment vanishingly small, they could create value with every deployment. Either the deployment was fine, in which case the engineer regained confidence, or the deployment broke, in which case the engineer learned something about the sensitivities of the system.
Waste
In Toyota Production System, Taiichi Ohno makes an analogy between inventory and water in a river. By lowering the water level in the river (reducing inventory), you can uncover previously hidden rocks (identify bottlenecks). Undeployed software is inventory. By deploying in smaller batches, you can identify bottlenecks in your software process.
Startups have a vital need to eliminate waste. Because many of the assumptions on which a startup are based are likely to prove false, most startups need to iterate to be successful. If the team can learn to learn from each iteration and can make those iterations short and cheap, then the whole enterprise has a greater chance of succeeding overall. Startups have the initial advantage of no code and no customers, so establishing a rapid deployment rhythm is relatively easy, although maintaining that rhythm through growth can be a challenge.
Fun
The final reason I thought of for accelerating the deployment cycle is the adventure. Especially if someone claims it is impossible, establishing a rapid rhythm is simply fun and satisfying. Don’t underestimate the role of fun in driving innovation.
Conclusion
There are my reasons for accelerating deployment: responding to (or staying ahead of) competition, scaling safely, identifying waste, and fun. My next post will look take a more abstract look at how accelerating deployment works, through its effects on latency, throughput, and variance.
Commercial plug–the switching costs between tools becomes more significant as the deployment cycle shrinks. That’s why JUnit Max runs tests automatically and displays test results right in the source code. [ed: JUnit Max is sadly long gone]
In the world of web deployed, low criticality software, faster is better for all of the reasons you mention. Another thing shorter deployments get you is more use cases. No matter how hard you try, some user somewhere is using your software in a way that you would never think of, so you don't check for. Shorter cycles teach you about those things and let you course-correct.
But when you're dealing with things that aren't updatable (such as embedded systems) or safety critical systems that involve physical things moving in the real world, the cost of failure is higher, so deploying (shipping) to the public fast isn't possible/desirable.
That said, even in those cases, deploying to internal test systems should be as fast and simple as possible so you can do those real-world tests quickly.
The acceleration benefit I value the most is reduction of risk the way it's described in "Accelerate." I think some of that is wrapped up in your Scaling section. For pitching to the skeptical (and the suits), it might be worth describing it separately.
My random thoughts: "Accelerating to the point where deployments can only be automated is a forcing function to de-risk every deployment by adopting more resilient patterns. Failure rate is always non-zero. If you don't deploy enough to know what YOURS is and how to gracefully recover, your next deployment could be The One that breaks you bad."