The paradox of software development

I finished Tim O’Reilly’s WTF. For the most part, his discussion of the way that the evolution of technology affects the business environment is really insightful. This is particularly true around chapter 6, where he describes how companies try to manage the process of software development.

I like to say that computer programming is easy and software development is hard. One or two people can write a powerful set of programs. Getting a large group of people to collaborate on a complex system is a different and larger challenge.

It is like an economy. We know that the division of labor makes people more productive. We know that some of the division of labor comes from roundabout production, meaning producing a final output by using inputs that are themselves produced (also known as capital). Having more people involved in an economy increases the opportunities to take advantage of the division of labor and roundabout production. However, the more people are involved, the more challenging are the problems of coordination.

O’Reilly describes Amazon as being able to handle the coordination problem in software development by dividing a complex system into small teams. You might think, “Aha! That’s the solution, Duh!” But as he points out, dividing the work among different groups of programmers was the strategy used in building the original healthcare.gov, with famously disastrous results. You risk doing the equivalent of having one team start to build a bridge from the north bank of a river and another team start to build from the south bank, and because of a misunderstanding their structures fail to meet in the middle.

He suggests that Amazon avoids such pitfalls by using what I would call a “document first” strategy. The natural tendency in programming is to wait until the program is working to document it. You go back and insert comments in the code explaining why you did what you did. You give users tips and warnings.

With disciplined software development, you try to document things early in the process rather than late. Before you start coding, you undertake design. Before you design, you gather requirements. I’m oversimplifying, but you get the point.

As O’Reilly describes it, Amazon uses a super-disciplined process, which he calls the promise method. The final user documentation comes first. Each team’s user documentation represents a promise. I’ve sketched the idea in a couple of sentences, but O’Reilly goes into more detail and also references entire books on the promise method.

Why isn’t most software developed in a super-disciplined way? I think it is because software development reflects the organizational culture of a business, and most business cultures are just not that disciplined. They impose on their software developers a combination of unstable requirements and deadline pressure. In practice, the developers cannot solidify requirements early, because they cannot get users to articulate exactly what they want in the first place.

Also, requirements change based on what people experience, and it takes discipline to decide how to handle these discoveries. What must you implement before you release, and what can you put off for the next version?

Consider three methods of software development. All of these have something to be said for them.

1. Document first–specify exactly what each component of the system promises to do.
2. Rapid prototyping–keep coming up with new versions, and learn from each version
3. Start simple–get a bare-bones system working, then move on to add in the more sophisticated features.

If you do (1) without (3), you end up with healthcare.gov. If you do (1) without (2) your process is not agile enough. You stay stuck with the first version that you designed, before you found out the real requirements. If you do (2) and (3) without (1), you get to a point where implementing a minor change requires assembling 50 people to meet regularly for six months in order to unravel the hidden dependencies across different components.

From O’Reilly, I get the sense that Amazon has figured out to do all three together. That seems like a difficult trick, and it left me curious to know more about how it’s done.

12 thoughts on “The paradox of software development

  1. Here’s a different theory on Amazon: Why Amazon is Eating the World. The author’s theory is that:

    Each piece of Amazon is being built with a service-oriented architecture, and Amazon is using that architecture to successively turn every single piece of the company into a separate platform — and thus opening each piece to outside competition. … But the revenue bonanza is a footnote compared to the overlooked organizational insight that Amazon discovered: By carving out an operational piece of the company as a platform, they could future-proof the company against inefficiency and technological stagnation.

    To get back to your point on software development, maybe it is about many small teams. Each team, however, is a separate product, so the requirements and complexity for that team alone are much lower. Allowing the product to be sold to the outside keeps the design and implementation of the product focused on what is actually important to customers.

  2. Amazon picked the hardest part of software management, and they made it into one of their most important products. They are the industry leader in managing cloud services. A lot of this stuff is done very badly because they are highly technical activities that are done rarely. Writing and updating software gets done every day, but a lot of server management operations are a big deal for a couple of months and then no one touches anything for a few years. These are likely points of failure in many systems and Amazon made sure they were better at it than anyone else.

  3. These posts have made me wonder if Amazon might be further along than most in creating standardized, interchangeable components for software systems. Seems to me the industry might be ripe for the American system of manufacturing, which did so much for the Industrial Revolution.

    Sure there are libraries and such, but it seems an awful lot of time is spent on fitting by skilled programmers rather than assembly by semi-skilled. An aside is that a standardized component that sticks to its promise to users can then be marketed as a separate product to outside users.

  4. Steve Yegge, is a Google engineer who previously worked at Amazon. His original post, sheds some high-level insights into the software development process at Amazon. It’s an interesting read: https://plus.google.com/+RipRowan/posts/eVeouesvaVX

    Notice, how Jeff Bezos’s big mandate (in Steve’s words) sets up a good system for platform level collaboration:

    “So one day Jeff Bezos issued a mandate. He’s doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion — back around 2002 I think, plus or minus a year — he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.

    His Big Mandate went something along these lines:

    1) All teams will henceforth expose their data and functionality through service interfaces.
    2) Teams must communicate with each other through these interfaces.
    3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
    4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter. Bezos doesn’t care.
    5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
    6) Anyone who doesn’t do this will be fired.
    7) Thank you; have a nice day!

    Ha, ha! You 150-odd ex-Amazon folks here will of course realize immediately that #7 was a little joke I threw in, because Bezos most definitely does not give a shit about your day.”

  5. There is much talk in the software world about starting with something-quick-and-dirty and using the money that (you hope) pours make up for the debt. This is a disfigured version of method (3).

    While I agree with this, people focus it too far on the technical side. So the result it salesman go and over-promise and then the deadline is met through sloppy engineering. In the real world this almost alway fails, which is why the quick-and-dirty philophy has to be invented to give it philosophical cover.

    The difficulty-reduction should be at all levels, starting with sales and management keeping expectations down to reasonable levels. That way the technical problem becomes one small enough that quick-and-dirty solutions are acutally a good thing. For that you need a reduced feature list, something that even the customer will see as “bare-bones”. So one hard thing is how do marketing people get customers to pay up for a bare bones thing?

  6. Is there any reason to think Amazon’s software is particularly well-designed and developed (as opposed to their business decisions and processes)? All the consumer-facing stuff seems to me to be competent and professional enough, but nothing more. Amazon Drive/Photos seems to be kind of ‘meh’ in comparison to competitors like Dropbox and Google Drive. All the storage is a great deal bundled in for Prime customers, but otherwise I would have no particular reason to use it. There’s nothing wrong with their video streaming service or user-interface, but it does not appear better in any way than Netflix or other offerings. Same goes for Fire TV — quite usable, but not as good as Roku. And, of course, their attempt at smart phones was an expensive disaster and they remain distant also-runs with their Fire tablets (whose main attraction seems to be low price — and whose major drawback, as with their smart phones, is their custom Android and resulting lack of Google apps and services).

    Speaking of Netflix and software development, I’m reminded of their famous culture document:

    http://postachio-files.s3-website-us-east-1.amazonaws.com/cc561bddfc40ff6b75d17968ed78c377/5725f8f1fd4174604928cc2b8e15c4da/b3167770ab29199c021f73ae28f71da4.pdf

    This impresses me more than anything I’ve seen with respect to software development at Amazon. But…Netflix’s continuing success depends vastly more on great original TV programs than on great original computer programming. Just as, I would say, Amazon’s success requires great business decisions and processes and ‘good enough’ software.

  7. Virtually all users are unable to specify what they want from scratch, but can say what they want or less of once they have something usable – something that does some useful work.

    What needs to happen is the (Agile based) Minimum Viable Product, which is a (2) prototype delivered as a (3) MVP, in conjunction with usable but minimal (1) documentation. The (3) product is improved after users react to the initial MVP, and their reactions result in changed documentation.

    There is not yet a good standard process, but “Agile” and “Development Operations”, in small, dedicated, co-located cross-functional teams seems one of the more popular ways to go.

    There continues to be changes in tools, where GitHub for code versions is fine, but now Jira is becoming more popular — perhaps to make better upward looking reports for managers about the SW development process.

  8. “Why isn’t most software developed in a super-disciplined way?”

    As a software developer, if you correlate “Documentation” with “Design”, which is incorrect but WAY out of scope here, you’re dead on until this. Failure to do design-first isn’t a reflection of corporate culture, or at least not mainly. It’s because doing so is insanely difficult, to the point of being functionally impossible. Which doesn’t mean you don’t try, but it does mean you try with the acknowledgement that you’re going to have to have a very flexible design which will change often. Nature of the beast. On many software development projects, it’s considered easier to use an “Emergent Design”, that is letting the design be dictated by the work rather than the other way around.

    You simply can’t do “Document First” on a project of any decent size or scope, and you shouldn’t try. Have a big picture and a flexible design for the first steps.

    And Healthcare.gov had more failings than just a lack of design. I wrote about it a bit when the failures started coming to light in the press. If you’re interested, and as a blatant self-plug:

    https://mattosbun.blogspot.com/2013/10/health-care-exchange-project-pt-1.html

    https://mattosbun.blogspot.com/2013/10/health-care-exchange-project-pt-2.html

    https://mattosbun.blogspot.com/2013/10/health-care-exchange-project-pt-3.html

  9. I’m a bit reluctant to comment here since I work for a subsidiary of the big river in Brazil, but…

    The key to your misunderstanding of how Amazon does it is here:

    It is like an economy. We know that the division of labor makes people more productive. We know that some of the division of labor comes from roundabout production, meaning producing a final output by using inputs that are themselves produced (also known as capital). Having more people involved in an economy increases the opportunities to take advantage of the division of labor and roundabout production. However, the more people are involved, the more challenging are the problems of coordination.

    You missed what is most important about an economy. Each actor that is selling is INDIVIDUALLY trying to satisfy their customer, so they have strong incentives to find some way to coordinate the chaos of the larger system to deliver for their customer. To win, I have to solve “the problems of coordination” for my product and delight my customer.

    I can say without any irony that we live the “Leadership Principles” every day. Customer Obsession and Ownership are at the top. They’re on the website and they’re not a secret. Those little software dev teams you described (I manage one) each act like a little startup trying to delight customers. A company this size has a pretty big, hairy economy and it’s chaotic as hell, but I know what my mission is and I’m going to find a way.

    If this type of owner is the only software dev you let yourself hire, and if they are empowered (literally… “in control”), you get Amazon.

Comments are closed.