Be Open from Day One, not Day N.

This might be the most important blog post I write all year (and it’s January now), so I’ll get straight to the point:

If you’re running a government software project and you plan to make it open source eventually, then just make it open source from the beginning. Waiting will only create more work.

It’s that simple. In later posts we’ll look at why to open source, and how, but first you have to know when.

The reason to be open source from day one is simply that the longer a project is run in a closed source manner, the harder it will be to open source later.

(Note that being open source from the start doesn’t force you to immediately take on the extra responsibility of community management. People often think that “open source” means “strangers distracting my programmers with questions”, but that’s optional — it’s something you might do down the road, if and when it makes sense for your project. It’s under your control.)

At Civic Commons we’ve watched a number of projects make the transition from closed to open, including some reasonably large ones, and they all seem to follow the rule pretty convincingly: the longer they run closed-source, the more difficult they are to open up later. Other people we’ve talked to have noticed the pattern too. Why does it seem to hold so consistently?

I think there’s one underlying cause:

At each step in a project, programmers face a choice: to do that step in a manner compatible with the future open-sourcing, or do it in a manner not compatible with the future open-sourcing. And every time they choose the latter, the project gets just a little bit harder to open source.

The crucial thing is, they can’t help choosing the latter occasionally — all the pressures of development propel them that way. It’s very difficult to give a future event the same present-day consequences as, say, fixing the incoming bugs reported by the testers, or finishing that feature the customer just added to the spec. Also, programmers struggling to stay on budget will inevitably cut corners here and there (in Ward Cunningham’s phrase, they will incur “technical debt”), with the intention of cleaning it up later.

Thus when it’s time to open source, you’ll suddenly find there are:

  • Customer-specific configurations and passwords checked into the code repository;
  • Sample data constructed from live (and confidential) information;
  • Bug reports containing sensitive information that cannot be made public;
  • Comments in the code expressing perhaps overly-honest reactions to the customer’s latest urgent request;
  • Archives of correspondence among the developer team, in which useful technical information is interleaved with personal opinions not intended for strangers;
  • Licensing issues with dependency libraries whose conditions might have been fine for internal deployment (or not even that), but aren’t compatible with open source distribution;
  • Documentation written in the wrong format (e.g., that proprietary internal wiki your department uses), with no easy translation tool available to get it into formats appropriate for public distribution;
  • Non-portable build dependencies that only become apparent when you try to move the software out of your internal build environment;
  • Modularity violations that everyone knows need cleaning up, but that there just hasn’t been time to take care of yet;
  • Need I go on? Do some of these sound familiar?

The problem isn’t just the work of doing the cleanups; it’s the extra decision-making they sometimes require. For example, if sensitive material was checked into the code repository in the past, your team now faces a choice between cleaning it out of the historical revisions entirely, so you can open source the entire (sanitized) history, or just cleaning up the latest revision and open-sourcing from that (sometimes called a “top-skim”). Neither method is wrong or right — and that’s the problem: now you’ve got one more discussion to have and one more decision to make. In some projects, that decision gets made and reversed several times before the final release. The thrashing itself is part of the cost.

Waiting Just Creates an Exposure Event

The other problem with opening up a developed code base is that it creates a needlessly large exposure event. Whatever issues there may be in the code (modularity corner-cutting, security vulnerabilities, etc), they are all exposed to public scrutiny at once — the open-sourcing event becomes an opportunity for the technical blogosphere to pounce on the code and see what they can find.

Contrast that with the scenario where development was done in the open from the beginning: code changes come in one at a time, so problems are handled as they come up (and are often caught sooner, since there are more eyeballs on the code). Because changes reach the public at a low, continuous rate of exposure, no one blames your development team for the occasional corner-cutting or flawed code checkin. Everyone’s been there, after all; these tradeoffs are inevitable in real-world development. As long as the technical debt is properly recorded in FIXME comments and bug reports, and any security issues are addressed promptly, it’s fine. Yet if those same issues were to appear suddenly all at once, some unsympathetic elements in the blogosphere would jump on the aggregate exposure in a way they never would have if the issues had come up piecemeal in the normal course of development.

The importance of avoiding a needless exposure event is especially true for government code, much more than for the private-sector. Elected officials and those who work for them are understandably sensitive to negative public comments. Even for the most conscientious team, a worrying cloud of uncertainty will surround everything by the time you’re ready to open up closed code. How can you ever know you’ve got it all cleaned up? You do your best, but you can never be totally sure some hawk-eyed hacker out there won’t spot something embarrassing after the release. The team worries, and worry is an energy drain: it causes them to spend time chasing down ghosts, yet at the same time can cause them to unconsciously avoid steps that might risk revealing real problems.

The Good News

The good news is that these are all unforced errors. A project incurs little or no extra cost by avoiding them in the simplest way possible: by running the project in the open from Day One.

“In the open” means the following things are publicly accessible, in standard formats, from the first day of the project: the code repository, bug tracker, design documents, user documentation, wiki, and developer discussion forums. It also means the code and documentation are placed under an open source license, of course. It also means your team’s day-to-day work takes place in the publicly visible area (except for sensitive configuration data and the like — that of course stays behind your firewall).

“In the open” does not have to mean: allowing strangers to check code into your repository (they’re free to copy it into their own repository, if they want, and work with it there); allowing anyone to file bug reports in your tracker (you’re free to choose your own QA process, and if allowing reports from strangers doesn’t help you, you don’t have to do it); reading and responding to every bug report filed, even if you do allow strangers to file; responding to every question people ask in the forums (even if you moderate them through); reviewing every patch or suggestion posted, when doing so may cost valuable development time; etc.

Think of it this way: you’re open sourcing the code, not your developers’ time. One of those resources is infinite, the other is not. You’ll have to determine whether engaging with outside users and developers makes sense for your project or not. In the long run it usually does, and later posts here will talk about how to do that. But the important thing is, it’s all under your control. Developing in the open does not change your degree of control over the project, it just ensures that everything you do is, by definition, done in a way that’s compatible with being open source. And you get that for free.

(For those who wonder why this only applies to government software: it doesn’t. But businesses sometimes have competitive reasons to stay closed until the first release, even if they intend for the project to be open source in the long run. The topic of for-profit businesses running open source projects is a whole other discussion.)

If you want your software to be on-time, feature-complete, on-budget, and open-source, then just develop it the way you normally would, but with everything open source from the start. You’ll be glad you did.

About Karl Fogel

See http://www.red-bean.com/kfogel for more information.
  • http://learn.theartofjoomla.com Andrew Eddie

    Great article Karl. I also try to practice putting commercial work built on a FOSS base (in my case Joomla!) “out there” from day one if the client is willing as well. Once the project is finished someone might just stumble upon it and find it useful, impossible if it’s just gathering dust on a backup DVD :)

    Really admire your work. Hopefully we’ll bump into each other at conference one day and can share a beer.

  • http://civiccommons.org/ Karl Fogel

    Hey, Andrew — likewise, and thanks for the comment. It’s great that you manage to do this even with private sector work.

    Interestingly, I have sometimes found that the “someone” who stumbles across the work years later is either myself or the client :-) . We may forget, for a time, but the Internet cannot forget.

  • http://www.laurathomson.com/2011/02/being-open/ tech ramblings » Blog Archive » Being Open

    [...] It’s easier to start open than to become open after the fact. However, it can be done – if it couldn’t be done Mozilla [...]

  • http://www.ramoxda.com/checkbook-nyc-advances-civic-open-source/ Checkbook NYC advances civic open source – The most sensational news

    [...] code out there and waiting for cities to pick it up. [1] A topic dear to my heart, as readers of this post [...]