Behlendorf on open source: a leading player in the open source movement discusses Apache, Java, Subversion, CollabNet, and the risks and rewards of open source development

Behlendorf on Open Source - Part 2

The second part of Tim Anderson's interview with Brian Behlendorf, founder and CTO of CollabNet, closely involved with Apache, and a leading member of the open source movement. Part one of the interview is here.

Behlendorf on Open Source

Open Source Copyright and Licensing

Tim: Some people feel that there are projects, and Linux may be an example, which from one perspective have been stolen by some highly commercial entities such as IBM. That raises the question whether the efforts of the open source community have really changed the culture of the industry. Put another way, as a developer why should I contribute my code to help out IBM?

Behlendorf: That's a valid question. I don't think Linux in particular has been co-opted by a particular company, I think you see active development from IBM but you also see active contributions from Intel and from HP, and Linux Torvalds himself still acts as the air traffic controller for patches into the kernel for non-profit.

In this open source risk model or maturity model, if you look at a project and find that over half of the IP [Intellectual Property] contributions are coming from one vendor, I would say that does increase the risk. The vendor may decide to exit the open source strategy around that project, and it could leave the rest of the developers without the momentum that it had before. At the same time the inherent right to fork that exists in any open source license guarantees that if one vendor exits, so long as there is still a critical mass of other developers and other companies with an interest, the project can continue, even if it has to continue somewhere else, under a different name even. The right to fork is an insurance policy for participants.

Tim: Open Office is an interesting case where you actually have to sign over your copyright to Sun in order to contribute your code. Is that something you feel developers should be happy to do?

Behlendorf: The silver lining to the SCO lawsuit is that it caused a lot of people to realise that maybe it's worth doing a bit more of the legal paperwork around code, if it helps to prevent frivolous lawsuits or if it helps us better argue about the provenance of code and why people can trust it. There's been a lot of implicit agreements among open source developers, when they post a patch to a mailing list, that they are making a submission of copyrighted code to the project, to redistribute as part of the code. That's always been an assumed contribution. When the Apache project became a non-profit organization we realised that to create a buffer for any potential legal challenges, there would need to be some paper trail that showed that the developers were actually contributing their code to the Apache project. So we've always had a contributor license agreement. It wasn't an assignment, it was a license. The contributor basically said, along with the code, that I'm giving Apache the right to distribute it under whatever license it wants to.

Sun, with Open Office, decided to go all the way to assignment, based on an understanding of copyright law that says that Sun could not defend the GPL license on it, unless they had copyright ownership. There's many copyright lawyers out there who'll say that unless you have the ownership of the code you can't actually initiate an enforcement action against somebody else. It's something that we're tossing around now in the Apache project. Should we actually be seeking assignment rather than just getting a license from developers? We're currently talking about that. Sun simply decided to go in that direction.

One thing Sun also did not decide to do was to create a non-profit organization around Open Office. That has led people to say, well, if I assign my copyright to Sun they could use my contribution in some other commercial way inside of some other unrelated project. I think the risk of that is low, because usually a contribution to something like Open Office wouldn't be terribly useful to something like a Java application server, but I can understand their trepidation. And there are people out there who've created plug-ins and other things for Open Office that they release themselves and they retain the copyright to under the GPL. There's even some translation work and maybe some documentation like that. And I think that's their right, they can decide whether they want to do that or not. They also realise that if they do that, they don't necessarily get the same community contributions they would get if their contribution was at Openoffice.org.

Tim: Let's say for example that a hostile company were to take over Sun, unlikely though it sounds, and then wanted to be obstructive in the ongoing development of Open Office. What impact would the copyright assignment have in that scenario?

Behlendorf: Well, one thing that they can't do is to rescind the grants that they've made. So in a worst case scenario, if a company takes over Sun, and cancels the Open Office project, that wouldn't prevent everyone out there who had downloaded Open Office from being able to exercise the rights granted to them under the GPL. They can continue the project located somewhere else.

When it comes to patents though, that is a concern. For example, it could be that Microsoft has patents that apply to Open Office. They've been privately spreading some fear, uncertainty and doubt that says that they might, so they could go after an organization that was using Open Office saying that "the product is covered by patents X Y and Z and we want some royalty." I think they would have to think carefully before doing anything like that, because if you do that, you incent a whole lot of people to find prior art or to do some other action that might invalidate the patent that you hold. As we're finding with SCO for example, if you try to sue the whole world, you motivate a lot of people to find contrary evidence to your claims. About half of patents don't actually stand up in court. I think that's why we haven't really seen to this date a major patent challenge yet, against major open source projects. The game theory here favours the open source community rather than the patent holder.

Tim: What are your views on Sun's licensing for Java?

Behlendorf: Sun has long had this perspective that the most important thing to accomplish with Java was "write once, run anywhere". There are many of us who have tried to explain to Sun that the best way to achieve that would be through an open-source license, that by publishing it out there under a license that the community could trust, and understand that the product wouldn't be taken away from them by anyone, by Sun or otherwise, that they would incent a large number of people to fix the bugs that they find. The open source community treats variants from a protocol and variants from a standard or from a spec as a bug to be fixed, whereas the commercial software industry has always looked very sceptically at standards and tried to compete on non-standard implementations, calling their non-standard stuff proprietary advantage. Sun simply had a different philosophy, they said that we have to enforce compliance to these different standards as a contractual requirement. It's simply because of where they came from.

I'm never the kind of guy who goes to companies and dictates the business model to them. So what was important to us at Apache was that rather then try to beat Sun over the head with this open source stick and tell them they needed to release their implementation of the standards under an open source license, we said "make it possible for us, make it possible for the open source community, to build open source implementations of your Java standard, of the standards that come out of the Java community process." We won that about two years ago. We have skirmishes from time to time where it looks like that slips back out of our control, but by and large most standards that come out of the Java Community Process can be implemented by somebody under an open source license. We have many of these kinds of mini projects running at Apache already.

There's a couple of open source packages out there, projects that implement the Java runtime. Open GNU Classpath and Kaffe and things like that. We may see one of the main other providers of Java VMs at some point decide to open source their VM. That almost would be poetic justice, to show Sun that you can have it be open source and still have something that was high performance and highly conformant to the Java standards. I'm the kind of person who prefers to make my point through proofs rather than continuing to argue in theory. That's how I think it plays out.

Tim: So what exactly did Sun do, to enable Apache to provide implementations of those standards?

Behlendorf: There were a couple of changes necessary to the core constitution, if you will, of the Java Community Process. It used to be that there was a requirement that any standard that was published carry with it a license that required that any implementations of those standards pass a TCK (Testing and Conformance Kit) before they could be distributed at all. That doesn't really work with open source communities where you're constantly releasing intermediate versions of projects. So what we did was convince Sun that you could turn it from a term of copyright into a term of trademark. If you implement a certain standard, you can release intermediate versions but you cannot call it certified until you've actually passed the TCK and gotten that official certification. For example, with Geronimo - Geronimo's this J2EE project inside of Apache - we haven't gotten J2EE certification for it yet, so we can't say this is a J2EE conformant server until we get that certification.

Open source and software reusability

Tim: One of the things you've been saying is that open source libraries are a solution to the problem of software reusability. Can you explain the reasoning behind that?

Behlendorf: It's even broader than open source libraries. I think open source has actually figured out the necessary ingredients to true software reusability. It goes like this. There are lots of beliefs that reusability was just a technology problem. If we could build these objects with stable APIs, and abstract away private functions and expose only certain public functions, then we'd have reusability. That didn't quite work. The whole theory around reusability hinged round the idea that you could build components that were like bricks, and you could stack these bricks one on top of another and build complex applications the same way you would build a house or a building or a bridge or something like that.

The problem is, software is not like a brick. Software is squishy. There's no such thing for example as software that doesn't have a defect. When you try to reuse a component in a new context, you're inevitably going to trigger some defect that the original developers didn't know existed. Furthermore, underlying infrastructures are under this constant state of churn. You have a new enterprise platform or a new operating system every couple of years, and you need to put those components in that new environment. So one missing ingredient was that you need to have the source code to a component to really be able to reuse it. You really need to be able to jump in and fix a bug or adapt it to some new environment, and you need the right to be able to do that as well.

Inside most enterprises getting the source code to something written by someone else in the enterprise is not a big controversial thing. The next thing that you need though is access to the community. If all you have is the source code, but you don't have any of the knowledge around how it was built, where it came from, why was it designed this way, you want to be able to tap into the developers who wrote it. You also need to be able to tap into the community of other people trying to reuse that component. Somebody who picked it up and tried to put it into a different environment and ran into some problems, and found a bug that the original developer says is actually a feature. Plugging into that is high value.

The third thing the open source community realised or just tripped over was that you need access to the context of the development. When you have an idea about a better way to do something, you need to be able to look at the archive of the mailing list to understand, whether that idea was considered in the past. Or, if you are seeing some bizarre behaviour, you want somebody to help you understand it. Rather than going and bugging the developers, annoying them with questions that they've answered ten thousand times before, you need to be able to see if those questions have been asked already. You also want to see whether you are using the most current version, and whether there is another bug-fix release coming out or a version 2.0 or whatever.

It's those three things, code, context and community, that I think is the holy grail of reusability. We tried to capture those ideas and those concepts into CollabNet's product. You could think of the project hierarchy inside a CollabNet deployment as very much like that. You've got these components that are spread out among these different teams, you've got access to all of the history of a project, all of the development artefacts, so you can see not just the end result but how it got there. For every project there are mailing lists for the developers, mailing lists for the users of the project, mailing lists for announcements related to the project, so that you can plug into whatever level of involvement you want.

Tim: One of the things that perhaps holds back open source software is documentation, especially when it comes to things like tutorials or explanations of the "why" as well as the "how" when it comes to using a particular library. How can this be addressed?

Behlendorf: There's no doubt that it's something that hinders a lot of more widespread use of open source software, where the software might actually be plenty mature for people to use but there's not enough supporting documentation to cross that chasm. The reason for it is fairly clear. Often the developers who work on an open source project are there to solve their own immediate problems. Some of the more enlightened ones realise that solving that problem involves bringing more people to the project and that bringing more people to the project involves having these kinds of intermediate layers of supporting documentation, even ways to use the code in easier ways. I think it depends on the nature of the project itself and the developers, whether they value the importance of end-user documentation or not.

Companies like O'Reilly provide a financial incentive to developers to create high quality documentation, that's a really good thing. That has helped. With Subversion we realised early on that we would need to do something, so when O'Reilly realised that hey, this project had legs and let's do a book around it, CollabNet said, let's have some of our own developers work on that book. We gave them the time and the space to do that, even though we had some very pressing deadlines and technology feature needs. Fortunately O'Reilly was pretty comfortable with the idea that the book would be itself part of the open source project.

Any time somebody mentions a limitation about open source, I turn around and say, well that's a business opportunity. Everybody wonders how are people going to make money with open source, and some open source projects sell the documentation. Or the developers go off and write books and get paid for that involvement, or they start consulting companies, or service companies. So yes, it's a limitation, but it's an opportunity too.

Tim Anderson's ITWriting

Behlendorf on open source

Behlendorf interview

Want to reproduce this article?

Behlendorf on Open Source - Part 2

Behlendorf on Open Source

Open Source Copyright and Licensing

Open source and software reusability

Click here for part 1 of this interview