Posco

Some Difficulties with Open Source Software

In 1993 I was at Georgia Tech and first learned of a free version of Unix I could install on my PC. Linux, I had learned from a good friend, was free, if I could find a way to download it. I didn’t find a way, or the time, to install Linux until 3 years later in graduate school. For around 15 years after that point I used virtually exclusively Free Software.

I have always been a bit of a utopian type when it comes to the Internet and software. I protested at Diane Feinstein’s office in 2001 the arrest of Dmitry Sklyarov for violating the DMCA, a law I felt then, and still feel, has unconstitutional limits on freedoms of programmers as it criminalizes actions regardless to whether they are infringements to copyright. I remember finding a local showing of the film Revolution OS. I released the software [1, 2] I wrote during my post-doc work as GPLv2.

When I left academia and entered industry, one of my main interests in joining Twitter was that they were active in releasing code as OSS software, and shortly after I joined I worked on many projects that were successful inside Twitter and enjoyed some level of adoption outside of Twitter, e.g. Scalding, Summingbird, Algebird, Bijection, Chill, Storehaus.

I left Twitter in 2015 to join Stripe, but I have stayed active in OSS, including co-maintaining the bazel scala rules, writing a tool to interop maven repositories with bazel, a pretty printing library for scala, and a DAG rewriting library. Additionally, for a time, I was an active contributor and maintainer of cats.

I think it is fair to say I have a lot of experience with free and open source software.

Recently, I have begun to feel fairly discontented in these efforts. It’s entirely possible I am simply burnt-out, but when I look at this body of work, I am disappointed that very few of the projects I worked on seemed to generate a sustaining project around themselves. I am overwhelmed by the burden of housekeeping for so many projects. I am exhausted on pull-requests making cases for why things should, or should not be merged. I am saddened by discussions which turn to into heated attacks. I think this is a reality faced by many OSS efforts.

Very often you will toil, the thanks will be few and far between, and the stings of the complaints, entitlement of users, fights with other authors, and the drudgery of maintaining and publishing usable software will wear you down. While there have been times of active collaboration, it too often feels like infighting more than achieving a shared goal as a team.

I got interested in OSS for the intellectual appeal: people learning and sharing what they had learned; for the promise of efficiency: marginal cost of software is zero, so we can change the world with Free Software; for the camaraderie: working with other great engineers towards shared goals; advancing the state of the art of software engineering. The reality hasn’t quite met with my hopes.

I’m not the only person, by any means, to express concerns about OSS.

Chris Wensel is the author of Cascading, an early and influential library in the big data space (and the key enabler for the Scalding project, on which I’ve worked so much). Chris has many tweets expressing the challenges of sustainable OSS development, and how our current system has very skewed incentives.

Wes McKinney, well known for his hugely impactful work on the Pandas library for data analysis with Python, has also been thinking about how to sustain OSS python development. His tweets express many frustrations with the funding and staffing of OSS development.

There are many pains associated with developing and maintaining open source software. The above tweets and links can give you a start in hearing first hand some of the challenges. A proper catalog of discontent would be its own essay, but a short summary is that there is a lot of drudgery, not as much help as is imagined, prone to insider skirmishes, usually thankless, and sometimes abusive.

A recent security issue with the Node.js event-stream module underscores an important point. An exhausted maintainer transferred a module to a malicious actor who used it to run code on perhaps millions of web browsers. Nothing prevents this from happening on a daily basis. I’m concerned that our current model is not sustainable except with a constant supply of idealists through which we must burn. Enough people are intrinsically motivated to write software that OSS can exist, but what is the shelf life of that contribution?

The Free Software movement which attracted me in 90s has largely won. Not all software is Free/OSS but key infrastructure is: virtually all the most commonly used programming languages, the core of the top web browsers, key projects like Linux, Hadoop, Spark, Kubernetes, Envoy, Kafka, to name a few relevant to my work. Even Microsoft is in on the game, 2017 Microsoft was the top contributor on GitHub. And in 2018 Microsoft bought GitHub which serves a vast library of mostly freely reusable code.

Yet we still struggle to maintain software, to fund key software infrastructure, and to find effective processes to work together. A famous example was the 2015 headline: “The World’s Email Encryption Software Relies on One Guy, Who is Going Broke”. After that article, that author got some support for a few years. This is not a repeatable process. Even if we could publish many such articles, I imagine they would have a smaller and smaller effect.

Perhaps we have still not sorted out if independent OSS is viable at a large scale, but only survives in fits and starts with corporate support. Looking back at the top 2017 Github contributors, the top 5 were Microsoft, Google, Redhat, IBM and Intel. Corporate OSS is moving forward, but in my experience isn’t very close to the model of wide collaboration that we envision. Generally, the corporate model is prone to the over-the-wall model, where a primary backer is sharing code, but outside contributors are second-class citizens, and the friction is much higher for contributors outside the sponsor company.

Some will argue there is no problem. People are going to be frustrated and quit, but that’s a natural part of the cycle. I disagree with that view. I think we would all be better off, more productive, and more prosperous if we could find more effective ways of working together.

It is easy to imagine that optimally collaborating on shared infrastructure would benefit everyone, but we face a prisoners dilemma type problem: withholding some or all of your development effort in the hopes that others will do the work seems rational if defectors of this sort are not punished at all. I don’t think this defection from the shared effort is actually rational when long enough time-scales (say a year or more) are taken into account, but many businesses are run with a huge focus on quarterly performance.

As an example of this seeming irrationality or market failure, it was only 2017 that NumPy received direct funding. NumPy is one of the most popular libraries for Python, which is one of the most popular programming languages. NumFocus has an excellent article on how NumPy was built on the personal sacrifices of a few, and that we don’t currently have a sustainable model of OSS development.

Some are working towards new economic models for OSS. Tidelift is one of the most interesting in this space. Acknowledging the above and other challenges to OSS, they have built a subscription model. Companies that depend on OSS can easily fund the development of the software they use and receive security advisories for their dependencies. Tidelift uses the subscription fees to directly fund the projects. Others have taken this approach directly on Patreon (e.g. Li Haoyi) or using OpenCollective (e.g. Fody)

While it may be the burn-out talking, without some change, I am losing sight of the meaning in the struggle. Once stripped of the meaning, OSS is merely unpaid labor, or a never-ending set of homework problems. Yet perhaps my attachment to my idea of OSS is the actual source of my suffering. People, and these processes, are what they are. Looking for utopian results from a human process is perhaps foolish. I don’t have strong answers at the moment. I’ll be thinking about it more. I think everyone in the OSS community should think about what we should strive towards for the next 20 years. I do know, for me, it is time I reset my expectations and reexamine my motivations.