Over 2000 years ago the Roman historian Tacitus observed that “Victory is claimed by all”—or, as some translations put it, “Success has many fathers.” Open source is similar.
For example, Amazon Web Services (AWS) recently worked with Grafana Labs to build and launch a managed Prometheus service—two companies joining together to deliver the open source monitoring software as a cloud service. Sounds simple, right?
To get to that Prometheus service a number of different companies and open source communities played a part: SoundCloud, which gave birth to Prometheus; Hyperic, the inspiration behind Cortex (by way of Scope); SpringSource, which recognised the business value in monitoring. Oh, and at the center of everything, Weaveworks, the company perhaps best known for GitOps but which also created Cortex.
Buried in the history of Cortex is a lesson in apportioning open source credit. The tl;dr? It’s complicated. It’s also diffuse. And it’s exactly how open source is supposed to work.
Money and open source
In the case of Cortex, it really started with money. Or, rather, the need to make money.
As Weaveworks CEO Alexis Richardson put it in an interview, “I remember having a conversation with Rod [Johnson, SpringSource CEO], and he said his biggest business regret in terms of the path SpringSource had taken was not acquiring Hyperic immediately and letting it grow in value…. That remark burned itself in my head.”
As Richardson tells it, it became clear to him that the key to monetising open source was through monitoring and management, something that eventually led him to leave Pivotal, where he ran the Spring and vFabric products (not to mention others, like RabbitMQ) and start Weaveworks in 2014.
Richardson figured the Docker container world would play out similar to the Spring application market: monitoring and management would matter. This gave rise to Weave Scope, a way of visualising Docker hosts, containers, and services in real time. Scope is a cool open source project. But I’m getting ahead of myself, because Scope didn’t start at Weaveworks. Not really.
While Weaveworks had been thinking through ways to map and visualise containerised applications, they interviewed former SoundCloud engineer Peter Bourgon and discovered he’d been working on the side on some ideas that closely matched their own.
While at SoundCloud, Bourgon teamed up with his friend David Kaltschmidt (then running his own company, Type Type Type) on a monitoring and visualisation solution that caught Richardson’s eye. Both joined Weaveworks, and their early monitoring work became Scope.
Scope was great, but could be better with metrics. Bourgon, fresh from his time at SoundCloud, steeped in Prometheus, agitated for the company to embrace Prometheus as a way to deliver those metrics.
The problem with adding Prometheus to Scope, however, was that someone had to pay to spin up an Amazon EC2 instance every time they wanted to see metrics for their app. It could get pricey. Weaveworks’ Cortex journey started with the need to make money (Prometheus!), and now the company needed to figure out how to save money to make its service palatable (not Prometheus!).
The company needed to try something different. It needed a service that looked like Prometheus but allowed Weaveworks to offer Prometheus-like metrics without incurring the cost of spinning up Prometheus instances whenever someone wanted to look at those metrics.
A different way to spin Prometheus
That “something different” was Cortex, which used a radically different design. Cortex, says Richardson, brings in “a bunch of code from Prometheus, but it’s more like Apache Cassandra in its approach.” That is, it has shared nothing, sharded attributes, with all the data going to different points on the system, without cooperating to get the data stored. The Weaveworks service looked like and could connect to Prometheus, using the same style of metrics, but was architected differently.
Furthering this “Where does open source credit begin and end?” theme, the Weaveworks design team included design lead Tom Wilkie, who is now at Grafana Labs, where he continues his Cortex development (shades of Kelsey Hightower’s “same team; different companies”).
But he came to Weaveworks with significant Cassandra experience, which helped shape the Cortex design. That team also included Julius Volz, the co-founder of Prometheus while at SoundCloud, and a current Weaveworks advisor.
The company founders thought that Prometheus would be a success as Kubernetes took off, and Cortex would enable a SaaS monetisation path. However, the reality is that developing both the technology and the business took time. Talking with Bryan Boreham, Weaveworks Distinguished Engineer, he stressed just how much effort went into guiding Cortex to its current success:
Cortex has taken years and gone through quite extensive rewrites and re-architectures. It’s like the ship of Theseus. The timbers have been replaced a few times. It’s been in continuous production for years at Weaveworks. The basic gear was sound, but in terms of the NoSQL side of it, we’re on schema version 11. We rethought and rethought and rethought and, in fact, have mostly replaced the NoSQL stuff with something patterned more after Prometheus’ own time-series database.
As promising as Cortex was at its inception, it was also noticeably slower than Prometheus, despite running on a “vast, distributed system,” notes Boreham. Weaveworks put in significant work to speed it up (adding caching, Jaeger tracing to spot slowdown points, etc.), and Cortex has run faster than Prometheus for at least the last two years.
Richardson identifies four key elements that boosted the company’s efforts to turn Cortex’s promise into reality. The first is that work has been attached to running a real service for the public.
“Running a Cortex service to provide Prometheus on demand definitely taught us a lot about [how to operate it in the real world],” says Richardson. The second? “It solves a real problem. Doing this in other ways is very difficult.”
Third, Weaveworks and the burgeoning Cortex community, which now included engineers actively participating from DigitalOcean and elsewhere, made a conscious decision to decouple Cortex from its original DynamoDB/Amazon S3 back end. “This made it more useful because it can be run in more places and doesn’t tie you completely to AWS.” And fourth? “The magic of open source.” Weaveworks got Cortex into the CNCF, giving it visibility and much higher traction.
Today that magic means that Weaveworks, still one of the most significant contributors to Cortex, no longer needs to pull all the weight. Grafana Labs is now the largest Cortex contributor (Weaveworks is second), but a host of others have stepped up too, including Red Hat, Robinhood, Splunk, and DigitalOcean. This is a sign of the remarkable “magic” of open source, as well as Weaveworks’ successful catalysing of a true open source community governance model, with no single vendor in control.
This is how open source is supposed to work. Projects like Loki (an open source logging project inspired by Prometheus) or Envoy originate at one company, get further developed at another, and get operationalised and enhanced at many others.
Loki was conceptualised by Tom Wilkie at Weaveworks but developed at Grafana Labs. Envoy was started by Matt Klein at Lyft but Google has become the single largest contributor over the past year. We see the same pattern playing out for many other open source projects.
It’s not a bug. It’s a feature. It’s the clearest indication that you’re doing open source right.