SPONSORED - Cassandra, Kafka, Pulsar, Spark, Kubernetes: all open source projects that have lively communities, and they underpin enterprise data infrastructure at some of the world’s biggest organisations.
The distributed database Apache Cassandra helps Netflix deliver the latest hit film to millions of screens; Goldman Sachs describes event streaming platform Apache Kafka as the backbone in its architecture; distributed streaming platform Apache Pulsar helps China’s Tencent process tens of billions of financial transactions; unified analytics engine Apache Spark is used by Amazon and NASA to analyse colossal datasets. Want to build a powerful data infrastructure? The open source software (OSS) community has built the tools.
For companies looking at how to develop their approaches around data, open source projects can also be combined to meet broader business and technology goals than any one project can address on its own. They can, in effect, create a powerful stack for data processing. One of the companies involved in the OSS community, DataStax, has spent the past year developing its own strategy around many of these projects to help enterprises understand - and ultimately capitalise on - this new stack approach.
“We found the balance”
DataStax has been on its own journey around this too. In late 2019 the company reset both its enterprise strategy and its relationship with the open source Apache Cassandra community -- a new CEO, Chet Kapoor, was brought in from Google with outgoing leadership admitting the company needed a “more ambitious, bold approach.”
The change over the past 18 months has been dramatic. DataStax has doubled down on giving back to the open source community via a range of projects designed to democratise developer access to Cassandra – arguably the world’s most scalable and resilient distributed database – and focused on reducing the cost and complexity of deploying Cassandra in the enterprise. It has also made strategic acquisitions, e.g. of Kesque, with the aim of becoming the go-to partner for enterprises wanting to adopt an open, multi-cloud stack that has been purpose-built for modern data applications.
The strategy, in short: to provide an open, developer-ready data stack that is not just a database. This stack should be able to run on-premises or in the cloud, but not locked in to it. It should run across multiple clouds and support Kubernetes. And it should be powered and informed by a lively, supported open source community.
Patrick McFadin, vice president developer relations at DataStax, remembers that shift to focus anew on OSS. As he puts it: “In my first meeting with Chet, he said 'I want you to work on re-engaging the open source community. We need to figure out how to get this right.’ It was a case of ‘drop everything else.’ That was one of my best days; it felt like being given a blank cheque.”
Since then, McFadin and his team have supported the donation of code to the Apache Cassandra project as well as launching innovative projects that are available to the community. It’s developed and open sourced K8ssandra, a project created to help developers deploy Apache Cassandra on Kubernetes, which is now central to most teams deploying, scaling, and managing containerised applications. It’s open sourced Stargate -- a data gateway that sits between applications and the underlying database so that developers can use JSON, REST, and GraphQL API extensions without having to learn the Cassandra specific query (CQL) language and advanced data modelling/query design.
And it’s taken Cassandra serverless with Astra – not open sourced yet, but it will be this year, the company promises. (“We don’t want to just do a code-dump”, says McFadin.) Why that focus on building bridges and building up the fully open source offering?
It’s a question the company gets asked a lot, and Ed Anuff, DataStax’s Chief Product Officer, has a ready answer: “When you look at Cassandra, it is the best of the internet coming together. We like the aspect of working together with smart people based in other companies, and we can solve problems that could not be solved in any other way. We believe it delivers technology that is better suited to customers, ande are committed to building on top of the open source community, to contributing back, and making Cassandraeasier. We are not going to hide that behind a paywall - we want more developers to choose Cassandra.”
Anuff says that the company has taken the same approach with its other open source projects as well: “We created Stargate as an open source project, so anyone can use it. It was important that this was available as open source, as otherwise it will fragment the community. We open sourced K8ssandra, which lets Cassandra operate on top of Kubernetes.”
But why this investment in the community and open source, at a time when other firms are stepping away from open source or adopting licenses that make it more difficult for other companies to adopt projects for cloud? Anuff counters that this is also about self-interest as well as altruism: “Why do we make this commitment to open source? We believe in this as it provides more insight and expertise, but that is not the only reason. DataStax succeeds when Cassandra succeeds. If we don’t support that community, help it to thrive and be successful, then our company won’t succeed. We need that engaged and successful community.”
DataStax's enterprise strategy shifts to deeper, broader data infrastructure offering
However, that move to more community focus was not the only goal. Instead, the bigger picture enterprise strategy shift has been to reset DataStax as a more integrated company that looked at data differently to other companies. “Kubernetes + data,” says McFadin.
Kubernetes’ importance as the de facto standard for orchestrating cloud-native applications has been widely noted. Target CIO Mike McNamara has been among the digital leaders recently evangelising its centrality to avoiding vendor lock-in: his team has built an architecture that lets them fluidly run ecommerce and in-store workloads on three different platforms as needed, with the overall aim, as he put it, of abstracting infrastructure “… away from application developers. So instead of an application developer worrying about how many cores they need, how much memory they need, how much heap space they need, anything like that, I wanted them to focus on writing features and functions.”
DataStax saw this trend early, according to its team. “Cassandra always had a nice match with Kubernetes,” muses Anuff. “That whole idea of the underlying hardware being commodity; nodes can just die and rejoin the cluster; that was one of the key things that set Cassandra apart, and a similar philosophy underlies Kubernetes. Today, Kubernetes has been chosen as how we all scale - it is the operating system for the data centre. So we have to connect into this as well.”
However, it’s not just about a happy marriage with Kubernetes: “A lot of people think they know Cassandra, but they don’t. If you look at the open source ecosystem, what is taking place and where we are the most active, you have Cassandra 4.0 approaching general availability. If you look at how Java deals with memory, there are lots of developments there that Cassandra can take advantage of. We are looking at benchmarks that are 20 % faster. Cassandra is heading in lots of great directions, and the development work the community has been doing has pushed the project on leaps and bounds.”
CEO Chet Kapoor describes the focus as one on delivering a comprehensive stack that supports cloud-native apps. “Because the amount of data we're generating is going to increase, the diversity of data that we're generating is going to increase; it's going to move to the edge, it's going to move more into our mobile devices, there's going to be applications we can't even dream of yet. And we want to be able to support them. And we are seeing a lot of our customers using Cassandra with something like Kafka or Pulsar together to build an application data infrastructure. We’ve made phenomenal progress, bringing together experts in APIs, in cloud, in data… [and] extending beyond the database layer.”
“What we have seen is the creation of all these things because people want to build new apps, powered by data,” continues Anuff. “They want to drive engagement at the point of engagement through mobile, websites and applications. These apps are more sticky for customers. Digital transformation means that all companies can have those rich, data-driven conversations with customers, connecting to all the data that is relevant.”
“When we talk about things in this way, it is high-level CIO speak - from a developer perspective, it means gathering and using data in your applications. There are a whole new set of technologies coming up for this as well. And if you look at eCommerce, mobile, the requirement today is scale. When developers are putting these apps together, they aren’t thinking about a database in particular, they are thinking about how to achieve what they want around data. For us, this is an opportunity to make it very easy for enterprises and users to build more sophisticated data infrastructures with the open source technologies that are emerging, and there is no database that can support that better than Cassandra,” he adds.
CEO Chet Kapoor sums this approach up as providing the full approach that companies will need in future, being able to keep up with data while still helping those enterprises avoid lock-in. “DataStax is delivering the open, multi-cloud stack purpose-built for modern data apps. It’s serverless, separating storage and compute; it has simple and powerful APIs, a Kubernetes-based unified control plane, and enterprise security and governance; and all built on an open source core.”
“And we’re just getting started.”