Big data is no longer esoteric; it’s key to much of the modern enterprise. Along with the ubiquity of S3, open source projects like Iceberg simplify storing and accessing data at scale, whether for analytics or agentic AI, but that also requires new tools for querying data at scale like Trino. 

That’s where Starburst comes in. Built by the core development team behind the Trino query engine (and one of the founders of Tabular), it offers an end-to-end data platform that spans massive lake houses built in Iceberg, running on-premises or in multiple cloud, as well as data outside your lake house to run fast queries no matter where your data is stored.

Connectors allow you to link tables from multiple locations – including across clouds. That’s unique, claims CEO Justin Borgman.

“We can reach any data source”

As he tells The Stack: “We can reach any data source. AI is really only as good as the data it has access to and the metadata it has access to, and we have access to everything within the enterprise, everything that you connect to, but also we're able to deploy on premises.”

That’s useful in any industry, but many of Starburst’s customers are in specific verticals like financial services, as well as in healthcare and public sector. What they have in common is the need to bring together various data sources at scale. Starburst’s ability to do this also allows financial services to build hybrid applications and data products inside their regulatory boundaries that can work with cloud data and on-premises data. 

Up to 30% faster 

Because so many of the Trino founders and contributors are at Starburst, they have the expertise to help customers who have begun to experiment with open source data platforms but want much more support, as well as to optimize the open source offering. Borgman says enhancements like improved caching and indexing for repeat queries in the Enterprise release of Starburst make it 15 - 30% faster than the open-source release. 

New features recently added to Starburst support massive AI applications, including vector search: creating a vector index and storing it in your lake house using Iceberg so you can ground AI applications via RAG inside your lake house.

“...that's paying dividends"

Trino’s heritage in the earlier Presto project, used by Facebook, AirBnB, Twitter, and LinkedIn (Trino was originally called PrestoSQL) is evident in its ability to scale. As Borgman points out, “Most technology companies run into the scalability needs much later in life: for us, it was the reverse and that's paying dividends today, because that performance is really proven out.”

The obvious competition is familiar names like Snowflake and Databricks but Borman suggests they can’t match the on-premises support and the ability to federate across multiple data sources. 

Run a Trino cluster where you need it

Starburst doesn’t focus on storage, just querying with SQL. That lets you continue to use services like Amazon and Azure without incurring massive egress charges, because you can run a Trino cluster wherever you need to. If you want to move data you can bring it together into an S3 bucket; but you can also do that virtually, letting Trino handle sending queries and getting data back, including moving data into Iceberg for you automatically if required.

Starburst uses Trino as a massively parallel processing engine, scaling in a cluster. It’s compute-optimised, handling much of its operation in-memory. The clusters scale from two or three machines to two or three hundred, and on up to thousands. At that scale, it’s more efficient to split clusters up, with Starburst tools routing queries to the appropriate cluster as needed. 

Starburst

Interested in learning more? Have a chat to the Starburst team.

Contact Starburst

Borgman compares the platform to a high-performance car. “Starburst adds the turbocharger, the rocketfuel, the elements that make the engine go even faster – that’s our caching technology and some of the other optimizations we've done from a performance perspective. The rest of the car: that's our ingest technology, our role-based access controls, our attribute-based access controls.”

Strong data control is important to Starburst’s customers, especially in Europe, where data sovereignty and privacy are key. They work at a very low, fine-grained, level, offering data masking at a row and column level. “We can ensure the right data is going to the right person or the right application. That's a very popular reason why people choose our platform: the sophistication around these types of access controls.”

Building applications on top of Starburst is as simple as using Trino’s SQL queries. But that’s only one way to use the platform; it’s compatible with business analytics tools like Tableau or Power BI – or even Python for data science users. Developers get access to familiar data protocols, including the venerable ODBC and JDBC. Starburst is also working on building connectors for AI applications, including using the new MCP and Agent2Agent protocols. 

Join peers following The Stack on LinkedIn

It’s already shipping its own natural language data agent, ready for AI that relies on agent to agent communication: “we end up playing this role of the enterprise data agent: perhaps a finance agent, or some other specialized agent, reaches out to our agent to get the data that they need to perform their function.”

Some sophisticated customers already use Starburst to build SaaS “data apps”, calling its APIs as a layer sitting underneath custom analytics tools they can sell to their own customers. Trino’s scale and fast responses allow interactive access to large amounts of data without having to write new queries, applying filters to slice and dice live data to deliver insights users are looking for.

While Starburst currently counts multiple billion dollar companies as customers, it’s planning to bring the same advantages to a wider market: large enterprises in the $100 million space. Organizations who don’t want to build and manage infrastructure can buy the hardware for running a lake house from Dell, with Starburst as the bundled software solution. With many enterprises having long relationships with Dell for their data centers, it’s an effective strategy for Starburst. 

If you prefer to buy direct, you can use their Galaxy SaaS cloud lake house or take an on-premises Enterprise license.

“Snowflake has certainly proven there’s a very large market in that mid-market segment,” Borgman notes: “we'd love to have an opportunity to compete for that as well.”

Delivered in partnership with Starburst. Check out some case studies

The link has been copied!