Skip to content

Search the site

As CIO budgets tighten, Apache Superset looks like a Big Data winner 

Free data exploration and visualisation tool continues to gain traction

As CIOs look to make budgets go further – whilst coming under pressure to deliver data-led innovation -- Apache Superset is becoming an increasingly important toolkit in many enterprise IT leaders' arsenals.

The free, open source data visualisation and data exploration platform was born at Airbnb, where it serves 600+ daily active users viewing 100K+ charts a day (a user base that would cost ~$300k in licenses for common SaaS alternatives like Tableau.) Other blue chip users include American Express, Netflix, and Yahoo!

Apache Superset provides massively scalable, free data visualisation

What is Apache Superset? Apache Superset hosting, data viz

Apache Superset is an open source web application (built on Python) that lets enterprise users (e.g. business analysts or data scientists et al) connect to live data then create charts and dashboards based on it.

It has a no-code front-end, i.e. no actual Python is needed to use it.

It is well maintained, intuitive to use and comes with out-of-the-box support for a wealth of databases, from Amazon Redshift to Exasol, MySQL, PostgreSQL, SQL Server, teradata and beyond: “Pretty much any databases that have a SQLAlchemy integration should work perfectly fine” as its community of committers say.

(To be clear, Superset itself doesn't have a storage layer to store your data. It pairs with your data store.

Apache Superset supported databases
Pick any of these or more...

Why Apache Superset was created

Apache Superset was created at Airbnb by Maxime Beauchemine, who has written that his “main driver to start the project at the time was the fact that Tableau (our main data visualization solution at the time) couldn’t connect natively to Apache Druid and Trino / Presto, our data engines of choice... With Tableau’s ‘Live Mode’ misbehaving in intricate ways at the time (I won’t get into this!), we were steered towards using Tableau Extracts [which] crumbled under the data volumes we had at Airbnb, creating a whole lot of challenges.

“Secondarily”, he noted in 2021, “we had a limited number of licenses for Tableau, and generally had an order of magnitude more employees that wanted/needed access to our internal than our contract allowed…”

(Hell hath no creative fury like a clever developer running up against a budget for limited seats…)

What is Apache Superset and what can I do with it?

Apache Superset provides a no-code interface for building charts; an API for programmatic customisation; a web-based SQL editor for advanced querying; a lightweight semantic layer for defining custom metrics; a range of visual templates from bar charts to geospatial visualisations; a cloud-native architecture and highly extensible security and authentication options for wary administrators. With an Apache 2.0 licence, it’s free to use.

Apache Superset is the main data exploration tool at Dropbox, which consolidated 10 tools to Superset and has written in some detail about that process; saying it prioritised security first (given the access the tool has to critical data stores), then "a shallow learning curve, good documentation, and good support"; all of which it found.

With Tableau Explorer costing from $42/user per month and other enterprise options also not cheap, companies looking to democratise data visualisation with large numbers of licenses can find it gets costly fast.

Yes, you will need to self-host if you want to keep Apache Superset free, but it is cloud-native, works well in containers and there are hosted versions available – including one from its creator, over at Preset, at $20/month per user (unlimited users) with multi-region support, role-based access controls, single sign on etc.

The project is “cloud-native”, Superset emphasises, in that you can choose the “web server (Gunicorn, Nginx, Apache); metadata database engine (MySQL, Postgres, MariaDB, etc); message queue (Redis, RabbitMQ, SQS, etc); results backend (S3, Redis, Memcached, etc); caching layer (Memcached, Redis, etc).

Whilst for smaller enterprises happy with their Tableau or PowerBI it may not be worth the shift, but for larger enterprises keen to give growing analyst or data science/engineer teams access to visualisation and data exploration tools, Apache Superset is a very compelling option indeed. You can dig deeper here.

See also: Bloomberg open sources "Memray" tool for Python apps