Snowflake is launching what amounts to a managed, secure Kubernetes service* that will allow customers to run containerised applications inside its data platform for the first time – massively expanding the scope of workloads and third-party tools that can be brought to customers’ data.
"Snowpark Container Services" is essentially a runtime option with GPU support that lets users run container images in Snowflake so that customers can take ML models or LLMs and let them run where their governed data is. (Behind the scenes, it's a managed K8s cluster.)
“You can take any container that you have, give it to Snowflake, and we'll run it” as Snowflake Director of Product Jeff Hollan puts it simply. A host of companies supporting cutting edge data analytics and artificial intelligence application-building workflows are signed up to participate.
The service will abstract away all the complexity of managing platform and resource considerations from those seeking to get ML or AI deployments built, and let them run on data hosted "inside" Snowflake. The move is a significant one that opens up the potential for customers to bring more operational data as well as analytical data to the company, which is rapidly moving beyond its original data analytics warehouse raison d'être.
“Snowpark Container Services” is currently in private preview. It will provide users with the “flexibility to build in any programming language and deploy on broader infrastructure choices, including the NVIDIA AI platform for optimized acceleration” the company said on June 27; the latter the result of a new partnership with Nvidia, announced on Monday.
That’s good news for those wanting to run AI workloads on Snowflake without exposing proprietary data; a major step towards the company’s vision of being a truly multifaceted data platform that lets data science workloads and applications themselves read and write to its data warehouse, rather than a Wild West of applications piping data out and into different services and/or governance and compliance regimes.
Earlier adopters in the private preview include Alteryx, Astronomer, Dataiku, Hex and SAS. Others in the private preview and gearing up to deliver their products and services to customers via Snowpark Container Services include AI21 Labs, Amplitude, geospatial and location data specialist CARTO, H2O.ai, Kumo AI, vector database Pinecone, RelationalAI, Weights & Biases, and more, Snowflake said on June 27.
Snowflake and Kubernetes? "A very flexible way to..."
Snowflake co-founder Benoit Dageville told The Stack in a Q&A session at its annual Summit in Las Vegas: "This is a very flexible way to deploy containerised code inside Snowflake. You can create a compute pool and specify which type of compute you want to run these services.. [before we take it to public preview] we want to make it very seamless for our partners; security and governance have to play well... We [want to] guarantee that even if you deploy bad code you cannot exfiltrate data."
The company has been making a major if somewhat belated play for data science workloads. Its Snowpark (a way for developers to write code in their preferred language and run that code directly on Snowflake) for Python, released in 2022, lets customers run Python directly in Snowflake for example. The new Snowpark Container Service aims to put a rocket under that drive to attract developers and data scientists onto Snowflake.
Why is Kubernetes good for AI/ML?
As Lee Hylton, CTO at Blue Sentry Cloud has put it tidily: "When we look at ML deployments, there are a ton of different platform and resource considerations to manage, and CI/CD teams are often managing all of these resources across a variety of different microservices (i.e., Docker containers)— while simultaneously dumping code into Git and deploying regularly. It’s a nightmare.
"On the infrastructure side, things aren’t much better. You have all of these different environments, systems, and stages (e.g., data ingestion, analysis, transformation, splitting, modeling, training, serving, logging, etc.) to contend with. So, you build out all of these containers, leverage deep learning solutions like TensorFlow, and create these amazing microservices that allow you to embrace the principles of CI/CD. Now you have to manage your avalanche of microservices to enable those machine learning workflows you’ve always dreamed about.
"This is where Kubernetes comes into play. Not only does Kubernetes help you orchestrate microservices at scale (with must-have’s like load balancing, autoscaling, and ingress-management), but it provides superb failover protection to ensure that workloads run smoothly across all microservices. While this is immensely helpful for app development, it can also help you orchestrate complex and scale-ready machine learning workloads using tools like Kubeflow. In other words, Kubernetes abstracts some of the infrastructure layer, allows ML workloads to take advantage of containerized GPUs, and standardize your data source ingestion."
Or as DataStax's Patrick McFadin earlier put it to The Stack: “Kubernetes is the equalizer. It allows users to declare what they need from the parts supplied by cloud providers. It's getting us closer to infrastructure conforming to the application and not the other way around."
The partnership with NVIDIA means that its “AI Enterprise”, which includes over 100 frameworks, pretrained models, and development tools like PyTorch for training and Triton Inference Server for production AI deployments, will all be available via Snowflake Container Services too.
As Snowflake’s Jeff Hollan puts it to The Stack: “It’s ‘here's a container, Snowflake; you run it securely with my data.’ Then we do the heavy lifting. We're managing the cluster, we're doing the updates…it comes bundled with logging, a container registry that is securely hosted... Everyone's cluster is entirely run and governed in their Snowflake account.”
Taking a step back, he adds: “When you come to Snowflake and say ‘I want to run this data query; give me details on my top 100 customers;, what we're doing behind the scenes is – whatever cloud of your choice you’re on – we're going in serverlessly; dynamically spinning up compute resources that happen to be specialised for doing data queries.
“Conceptually, the container thing is running in the same way, the only difference is, you might, you might say, ‘okay, instead of giving me the top 100 customers in a data query’, and we spin up compute resources to go execute this thing. Now you say, ‘oh, I need to go run this machine learning model, or this LLM’; it still goes through the same pipeline; still goes through the same control plane, all of those pieces are the same.
"But now when it grabs that compute, instead of grabbing the specialised query warehouse, it's just grabbing compute that is running Kubernetes.
“That is [often going to be in a ] hybrid mode: ‘Okay, pull in the data processing here from the warehouse, stream that data into your container over here, and those actually work together as part of a single query; behind the scenes, we are leveraging two forms of underlying compute.”
The managed Snowflake Kubernetes service preview comes as the company also took its Native App Framework to public preview on AWS.
It is now available for developers to build and test applications natively inside Snowflake, with the company saying that “this native deployment and distribution model reimagines the traditional approach of copying data to apps, instead bringing the work to the data by enabling apps to run inside an end users’ existing Snowflake account” – with tools like Custom Event Billing (public preview) and on-platform monetization (GA) through Snowflake Marketplace already baked in so that companies can distribute apps without having to set-up cost-intensive billing systems.
Snowflake co-founder Benoit Dageville suggested that getting the Kubernetes/container service to public preview was dependent on really being ironclad on security: “Moving AI inside Snowflake is critical,” he said. "We want to make Snowflake the iPhone for data applications."
Stay tuned for more from Snowflake Summit.
n.b. It took The Stack quite a lot of questioning to get here. The company’s press release does not mention “Kubernetes” once, ostensibly, it seems, because it wants to put a big emphasis on its traditional strengths in user experience and abstracting away complexity; data scientists already have enough complex toolings to work with and adding Kubernetes to the list may frighten the horses and somehow give the assumption that they will need to wrangle with an infrastructure layer, seems to be the thinking <shrug emoji/>