Google says its framework for local AI inference, LiteRT, is getting closer to being able to run agentic workflows on edge devices cheaply, using the most popular open-source models.

Google rebranded its runtime system for local AI, TensorFlow Lite, as LiteRT in 2024 because it said at the time, the runtime was evolving beyond prior TensorFlow models.

With LiteRT, Google said it was not building a platform for one AI model, but creating a way for users to deploy the model of their choice on the edge device of their choice simply, without the friction common in cross-platform workflows.

Fast forward and – albeit with some bits still in preview – the Alphabet company says it is just about there, complete with agentic capabilities and even some NPU acceleration.

Oh, and it beats the pants off Meta's Llama.

Get the full story: Subscribe for free

Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events.

Subscribe now

Already a member? Sign in