Google names LiteRT its universal framework for on‑device AI
Company graduates advanced GPU and NPU acceleration into production, promises simpler tooling and better GenAI support.
Google has named LiteRT its universal framework for on-device artificial intelligence, marking a major evolution of the TensorFlow Lite lineage. In a post dated 28 January 2026, the company said advanced acceleration capabilities have graduated into the production stack and are available to all developers. The move aligns with Google’s broader push to run more AI locally on consumer devices, an agenda championed by CEO Sundar Pichai.
Context and continuity for developers
LiteRT began in 2024 as a refreshed runtime for deploying machine learning across phones, laptops and embedded systems. Google positions it as the default path for on-device inference while preserving continuity through the familiar .tflite model format and the interpreter API. For new projects that aim to tap modern accelerators, a CompiledModel API offers a more direct route to GPU and NPU performance without bespoke delegate wiring, according to the company.
Hardware acceleration and performance
The headline upgrades centre on cross-platform GPU support. Google states that LiteRT now spans Android, iOS, macOS, Windows, Linux and the web, with backends that include OpenCL, OpenGL, Metal and WebGPU. The company reports average gains of about 1.4 times over the legacy TensorFlow Lite GPU delegate, helped by asynchronous execution and zero copy buffer interoperability that trims CPU overheads. Sample apps cited by Google show latency reductions for real-time use cases such as background segmentation and speech recognition.
On NPUs, Google has introduced a unified workflow that abstracts vendor-specific SDKs and helps manage the fragmentation across chip variants. LiteRT supports both ahead-of-time compilation and on-device compilation, giving teams a choice between instant start-up and broader portability. Initial production-ready integrations cover MediaTek and Qualcomm. For select workloads, Google’s figures indicate speeds up to 100 times faster than CPU and up to 10 times faster than GPU when using NPU backends. On Android, distribution can be handled through Google Play for On-device AI, which manages delivery of models and runtime components to compatible devices.
How does LiteRT change on-device AI for developers
Practically, LiteRT aims to reduce friction from research to production. PyTorch, TensorFlow and JAX models can be converted into the .tflite format and executed through the runtime, which handles accelerator selection and input-output buffers. For generative AI, Google highlights a stack that includes a Torch Generative API for PyTorch models, LiteRT-LM for orchestration, and the core converter and runtime. The company says LiteRT-LM already powers Gemini Nano deployment in products such as Chrome and Pixel Watch.
Google’s benchmarks emphasise deployment of popular open-weight models. The announcement lists the Gemma family alongside Qwen, Phi and FastVLM as supported options, with pre-converted artefacts available for quick experimentation. In one example, the company reports that on a Samsung Galaxy S25 Ultra, LiteRT outperformed a common community baseline on CPU and GPU, and gained further acceleration on NPU for prefill workloads. While performance varies by device and model size, the aim is consistent: lower latency and lower power for on-device GenAI.


