DeepSeek unveils ‘sparse attention’ model that cuts API costs by 50%
DeepSeek has introduced an experimental V3.2‑Exp model featuring “DeepSeek Sparse Attention” and has said it has reduced API pricing by more than 50%, aiming to make long‑context AI cheaper and more efficient for developers.
Chinese AI company DeepSeek has released an experimental large language model with a new “DeepSeek Sparse Attention” mechanism and has said it has reduced its API pricing by “50%+,” in a move aimed at improving long‑context efficiency while lowering developers’ bills.
The V3.2‑Exp model was announced on 29 September 2025 and described as an “intermediate step” towards the firm’s next‑generation architecture.
What is ‘sparse attention’?
Sparse attention techniques have been designed to process only the most relevant tokens rather than every token in a sequence, easing the computational and memory load of standard Transformer attention.
In recent research co‑authored by DeepSeek and academic partners, a “Native Sparse Attention” approach has been presented that is trainable end‑to‑end and aligned with modern hardware, reporting notable speed‑ups on long contexts while maintaining accuracy across general and long‑context benchmarks.
DeepSeek has said the V3.2‑Exp release is more efficient to train and better at handling long sequences than previous iterations, with the new sparse attention mechanism intended to reduce computing costs and lift certain performance metrics.
The company has framed the release as an experimental waypoint en route to its next major architecture.
Pricing has been cut for developers
Alongside the model announcement, DeepSeek has stated it has lowered API prices by more than half. The company has previously experimented with pricing levers, including off‑peak discounts of up to 75% introduced in February 2025, signalling a continued push on cost competitiveness.
The firm also announced the model on developer platforms including Hugging Face.


