Anthropic launches Claude Sonnet 4.5 with Agentic AI and coding upgrades
Anthropic has unveiled Claude Sonnet 4.5, citing 30‑hour autonomous runs, stronger computer‑use skills and new state‑of‑the‑art coding results. Early reports also note a jump on OS‑level tasks and bundled tools for building agents on Claude Code.
Anthropic has introduced Claude Sonnet 4.5, positioning its latest model as a step change for long‑running “agentic” work and software development.
In internal and early‑access trials, the model reportedly operated autonomously for around 30 hours on a single brief—up from seven hours on earlier Claude 4 models—and has delivered leading results in computer‑use and coding evaluations, according to the company and early reporting.
Stronger at long, multi‑step work on real computers
Anthropic has said Sonnet 4.5 is its strongest model yet for using a computer end‑to‑end, navigating the web, clicking, typing and handling files to complete multi‑step tasks.
In one internal test described to reporters, the model has run for about 30 hours to build a chat app of roughly 11,000 lines of code without human intervention. The company has also claimed the model is now “best in the world” for real‑world agents, coding and computer use.
On an operating‑system dexterity benchmark, Anthropic has reported a score of about 60%, a marked improvement on prior Claude models that scored roughly 40% on similar tasks.
What’s new for developers
- Paired releases: Anthropic has bundled Sonnet 4.5 with updated building blocks—virtual machines, memory and multi‑agent support—so teams can assemble their own agents on top of the Claude Code tooling.
- Focus sectors: Early partners have highlighted uses in cybersecurity, finance and technical research; design platform Canva has praised its performance on complex, in‑product engineering work.
- Availability: Anthropic has made Sonnet 4.5 available to Claude users from Monday, with an enterprise push backed by investors Amazon and Alphabet.
Agentic and coding performance
Media briefings emphasised two advances: sustained autonomy and higher reliability when acting through a computer.
Beyond the 30‑hour run, Anthropic has pointed to new state‑of‑the‑art results on leading software engineering evaluations, including SWE‑bench, reinforcing its pitch that Claude is ready for production agent workflows.
Sonnet 4.5 reportedly outperformed previous Claude releases on real‑repo bug‑fixing and on simulated desktop tasks, while tripling last year’s computer‑navigation proficiency. It has also been demonstrated building complete applications under its own steam, suggesting more robust scaffolding for long jobs than prior models.
Why it matters
Agentic reliability, which refers to keeping a model on task across hours while it searches, edits code, manipulates files and validates results, has been a limiting factor for practical AI software agents.
By extending continuous run‑time and improving OS‑level control, Sonnet 4.5 has pushed the category forward for enterprise use, particularly in regulated fields where reproducibility and auditability are central.


