AWS scales AI stack with new chips, models, and agents
AWS's full-stack AI strategy integrates Trainium3 compute, the Nova 2 model family, autonomous frontier agents, AgentCore governance, and sovereign AI Factories.
Modern artificial intelligence (AI) is advancing along three fronts: the highly specialised hardware that powers training and inference, the sophistication of the frontier models themselves, and the software layers necessary for governing autonomous, enterprise-scale applications.
Speaking to a packed room of tech enthusiasts at AWS re:Invent 2025 in Las Vegas on December 2, Amazon Web Services (AWS) Chief Executive Officer Matt Garman unveiled a comprehensive set of AI innovations spanning all three domains during his keynote address.
Foundation of compute
The pursuit of efficient, massive-scale AI necessitates continuous advancements in custom silicon. AWS is heavily invested in its custom accelerator technology, exemplified by the latest Trainium3 UltraServers, which are now available. These servers form the backbone of AWS’s strategy to deliver strong performance for AI training and inference.
Trainium3 UltraServers are built around AWS’s first AI chip developed using 3nm technology. A single integrated system can incorporate up to 144 Trainium3 chips. This translates into performance enhancements, providing up to 4.4 times more compute performance and 4 times greater energy efficiency compared to the previous generation, Trainium2 UltraServers.
These servers deliver up to 362 FP8 PFLOPs. Such significant gains directly address the escalating demands of model complexity, allowing organisations to cut lengthy model training times from months down to weeks.
In performance testing using the open weight model GPT-OSS, Trn3 UltraServers demonstrated 3 times higher throughput per chip and 4 times faster response times than their predecessors.
The commitment to hardware advancement continues with the planned Trainium4. This next-generation chip is being designed to deliver substantial performance improvements, including at least 6 times the processing performance (FP4), 3 times the FP8 performance, and 4 times more memory bandwidth.
Trainium4 is also being designed to support the integration of NVIDIA NVLink Fusion high-speed chip interconnect technology. This future integration aims to allow Trainium4, Graviton, and Elastic Fabric Adapter (EFA) to operate seamlessly together within common MGX racks, establishing a flexible, cost-effective, rack-scale AI infrastructure supporting both GPU and Trainium servers.
In the broader context of the AI hardware market, AWS maintains a strategy of offering both proprietary and third-party solutions. AWS supports the latest NVIDIA Grace Blackwell and the next-generation NVIDIA Vera Rubin platforms. The partnership between AWS and NVIDIA dates back 15 years.
Frontier models
Beyond compute infrastructure, AWS has introduced the Nova 2 family of models, reinforcing its foundational model portfolio and providing a competitive price performance across critical AI capabilities.
The model family includes four distinct offerings designed for different enterprise needs.
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads. It possesses multimodal capabilities, processing text, images, and videos to generate text. Users can adjust the depth of the model's step-by-step thinking to balance intelligence with speed and cost. Nova 2 Lite demonstrates strong performance metrics, benchmarking equal or better than Claude Haiku 4.5 on 13 out of 15 benchmarks, GPT-5 Mini on 11 out of 17, and Gemini Flash 2.5 on 14 out of 18.
Nova 2 Pro is positioned as Amazon's most intelligent reasoning model. It is built for highly complex tasks such as agentic coding, long-range planning, and sophisticated problem-solving where maximum accuracy is essential. Nova 2 Pro is multimodal, processing text, images, video, and speech to generate text. It can also act as a “teacher” for knowledge distillation into smaller, domain-specific models. Its performance is competitive, being equal or better than Claude Sonnet 4.5 on 10 out of 16 benchmarks, GPT-5.1 on 8 out of 16, Gemini 2.5 Pro on 15 out of 19, and Gemini 3 Pro Preview on 8 out of 18 benchmarks.
Both Nova 2 Lite and Nova 2 Pro feature built-in web grounding and code execution to anchor responses in current facts.
Nova 2 Sonic is Amazon’s specialised speech-to-speech offering. It unifies text and speech understanding and generation for real-time, human-like conversational AI. Key features include high accuracy, expanded multilingual support with expressive voices, and a substantial one-million token context window for sustained interactions. It is optimised for integration with customer service applications like Amazon Connect.
Nova 2 Omni is a unified multimodal reasoning and generation model that can process text, images, video, and speech inputs while simultaneously generating both text and images. Nova 2 Omni eliminates the complexity and cost of connecting multiple specialised models by handling massive inputs, including up to 750,000 words, hours of audio, long videos, and hundred-page documents.
To facilitate deep customisation, AWS introduced Nova Forge, which pioneers “open training”. This service allows organisations to create their own optimised model variants, called “Novellas”.
Nova Forge grants exclusive access to pre-trained, mid-trained, and post-trained Nova model checkpoints, enabling customers to blend their proprietary data with Amazon Nova-curated datasets at every training stage. It also offers tools for training AI using simulated scenarios and generating smaller models via synthetic data distillation.
For building highly reliable agents, Nova Act is a new service for building and deploying AI agents that automate browser-based UI automation workflows. Powered by a custom Nova 2 Lite model trained through reinforcement learning, Nova Act achieves a breakthrough 90% reliability on early customer workflows.
Autonomous operations
Building on model capabilities, AWS unveiled frontier agents, a new class of sophisticated AI agents designed to function as an extension of a software development team. These agents are defined by being autonomous (figuring out how to achieve a goal), scalable (performing multiple simultaneous tasks), and independent (operating for hours or days without constant human intervention).
Three such agents were announced, focused on transforming the software development lifecycle.
Kiro autonomous agent acts as a virtual developer. It maintains persistent context across sessions and continuously learns from pull requests and feedback. It can handle tasks ranging from triaging bugs to improving code coverage, with changes potentially spanning multiple repositories. Kiro integrates with existing team tools like Jira, GitHub, and Slack to maintain context.
AWS Security Agent functions as a virtual security engineer. It embeds deep security expertise throughout development, proactively reviewing design documents and scanning pull requests against specific organisational security requirements. Crucially, it transforms penetration testing into an on-demand capability that matches development speed. The agent returns validated findings alongside remediation code to fix issues.
AWS DevOps Agent aims to deliver continuous operational improvement. It instantly responds to incidents, using its knowledge of the application and component relationships to find the root cause. It learns from diverse resources, including code repositories, CI/CD pipelines, and observability tools such as Amazon CloudWatch, Dynatrace, Datadog, New Relic, and Splunk.
Governing autonomy
The increasing autonomy of AI agents necessitates strong governance and control mechanisms. Amazon Bedrock AgentCore is the underlying platform designed to help developers securely build, deploy, and scale production-ready AI agents.
To ensure agents operate within defined corporate rules, AWS launched Policy in Amazon Bedrock AgentCore in preview. This feature allows development teams to establish clear boundaries for agent actions using natural language instead of complex policy code. Policy actively blocks unauthorised actions through real-time, deterministic controls that function outside of the agent’s code.
For quality and performance assurance, AgentCore Evaluations simplifies the monitoring of agent behaviour. It offers 13 pre-built evaluators for common quality dimensions, including correctness, helpfulness, safety, and tool selection accuracy. The service continuously samples live agent interactions to analyse performance against predefined criteria, triggering alerts when performance drops.
Furthermore, AgentCore Memory introduced new episodic functionality, allowing agents to learn from past experiences by capturing structured episodes of context, reasoning, and actions, improving decision-making over time. This targeted learning helps agents apply insights to future interactions.
Localising AI infra
Recognising that large enterprises and governments often face challenges regarding data sovereignty and regulatory compliance, AWS announced AWS AI Factories. This offering transforms existing customer data centres into high-performance, dedicated AI environments.
AWS AI Factories provide dedicated AWS AI infrastructure deployed directly within the customer's own data centres, operated exclusively for them. This architecture allows organisations to utilise their existing data centre space, network connectivity, and power capacity. AWS assumes the complexity of deploying and managing this integrated infrastructure.
The infrastructure integrates components, including the latest NVIDIA accelerated computing platforms, Trainium chips, high-speed networking, storage, databases, and security. It also encompasses comprehensive AI services like Amazon Bedrock and SageMaker AI.
Functionally, an AWS AI Factory operates like a private AWS Region, providing secure, low-latency access to services. This solution significantly accelerates deployment timelines and helps regulated industries meet requirements regarding where data is processed and stored.


