Claude 3.5 Sonnet: Anthropic's best AI surpasses GPT-4o

Did just Anthropic pull an uno reverse card? Meet Claude 3.5 Sonnet which is said to be their "most intelligent model yet"!

Monday June 24, 2024 , 4 min Read

Anthropic has made waves with the release of its best-ever AI model. After Anthropic unveiled its latest AI chatbot, there was immediate buzz online claiming that it surpassed the viral GPT-40. Although the company claims it is its "most intelligent AI model," the question remains: does it outperform competitors like OpenAI and Google? Let's explore its features in detail to determine how much better it is compared to other AI chatbots!

Meet Anthropic's Claude 3.5 Sonnet

Anthropic has debuted its latest chatbot, Claude 3.5 Sonnet, which is designed to outperform its previous top-tier model, Claude 3 Opus. Right now, this exciting new AI model is available for free on Claude.ai and the Claude iOS app. Apart from that, users can access Claude 3.5 Sonnet through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Key features of Claude 3.5 Sonnet

Now let's explore why is this chatbot gaining popularity and how it outsmarts OpenAI's GPT-4o!

1. High speed

Previously, Claude Opus held the crown for being the fastest AI LLM is now beaten. Anthropic's new version of Claude Sonnet 3.5 operates 2 times faster than Opus but that is not all. This boost in performance and cost efficiency has placed this new AI model to complete complex tasks such as context-sensitive customer support and producing multistep workflows according to the firm.

2. Reasoning and knowledge: Here's the part where GPT trails

AI models experimented with certain benchmarks to test their abilities. These include reasoning, coding, text and image generation, etc. Some of the common benchmarks are graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding evaluation (HumanEval).

Anthropic has stated that Claude 3.5 Sonnet has set a new record by outperforming major rival models like GPT-4o and Gemini 1.5 Pro. It scored 59.4% on GPQA, 88.7% on MMLU being on par with GPT-4o and 92% in Human Eval. Now, these benchmarks are industry-level and often taken as the ultimate proof for AI startups to say their AI models beat others.

However, such technical tests are highly specific with tasks, are not standardised and can even be biased. So, AI benchmarks need to improve but that is a debate for later. Coming back to Claude 3.5 Sonnet, the results of the tests show that this model surpasses GPT-4o in 8 benchmarks which include:

GPQA
HumanEval (code)
Multilingual Math (MGSM)
Reasoning over text (DROP test)
Visual Math Reasoning
Chart Q&A
Document Visual Q&A
Science Diagram

3. Better at sarcasm and humour

After Elon Musk launched Grok- an AI with humour, it started a trend that all other LLM models followed. While Claude Opus was already impressive in this space, Claude 3.5 Sonnet is quicker at understanding humour and sarcasm. These elements make AI models easy to chat with and get the work done which is why most LLMs are trying to become better at learning "the human jokes" and it is not a laughing matter (I hope you get it).

4. Visual reasoning strength boost

Anthropic has also revealed their new AI bot is better at visual reasoning. After making improvements, it can easily interpret charts and graphs. Moreover, it can transcribe text from imperfect images.

Important Note: Anthropic has clearly stated Claude 3.5 Sonnet is at AI Safety Level- 1 (ASL) and has fine-tuned the bot to reduce misuse and does not use customer data to train their AI model.

Anthropic unveils 'Artifacts'

Apart from the AI bot, Anthropic launched a new addition called Artifacts. With this new feature, you'll have a dedicated space for AI-generated content like code snippets, flowcharts and text documents. Now you can easily view, edit, and expand upon Claude's creations in real time. This will help to boost collaboration and streamline workflow. The startup even stated that UX and design teams can leverage this feature to iterate their prototypes. As of now, Artifacts is available in preview through the web version of Claude.

The bottom line

The AI race to be the best gets interesting day by day. With Anthropic's Claude Sonnet 3.5 benchmark achievements, competitors like ChatGPT and Gemini will need to step up their game.