Elon Musk's xAI launches Grok 4.1 with higher EQ and fewer hallucinations
xAI has rolled out Grok 4.1 across web, X, iOS and Android, reporting a 64.78% user preference over its predecessor, a top LMArena ranking, and notable drops in hallucinations, alongside new RL methods aimed at tone, personality and alignment.
Elon Musk's xAI has announced Grok 4.1, a major update to its chatbot the company said is more emotionally perceptive, creatively capable and factually reliable than prior versions.
The model is available on Grok’s website, the X platform and the iOS and Android apps, with Auto mode enabling it by default.
According to xAI’s release, Grok 4.1 has begun rolling out immediately and can be explicitly selected from the model picker.
The company said it conducted a two‑week silent rollout between 1–14 November to progressively larger shares of live traffic on grok.com, X and the mobile apps before the broader release.
Benchmarks and user preference
xAI reported that, in blind pairwise tests on production traffic, Grok 4.1 is preferred 64.78% of the time versus the previous production model.
The company also claimed the new model has topped LMArena’s Text Arena in its “Thinking” configuration (code name “quasarflux”) with a 1483 Elo score, while its non‑reasoning version (“tensor”) has ranked #2 with 1465 Elo—surpassing other models’ full‑reasoning modes on the public leaderboard.
xAI stated it applied the same large‑scale reinforcement learning infrastructure used for Grok 4, then optimised Grok 4.1 for style, personality, helpfulness and alignment.
To handle non‑verifiable reward signals, the team said it has used frontier “agentic reasoning” models as reward models to evaluate and iterate on responses autonomously at scale.
Hallucinations have reduced
The company outlined a post‑training focus on lowering factual errors, particularly for information‑seeking prompts with web search.
While xAI’s post includes charts, reports indicated the underlying figures represented a drop in internal hallucination rate from 12.09% to 4.22% and a FActScore error rate reduction from 9.89% to 2.97%.
xAI also showcased improvements on EQ‑Bench3 and a Creative Writing v3 benchmark, alongside side‑by‑side examples of more nuanced, empathetic replies and stronger narrative voice. The firm said it is collaborating with benchmark authors to surface results on public leaderboards.


