Anthropic details safeguards to protect user well-being in Claude

Company highlights suicide and self-harm classifier, reduced sycophancy, and 18+ age gating in a 18 December 2025 update.

Friday December 19, 2025 , 3 min Read

Anthropic has set out a new slate of protections to support people who turn to AI for emotional support, detailing product updates and test results aimed at safer conversations with its Claude assistant. In a blog post dated 18 December 2025, the company said its Safeguards team is focusing on how Claude handles discussions about suicide and self-harm, how it reduces sycophancy, and how it enforces an 18-plus age requirement.

Suicide and self-harm support moves from principle to product

According to the company, Claude is designed to respond with care, acknowledge its limitations, and point people towards human help when conversations raise concerns. Beyond training and policies, Anthropic has added a suicide and self-harm classifier that scans active chats on Claude.ai for signals that additional resources could help. When triggered, users see a crisis banner with options to contact trained professionals and country-specific helplines.

Anthropic said these resources are provided by ThroughLine, which maintains a verified network across more than 170 countries. The company has also begun working with the International Association for Suicide Prevention to convene clinicians, researchers, and people with lived experience, informing its product design, model training, and evaluation.

How will the suicide and self-harm classifier work

The classifier is a small AI model that evaluates the content of a conversation in real time and flags moments involving suicidal ideation or fictional scenarios centred on suicide or self-harm. If a risk signal is detected, a banner appears that routes users to helplines or mental health professionals in their region.

Anthropic emphasised that Claude is not a substitute for professional care, so the system is designed to encourage human connection and support. The company also shapes responses through a public system prompt and reinforcement learning, and it stress-tests behaviour using a “prefill” method that drops newer models into challenging mid-conversation contexts.

What the internal tests show

In single-turn evaluations of clearly risky prompts, Anthropic reported high rates of appropriate responses from its latest models, including the Opus 4.5, Sonnet 4.5, and Haiku 4.5 family. The firm also said benign requests are rarely blocked, indicating the models are better at reading intent.

In multi-turn tests that simulate longer, more nuanced exchanges, Opus 4.5 and Sonnet 4.5 improved markedly over the prior Opus 4.1 generation. In harder “prefill” tests, where models must course-correct from problematic older conversations, the newer models showed substantial gains.

Tackling sycophancy and delusions

Anthropic defines sycophancy as telling users what they want to hear rather than what is true or helpful. The company said it has refined training and measurement since 2022 and that the 4.5 models show significantly lower sycophancy and less encouragement of user delusion than earlier releases.

It has open-sourced an evaluation tool called Petri so that external teams can compare models on these behaviours. In stress tests that replay older chats, results varied by model, reflecting a trade-off between warmth and firmness when the AI is expected to push back.

Age gates and detection

As younger users face heightened risks from chatbot interactions, Anthropic reiterated that Claude.ai is restricted to people aged 18 and above. During sign-up, users must affirm they are over 18. If someone self-identifies as a minor in a chat, the company’s systems flag the case for review and it disables accounts confirmed to belong to under‑18 users.

Anthropic said it is developing a classifier to detect subtler signs that a user might be underage, and it has joined the Family Online Safety Institute to contribute to industry work on teen safety.

Advertise with us