19.6 C
New York
May 10, 2026
GstechZone
Tech

Anthropic says ‘evil’ portrayals of AI had been answerable for Claude’s blackmail makes an attempt


Fictional portrayals of synthetic intelligence can have an actual impact on AI fashions, in line with Anthropic.

Final yr, the corporate mentioned that in pre-release exams involving a fictional firm, Claude Opus 4 would usually try to blackmail engineers to keep away from being changed by one other system. Anthropic later published research suggesting that fashions from different firms had related points with “agentic misalignment.”

Apparently Anthropic has finished extra work round that conduct, claiming in a post on X“We consider the unique supply of the conduct was web textual content that portrays AI as evil and desirous about self-preservation.”

The corporate went into extra element in a blog post stating that since Claude Haiku 4.5, Anthropic’s fashions “by no means interact in blackmail (throughout testing), the place earlier fashions would generally accomplish that as much as 96% of the time.”

What accounts for the distinction? The corporate mentioned it discovered that “paperwork about Claude’s structure and fictional tales about AIs behaving admirably enhance alignment.”

Associated, Anthropic mentioned that it discovered coaching to be simpler when it contains “the rules underlying aligned conduct” and never simply “demonstrations of aligned conduct alone.”

“Doing each collectively seems to be the simplest technique,” the corporate mentioned.

Techcrunch occasion

San Francisco, CA
|
October 13-15, 2026



Source link

Related posts

I retested Apple AirTags after 5 years – how they examine to Bluetooth tracker rivals

Amazon confirmed Prime Day 2026 is coming earlier. Here is the whole lot to know now

Cloudflare says AI made 1,100 jobs out of date, whilst income hit a document excessive