Anthropic's Mythos is evolving sooner than anticipated, studies AI security company

aiburst-gettyimages-2189115060 — Eugene Mymrin/ Second through Getty Photos

Observe ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

The most recent model of Claude Mythos has already superior.
Exterior researchers discovered that it achieved a number of firsts in testing.
AI capabilities could also be enhancing a lot sooner than anticipated.

Anthropic’s Claude Mythoswhich the corporate maintains is simply too highly effective to be launched usually, already seems to have gained new capabilities.

In a blog put up on Wednesday, the UK AI Safety Institute (AISI) reported that it had examined a more moderen model of Mythos, which outperformed each its earlier outcomes and OpenAI’s GPT-5.5 — only a month after Mythos’ preliminary launch.

Additionally: Apple, Google, and Microsoft join Anthropic’s Project Glasswing to defend world’s most critical software

“The newer Mythos Preview checkpoint accomplished each our cyber ranges, fixing the vary ‘The Final Ones’ in 6 of 10 makes an attempt and the beforehand unsolved ‘Cooling Tower’ in 3 of 10 makes an attempt,” the weblog authors wrote. “This was the primary time {that a} mannequin accomplished the second of our two cyber ranges.”

When Anthropic first introduced Mythos Preview and Undertaking Glasswing — the cybersecurity testing alliance it shaped with rival tech corporations and AI labs, to which it gave restricted entry to Mythos — final month, UK AISI evaluated itdiscovering that the mannequin “represents a step up over earlier frontier fashions in a panorama the place cyber efficiency was already quickly enhancing.”

That third-party perspective helped stability claims that the hype round Mythos was both solely advertising and marketing or, on the different finish, signaled a catastrophic shift in AI capabilities. The reality about what the mannequin can do is probably going someplace within the center.

Additionally: How to learn Claude Code for free with Anthropic’s AI courses – one took me just 20 minutes

AISI’s up to date check additionally exemplifies that functionality enhancements aren’t restricted to particular person mannequin releases, however can occur inside variations of a single mannequin.

A quickly accelerating cyber menace

AISI famous that AI fashions are quickly advancing of their means to deal with cyber duties, with critical implications for cybersecurity, particularly given Mythos’ knack for detecting software vulnerabilities.

“In February 2026, we internally estimated that the size of cyber duties AI fashions may full had doubled each 4.7 months since late 2024 – already an acceleration from our November 2025 estimate of 8 months,” the weblog authors wrote. “Since then, AISI reported on two new fashions, Claude Mythos Preview and (OpenAI’s) GPT-5.5which considerably exceeded each doubling fee traits.”

Additionally: The third major Linux kernel flaw in two weeks has been found – thanks to AI

The authors added that it is unclear whether or not that pattern will maintain or whether or not these findings point out a long-lasting improve. Mythos and GPT-5.5 may merely be notable breaks from the general sample of mannequin evolution.

Nonetheless, AISI clarified that there are a number of unknowns its testing couldn’t account for. The assessments capped duties at 2.5 million tokens, which let researchers higher examine efficiency outcomes over time. That inherently “understates what frontier fashions can do,” they wrote.

“Mythos Preview and GPT-5.5 have giant upper-bound error bars as a result of near-100% success charges on our slender cyber suite’s longest duties, even with the two.5M token restrict,” the weblog continued. “Our duties are additionally not lengthy sufficient to find out how sharply the fashions’ reliability would deteriorate at increased job lengths. This locations a few of the newest fashions on the restrict of what our slender check suite can measure.”

Additionally: I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

Whereas this makes the purpose of mannequin failure onerous to measure, it additionally means mannequin success charges on these duties could be a lot increased with out the token cap — so excessive, actually, that “time horizons change into unimaginable to calculate.” Fashions with extra token entry and complicated agent infrastructure could be far more succesful.

“A 2.5M token restrict is comparatively low — in our cyber vary experiment we use as much as 100M tokens and discover efficiency would seemingly nonetheless enhance past that funds, particularly for current fashions, which disproportionately profit from increased token limits,” the weblog added.

Source link

Login

Register

ZDNET’s key takeaways

A quickly accelerating cyber menace

Related posts