Chinese Hackers Jailbreak Claude to Launch Autonomous AI-Driven Cyberattack

Share


Anthropic has revealed a concerning case of advanced AI misuse, disclosing that a China-based hacking group successfully jailbroke its Claude model and used it to conduct a large-scale cyber operation with minimal human intervention. In a detailed blog post published on Thursday, the company described the incident as the first known example of an AI system autonomously executing a sophisticated cyberattack from reconnaissance through exploitation.

According to Anthropic, the attackers exploited “agentic AI” behaviour, directing Claude to perform tasks typically handled by a skilled cybersecurity unit. These capabilities included system scanning, vulnerability discovery, exploit development and compiling structured intelligence reports.

The operation began with the hackers selecting 30 high-value targets across sectors such as finance, technology, chemicals and government. While Anthropic did not identify the victims, it confirmed that the group built an automated workflow positioning Claude as the central engine driving the attack.

To evade safety mechanisms, the attackers broke malicious instructions into smaller, seemingly benign requests and framed them as defensive security evaluations. This strategy enabled the jailbreak to bypass Claude’s safeguards without raising alarms.

Once engaged, Claude reportedly mapped network environments, scanned infrastructure rapidly and produced high-level summaries for the attackers. “According to the Anthropic blog, the AI researched vulnerabilities, wrote its own exploit code and notably attempted to gain access to high-value accounts.” In some cases, it harvested credentials and organised stolen data by importance before generating detailed intrusion reports.

Anthropic warns that such incidents demonstrate a rapidly shifting threat landscape, where the barrier to conducting advanced cyberattacks is falling significantly. Autonomous AI models capable of coordinating multi-step operations could empower small or less-skilled groups to execute attacks previously limited to elite cyber units.

Although Claude still displayed occasional errors — such as fabricating information or miscategorising data — the overall sophistication of the operation underscores the accelerating emergence of AI-driven threats.


Recent Random Post: