models April 11, 2026 2 min read

Claude Mythos Preview: Anthropic's Frontier Model Withheld from the Public

On April 7, 2026, Anthropic announced Claude Mythos Preview as part of Project Glasswing. The model decisively outperforms Opus 4.6 on SWE-bench Verified with 93.9% and GPQA Diamond with 94.6%, among other agentic coding benchmarks, yet remains restricted from public access due to its ability to autonomously discover thousands of zero-day vulnerabilities.

AI DayaHimour Team

April 11, 2026

Claude Mythos Preview: Anthropic's Frontier Model Withheld from the Public

On April 7, 2026, Anthropic announced Claude Mythos Preview as part of the Project Glasswing initiative. The model is classified as a general-purpose frontier model not available to the public, with the project aimed at leveraging its capabilities to secure critical software globally.

Key Technical Specifications

Claude Mythos Preview is a model not released for general use, featuring advanced agentic capabilities in autonomously discovering and exploiting security vulnerabilities. During the weeks leading up to the announcement, the model demonstrated discovery of thousands of high-severity vulnerabilities across all major operating systems and all major browsers, in addition to other important software. Some of these vulnerabilities had existed for decades without being detected by previous human or automated testing.

Benchmark Performance

Anthropic conducted official evaluations on Claude Mythos Preview in April 2026, with results showing clear superiority compared to Opus 4.6 in benchmarks related to agentic coding and scientific reasoning.

Key Benchmarks — April 2026

GPQA Diamond 94.6%

SWE-Bench Verified 93.9%

Science

Coding

Official evaluations show additional superiority in other specialized benchmarks. The model achieved 77.8% on SWE-bench Pro versus 53.4% for Opus 4.6, and 82.0% on Terminal-Bench 2.0 versus 65.4%, and 59.0% on SWE-bench Multimodal (internal implementation) versus 27.1%, and 87.3% on SWE-bench Multilingual versus 77.8%. It also recorded 83.1% on CyberGym versus 66.6%, and 79.6% on OSWorld-Verified versus 72.7%.

Quick Comparison with Competitors

Claude Mythos Preview outperforms Opus 4.6 on all mentioned benchmarks, with a margin of up to 24.4 percentage points on SWE-bench Pro. The superiority is particularly evident in agentic tasks requiring vulnerability discovery and exploitation without human guidance. No direct comparison data with other models like GPT-5.4 or Gemini 3.1 is available at this official stage, though Anthropic’s internal results confirm leadership in programming and cybersecurity capabilities.

Optimal Use Cases

Claude Mythos Preview is currently allocated to more than 40 partners in Project Glasswing, including Amazon Web Services, Apple, Google, Microsoft, NVIDIA, CrowdStrike, and the Linux Foundation. Partners use the model to audit core software and discover and patch zero-day vulnerabilities before they are exploited. General use is not permitted, and access is restricted to defensive purposes only after the preview phase.

Broader Context of Agentic Capabilities

Mythos Preview’s capabilities reflect a broader trend in developing frontier models that go beyond traditional tasks to full autonomy in complex environments. The model discovered a 27-year-old vulnerability in OpenBSD, a 16-year-old vulnerability in FFmpeg, and a series of vulnerabilities in the Linux kernel allowing complete system takeover. These capabilities raise questions about the balance of risks and benefits in deploying models that surpass traditional human testing.

The question remains open regarding how long companies will take to develop the necessary security assurances before releasing models at this capability level for general use, and how this will affect the pace of AI development in cybersecurity and programming.

Claude Mythos PreviewAnthropicSWE-benchProject Glasswingartificial intelligence2026

Total Views

... readers

Share this article:

Claude Mythos Preview: Anthropic's Frontier Model Withheld from the Public

Key Technical Specifications

Benchmark Performance

Quick Comparison with Competitors

Optimal Use Cases

Broader Context of Agentic Capabilities

Related Articles

Llama 4 Maverick: The Open‑Source Model That Shook the AI Throne in 2026 — A Comprehensive Analysis

GPT-5.4: OpenAI's Most Powerful Model That Combines Extended Reasoning and Autonomous Agents — A Comprehensive Analysis

Sora 2 Pro: OpenAI's Flagship Cinematic Video Generation Model with Synchronized Audio and Professional‑Grade Resolution