Muse Spark: When Meta Betrays Open Source and Rejoins the Race from Behind
After the Llama 4 scandal and Alexandr Wang's ground-up rebuild, Meta launches its first closed-source model Muse Spark in an existential bet on the future of AI.
AI DayaHimour Team
April 11, 2026
Between the April 8, 2026 announcement and the April 2025 scandal, only twelve months had passed for Meta. But they were enough to change everything: the architecture, the strategy, the ambition, and the identity itself.
On that day, Meta unveiled Muse Spark — the first model produced by Meta Superintelligence Labs (MSL), the research arm established in mid-2025 after one of the most critical moments in the company’s technical history. The model is available for free via meta.ai and the Meta AI app from the moment of announcement, and is on its way to rolling out across WhatsApp, Instagram, Facebook, Messenger, and Ray-Ban smart glasses in the coming weeks.
What makes this launch an event beyond a mere new model announcement is not its technical capabilities alone — but the decision embedded in the announcement: Muse Spark is entirely closed-source. No weights, no self-hosting, no public weight preview. This announcement, simply put, marks the end of an entire era for Meta.
From Avocado to Muse Spark: Nine Months from Scratch
To understand what this launch represents, one must return to April 2025. Meta launched Llama 4 to reactions it did not anticipate. Quickly, independent researchers discovered that the version the company submitted to the LM Arena platform was not the same version available to the public — but a version specifically optimized to improve benchmark results. Later, additional investigations revealed that Meta had privately tested 27 different variants of Llama 4 and selected the best-performing one for submission to the platform.
The scandal was costly for a reason deeper than mere deception: Meta had built its AI reputation on the discourse of transparency and open source. Llama 3 was the star of the open community. Llama 4 turned that into a public relations nightmare. According to later investigations, the Llama 4 team resorted in the final training stages to mixing test data into training data, leading to severe overfitting.
The $14.3 Billion Deal
In June 2025, Mark Zuckerberg made a radical decision: he spent $14.3 billion for a non-voting 49% stake in Scale AI, bringing its founder Alexandr Wang to assume — for the first time in the company’s history — the position of Chief AI Officer. Under the deal, Scale AI’s valuation rose to over $29 billion. According to reports, Wang maintained his board seat at Scale AI while transitioning to Meta.
In August 2025, Wang announced via an internal memo that MSL would be divided into four divisions: AI Research, Superintelligence Research, Product Development, and Infrastructure. The dissolution of the AGI Foundations team and redistribution of its members was also announced, further centralizing authority around Wang. In November 2025, Yann LeCun — one of AI’s “godfathers” — left Meta after 12 years to establish an independent lab focused on “world models.”
What Is Muse Spark, Actually?
Muse Spark is a natively multimodal model — not merely a text model later fine-tuned to process images, but designed from the ground up to integrate text, images, and audio as inputs within a single unified framework. The model natively supports tool-use, visual chain of thought, and multi-agent orchestration. The context window reaches 262,000 tokens.
What truly distinguishes Muse Spark is not these features per se — most have become the “minimum bar” for any leading model in 2026 — but the core design philosophy. According to Wang, the team rebuilt everything from scratch: the architecture, the training data pipeline, the computational infrastructure — including the massive Hyperion data center.
Benchmarks: A Complex Picture
Official Benchmarks — Muse Spark vs Competitors — April 2026
Source: Official Meta AI Blog
📷 Multimodal
GPT-5.4: 82.8 / Gemini 3.1 Pro: 80.2 / Opus 4.6: 65.3
Gemini 3.1 Pro: 83.9 / GPT-5.4: 81.2 / Opus 4.6: 77.4
Gemini 3.1 Pro: 72.4 / Opus 4.6: 62.2 / GPT-5.4: 61.1
GPT-5.4: 41.0 / Gemini 3.1 Pro: 29.0 — Opus 4.6: —
🧠 Text / Reasoning
Gemini 3.1 Pro: 45.4 / GPT-5.4: 43.9 / Opus 4.6: 40.0
Opus 4.6: 53.1 / GPT-5.4: 52.1 / Gemini 3.1 Pro: 51.4
Gemini 3.1 Pro: 94.3 / GPT-5.4: 92.8 / Opus 4.6: 92.7
GPT-5.4: 87.5 / Gemini 3.1 Pro: 82.9 / Opus 4.6: 70.7
Gemini 3.1 Pro: 76.5 / GPT-5.4: 76.1 / Opus 4.6: 63.3 — Largest gap
🏥 Health
GPT-5.4: 40.1 / Gemini 3.1 Pro: 20.6 / Grok 4.2: 20.3 / Opus 4.6: 14.8
Gemini 3.1 Pro: 81.3 / GPT-5.4: 77.1 / Opus 4.6: 64.8
🤖 Agentic
Opus 4.6: 73.7 / GPT-5.4: 73.6 / Gemini 3.1 Pro: 69.7
Opus 4.6: 80.8 / Gemini 3.1 Pro: 80.6 / Grok 4.2: 76.7
GPT-5.4: 75.1 / Gemini 3.1 Pro: 68.5 / Opus 4.6: 65.4
Grok 4.2: 96.5 / Gemini 3.1 Pro: 95.6 / Opus 4.6: 92.1
GPT-5.4: 1672 / Opus 4.6: 1606 / Gemini 3.1 Pro: 1320
These numbers reveal a clearly mixed picture. The model leads globally in health (HealthBench Hard: 42.8%, ahead of GPT-5.4 at 40.1%), in visual understanding (CharXiv Reasoning: 86.4%), and in agentic tasks (DeepSearchQA: 74.8%). But it lags sharply in competitive programming (LiveCodeBench Pro: 80.0% vs. 87.5% for GPT-5.4), and in abstract reasoning in particular.
On ARC-AGI-2 specifically — the benchmark that tests recognition of entirely novel patterns that cannot be memorized — Muse Spark scores 42.5% while Gemini 3.1 Pro and GPT-5.4 reach approximately 76%. The gap here is not merely a decline; it is a structural gap suggesting that the model still struggles with tasks requiring abstract symbolic reasoning far removed from the texts and images it was trained on.
Meta itself acknowledged in its technical post the existence of “performance gaps” in long-horizon agent systems and programming workflows. But the company points out that Muse Spark is “the first and smallest model in the series,” hinting that larger upcoming models may close these gaps.
The Break with Open Source
The weightiest decision was not technical. Muse Spark is entirely closed-source — the weights are not available for download, there is no self-hosting, and API access is currently restricted to a private preview for selected partners, with no announced pricing or timeline for general availability.
This shift starkly contradicts Zuckerberg’s previous rhetoric, in which he stated that “open source represents the best opportunity for the world to harness this technology to create the greatest economic opportunity and safety for all.” One analyst described Muse Spark as “closed like the private school Zuckerberg attended.” Some commentators view the closure not as a change in philosophy but as an implicit acknowledgment that “open source stopped being a competitive advantage and became a competitive burden.”
The tech community — which had built thousands of projects and studies on Llama — received this reservation with widespread skepticism. Reddit, specifically the r/LocalLLaMA community comprising thousands of developers who rely on Meta’s open-source models, saw angry reactions.
Three Billion Users as a Deployment Arena
What Muse Spark possesses that none of the competitor models do is the immediate deployment arena: over three billion people use Meta’s apps daily. The model will reach WhatsApp, Instagram, Facebook, and Messenger within weeks — not as an option the user seeks out, but as an assistant embedded in the products they already use.
This scale of deployment gives Meta an exceptional advantage in collecting real-world usage data, thereby continuously improving the model. The Shopping Mode that Meta is testing adds a behavioral data layer derived from user interactions across its platforms — from purchases to interactions with ads and content.
The Cost: Between $115 and $135 Billion
During the announcement of Q4 2025 results, Meta revealed its capital expenditure plan for 2026: between $115 billion and $135 billion — roughly double the $72.22 billion spent in 2025. This figure far exceeds analyst expectations of $110 billion. A significant portion of these funds is directed toward funding MSL, expanding data centers, and strengthening third-party cloud capabilities.
This scale of spending explains the logic of closure: when spending at this level, it becomes difficult to give away the weights for free. But the deeper question is: was there another way? Could Meta have remained in the open-source camp while maintaining its competitive edge?
The Roadmap: Larger Models Ahead
Wang indicated on X that Muse Spark is merely the beginning: “This is the first step. Larger models are in development, and there are plans to open-source future releases.” But the tech community received this promise with clear skepticism — the announcement of Muse Spark without open weights makes it, for now, exclusive to Meta’s ecosystem alone.
Internal sources indicate that the codename “Avocado” referred specifically to Muse Spark, and that the next model in the Muse series is already in development. Meta has not mentioned any specific timeline for future releases.
Performance Gaps and Independent Verification
Beyond the official numbers, there is another story. Independent evaluations — such as those conducted by Artificial Analysis after obtaining early access from Meta — paint a more conservative picture. In the Humanity’s Last Exam test, Artificial Analysis recorded 39.9% for Muse Spark, trailing Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (41.6%). However, it is important to note that these evaluations were not “fully independent,” as the access itself came through Meta.
Independent evaluations suggest that Muse Spark is “entering the leading group” without being “a leader in every domain.” In areas such as programming and abstract reasoning, the model remains behind. This aligns with what official benchmarks showed, but adds a layer of caution: the official numbers reveal superiority in some areas and deficiency in others, but the magnitude of some of these gaps may be larger than the initial figures suggest.
Release Season: A Race That Never Stops
The Muse Spark announcement came in the same week that Cursor announced its third version and Claude Code expanded into Auto Mode. Competitors have not paused. OpenAI continues developing GPT-5.4, Google is strengthening Gemini, Anthropic is pushing Claude forward. In this context, Meta’s return to the race does not mean winning it — it only means the company is no longer out of the race.
The same week also saw Meta announce new collaborations with app developers to integrate Muse Spark into their products via API — a move indicating that the company does not intend to rely solely on its own apps, but rather seeks to build a closed ecosystem around its new model.
What remains open is not the question of Muse Spark’s quality — which appears convincing within its announced limits for health and science — but about the methodology’s credibility. After the Llama 4 experience, the definitive judgment will depend on what independent evaluation reveals outside Meta’s internal labs. And this is perhaps the real vulnerability that Alexandr Wang faces now: not building a powerful model, but something harder — restoring the trust of a community that once felt betrayed.
On X, one AI developer wrote: “Llama 4 was a scandal. Muse Spark is closed. When do we trust Meta again?” The answer to that question may take longer than the nine months it took to build the model. And perhaps the answer will not come from comparing numbers on benchmarks, but from daily testing, in the apps of millions, over the months ahead.
Total Views
... readers