> This is the markdown version of https://www.maniac.ai/blog/interactive-open-vs-closed-benchmark-frontier. Visit the full page for interactive content.


# Interactive open vs closed frontier across benchmarks | Maniac | Maniac

[Blog](/blog)

April 13, 2026

\[ Next step \]

## Turn model comparisons into production wins

Benchmarks are a starting point. Maniac helps you evaluate and route models on your real traffic, so you ship the best quality per dollar on the workloads that matter.

[Book a demo](/book-demo)[Read the docs](https://docs.maniac.ai/agent-setup/agent-setup)

Headline leaderboards flatten the story into a single number. In practice, **open** and **closed** model families do not advance in lockstep on every task. Coding, math, knowledge, and agent benchmarks can tell different stories.

This interactive view plots the **frontier envelope** for each track: at each point in time, what was the best score among open-weight models, and separately among closed models, using a one-year window. Switch benchmarks to see how the gap and the pace of improvement change with the task.

Scores are sourced from current rows on [Vals AI](https://www.vals.ai/benchmarks) benchmark pages, which keep older model entries visible. That makes this useful for comparing the **shape** of progress by metric, not for reconstructing release-day press tables.

For a single third-party **chat-preference** view on one scale (Arena AI), see the static frontier chart in [Qwen 3.5 vs Gemma 4: the benchmark-by-size comparison](/blog/qwen-3-5-vs-gemma-4-benchmarks-by-size).

Interactive benchmark explorerVals AI sourced rows

### Open vs closed frontier, benchmark by benchmark

The static chart above is the clean one-metric view. This explorer recomputes the frontier envelope over the same one-year window on a Vals AI benchmark set that includes older and newer models directly on each benchmark page.

Benchmark

KnowledgeVals AI academic reasoning benchmark.

Open gain

+7.1 pts

Change from the first to current open frontier point.

Closed gain

+11.4 pts

Change from the first to current closed frontier point.

End gap

Closed +8.1 pts

Closed minus open at the current frontier edge.

99.5%93.7%87.9%82.1%76.3%Apr 2025Jul 2025Oct 2025Jan 2026Apr 2026Open weights: DeepSeek V3.2 (Thinking) on Dec 1, 2025 at 80.3%Open weights: Kimi K2.5 on Jan 27, 2026 at 84.1%Open weights: Qwen 3.5 Plus on Feb 16, 2026 at 87.4%Open weightsClosed weights: o3 on Apr 16, 2025 at 84.1%Closed weights: GPT 5 on Aug 7, 2025 at 85.6%Closed weights: Claude Opus 4.5 (Thinking) on Nov 1, 2025 at 85.9%Closed weights: GPT 5.1 on Nov 13, 2025 at 86.6%Closed weights: Gemini 3 Pro on Nov 18, 2025 at 91.7%Closed weights: Gemini 3.1 Pro Preview on Feb 19, 2026 at 95.5%Closed weights

Open frontier milestones

DeepSeek V3.2 (Thinking)

Dec 1, 2025

80.3%

Kimi K2.5

Jan 27, 2026

84.1%

Qwen 3.5 Plus

Feb 16, 2026

87.4%

Closed frontier milestones

o3

Apr 16, 2025

84.1%

GPT 5

Aug 7, 2025

85.6%

Claude Opus 4.5 (Thinking)

Nov 1, 2025

85.9%

GPT 5.1

Nov 13, 2025

86.6%

Gemini 3 Pro

Nov 18, 2025

91.7%

Gemini 3.1 Pro Preview

Feb 19, 2026

95.5%

Uses current sourced rows from [Vals AI](https://www.vals.ai/benchmarks) benchmark pages and plots them by frontier date within the last-year window. Missing model rows are simply omitted for that benchmark, so this is best read as the shape of progress, not a release-day historical record. Historical frontier points can still include superseded models when they really were best on that benchmark at the time.

_Note: If a model is missing a sourced row on a given benchmark, that milestone does not appear for that metric. Historical frontier points can still include superseded models when they were genuinely best on that benchmark at the time._

\[ Next step \]

## Turn model comparisons into production wins

Benchmarks are a starting point. Maniac helps you evaluate and route models on your real traffic, so you ship the best quality per dollar on the workloads that matter.

[Book a demo](/book-demo)[Read the docs](https://docs.maniac.ai/agent-setup/agent-setup)

---

*Maniac, High throughput background agents. Opus-quality outputs at 1/50 of the cost. Learn more at [maniac.ai](https://www.maniac.ai).*