LLM-Ranking

Fähigkeiten großer Sprachmodelle (Beispieldaten).

Composite scores may decompose into reasoning, coding, multilingual, and safety dimensions; Methodology must cite benchmark versions and tool-use policy.

Stand:

Public ranking policy: rows are sorted by composite score (desc). Composite score is a weighted sum of normalized sub-metrics; ties are broken by higher recent activity.

RangModellAnbieterGrößePunktzahlHinweise
1 Nova-Large-2 Nova AI ~400B MoE 95 Reasoning-Modus
2 Summit-Pro Summit ~200B 93.4 Starke Befolgung von Anweisungen
3 DeepLine-R1 DeepLine ~70B 91.9 Offene Gewichte
4 Cedar-32B Cedar 32B 89.7 Ausgewogen ZH/EN
5 Birch-Mini Birch 8B 87.3 On-Device-Deployment
6 Fjord-1.5 Fjord Labs 14B 86.1 Tool-Aufrufe
7 Ridge-Code Ridge 33B 85 Code-fokussiert
8 Willow-Base Willow 3B 82.4 Sehr geringe Latenz