Text (LLM), image, video, and multimodal—alongside the combined Model board.
Model leaderboard (all)
Cross-task performance (multimodal / vision / language)—sample data; replace with production eval output.
Cross-task overview; multimodal, vision, and language may split into additional columns or child boards once eval JSON is wired.
Public ranking policy: rows are sorted by composite score (desc). Composite score is a weighted sum of normalized sub-metrics; ties are broken by higher recent activity.
| Rank | Model | Vendor / team | Type | Score | Notes |
|---|---|---|---|---|---|
| 1 | Demo-Vision-Pro | Demo Lab | Multimodal | 94.2 | Balanced image + text |
| 2 | NorthStar-MM | North AI | Multimodal | 92.8 | Strong long-context scenarios |
| 3 | Aurora-VL-7B | Aurora | Vision-language | 91.5 | Edge-friendly |
| 4 | Helix-3 | Helix Research | General | 90.1 | Stable tool calling |
| 5 | Kite-Small | Kite | Language | 88.6 | Strong price/performance |
| 6 | Lattice-R1 | Lattice | Reasoning | 87.9 | Strong math/code subscores |
| 7 | Pulse-Audio-2 | Pulse | Speech multimodal | 86.4 | ASR/TTS combined |
| 8 | Quark-Mini | Quark Systems | Language | 85.2 | Low latency |