Methodology

How to read sample leaderboards today and how they connect to production data—keep this narrative aligned when you ship real scores.

Public ranking algorithms

All site lists are rankable and auditable. The formulas below are publicly documented in both UI and source code.

  • Leaderboards (Model / Agent / LLM / Toolchain)

    Sort order: composite score descending.

    Composite score: normalize each sub-metric with min-max, then compute a weighted sum. Ties are broken by higher recent activity (e.g., commits in the last 30 days).

  • Trend groups (GitHub)

    Within-group formula: Score = 100 × [0.30·Stars + 0.15·Forks + 0.30·Commits30d + 0.15·Contributors + 0.05·(1-Issues) + 0.05·(1-PRs)].

    All terms are min-max normalized within the same group; Issues/PR are inverse signals (lower is better).

Layers & dimensions

Six boards map to Models, Agents, LLMs, Toolchains, Token providers, and Model aggregators; per-board columns can extend independently (vendor, domain, size, coverage, auth posture, aggregation breadth, etc.).

Composite scores come from configurable weights and normalization; multi-benchmark setups should declare benchmark versions, weights, and missing-value handling.

Updates & release

Static builds: commit JSON or fetched artifacts, then SSG.

Scheduled jobs: GitHub Actions can run evaluators or aggregators, write artifacts, and trigger builds; read-only D1/KV at the edge is possible but should be weighed against static-first goals.

Verifiability

List primary sources, fetch times, and versions on the Sources page; disclose caching or sampling. Readers can cross-check Methodology ↔ Sources.