Methodology

This page defines how sample leaderboards relate to production scoring; published scores must remain consistent with the same documentation.

Public ranking algorithms

All site lists are rankable and auditable. The formulas below are publicly documented in both UI and source code.

Leaderboards (Model / Agent / LLM / Toolchain)

Sort order: composite score descending.
Composite score: normalize each sub-metric with min-max, then compute a weighted sum. Ties are broken by higher recent activity (e.g., commits in the last 30 days).
Trend groups (GitHub)

Within-group formula: Score = 100 × [0.30·Stars + 0.15·Forks + 0.30·Commits30d + 0.15·Contributors + 0.05·(1-Issues) + 0.05·(1-PRs)].
All terms are min-max normalized within the same group; Issues/PR are inverse signals (lower is better).

Layers & dimensions

Six boards map to Models, Agents, LLMs, Toolchains, Token providers, and Model aggregators; columns may extend independently (vendor, domain, size, coverage, auth posture, aggregation breadth, etc.).

Composite scores use configurable weights and normalization; multi-benchmark setups must declare benchmark versions, weights, and missing-value handling.

Updates & release

Static builds: commit JSON or fetched artifacts, then run SSG.

Scheduled jobs: GitHub Actions may run evaluators or aggregators, write artifacts, and trigger builds; read-only edge storage must be weighed against static-first goals.

Verifiability

The Sources page should list primary sources, fetch times, and versions; caching or sampling must be disclosed. Readers may cross-check Methodology against Sources.

Bias & mitigations

Common issues include benchmark leakage, overfitting to public eval sets, vendor-reported scores vs. independent reproduction, and composite scores masking weak tasks. Mitigations: pin task versions, publish seeds and scripts, disclose per-task scores, and review third-party leaderboard changelogs.

GitHub momentum is gameable—cross-check stars with commits, issues/PRs, and release cadence.

Public ranking algorithms

Leaderboards (Model / Agent / LLM / Toolchain)

Trend groups (GitHub)

Layers & dimensions

Updates & release

Verifiability

Bias & mitigations