FAQ

Common questions when browsing leaderboards and trends; repository documentation prevails where deployment differs.

Are leaderboard scores production evaluations?

Defaults are sample data for layout and build validation—not an assertion of any vendor’s true ranking.

Public, credible rankings require replacing JSON under data/rankings and documenting tasks, weights, dates, and reproducibility in Methodology.

Why static sites?

Static HTML favors SEO, time-to-first-byte, and global CDN caching. Leaderboards may refresh on a daily or weekly cadence via CI-triggered builds.

Live queries may use read-only edge APIs alongside static snapshots and source citations for auditability.

Does language switching change the current page?

The path outside the locale prefix is preserved (e.g., /zh/models/ ↔ /en/models/) for side-by-side reading.

Untranslated long-form sections may temporarily mirror English or another default—incremental localization.

How to read composite scores?

Composites aggregate metrics after normalization and weighting—useful for overview, inadequate alone for weak-task analysis. Production deployments should publish per-task scores or sub-ranks where applicable.

When mixing public benchmarks, declare versions and missing-cell handling.

How do GitHub trends relate to model leaderboards?

Model and agent boards emphasize capability or task success; GitHub trends emphasize OSS activity—they are complementary.

High star counts do not imply state-of-the-art capability; closed-source or off-GitHub work is excluded from trend statistics.

How to connect an internal eval pipeline?

Typical flow: run evaluators in CI, emit JSON under data/rankings or build artifacts, trigger Astro build, deploy static output.

Object-storage snapshots require URLs and checksums on the Sources page.

Are outbound links safe?

External links open in a new tab with noopener/noreferrer. Trustworthiness and privacy policies of destination sites are the visitor’s responsibility.

May these tables be embedded or republished?

Upstream data and code licenses apply; republish with Methodology and Sources links and the data date. Mark demo data as sample-only.