FAQ

Common questions when browsing leaderboards and trends; repository documentation prevails where deployment differs.

Data-driven FAQ snapshot

Compare usage rate: 0% · detail second-click 0%

Open global rankings to validate models

Are leaderboard scores production evaluations?

Published rankings are generated from pipeline outputs and documented methodology, not vendor marketing claims.

For credible releases, keep the snapshots in src/data/db/site.sqlite synchronized with tasks, weights, dates, and reproducibility notes in Methodology.

Why static sites?

Static HTML favors SEO, time-to-first-byte, and global CDN caching. Leaderboards may refresh on a daily or weekly cadence via CI-triggered builds.

Live queries may use read-only edge APIs alongside static snapshots and source citations for auditability.

Does language switching change the current page?

The path outside the locale prefix is preserved (e.g., /zh/model//en/model/) for side-by-side reading.

Untranslated long-form sections may temporarily mirror English or another default—incremental localization.

How to read composite scores?

Composites aggregate metrics after normalization and weighting—useful for overview, inadequate alone for weak-task analysis. Production deployments should publish per-task scores or sub-ranks where applicable.

When mixing public benchmarks, declare versions and missing-cell handling.

How do GitHub trends relate to model leaderboards?

Model and agent boards emphasize capability or task success; GitHub trends emphasize OSS activity—they are complementary.

High star counts do not imply state-of-the-art capability; closed-source or off-GitHub work is excluded from trend statistics.

How to connect an internal eval pipeline?

Typical flow: run evaluators in CI, write boards/datasets into src/data/db/site.sqlite via the data pipeline, trigger Astro build, deploy static output.

Object-storage snapshots require URLs and checksums on the Sources page.

Are outbound links safe?

External links open in a new tab with noopener/noreferrer. Trustworthiness and privacy policies of destination sites are the visitor’s responsibility.

May these tables be embedded or republished?

Upstream data and code licenses apply; republish with Methodology and Sources links and the data date.