Four boards map to Models, Agents, LLMs, and Toolchains; per-board columns can extend independently (vendor, domain, size, coverage, etc.).
Composite scores come from configurable weights and normalization; multi-benchmark setups should declare benchmark versions, weights, and missing-value handling.