Glossary
Definitions below are working terms for this site; authoritative wording appears in academic or vendor documentation.
LLM
Large language model—typically autoregressive or prefix LM trained at scale for chat, code, reasoning, and tool use; behavior depends on size and post-training.
Agent
A system that observes state, plans steps, and calls tools to reach goals—single- or multi-agent; judged on success rate, steps, and cost.
Toolchain
End-to-end software and process covering data, training, evaluation, deployment, and observability; here the emphasis is repeatability and delivery.
Benchmark
A standardized task suite and metrics for comparing models or systems; version drift and contamination affect comparability.
SSG
Static site generation—compile templates and data to files at build time; no per-request server rendering required.
RAG
Retrieval-augmented generation—retrieve relevant documents or snippets before answering; quality depends on retrieval and chunking.
Fine-tuning
Continue training a base model on a smaller dataset to adapt tasks, style, or safety; watch for catastrophic forgetting and bias.
Alignment
Techniques to align model behavior with human intent and safety constraints—RLHF, DPO, filters, etc.
Quantization
Lower numeric precision for weights or activations to speed inference and save memory; verify task-level trade-offs.
Inference serving
Expose models as online APIs or batch jobs; optimize throughput, latency, batching, and hardware utilization.
Stars / Forks (GitHub)
Stars indicate interest or endorsement; forks indicate copies for modification—both are marketing-sensitive; pair with commits and issues.
Composite score
A scalar combining weighted metrics—useful for overview; always read sub-scores and task definitions.