Glossary

Definitions below are working terms for this site; authoritative wording appears in academic or vendor documentation.

LLM

Large language model—typically autoregressive or prefix LM trained at scale for chat, code, reasoning, and tool use; behavior depends on size and post-training.

Agent

A system that observes state, plans steps, and calls tools to reach goals—single- or multi-agent; judged on success rate, steps, and cost.

Toolchain

End-to-end software and process covering data, training, evaluation, deployment, and observability; here the emphasis is repeatability and delivery.

Benchmark

A standardized task suite and metrics for comparing models or systems; version drift and contamination affect comparability.

SSG

Static site generation—compile templates and data to files at build time; no per-request server rendering required.

RAG

Retrieval-augmented generation—retrieve relevant documents or snippets before answering; quality depends on retrieval and chunking.

Fine-tuning

Continue training a base model on a smaller dataset to adapt tasks, style, or safety; watch for catastrophic forgetting and bias.

Alignment

Techniques to align model behavior with human intent and safety constraints—RLHF, DPO, filters, etc.

Quantization

Lower numeric precision for weights or activations to speed inference and save memory; verify task-level trade-offs.

Inference serving

Expose models as online APIs or batch jobs; optimize throughput, latency, batching, and hardware utilization.

Stars / Forks (GitHub)

Stars indicate interest or endorsement; forks indicate copies for modification—both are marketing-sensitive; pair with commits and issues.

Composite score

A scalar combining weighted metrics—useful for overview; always read sub-scores and task definitions.