Learning center

Curated learning paths for LLMs, agents, RAG, and model evaluation: foundations, hands-on cookbooks, inference APIs, and public benchmarks. External links open in a new tab; inclusion is informational, not endorsement.

Foundations & courses

Resource	Notes
Hugging Face Transformers	Pretrained models, tokenizers, fine-tuning, and the library’s core API.
Stanford CS224N	Deep learning for NLP—word vectors, attention, Transformers, and applications.
DeepLearning.AI	Short courses on LLMs, prompt engineering, agents, RAG, and tooling.
The Illustrated Transformer	Visual explanation of attention and encoder–decoder architectures.
Lil’Log — LLM topics	Surveys and notes on agents, prompting, RLHF, and related systems.
Papers with Code	Tasks, papers, code, and leaderboards across ML and vision domains.
PyTorch documentation	Tensors, autograd, modules, and distributed training—common for LLM research.
Stanford CS336 — Language Modeling from Scratch	End-to-end LM course: tokenization, training, data, scaling, alignment, and systems.
Hugging Face PEFT	Parameter-efficient fine-tuning—LoRA, adapters, and prompt tuning.
Hugging Face TRL	Train LMs with supervised fine-tuning, DPO, PPO, and reward modeling.
fast.ai	Practical deep learning top-down; useful foundation before LLM specialization.
Attention Is All You Need (paper)	Original Transformer architecture—baseline reference for modern LLMs.

Cookbooks, agents & RAG

Resource	Notes
Hugging Face course	Transformers, datasets, tokenization, and NLP/LLM pipelines.
OpenAI Cookbook	Patterns for embeddings, RAG, function calling, and evaluation.
Anthropic Cookbook	Claude SDK examples, prompts, tool use, and long-context workflows.
LangChain documentation	Chains, tools, agents, memory, and retrieval integrations.
LlamaIndex	Data connectors, indexing, RAG, and agent workflows over documents.
DSPy	Programs and optimizers for LM pipelines—prompts as modules.
AutoGen (Microsoft)	Multi-agent conversations, tool use, and orchestration.
vLLM	Fast LLM serving with PagedAttention, OpenAI-compatible APIs.
Semantic Kernel (Microsoft)	Plugins, planners, and connectors for LLM apps in .NET and Python.
Haystack	Pipelines for RAG, retrieval, and document QA at scale.
Langfuse	Tracing, evals, and observability for LLM applications.
PyTorch tutorials	Official tutorials from basics to NLP and distributed training.
NVIDIA Triton Inference Server	Production model serving, dynamic batching, and multi-framework backends.

Model & inference APIs

Resource	Notes
OpenAI API	Chat, embeddings, images, audio, batch, and assistants.
Anthropic API	Claude models and Messages API.
Google AI for Developers	Gemini and related Google AI APIs.
Mistral AI	Chat, embeddings, and fine-tuning.
Cohere	Command, embed, classify, and RAG-oriented APIs.
Hugging Face Inference	Hosted inference for models on the Hub.
Replicate	Run open models via API and webhooks.
Together AI	Open-weight LLM inference and fine-tuning APIs.
Ollama	Local LLM serving; model library and CLI.
Groq	Very fast inference API for supported open models (LPU-backed).
OpenRouter	Unified API routing across many model providers and open weights.
Azure OpenAI Service	Enterprise OpenAI models on Azure with regional deployment and policies.
Vertex AI (Google Cloud)	Gemini, tuning, and MLOps on GCP.
Fireworks AI	Fast inference and fine-tuning for open models.

Benchmarks & leaderboards

Resource	Notes
LMSYS Chatbot Arena	Crowd-ranked LLM comparisons via blind side-by-side votes.
Open LLM Leaderboard	Open-weight models on common benchmarks (Hugging Face).
HELM (Stanford)	Holistic evaluation of language models across scenarios and metrics.
lm-evaluation-harness	Standard benchmark suite for many LLMs; widely used in papers and repos.
Papers with Code — SOTA	State-of-the-art tasks across NLP, vision, and more.
arXiv cs.CL	Recent computation and language papers—preprints.
LiveBench	Contamination-aware benchmark with frequently refreshed questions.
SWE-bench	Real GitHub issues—tests coding agents on repository-scale tasks.
Artificial Analysis	Independent comparisons of model quality, speed, and price.
MTEB Leaderboard	Massive Text Embedding Benchmark—retrieval and embedding model rankings.