Learning center
Curated learning paths for LLMs, agents, RAG, and model evaluation: foundations, hands-on cookbooks, inference APIs, and public benchmarks. External links open in a new tab; inclusion is informational, not endorsement.
Foundations & courses
| Resource | Notes |
|---|---|
| Hugging Face Transformers | Pretrained models, tokenizers, fine-tuning, and the library’s core API. |
| Stanford CS224N | Deep learning for NLP—word vectors, attention, Transformers, and applications. |
| DeepLearning.AI | Short courses on LLMs, prompt engineering, agents, RAG, and tooling. |
| The Illustrated Transformer | Visual explanation of attention and encoder–decoder architectures. |
| Lil’Log — LLM topics | Surveys and notes on agents, prompting, RLHF, and related systems. |
| Papers with Code | Tasks, papers, code, and leaderboards across ML and vision domains. |
| PyTorch documentation | Tensors, autograd, modules, and distributed training—common for LLM research. |
| Stanford CS336 — Language Modeling from Scratch | End-to-end LM course: tokenization, training, data, scaling, alignment, and systems. |
| Hugging Face PEFT | Parameter-efficient fine-tuning—LoRA, adapters, and prompt tuning. |
| Hugging Face TRL | Train LMs with supervised fine-tuning, DPO, PPO, and reward modeling. |
| fast.ai | Practical deep learning top-down; useful foundation before LLM specialization. |
| Attention Is All You Need (paper) | Original Transformer architecture—baseline reference for modern LLMs. |
Cookbooks, agents & RAG
| Resource | Notes |
|---|---|
| Hugging Face course | Transformers, datasets, tokenization, and NLP/LLM pipelines. |
| OpenAI Cookbook | Patterns for embeddings, RAG, function calling, and evaluation. |
| Anthropic Cookbook | Claude SDK examples, prompts, tool use, and long-context workflows. |
| LangChain documentation | Chains, tools, agents, memory, and retrieval integrations. |
| LlamaIndex | Data connectors, indexing, RAG, and agent workflows over documents. |
| DSPy | Programs and optimizers for LM pipelines—prompts as modules. |
| AutoGen (Microsoft) | Multi-agent conversations, tool use, and orchestration. |
| vLLM | Fast LLM serving with PagedAttention, OpenAI-compatible APIs. |
| Semantic Kernel (Microsoft) | Plugins, planners, and connectors for LLM apps in .NET and Python. |
| Haystack | Pipelines for RAG, retrieval, and document QA at scale. |
| Langfuse | Tracing, evals, and observability for LLM applications. |
| PyTorch tutorials | Official tutorials from basics to NLP and distributed training. |
| NVIDIA Triton Inference Server | Production model serving, dynamic batching, and multi-framework backends. |
Model & inference APIs
| Resource | Notes |
|---|---|
| OpenAI API | Chat, embeddings, images, audio, batch, and assistants. |
| Anthropic API | Claude models and Messages API. |
| Google AI for Developers | Gemini and related Google AI APIs. |
| Mistral AI | Chat, embeddings, and fine-tuning. |
| Cohere | Command, embed, classify, and RAG-oriented APIs. |
| Hugging Face Inference | Hosted inference for models on the Hub. |
| Replicate | Run open models via API and webhooks. |
| Together AI | Open-weight LLM inference and fine-tuning APIs. |
| Ollama | Local LLM serving; model library and CLI. |
| Groq | Very fast inference API for supported open models (LPU-backed). |
| OpenRouter | Unified API routing across many model providers and open weights. |
| Azure OpenAI Service | Enterprise OpenAI models on Azure with regional deployment and policies. |
| Vertex AI (Google Cloud) | Gemini, tuning, and MLOps on GCP. |
| Fireworks AI | Fast inference and fine-tuning for open models. |
Benchmarks & leaderboards
| Resource | Notes |
|---|---|
| LMSYS Chatbot Arena | Crowd-ranked LLM comparisons via blind side-by-side votes. |
| Open LLM Leaderboard | Open-weight models on common benchmarks (Hugging Face). |
| HELM (Stanford) | Holistic evaluation of language models across scenarios and metrics. |
| lm-evaluation-harness | Standard benchmark suite for many LLMs; widely used in papers and repos. |
| Papers with Code — SOTA | State-of-the-art tasks across NLP, vision, and more. |
| arXiv cs.CL | Recent computation and language papers—preprints. |
| LiveBench | Contamination-aware benchmark with frequently refreshed questions. |
| SWE-bench | Real GitHub issues—tests coding agents on repository-scale tasks. |
| Artificial Analysis | Independent comparisons of model quality, speed, and price. |
| MTEB Leaderboard | Massive Text Embedding Benchmark—retrieval and embedding model rankings. |