Run this notebook: Open in Colab Open in Kaggle

Local LLMs — Start Here¶

Run powerful language models completely locally — no API keys, no usage costs, full data privacy.

Why Run LLMs Locally?¶

Privacy: Data never leaves your machine
Cost: Zero per-token charges at inference time
Offline: Works without internet
Customization: Fine-tune for your specific use case
Control: No rate limits, no terms of service restrictions

Notebooks in This Phase¶

Notebook	Topic
`01_ollama_quickstart.ipynb`	Ollama — run Llama 3, Mistral, Gemma locally
`02_open_source_models_overview.ipynb`	Model landscape: Llama 3, Mistral, Phi-3, Gemma
`03_local_rag_with_ollama.ipynb`	Build a fully local RAG system
`04_llm_server_and_api.ipynb`	Serve models via OpenAI-compatible API

Tools You’ll Use¶

Tool	Purpose
Ollama	Easy local model management and serving
llama.cpp	Low-level inference, GGUF format
LM Studio	GUI for local model management
vLLM	High-throughput serving for production
Transformers	HuggingFace inference for any model
AI Toolkit	VS Code extension: model catalog, playground, fine-tuning, evaluation

Top Local Models (2026)¶

Model	Size	Strength
Llama 3.3 70B	70B	Best overall open source
Mistral 7B	7B	Fast, great for most tasks
Phi-4	14B	Microsoft, strong reasoning
Gemma 2 9B	9B	Google, efficient
DeepSeek-R1	7B-70B	Strong reasoning, open weights
Qwen 2.5	7B-72B	Multilingual, code

Prerequisites¶

RAG Systems (Phase 08)
Install Ollama: https://ollama.ai then ollama pull llama3.3

Learning Path¶

01_ollama_quickstart.ipynb       ← Install Ollama first
02_open_source_models_overview.ipynb
03_local_rag_with_ollama.ipynb
05_speculative_decoding.ipynb

06_ai_toolkit_vscode.md          ← VS Code AI Toolkit deep dive