Local LLMs β€” Start HereΒΆ

Run powerful language models completely locally β€” no API keys, no usage costs, full data privacy.

Why Run LLMs Locally?ΒΆ

  • Privacy: Data never leaves your machine

  • Cost: Zero per-token charges at inference time

  • Offline: Works without internet

  • Customization: Fine-tune for your specific use case

  • Control: No rate limits, no terms of service restrictions

Notebooks in This PhaseΒΆ

Notebook

Topic

01_ollama_quickstart.ipynb

Ollama β€” run Llama 3, Mistral, Gemma locally

02_open_source_models_overview.ipynb

Model landscape: Llama 3, Mistral, Phi-3, Gemma

03_local_rag_with_ollama.ipynb

Build a fully local RAG system

04_llm_server_and_api.ipynb

Serve models via OpenAI-compatible API

Tools You’ll UseΒΆ

Tool

Purpose

Ollama

Easy local model management and serving

llama.cpp

Low-level inference, GGUF format

LM Studio

GUI for local model management

vLLM

High-throughput serving for production

Transformers

HuggingFace inference for any model

AI Toolkit

VS Code extension: model catalog, playground, fine-tuning, evaluation

Top Local Models (2026)ΒΆ

Model

Size

Strength

Llama 3.3 70B

70B

Best overall open source

Mistral 7B

7B

Fast, great for most tasks

Phi-4

14B

Microsoft, strong reasoning

Gemma 2 9B

9B

Google, efficient

DeepSeek-R1

7B-70B

Strong reasoning, open weights

Qwen 2.5

7B-72B

Multilingual, code

PrerequisitesΒΆ

  • RAG Systems (Phase 08)

  • Install Ollama: https://ollama.ai then ollama pull llama3.3

Learning PathΒΆ

01_ollama_quickstart.ipynb       ← Install Ollama first
02_open_source_models_overview.ipynb
03_local_rag_with_ollama.ipynb
05_speculative_decoding.ipynb

06_ai_toolkit_vscode.md          ← VS Code AI Toolkit deep dive