RAGΒΆ
π― OverviewΒΆ
Combine your skills from previous phases to build production-grade RAG systems!
Prerequisites:
β Tokenization (Phase 4)
β Embeddings (Phase 5)
β Neural Networks (Phase 6)
β Vector Databases (Phase 7)
Time: 3-4 weeks | 60-80 hours
Outcome: Build AI applications that can query your knowledge base
π What Youβll LearnΒΆ
Core RAG ConceptsΒΆ
RAG architecture and pipeline
Document processing and chunking strategies
Retrieval methods (dense, sparse, hybrid)
Context management and prompt construction
Re-ranking and result filtering
LLM integration (OpenAI, Anthropic, local models)
Advanced RAG TechniquesΒΆ
Hybrid search (vector + keyword)
Query transformation and expansion
Multi-query retrieval
Parent-document retrieval
Self-query and metadata filtering
Conversation memory and context
HyDE and hypothetical-answer retrieval
Contextual compression and segment extraction
Cross-encoder reranking
Hierarchical retrieval, RAPTOR, and parent-child indexing
Corrective RAG (CRAG) and self-reflective retrieval loops
GraphRAG, multimodal RAG, and agentic retrieval
ποΈ Module StructureΒΆ
08-rag/
βββ 00_START_HERE.ipynb # RAG overview and quick demo
βββ 01_basic_rag.ipynb # Simple RAG from scratch
βββ 02_document_processing.ipynb # Chunking strategies
βββ 03_langchain_rag.ipynb # Using LangChain framework
βββ 04_llamaindex_rag.ipynb # Using LlamaIndex framework
βββ 05_advanced_retrieval.ipynb # Hybrid search, re-ranking
βββ 06_conversation_rag.ipynb # Chat with memory
βββ 07_evaluation.ipynb # RAG evaluation metrics
βββ 08_hyde_reranking.ipynb # HyDE-style query expansion plus reranking
βββ 08_rag_evaluation_playbook.md # How to benchmark RAG improvements
βββ 08_rag_technique_selection.md # How to choose the right RAG upgrade
βββ 09_advanced_retrieval.ipynb # Parent-child retrieval, ensemble
βββ 10_graphrag_visual_rag.ipynb # GraphRAG and multimodal RAG
βββ 11_corrective_rag.ipynb # CRAG-style retrieval grading, retry, abstention
βββ 12_parent_child_retrieval.ipynb # Structured retrieval with chunk-to-parent expansion
βββ 13_raptor_retrieval.ipynb # RAPTOR-style hierarchical summary-tree retrieval
βββ assignment.md # Phase assignment
βββ challenges.md # Hands-on challenges
βββ README.md # This file
π Quick StartΒΆ
1. Basic RAG PipelineΒΆ
# The fundamental RAG flow:
# 1. Index documents β embeddings β vector DB
# 2. User query β embedding β similarity search
# 3. Retrieved docs + query β LLM β answer
from sentence_transformers import SentenceTransformer
from your_vector_db import VectorDB # Chroma, Qdrant, etc.
from openai import OpenAI
# 1. Index your documents
# Use any embedding model - see 05-embeddings/embedding_comparison.md for options
# API: Gemini Embedding (cheapest + best), Voyage 3.5, or OpenAI
# Local: Qwen3-Embedding, BGE-M3, or all-MiniLM-L6-v2
model = SentenceTransformer('all-MiniLM-L6-v2') # local, fast
docs = ["Your documents here..."]
embeddings = model.encode(docs)
db.add(documents=docs, embeddings=embeddings)
# 2. Retrieve relevant context
query = "What is RAG?"
query_embedding = model.encode(query)
results = db.search(query_embedding, top_k=3)
# 3. Generate answer with LLM (Claude, GPT, Gemini, or local)
context = "\n".join(results)
prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
response = llm.generate(prompt)
π Learning PathΒΆ
Week 1: RAG FundamentalsΒΆ
Complete
00_START_HERE.ipynbBuild basic RAG in
01_basic_rag.ipynbLearn chunking strategies in
02_document_processing.ipynbProject: Simple Q&A on your documents
Week 2: RAG FrameworksΒΆ
Learn LangChain in
03_langchain_rag.ipynbExplore LlamaIndex in
04_llamaindex_rag.ipynbCompare frameworks and choose your favorite
Project: Build a research paper assistant
Week 3: Advanced TechniquesΒΆ
Implement hybrid search in
05_advanced_retrieval.ipynbAdd conversation memory in
06_conversation_rag.ipynbLearn evaluation in
07_evaluation.ipynbProject: Code search system for your repos
Week 4: Production ProjectΒΆ
Build end-to-end RAG application
Add proper error handling
Implement caching and optimization
Deploy as API (preview of Phase 9)
Capstone: Personal knowledge assistant
Optional Week 5: Modern RAG Deep DivesΒΆ
Explore HyDE / query rewriting / query decomposition patterns
Work through
08_hyde_reranking.ipynbto compare baseline retrieval vs HyDE + rerankingCompare reranking, contextual compression, and relevant-segment extraction
Work through
11_corrective_rag.ipynbto add retrieval grading, retry logic, and abstentionWork through
12_parent_child_retrieval.ipynbbefore moving to RAPTOR or GraphRAGWork through
13_raptor_retrieval.ipynbto compare flat, parent-child, and tree-based retrievalStudy CRAG, Self-RAG, and retrieval-with-feedback loops
Review RAPTOR, GraphRAG, and multimodal RAG architectures
Build a small benchmark to compare at least 3 advanced techniques
π§ Modern RAG Technique MapΒΆ
The cloned RAG_Techniques repository is strong because it does not treat RAG as one pattern. It treats RAG as a family of retrieval control strategies. Use this map to understand which techniques matter and when.
Problem |
Techniques to Study |
Why It Helps |
When to Use |
|---|---|---|---|
Queries are vague or underspecified |
Query rewriting, query decomposition, multi-query retrieval, HyDE |
Makes retrieval better aligned with user intent |
User asks short, ambiguous, or multi-part questions |
Chunks lose too much context |
Semantic chunking, proposition chunking, contextual headers, window expansion |
Preserves meaning while keeping retrieval precise |
Long docs, technical manuals, research papers |
Retriever finds partly-right docs |
Hybrid retrieval, reranking, contextual compression, segment extraction |
Improves top-k quality before the LLM sees context |
Large corpora, noisy search results, enterprise docs |
Questions require structure beyond flat chunks |
Parent-child retrieval, hierarchical indices, RAPTOR, GraphRAG |
Retrieves summaries, entities, relationships, and larger context blocks |
Multi-hop reasoning, long reports, knowledge graphs |
System hallucinates or retrieves weak evidence |
Reliable RAG, CRAG, Self-RAG, feedback loops |
Adds validation and correction before final answer |
High-stakes workflows, compliance, research, support |
Queries span text, tables, and images |
Multimodal RAG, caption-based retrieval, visual RAG, ColPali-style retrieval |
Brings non-text content into the retrieval loop |
PDFs, dashboards, slide decks, diagrams |
Workflow needs tools and planning |
Agentic RAG, retrieval orchestration, tool selection |
Lets the system choose retrieval tools dynamically |
Complex research agents, enterprise copilots |
Suggested progressionΒΆ
Learn the baseline pipeline first: chunk, embed, retrieve, answer.
Improve retrieval quality next: hybrid search, reranking, metadata filters.
Improve query understanding after that: rewriting, multi-query, HyDE.
Add reliability controls next: compression, validation, CRAG or Self-RAG.
Only then move into GraphRAG, agentic RAG, and multimodal retrieval.
This ordering matters. Most weak RAG systems fail because teams jump to advanced architecture before fixing chunking, retrieval quality, and evaluation.
Companion guideΒΆ
Use 08_rag_technique_selection.md if you want a compact decision guide for choosing between HyDE, reranking, compression, RAPTOR, CRAG, Self-RAG, and GraphRAG.
Use 08_rag_evaluation_playbook.md if you want a practical framework for benchmarking retrieval quality, answer quality, latency, and failure behavior.
π οΈ Technologies Youβll UseΒΆ
LLM Frameworks:
LangChain - Most popular, extensive ecosystem
LlamaIndex - Best for document indexing
Haystack - Production-focused
LLM Providers:
OpenAI (GPT-5.4, GPT-4.1, GPT-4.1-mini)
Anthropic (Claude Sonnet 4.6, Haiku 4.5)
Google (Gemini 3.1 Pro, Flash)
Local models (Qwen 3, Llama 4, DeepSeek R1 via Ollama)
Vector Databases:
Use what you learned in Phase 7!
Chroma, Qdrant, Weaviate, Milvus
Embeddings:
OpenAI embeddings (text-embedding-3-small/large)
Sentence Transformers (all-MiniLM-L6-v2, all-mpnet-base-v2)
Cohere embeddings
π Key Concepts ExplainedΒΆ
1. RAG PipelineΒΆ
flowchart TD
A[Documents] --> B[Split into Chunks]
B --> C[Embed]
C --> D[Store in Vector DB]
E[User Query] --> F[Embed Query]
F --> G[Similarity Search]
D --> G
G --> H[Retrieve Top-K]
H --> I[Retrieved Docs + Query]
I --> J[LLM Prompt]
J --> K[Answer]
2. Chunking StrategiesΒΆ
Fixed-size chunks:
chunk_size = 512 # tokens or characters
overlap = 50 # overlap between chunks
Semantic chunks:
Split by paragraphs, sentences
Preserve document structure
Maintain context boundaries
Recursive splitting:
Try different separators (\n\n, \n, ., space)
Preserve hierarchy
3. Retrieval MethodsΒΆ
Dense (Vector Search):
Semantic similarity
Works for paraphrased queries
Requires embeddings
Sparse (Keyword Search):
BM25, TF-IDF
Exact keyword matching
Fast and interpretable
Hybrid:
Combine both approaches
Re-rank with RRF (Reciprocal Rank Fusion)
Best of both worlds
4. What Upgrades a Good RAG System into a Strong OneΒΆ
Query-side upgrades:
Rewrite vague questions into standalone queries
Generate multiple retrieval queries and merge the results
Use HyDE when the question is abstract and semantic similarity is weak
Document-side upgrades:
Use semantic or proposition chunking when fixed windows lose meaning
Add headers, summaries, or parent references to each chunk
Use hierarchical retrieval for long documents and section-level reasoning
Ranking-side upgrades:
Retrieve broad candidate sets first
Re-rank with a cross-encoder or reranker model
Compress context so the generator only sees the best evidence
Control-loop upgrades:
Detect low-confidence retrieval before answering
Retry with transformed queries when the first pass is weak
Add answer verification or evidence grading for high-risk use cases
π― ProjectsΒΆ
Project 1: Personal Documentation Q&AΒΆ
Build a chatbot that answers questions about your personal notes, docs, PDFs.
Features:
Upload PDFs, TXTs, Markdown files
Chunk and embed documents
Conversational interface
Source citation
Project 2: Code Search EngineΒΆ
Semantic search across your GitHub repositories.
Features:
Index code files (Python, JavaScript, etc.)
Search by intent (βhow to connect to database?β)
Show relevant code snippets
Explain code functionality
Project 3: Research AssistantΒΆ
Query academic papers and scientific literature.
Features:
Process research papers (PDFs)
Extract citations and references
Summarize papers
Compare multiple papers
Project 4: Customer Support BotΒΆ
RAG-powered FAQ system.
Features:
Index support documentation
Handle common questions
Escalate to human when needed
Track conversation context
Project 5: Advanced Enterprise SearchΒΆ
Build a RAG system that combines multiple retrieval strategies and exposes evidence quality.
Features:
Query rewriting and multi-query retrieval
Hybrid retrieval plus reranking
Metadata-aware filtering
Confidence scoring and answer verification
Failure routing: answer, abstain, or ask follow-up
π Evaluation MetricsΒΆ
Retrieval QualityΒΆ
Precision@K: Relevant docs in top K results
Recall@K: % of relevant docs retrieved
MRR (Mean Reciprocal Rank): Position of first relevant result
NDCG: Normalized Discounted Cumulative Gain
Generation QualityΒΆ
Faithfulness: Answer grounded in context
Relevance: Answer addresses the question
Correctness: Factually accurate
Human evaluation: User satisfaction
System MetricsΒΆ
Latency: Response time
Cost: API costs per query
Cache hit rate: Efficiency
π‘ Best PracticesΒΆ
Document ProcessingΒΆ
β
Chunk size: 256-1024 tokens (experiment!)
β
Overlap: 10-20% of chunk size
β
Preserve metadata (source, date, author)
β
Clean text (remove headers, footers)
RetrievalΒΆ
β
Retrieve 3-10 documents (balance context vs noise)
β
Use hybrid search when possible
β
Re-rank results for better quality
β
Filter by metadata when relevant
PromptingΒΆ
β
Provide clear instructions
β
Include relevant context only
β
Ask LLM to cite sources
β
Handle βI donβt knowβ cases
ProductionΒΆ
β
Cache embeddings and results
β
Monitor LLM costs
β
Implement rate limiting
β
Add error handling and retries
β
Benchmark retrieval variants before adding architectural complexity
β
Track answer faithfulness separately from answer fluency
β
Keep a failure set of hard questions and regressions
β
Prefer simpler retrieval improvements before adding agents or graphs
π ResourcesΒΆ
DocumentationΒΆ
PapersΒΆ
CoursesΒΆ
ToolsΒΆ
β Completion ChecklistΒΆ
Before moving to Phase 9 (MLOps), you should be able to:
Explain RAG architecture and benefits
Process and chunk documents effectively
Build basic RAG pipeline from scratch
Use LangChain or LlamaIndex
Implement hybrid search (dense + sparse)
Add conversation memory to chatbots
Evaluate RAG system quality
Explain when to use HyDE, reranking, contextual compression, or GraphRAG
Diagnose retrieval failures and choose the right fix
Deploy a working RAG application
Understand cost/latency tradeoffs
Handle edge cases and errors
π Whatβs Next?ΒΆ
Phase 9: MLOps & Production β
Deploy RAG as scalable API
Monitor performance and costs
CI/CD for ML systems
Cloud deployment (AWS, Azure, GCP)
Phase 10: Specializations β
Multimodal RAG (images + text)
Agent systems with RAG
Advanced prompt engineering
Ready to build your first RAG system? β Start with 00_START_HERE.ipynb
Questions? β Check the assignment.md, challenges.md, 08_rag_technique_selection.md, and 08_rag_evaluation_playbook.md for practice, technique selection, and benchmarking
π Letβs build intelligent systems that can learn from your data!