Practice Labs: SLP (Jurafsky/Martin)¶

Source PDF: ed3book_jan26.pdf Book: Speech and Language Processing (3rd Edition draft) by Daniel Jurafsky & James H. Martin

This book is a comprehensive NLP textbook covering tokenization, language models, embeddings, neural networks, transformers, and LLMs. Labs follow the book’s chapter order.

Labs¶

Lab	Topic	Book Chapter(s)	Key Concepts
Lab 01	Words, Tokens & Text Processing	Ch 2: Words and Tokens	BPE tokenization, regex, edit distance
Lab 02	N-gram Language Models	Ch 3: N-gram Language Models	N-grams, perplexity, smoothing, text generation
Lab 03	Word Embeddings	Ch 5: Embeddings	Co-occurrence, TF-IDF, Word2Vec, cosine similarity
Lab 04	Neural Networks from Scratch	Ch 6: Neural Networks	XOR, feedforward nets, backprop, optimizers
Lab 05	Transformers & Attention	Ch 8: Transformers	Self-attention, multi-head, positional encoding
Lab 06	Large Language Models	Ch 7, 9, 10: LLMs, MLMs, Post-training	Sampling, prompting, BERT masking, RLHF

How to Use¶

Each lab is a Jupyter notebook with theory (markdown) and fully implemented code cells
Read the theory cells, study the implementations, and run each cell
Open in Jupyter: jupyter notebook lab_01_words_tokens.ipynb

Prerequisites¶

Python 3.8+
NumPy
Matplotlib

Suggested Order¶

Follow the labs in order (1 through 6) as they build upon each other:

Lab 01 - Text processing fundamentals
Lab 02 - Statistical language models
Lab 03 - Word representations
Lab 04 - Neural network foundations
Lab 05 - Transformer architecture
Lab 06 - Modern LLMs and applications