Practice Labs: SLP (Jurafsky/Martin)ΒΆ

Source PDF: ed3book_jan26.pdf Book: Speech and Language Processing (3rd Edition draft) by Daniel Jurafsky & James H. Martin

This book is a comprehensive NLP textbook covering tokenization, language models, embeddings, neural networks, transformers, and LLMs. Labs follow the book’s chapter order.

LabsΒΆ

Lab

Topic

Book Chapter(s)

Key Concepts

Lab 01

Words, Tokens & Text Processing

Ch 2: Words and Tokens

BPE tokenization, regex, edit distance

Lab 02

N-gram Language Models

Ch 3: N-gram Language Models

N-grams, perplexity, smoothing, text generation

Lab 03

Word Embeddings

Ch 5: Embeddings

Co-occurrence, TF-IDF, Word2Vec, cosine similarity

Lab 04

Neural Networks from Scratch

Ch 6: Neural Networks

XOR, feedforward nets, backprop, optimizers

Lab 05

Transformers & Attention

Ch 8: Transformers

Self-attention, multi-head, positional encoding

Lab 06

Large Language Models

Ch 7, 9, 10: LLMs, MLMs, Post-training

Sampling, prompting, BERT masking, RLHF

How to UseΒΆ

  1. Each lab is a Jupyter notebook with theory (markdown) and fully implemented code cells

  2. Read the theory cells, study the implementations, and run each cell

  3. Open in Jupyter: jupyter notebook lab_01_words_tokens.ipynb

PrerequisitesΒΆ

  • Python 3.8+

  • NumPy

  • Matplotlib

Suggested OrderΒΆ

Follow the labs in order (1 through 6) as they build upon each other:

  1. Lab 01 - Text processing fundamentals

  2. Lab 02 - Statistical language models

  3. Lab 03 - Word representations

  4. Lab 04 - Neural network foundations

  5. Lab 05 - Transformer architecture

  6. Lab 06 - Modern LLMs and applications