Practice Labs: SLP (Jurafsky/Martin)ΒΆ
Source PDF: ed3book_jan26.pdf Book: Speech and Language Processing (3rd Edition draft) by Daniel Jurafsky & James H. Martin
This book is a comprehensive NLP textbook covering tokenization, language models, embeddings, neural networks, transformers, and LLMs. Labs follow the bookβs chapter order.
LabsΒΆ
Lab |
Topic |
Book Chapter(s) |
Key Concepts |
|---|---|---|---|
Words, Tokens & Text Processing |
Ch 2: Words and Tokens |
BPE tokenization, regex, edit distance |
|
N-gram Language Models |
Ch 3: N-gram Language Models |
N-grams, perplexity, smoothing, text generation |
|
Word Embeddings |
Ch 5: Embeddings |
Co-occurrence, TF-IDF, Word2Vec, cosine similarity |
|
Neural Networks from Scratch |
Ch 6: Neural Networks |
XOR, feedforward nets, backprop, optimizers |
|
Transformers & Attention |
Ch 8: Transformers |
Self-attention, multi-head, positional encoding |
|
Large Language Models |
Ch 7, 9, 10: LLMs, MLMs, Post-training |
Sampling, prompting, BERT masking, RLHF |
How to UseΒΆ
Each lab is a Jupyter notebook with theory (markdown) and fully implemented code cells
Read the theory cells, study the implementations, and run each cell
Open in Jupyter:
jupyter notebook lab_01_words_tokens.ipynb
PrerequisitesΒΆ
Python 3.8+
NumPy
Matplotlib
Suggested OrderΒΆ
Follow the labs in order (1 through 6) as they build upon each other:
Lab 01 - Text processing fundamentals
Lab 02 - Statistical language models
Lab 03 - Word representations
Lab 04 - Neural network foundations
Lab 05 - Transformer architecture
Lab 06 - Modern LLMs and applications