MiniMind Modular Teaching
Understand every LLM design choice through controlled experiments
📚 Module Navigator
🧱 Tier 1: Foundation (core components)
Core question: How do the basic building blocks of a Transformer work?
| Module | Core question | Time | Status |
|---|---|---|---|
| 01-normalization | Why normalization? Pre-LN vs Post-LN? | 1 hour | ✅ |
| 02-position-encoding | Why RoPE? How does extrapolation work? | 1.5 hours | ✅ |
| 03-attention | What is the intuition behind QKV? Why multi-head? | 2 hours | ✅ |
| 04-feedforward | What does FFN store? Why expansion? | 1 hour | ✅ |
Completion criteria:
- ✅ Understand the math behind each component
- ✅ Run controlled experiments and observe what breaks if removed
- ✅ Explain design choices in your own words
🏗️ Tier 2: Architecture (assembly)
Core question: How do we assemble components into a full Transformer?
| Module | Core question | Time | Status |
|---|---|---|---|
| 01-residual-connection | Why residuals? How do they stabilize gradients? | 1 hour | 📋 |
| 02-transformer-block | Why this assembly order? | 1.5 hours | 📋 |
Completion criteria:
- ✅ Understand residual connections
- ✅ Understand why Pre-Norm works better
- ✅ Implement a Transformer block from scratch
🚀 Tier 3: Training
(Planned)
🎓 Tier 4: Advanced
(Planned)
⚡ Quick Start
Environment setup
# 1. Activate your virtual environment
source venv/bin/activate
# 2. Download experiment data (~60 MB)
cd modules/common
python datasets.py --download-all30-minute quick experience
Run three key experiments to grasp core design choices:
# Experiment 1: Why normalization? (5 min)
cd modules/01-foundation/01-normalization/experiments
python exp1_gradient_vanishing.py
# Experiment 2: Why RoPE? (10 min)
cd ../../02-position-encoding/experiments
python exp2_rope_vs_absolute.py --quick
# Experiment 3: Why residual connections? (5 min)
cd ../../../02-architecture/01-residual-connection/experiments
python exp1_with_vs_without.py --quickSystematic study path
Recommended order:
Foundation layer (5.5 hours)
- Study 01 → 02 → 03 → 04 in order
- Each module: read
teaching.md→ run experiments → finish quiz
Architecture layer (2.5 hours)
- Learn how to assemble components
Practice project (optional)
- Train a tiny model from scratch
- Test on a real task
📖 Learning method
The recommended flow for each module
1. Read README.md # Overview (5 min)
↓
2. Read teaching.md # Core concepts (20 min)
↓
3. Run experiments # Validate theory (20 min)
↓
4. Read code_guide.md # Understand implementation (10 min)
↓
5. Finish quiz.md # Self-check (5 min)Experiment usage
All experiments support:
# Full run (recommended)
python exp_xxx.py
# Quick mode (concept check, < 2 min)
python exp_xxx.py --quick
# Help
python exp_xxx.py --helpExperiment results are saved under each module’s experiments/results/ directory.
🎯 Design philosophy
1️⃣ Principles first, not command copying
- ❌ “Run this command and you’ll get a model”
- ✅ “Understand why the design works”
2️⃣ Validate with controlled experiments
Each design choice answers:
- What breaks if we remove it?
- Why do other options fail?
3️⃣ Progressive learning
- Single components → assembled architecture → full training
- Clear goals and validation at every step
4️⃣ Runs on a normal laptop
- Experiments use TinyShakespeare (1MB) or TinyStories (10–50MB)
- No GPU required (CPU/MPS works)
- Each experiment < 10 minutes
🛠️ Common tools
Shared tools live in modules/common/:
datasets.py - Dataset manager
from modules.common.datasets import get_experiment_data
# TinyShakespeare
text = get_experiment_data('shakespeare')
# TinyStories subset
texts = get_experiment_data('tinystories', size_mb=10)experiment_base.py - Experiment base class
from modules.common.experiment_base import Experiment
class MyExperiment(Experiment):
def run(self):
# experiment code
passvisualization.py - Visualization helpers
from modules.common.visualization import (
plot_attention_heatmap,
plot_activation_distribution,
plot_gradient_flow,
plot_loss_curves
)See docstrings in each file for details.
🤝 Contribution guide
Contributions welcome:
- New controlled experiments
- Better intuitive analogies
- Visualizations
- Bug fixes
Before submitting, please ensure:
- [ ] Experiments run independently
- [ ] Code has sufficient Chinese comments
- [ ] Results are reproducible (fixed random seeds)
- [ ] Follows the existing file structure
📜 Acknowledgements
This teaching module is based on jingyaogong/minimind.
All experiments link to real implementations in the upstream repository to help learners understand production-grade code.