MiniMind Modular Teaching

Understand every LLM design choice through controlled experiments

📚 Module Navigator

🧱 Tier 1: Foundation (core components)

Core question: How do the basic building blocks of a Transformer work?

Module	Core question	Time	Status
01-normalization	Why normalization? Pre-LN vs Post-LN?	1 hour	✅
02-position-encoding	Why RoPE? How does extrapolation work?	1.5 hours	✅
03-attention	What is the intuition behind QKV? Why multi-head?	2 hours	✅
04-feedforward	What does FFN store? Why expansion?	1 hour	✅

Completion criteria:

✅ Understand the math behind each component
✅ Run controlled experiments and observe what breaks if removed
✅ Explain design choices in your own words

🏗️ Tier 2: Architecture (assembly)

Core question: How do we assemble components into a full Transformer?

Module	Core question	Time	Status
01-residual-connection	Why residuals? How do they stabilize gradients?	1 hour	📋
02-transformer-block	Why this assembly order?	1.5 hours	📋

Completion criteria:

✅ Understand residual connections
✅ Understand why Pre-Norm works better
✅ Implement a Transformer block from scratch

🚀 Tier 3: Training

(Planned)

🎓 Tier 4: Advanced

(Planned)

📋 System Requirements

Python Version

Recommended: Python 3.10+
Minimum: Python 3.10

Some utility code uses Python 3.10+ union type syntax (e.g., str | list). Earlier versions will not work.

Dependencies

bash

pip install torch requests datasets matplotlib numpy

See: Environment Setup Guide

⚡ Quick Start

Environment setup

bash

# 1. Activate your virtual environment
source venv/bin/activate

# 2. Download experiment data (~60 MB)
cd modules/common
python data_sources.py --download-all

30-minute quick experience

Run three key experiments to grasp core design choices:

bash

# Experiment 1: Why normalization? (5 min)
cd modules/01-foundation/01-normalization/experiments
python exp1_gradient_vanishing.py

# Experiment 2: Why RoPE? (10 min)
cd ../../02-position-encoding/experiments
python exp2_rope_vs_absolute.py --quick

# Experiment 3: Why residual connections? (5 min)
cd ../../../02-architecture/01-residual-connection/experiments
python exp1_with_vs_without.py --quick

Systematic study path

Recommended order:

Foundation layer (5.5 hours)
- Study 01 → 02 → 03 → 04 in order
- Each module: read teaching.md → run experiments → finish quiz
Architecture layer (2.5 hours)
- Learn how to assemble components
Practice project (optional)
- Train a tiny model from scratch
- Test on a real task

📖 Learning method

The recommended flow for each module

1. Read README.md        # Overview (5 min)
   ↓
2. Read teaching.md      # Core concepts (20 min)
   ↓
3. Run experiments       # Validate theory (20 min)
   ↓
4. Read code_guide.md    # Understand implementation (10 min)
   ↓
5. Finish quiz.md        # Self-check (5 min)

Experiment usage

All experiments support:

bash

# Full run (recommended)
python exp_xxx.py

# Quick mode (concept check, < 2 min)
python exp_xxx.py --quick

# Help
python exp_xxx.py --help

Experiment results are saved under each module’s experiments/results/ directory.

🎯 Design philosophy

1️⃣ Principles first, not command copying

❌ “Run this command and you’ll get a model”
✅ “Understand why the design works”

2️⃣ Validate with controlled experiments

Each design choice answers:

What breaks if we remove it?
Why do other options fail?

3️⃣ Progressive learning

Single components → assembled architecture → full training
Clear goals and validation at every step

4️⃣ Runs on a normal laptop

Experiments use TinyShakespeare (1MB) or TinyStories (10–50MB)
No GPU required (CPU/MPS works)
Each experiment < 10 minutes

🛠️ Common tools

Shared tools live in modules/common/:

data_sources.py - Dataset manager

python

from modules.common.data_sources import get_experiment_data

# TinyShakespeare
text = get_experiment_data('shakespeare')

# TinyStories subset
texts = get_experiment_data('tinystories', size_mb=10)

experiment_base.py - Experiment base class

python

from modules.common.experiment_base import Experiment

class MyExperiment(Experiment):
    def run(self):
        # experiment code
        pass

visualization.py - Visualization helpers

python

from modules.common.visualization import (
    plot_attention_heatmap,
    plot_activation_distribution,
    plot_gradient_flow,
    plot_loss_curves
)

See docstrings in each file or modules/common/README.md for details.

⚠️ Migration Notice

2026-02: datasets.py has been renamed to data_sources.py to avoid naming conflict with HuggingFace datasets library.

For detailed migration guide, see modules/common/README.md or PR #20.

🤝 Contribution guide

Contributions welcome:

New controlled experiments
Better intuitive analogies
Visualizations
Bug fixes

Before submitting, please ensure:

[ ] Experiments run independently
[ ] Code has sufficient Chinese comments
[ ] Results are reproducible (fixed random seeds)
[ ] Follows the existing file structure

📜 Acknowledgements

This teaching module is based on jingyaogong/minimind.

All experiments link to real implementations in the upstream repository to help learners understand production-grade code.

MiniMind Modular Teaching ​

📚 Module Navigator ​

🧱 Tier 1: Foundation (core components) ​

🏗️ Tier 2: Architecture (assembly) ​

🚀 Tier 3: Training ​

🎓 Tier 4: Advanced ​

📋 System Requirements ​

Python Version ​

Dependencies ​

⚡ Quick Start ​

Environment setup ​

30-minute quick experience ​

Systematic study path ​

📖 Learning method ​

The recommended flow for each module ​

Experiment usage ​

🎯 Design philosophy ​

1️⃣ Principles first, not command copying ​

2️⃣ Validate with controlled experiments ​

3️⃣ Progressive learning ​

4️⃣ Runs on a normal laptop ​

🛠️ Common tools ​

data_sources.py - Dataset manager ​

experiment_base.py - Experiment base class ​

visualization.py - Visualization helpers ​

⚠️ Migration Notice ​

🤝 Contribution guide ​

📜 Acknowledgements ​

📞 Related documents ​

MiniMind Modular Teaching

📚 Module Navigator

🧱 Tier 1: Foundation (core components)

🏗️ Tier 2: Architecture (assembly)

🚀 Tier 3: Training

🎓 Tier 4: Advanced

📋 System Requirements

Python Version

Dependencies

⚡ Quick Start

Environment setup

30-minute quick experience

Systematic study path

📖 Learning method

The recommended flow for each module

Experiment usage

🎯 Design philosophy

1️⃣ Principles first, not command copying

2️⃣ Validate with controlled experiments

3️⃣ Progressive learning

4️⃣ Runs on a normal laptop

🛠️ Common tools

data_sources.py - Dataset manager

experiment_base.py - Experiment base class

visualization.py - Visualization helpers

⚠️ Migration Notice

🤝 Contribution guide

📜 Acknowledgements

📞 Related documents