🎓 Deep Mastery (30+ hours)

Train a complete LLM from scratch

🎯 Learning goals

After 30 hours you will be able to:

✅ Train a usable LLM from scratch
✅ Understand the full training pipeline
✅ Debug and optimize training runs
✅ Fine-tune models on your own data

📋 Learning path

Week 1: Fundamentals (6 hours)

✅ Complete 📚 Systematic Study

Week 2: Data preparation (8 hours)

1. Tokenizer training (2 hours)

bash

python scripts/train_tokenizer.py

What to learn:

Understand the BPE algorithm
Train a custom tokenizer

2. Data cleaning and preprocessing (4 hours)

Read dataset/lm_dataset.py
Understand packing strategies
Create your own dataset

3. Data format conversion (2 hours)

Pretrain format
SFT format
DPO format

Completion criteria:

[ ] Train a custom tokenizer
[ ] Understand data formats for each training stage
[ ] Prepare training data

Week 3: Model training (10 hours)

1. Pretraining (4 hours)

bash

cd trainer
python train_pretrain.py \
    --data_path ../dataset/pretrain_hq.jsonl \
    --epochs 1 \
    --batch_size 32 \
    --hidden_size 512 \
    --num_hidden_layers 8

Key points:

Understand the causal language modeling objective
Monitor training curves (loss, learning rate)
Debug common issues (NaN, OOM)

2. Supervised Fine-Tuning (3 hours)

bash

python train_full_sft.py \
    --data_path ../dataset/sft_mini_512.jsonl \
    --from_weight pretrain

Key points:

Understand the role of instruction tuning
Compare pretrain vs SFT behavior

3. LoRA Fine-Tuning (3 hours)

bash

python train_lora.py \
    --data_path ../dataset/lora_identity.jsonl \
    --from_weight full_sft

Key points:

Understand parameter-efficient fine-tuning (PEFT)
Learn the math behind LoRA
Domain adaptation strategy

Completion criteria:

[ ] Train a small model successfully (perplexity < 3.0)
[ ] Understand the full path: pretrain → SFT → LoRA
[ ] Debug training issues

Week 4: Advanced topics (6+ hours)

Optional track 1: RLHF / RLAIF (4 hours)

DPO (Direct Preference Optimization)
PPO/GRPO (Reinforcement Learning)

Optional track 2: Inference optimization (2 hours)

KV cache
Flash Attention
Quantization (INT8/INT4)

Optional track 3: Evaluation and analysis (2 hours)

C-Eval / MMLU benchmarks
Error analysis
Ablation studies

🔗 References

📖 MiniMind upstream repo
📝 CLAUDE.md - full command reference
💻 Training scripts - all training code

📝 Learning tips

1. Experiment first, then theory

❌ Do not read all the docs before trying things ✅ Run experiments first to build intuition, then read theory

2. Learn by comparison

Each module answers via experiments:

What breaks if we don’t do this?
Why do other options fail?

3. Iterate in passes

First pass: skim to grasp the big picture
Second pass: go deep on details and math
Third pass: implement yourself to solidify understanding

4. Keep notes

Record your progress in Learning Log

🎯 Checklist

Fundamentals

[ ] Complete Systematic Study (6 hours)

Data preparation

[ ] Train a custom tokenizer
[ ] Prepare training data
[ ] Understand data formats

Model training

[ ] Finish Pretrain
[ ] Finish SFT
[ ] Finish LoRA

Advanced topics

[ ] Try RLHF/RLAIF
[ ] Optimize inference performance
[ ] Evaluate model quality

Ready to begin the deep mastery journey? 🚀

🎓 Deep Mastery (30+ hours) ​

🎯 Learning goals ​

📋 Learning path ​

Week 1: Fundamentals (6 hours) ​

Week 2: Data preparation (8 hours) ​

1. Tokenizer training (2 hours) ​

2. Data cleaning and preprocessing (4 hours) ​

3. Data format conversion (2 hours) ​

Week 3: Model training (10 hours) ​

1. Pretraining (4 hours) ​

2. Supervised Fine-Tuning (3 hours) ​

3. LoRA Fine-Tuning (3 hours) ​

Week 4: Advanced topics (6+ hours) ​

Optional track 1: RLHF / RLAIF (4 hours) ​

Optional track 2: Inference optimization (2 hours) ​

Optional track 3: Evaluation and analysis (2 hours) ​

🔗 References ​

📝 Learning tips ​

1. Experiment first, then theory ​

2. Learn by comparison ​

3. Iterate in passes ​

4. Keep notes ​

🎯 Checklist ​

Fundamentals ​

Data preparation ​

Model training ​

Advanced topics ​

🎓 Deep Mastery (30+ hours)

🎯 Learning goals

📋 Learning path

Week 1: Fundamentals (6 hours)

Week 2: Data preparation (8 hours)

1. Tokenizer training (2 hours)

2. Data cleaning and preprocessing (4 hours)

3. Data format conversion (2 hours)

Week 3: Model training (10 hours)

1. Pretraining (4 hours)

2. Supervised Fine-Tuning (3 hours)

3. LoRA Fine-Tuning (3 hours)

Week 4: Advanced topics (6+ hours)

Optional track 1: RLHF / RLAIF (4 hours)

Optional track 2: Inference optimization (2 hours)

Optional track 3: Evaluation and analysis (2 hours)

🔗 References

📝 Learning tips

1. Experiment first, then theory

2. Learn by comparison

3. Iterate in passes

4. Keep notes

🎯 Checklist

Fundamentals

Data preparation

Model training

Advanced topics