CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Repository Structure
This repository now has a dual-track system:
- Public Teaching Track (
modules/) - Modular教学架构 - Personal Learning Track (
docs/) - 个人学习记录
modules/ - Public Teaching Modules
modules/
├── common/ # Shared utilities
├── 01-foundation/ # Basic components (4 modules)
└── 02-architecture/ # Architecture assembly (2 modules)Each module contains:
README.md- Module navigationteaching.md- Teaching document (Why/What/How structure)code_guide.md- Source code walkthroughquiz.md- Self-assessment questionsexperiments/- Hands-on experiments
When user learns new content:
- Update personal learning track (
docs/) - If completing a module concept, optionally update/create module in
modules/
docs/ - Personal Learning Track
docs/
├── learning_log.md # Chronological learning record
├── knowledge_base.md # Topic-organized knowledge
└── notes.md # Index and navigationLearning Notes Maintenance
IMPORTANT: This repository uses a three-tier note system for organized learning:
Note System Structure
minimind/
├── notes.md ← 总索引(entry point)
├── learning_log.md ← 学习日志(chronological)
├── knowledge_base.md ← 知识库(topical)
└── learning_materials/ ← 可执行示例代码1. notes.md - 总索引
- Purpose: Central index and navigation hub
- Content:
- Links to other documents
- Current progress overview
- Quick reference table
- File structure diagram
- Update: When structure changes or new sections are added
2. learning_log.md - 学习日志
- Purpose: Chronological record of learning journey
- Content:
- Date-stamped entries (format:
### 2025-MM-DD: Topic) - Daily completed tasks (✅ checkbox list)
- Problems encountered and solutions (🐛 section)
- Personal thoughts and reflections (💭 section)
- Learning plans for next session
- Date-stamped entries (format:
- Update: At the end of each learning session
- Format Example:markdown
### 2025-11-07: 深度理解 Transformer 核心组件 #### ✅ 完成事项 - [x] 理解 RMSNorm 原理 - [x] 理解 RoPE 位置编码 #### 💭 个人思考 - **收获**: ... - **疑问解答**: ...
3. knowledge_base.md - 知识库
- Purpose: Systematic knowledge organization by topic
- Content:
- Technical concepts and principles (numbered sections)
- Comparison tables (e.g., RMSNorm vs LayerNorm)
- Mathematical formulas
- Code snippets with explanations
- Q&A records (separate section at bottom)
- Update: When new concepts are learned or questions are answered
- Structure:markdown
## 1. Topic Name ### 1.1 Subtopic [Detailed explanation] ## 问答记录 ### Q: Question? **A**: Answer
4. learning_materials/ - 学习辅助材料
- Purpose: Executable code examples for hands-on learning
- Content:
- Python files demonstrating concepts
- README.md with usage instructions
- Organized by topic (normalization, position encoding, attention)
- Update: When creating new learning examples
- Naming: Descriptive names like
rope_basics.py,why_normalization.py
Update Workflow
IMPORTANT: Every conversation round with new content MUST update the notes system.
After each learning session OR after answering user questions:
Update learning_log.md:
- Add new date section (if new day)
- Add new subsection for additional learning within the same day
- List completed tasks
- Record problems and solutions
- Write personal reflections
- Record user questions and answers
Update knowledge_base.md:
- Add new knowledge sections
- Add Q&A records for ALL user questions (even follow-up questions)
- Add comparison tables if needed
- Number questions sequentially (Q1, Q2, Q3...)
- Mark particularly important questions with ⭐️
Update notes.md:
- Update progress indicator
- Add new date to "按日期查找"
- Update file structure if needed
Create learning materials (if applicable):
- Write executable examples
- Update learning_materials/README.md
- Add references in learning_log.md
Update learning_materials/README.md (if new files created):
- Add new file descriptions
- Update recommended learning order
- Mark important files with ⭐️
When to Update Notes
Update notes in these scenarios:
✅ After teaching a new concept
- Add to knowledge_base.md
- Add to learning_log.md
✅ After answering user questions
- Add Q&A to knowledge_base.md
- Add reflection to learning_log.md
- Even if it's a follow-up question in the same conversation
✅ After solving a problem
- Add to learning_log.md (problems section)
✅ After creating learning materials
- Update all three files
- Add file references
✅ User explicitly requests note updates
- Follow user's guidance on what to record
Notes Update Checklist
Before ending a conversation, ensure:
- [ ] All user questions have Q&A entries in knowledge_base.md
- [ ] New learning has date-stamped entry in learning_log.md
- [ ] New files are listed in learning_materials/README.md
- [ ] Question numbers are updated sequentially
- [ ] Important discoveries are marked with ⭐️
Quick Reference: See NOTE_UPDATE_GUIDE.md for detailed templates and examples.
Interactive Learning Approach
- The user prefers to learn at a slower pace with deep understanding
- Use dialogue and questions to help clarify concepts before moving forward
- For each knowledge point, organize the Q&A discussion into knowledge_base.md
- Don't rush through multiple concepts - focus on one at a time until the user fully understands
- Create executable examples in learning_materials/ to demonstrate concepts
- Always ask if the user is ready to continue before moving to the next topic
Key Principles
Separation of Concerns:
- Chronological (learning_log.md) vs Topical (knowledge_base.md)
- Theory (knowledge_base.md) vs Practice (learning_materials/)
Easy Navigation:
- notes.md provides quick links to all sections
- Clear table of contents in each document
No Information Loss:
- When reorganizing, preserve all content
- Move, don't delete
User-Friendly:
- Clear headings and formatting
- Emoji markers for quick scanning (✅ ❌ 💡 🐛 💭 etc.)
- Code examples with explanations
Git Workflow for Learning Notes
IMPORTANT: After completing each learning session, commit and push notes to the remote repository.
Remote Repository Setup
This repository has two remotes:
origin: Main MiniMind project (https://github.com/jingyaogong/minimind.git)notes: Personal learning notes backup (https://github.com/joyehuang/minimind-notes.git)
Commit Workflow
After each learning session:
Stage note files:
bashgit add notes.md learning_log.md knowledge_base.md learning_materials/Commit with concise message:
bash# Use simple, descriptive commit messages # Examples: git commit -m "学习 RMSNorm 归一化原理" git commit -m "理解 RoPE 多频率机制" git commit -m "添加 Attention 学习材料" git commit -m "完成环境搭建和首次运行"Push to remote:
bashgit push origin master
Commit Message Guidelines
- DO: Use concise, descriptive Chinese messages (one sentence)
- DO: Focus on what was learned (e.g., "学习 Attention 机制原理")
- DON'T: Include generic phrases like "Generated with Claude Code"
- DON'T: Include emojis or formatting in commit messages
- DON'T: Make multi-paragraph commit messages
When to Commit
Commit after:
- Completing a major concept (e.g., after learning RMSNorm)
- Adding new learning materials (e.g., new .py examples)
- Solving a significant problem (documented in learning_log.md)
- End of each learning session (even if work is in progress)
Example Workflow
# After learning session on Attention
git add notes.md learning_log.md knowledge_base.md learning_materials/
git commit -m "学习 Attention 注意力机制基础"
git push origin masterImportant Notes
- DO NOT commit generated model weights, datasets, or cache files
- All commits should only include learning note files:
notes.mdlearning_log.mdknowledge_base.mdlearning_materials/*.pylearning_materials/README.mdCLAUDE.md(when updating guidelines)NOTE_UPDATE_GUIDE.md(when updating templates)
Project Overview
MiniMind is an educational implementation of a complete large language model (LLM) training pipeline from scratch. The project aims to train ultra-small language models (starting at just 25.8M parameters) using minimal resources (3 RMB + 2 hours on a single NVIDIA 3090 GPU).
Key differentiators:
- All core algorithms implemented from scratch using PyTorch (not abstracted behind third-party libraries)
- Complete training pipeline: tokenizer training, pretraining, supervised fine-tuning (SFT), LoRA, RLHF (DPO), RLAIF (PPO/GRPO/SPO), and model distillation
- Compatible with transformers, trl, peft, and third-party inference engines (llama.cpp, vllm, ollama)
- Supports both Dense and MoE (Mixture of Experts) architectures
- Includes distilled reasoning model capabilities (MiniMind-Reason, inspired by DeepSeek-R1)
Core Architecture
Model Structure
The codebase implements two main architectures:
MiniMind-Dense: Transformer Decoder-Only architecture similar to Llama3.1
- Pre-normalization with RMSNorm on inputs (not outputs)
- SwiGLU activation function (instead of ReLU)
- Rotary Position Embeddings (RoPE) instead of absolute position embeddings
- Supports YaRN algorithm for long-context extrapolation
MiniMind-MoE: Mixture of Experts based on DeepSeek-V2/V3
- Shared + routed expert architecture
- Fine-grained expert splitting
- Load balancing loss for expert utilization
Directory Structure
minimind/
├── model/ # Model implementations
│ ├── model_minimind.py # Main MiniMindConfig and MiniMindForCausalLM
│ └── model_lora.py # LoRA implementation from scratch
├── dataset/ # Dataset handling
│ └── lm_dataset.py # Dataset classes for all training stages
├── trainer/ # Training scripts (all stages)
│ ├── train_pretrain.py # Pretraining
│ ├── train_full_sft.py # Supervised fine-tuning
│ ├── train_lora.py # LoRA fine-tuning
│ ├── train_dpo.py # Direct Preference Optimization (RLHF)
│ ├── train_ppo.py # Proximal Policy Optimization (RLAIF)
│ ├── train_grpo.py # Group Relative Policy Optimization (RLAIF)
│ ├── train_spo.py # Simple Policy Optimization (RLAIF)
│ ├── train_distillation.py # White-box distillation
│ ├── train_distill_reason.py # Reasoning model distillation (R1-style)
│ └── trainer_utils.py # Shared training utilities
├── scripts/ # Inference and utilities
│ ├── train_tokenizer.py # Custom tokenizer training
│ ├── serve_openai_api.py # OpenAI-compatible API server
│ ├── web_demo.py # Streamlit web UI
│ └── convert_model.py # Model format conversion
└── eval_llm.py # Model evaluation and chat interfaceCommands Reference
Environment Setup
# Install dependencies
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simpleTesting Existing Models
# Download model (choose one)
git clone https://huggingface.co/jingyaogong/MiniMind2
# or
git clone https://www.modelscope.cn/models/gongjy/MiniMind2
# Command-line chat interface
python eval_llm.py --load_from ./MiniMind2
# Start web UI (requires streamlit)
streamlit run scripts/web_demo.py
# Third-party inference
ollama run jingyaogong/minimind2
vllm serve ./MiniMind2/ --served-model-name "minimind"Training Pipeline (all commands run from ./trainer directory)
Important: All training scripts should be executed from the ./trainer directory:
cd trainer1. Pretraining
# Single GPU
python train_pretrain.py
# Multi-GPU (DDP)
torchrun --nproc_per_node N train_pretrain.py
# Common arguments:
# --data_path ../dataset/pretrain_hq.jsonl
# --epochs 1
# --batch_size 32
# --learning_rate 5e-4
# --max_seq_len 512
# --hidden_size 512 # or 768 for larger model
# --num_hidden_layers 8 # or 16 for larger model
# --use_moe 0 # 1 to enable MoE
# --use_wandb # Enable wandb/swanlab loggingOutput: ../out/pretrain_*.pth
2. Supervised Fine-Tuning (SFT)
# Single GPU
python train_full_sft.py
# Multi-GPU
torchrun --nproc_per_node N train_full_sft.py
# Common arguments (similar to pretrain):
# --data_path ../dataset/sft_mini_512.jsonl
# --from_weight pretrain # Load pretrained weightsOutput: ../out/full_sft_*.pth
3. LoRA Fine-Tuning
python train_lora.py
# Common arguments:
# --data_path ../dataset/lora_identity.jsonl # or lora_medical.jsonl
# --from_weight full_sft # Base model to add LoRA to
# --lora_r 8 # LoRA rank
# --lora_alpha 16Output: ../out/lora/lora_*_*.pth
Test LoRA:
cd ..
python eval_llm.py --weight full_sft --lora_weight lora_medical4. RLHF: Direct Preference Optimization (DPO)
python train_dpo.py
# Common arguments:
# --data_path ../dataset/dpo.jsonl
# --from_weight full_sft
# --beta 0.1 # KL penalty coefficientOutput: ../out/dpo_*.pth
5. RLAIF: PPO/GRPO/SPO
Prerequisites:
- Download reward model to sibling directory:
cd ../.. # Go to parent of minimind
git clone https://modelscope.cn/Shanghai_AI_Laboratory/internlm2-1_8b-reward.git
# or
git clone https://huggingface.co/internlm/internlm2-1_8b-rewardcd minimind/trainer
# PPO (Proximal Policy Optimization)
python train_ppo.py
# GRPO (Group Relative Policy Optimization)
python train_grpo.py
# SPO (Simple Policy Optimization)
python train_spo.py
# Common arguments:
# --data_path ../dataset/rlaif-mini.jsonl
# --from_weight dpo
# --reward_model_path ../../internlm2-1_8b-rewardOutput: ../out/ppo_*.pth, ../out/grpo_*.pth, ../out/spo_*.pth
6. Reasoning Model Training (R1-style)
python train_distill_reason.py
# Common arguments:
# --data_path ../dataset/r1_mix_1024.jsonl
# --from_weight dpo # Usually based on RLHF model
# --max_seq_len 1024 # Match data max lengthOutput: ../out/reason_*.pth
The reasoning model uses special tags:
<think>思考过程</think>for chain-of-thought reasoning<answer>最终回答</answer>for final response
7. White-box Distillation
python train_distillation.py
# This is primarily for educational reference
# Requires teacher model of same architectureTraining Resumption
All training scripts support checkpoint resumption:
- Checkpoints saved in
../checkpoints/directory - Use
--resumeflag to continue from last checkpoint - Supports cross-GPU resumption (can change number of GPUs)
- Wandb/SwanLab logging continuity maintained
Monitoring Training
# Enable wandb (requires VPN outside China)
# Login first: wandb login
python train_*.py --use_wandb
# Enable SwanLab (China-friendly, API compatible with wandb)
# Modify import in training script: import swanlab as wandb
python train_*.py --use_wandbRunning Tests
The project doesn't have traditional unit tests, but you can evaluate models on benchmarks:
- C-Eval
- C-MMLU
- OpenBookQA
Refer to third-party evaluation frameworks or the README for detailed benchmark instructions.
Key Implementation Details
Model Configuration
Key parameters in model/model_minimind.py:
MiniMindConfig(
hidden_size=512, # 512 for small, 768 for base
num_hidden_layers=8, # 8 for small, 16 for base
num_attention_heads=8,
num_key_value_heads=2, # GQA (Grouped Query Attention)
vocab_size=6400, # Custom minimind tokenizer
rope_theta=1000000.0, # RoPE base frequency
max_position_embeddings=32768,
use_moe=False, # Enable for MoE variant
n_routed_experts=4, # MoE: number of experts
n_shared_experts=1, # MoE: shared experts
num_experts_per_tok=2, # MoE: top-k routing
flash_attn=True, # Use Flash Attention
inference_rope_scaling=False, # YaRN long-context extrapolation
)Dataset Formats
Pretrain (pretrain_hq.jsonl):
{"text": "如何才能摆脱拖延症? 治愈拖延症并不容易,但以下建议可能有所帮助..."}SFT (sft_*.jsonl):
{
"conversations": [
{"role": "user", "content": "你好"},
{"role": "assistant", "content": "你好!"},
{"role": "user", "content": "再见"},
{"role": "assistant", "content": "再见!"}
]
}DPO (dpo.jsonl):
{
"chosen": [
{"role": "user", "content": "Q"},
{"role": "assistant", "content": "good answer"}
],
"rejected": [
{"role": "user", "content": "Q"},
{"role": "assistant", "content": "bad answer"}
]
}RLAIF (rlaif-mini.jsonl):
{
"conversations": [
{"role": "user", "content": "请解释一下什么是光合作用?"},
{"role": "assistant", "content": "无"}
]
}Note: Assistant content is ignored during RLAIF training (model generates responses on-policy).
Reasoning (r1_mix_1024.jsonl): Same as SFT format, but assistant content uses <think>...</think><answer>...</answer> tags.
Tokenizer
- Custom tokenizer with 6400 vocab size (minimind_tokenizer)
- Located in
./model/tokenizer.jsonand./model/tokenizer_config.json - To train new tokenizer:
python scripts/train_tokenizer.py(usually not needed) - Uses special tokens:
<|im_start|>,<|im_end|>,<think>,<answer>,<tool_call>, etc.
Model Weights Naming Convention
Saved weights follow pattern: {stage}_{dimension}[_moe].pth
pretrain_512.pth- Pretrained small modelfull_sft_768.pth- SFT'd base modeldpo_512.pth- DPO small modelreason_768.pth- Reasoning base modelpretrain_640_moe.pth- MoE variant
Chat Template
The model uses a chat template similar to ChatML:
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>For reasoning models:
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
<think>
{reasoning_process}
</think>
<answer>
{final_answer}
</answer><|im_end|>Training Tips
Quick Start (Fastest): Use
pretrain_hq.jsonl+sft_mini_512.jsonlfor a functional chatbot in ~2 hours on a single 3090.Better Quality: Use full dataset combination:
pretrain_hq.jsonl+sft_512.jsonl+sft_2048.jsonl+dpo.jsonl(~38-122 hours depending on model size).Model Size vs Depth: For small models (<1B), "deep and narrow" architectures perform better than "wide and shallow". Prefer increasing
num_hidden_layersoverhidden_sizewhen the same parameter count.Gradient Accumulation: Use
--accumulation_stepsto simulate larger batch sizes with limited VRAM.Mixed Precision: Default dtype is
bfloat16(change with--dtype). Use autocast context for memory efficiency.Checkpoint Frequency: Default save interval is 100 steps. Adjust with
--save_intervalbased on dataset size.Long Context: For sequences >512, use YaRN by setting
inference_rope_scaling=Truein config. Train with longermax_seq_lenusingsft_1024.jsonlorsft_2048.jsonl.LoRA for Domain Adaptation: When adapting to specific domains (medical, legal, etc.), use LoRA to avoid catastrophic forgetting. Mix domain data with general SFT data.
Reasoning Models: When training R1-style models, increase loss weight on special tokens (
<think>,<answer>) to enforce format compliance (seetrain_distill_reason.py).RLAIF Reward Sparsity: For very small models, use continuous reward signals (reward models) rather than binary rule-based rewards to avoid sparse gradient problems.
Compatibility Notes
- Models trained after 2025-04-26 use updated naming convention aligned with transformers
- Old models (minimind-v1 series) are no longer maintained
- The codebase supports both checkpoint formats (
.pth) and transformers format - Use
scripts/convert_model.pyto convert between formats
API Server
Start OpenAI-compatible API server:
cd scripts
python serve_openai_api.py --load_from ../MiniMind2
# Test with client:
python chat_openai_api.pyDefault endpoint: http://localhost:8000/v1/chat/completions
Important File Locations
- Model outputs:
./out/directory (create if not exists) - Checkpoints:
./checkpoints/directory (for resumption) - Datasets:
./dataset/directory (download from ModelScope/HuggingFace) - Tokenizer:
./model/tokenizer.json - Model implementations:
./model/model_minimind.pyand./model/model_lora.py
Recommended Training Sequence
For a complete model from scratch:
- Pretrain →
pretrain_*.pth - SFT (load pretrain weights) →
full_sft_*.pth - DPO (load SFT weights) →
dpo_*.pth - GRPO/PPO (load DPO weights) →
grpo_*.pth/ppo_*.pth - (Optional) Reasoning distillation (load RLHF weights) →
reason_*.pth
Alternative for domain-specific models:
- Start from pretrained/SFT checkpoint
- Apply LoRA with domain data →
lora_*_*.pth - Inference with base + LoRA weights
Common Issues
- CUDA Out of Memory: Reduce
batch_size, increaseaccumulation_steps, or reducemax_seq_len - Model generates nonsense: Ensure using correct tokenizer and chat template
- Training loss not decreasing: Check learning rate schedule, verify data format, ensure
from_weightloads correct checkpoint - RLAIF training unstable: Verify reward model path, check reward signal variance, ensure data quality
- Long context performance: Enable YaRN rope scaling and train with longer sequences
External Resources
- Dataset downloads: ModelScope or HuggingFace
- Model weights: HuggingFace Collection
- Reward model: internlm2-1_8b-reward
Development Philosophy
This codebase prioritizes educational clarity over abstraction:
- Core algorithms (attention, RoPE, LoRA, DPO, PPO, GRPO) are implemented from scratch in PyTorch
- Avoid black-box third-party wrappers when possible
- Code is heavily commented in Chinese (original documentation language)
- Designed for learning LLM internals, not production deployment