Curious how a 1.5B parameter model can solve maths problems better than far larger models? In this video, I demonstrate how DeepSeek R1 leverages lengthy chains of thought to enhance its mathematical reasoning. We take a close look at how DeepSeek R1 prompts are structured and generated according to the R1 paper—then reproduce these chain of thought prompts via the DeepSeek R1 coldstart method and my own maths compiler to create synthetic training data. I then walk through the entire fine-tuning process, step by step, showing how even a relatively modest model can outperform bulkier rivals using DeepSeek R1’s coldstart technique. If you’re fascinated by AI breakthroughs or simply enjoy seeing a thorough training pipeline, this detailed behind-the-scenes session is for you. github repo for math compiler: https://github.com/chrishayuk/chuk-math github repo for verifiers: https://github.com/chrishayuk/verifiers
Mathematical Reasoning Enhancement
🧠DeepSeek R1, a 1.5B parameter model, outperforms larger models like ChatGPT in math problem-solving by leveraging long chains of thought and reinforcement learning.
🏷️The model generates "think tags" for reasoning and "answer tags" for solutions, enabling automated verification of correctness and improving mathematical reasoning capabilities.
🔄Reflection moments in generated chains of thought allow the model to re-evaluate calculations step-by-step and identify discrepancies, significantly improving accuracy.
Training Techniques
🚀The "cold start technique" uses a small amount of high-quality, supervised fine-tuning data to improve reasoning performance and convergence speed, followed by reinforcement learning.
🤖Synthetic data generation combines math compilers and large language models to create accurate, detailed chain of thought explanations for math problems, enabling effective fine-tuning.
🏆Reward modeling evaluates the quality of generated chain of thoughts using accuracy rewards (correctness of answers) and format rewards (correct use of think/answer tags).
Model Performance and Scalability
📊Only 120 synthetic math samples with long chains of thought were used to fine-tune the model, enabling it to perform complex reasoning comparable to larger models.
🔬The fine-tuned model performs math reasoning at a level comparable to larger models, but still requires more diverse cold start data for general tasks.
Open-Source and Community Impact
🌐The DeepSeek R1 cold start technique is powerful and reproducible, with all code available on GitHub, enabling a community-driven approach to fine-tuning models.
💡This approach demonstrates how even relatively modest models can outperform bulkier rivals using advanced training techniques and synthetic data generation.
🔓The open-source nature of this project allows for further improvements and adaptations by the AI community, potentially leading to more efficient and capable models in the future.