“The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of ‘good’ RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.” – Jack Clark, Anthropic
Jack Clark, Co-founder of Anthropic, co-chair of the AI Index at Stanford University, co-chair of OECD working group on AI & Compute, shed light on the significance of DeepSeek-R1, a revolutionary AI reasoning model developed by China’s DeepSeek team. In an article posted in his newsletter on the 27th January 2025, Clark highlighted that it only takes approximately 800k samples of “good” RL (Reinforcement Learning) reasoning to convert other models into RL-reasoners.
The Power of Fine-Tuning
DeepSeek-R1 is not just a powerful AI model; it also provides a framework for fine-tuning existing models to enhance their reasoning capabilities. By leveraging the 800k samples curated with DeepSeek-R1, researchers can refine any other model into an RL reasoner. This approach has been demonstrated by fine-tuning open-source models like Qwen and Llama using the same dataset.
Implications for AI Policy
The release of DeepSeek-R1 has significant implications for AI policy and control. As Clark notes, if you need fewer than a million samples to convert any model into a “thinker,” it becomes much harder to control AI systems. This is because the valuable data, including chains of thought from reasoning models, can be leaked or shared openly.
A New Era in AI Development
The availability of DeepSeek-R1 and its associated techniques has created a new era in AI development. With an open weight model floating around the internet, researchers can now bootstrap any other sufficiently powerful base model into being an AI reasoner. This has the potential to accelerate AI progress worldwide.
Key Takeaways:
- Fine-tuning is key : DeepSeek-R1 demonstrates that fine-tuning existing models with a small amount of data (800k samples) can significantly enhance their reasoning capabilities.
- Open-source and accessible : The model and its techniques are now available for anyone to use, making it easier for researchers to develop powerful AI reasoners.
- Implications for control : The release of DeepSeek-R1 highlights the challenges of controlling AI systems, as valuable data can be leaked or shared openly.
Conclusion
DeepSeek-R1 has marked a significant milestone in AI development, showcasing the power of fine-tuning and open-source collaboration. As researchers continue to build upon this work, we can expect to see even more advanced AI models emerge, with far-reaching implications for various industries and applications.