We are announcing Open Thoughts, an open-source effort to curate the best open reasoning datasets. Open Thoughts is a collaboration led by Bespoke Labs and the DataComp community from Stanford, UC Berkeley, UT Austin, UW, UCLA, UNC, TRI, and LAION.
Recent breakthroughs such as SkyT1, STILL-2, and DeepSeek-R1 have shown that a few hundred thousand reasoning demonstrations suffice to substantially improve the reasoning capabilities of a language model. With the release of DeepSeek-R1, such thinking demonstrations can now be synthetically created at low cost and at scale.
While this process of reasoning distillation works surprisingly well, the corresponding datasets and data generation strategies unfortunately remain closed. Moreover, there is a rich design space in data generation for reasoning that the community is only beginning to explore.
The goal of Open Thoughts is to bridge this gap and create state-of-the-art open reasoning datasets. In the process, we are publicly iterating on and sharing the best datasets and data recipes for reasoning data. We invite the community to join us to build, explore, and push the frontier of reasoning models forward together.
Model | AIME24 | MATH500 | GPQA-D | LCB Easy | LCB Med | LCB Hard |
---|---|---|---|---|---|---|
Open-Thinker-7B | 43.3 | 83.0 | 42.4 | 75.3 | 28.6 | 6.5 |
Bespoke-Stratos-7B | 16.6 | 79.6 | 38.9 | 71.4 | 25.2 | 0.8 |
DeepSeek-R1-Distill-Qwen-7B | 60.0 | 88.2 | 46.9 | 79.7 | 45.1 | 14.6 |
gpt-4o-2024-08-06 | 10.0 | 75.8 | 46.5 | 87.4 | 42.7 | 8.9 |
o1-mini | 63.0 | 85.6 | 60.0 | 92.8 | 74.7 | 39.8 |
Today, we are also releasing our first dataset Open-Thoughts-114k and model Open-Thinker-7B based on Qwen-2.5-7B-Instruct. We scaled the data strategy from Bespoke-Stratos-17k, resulting in a significant improvement over Bespoke-Stratos-7B. The numbers reported in the table above are evaluated with our open-source tool Evalchemy.
We are just getting started. A new field of exciting research has just opened up. If you want to contribute or sponsor the Open Thoughts effort, get in touch or raise an issue on GitHub. Come and join us on this journey!