Olmo 3
Olmo 3 is a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales developed by the Allen Institute for AI (AI2). This release includes the entire Model Flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it.
Key features:
- Fully open: All training data, code, and intermediate checkpoints are publicly released
- Diverse capabilities: Long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall
- Flagship model: Olmo 3.1 Think 32B is the strongest fully-open reasoning model ever released
Paper: arXiv:2512.13961
Contents
Base Model Training
Post-training
Model Variants
Olmo 3 Base: Foundation model (7B, 32B) — the strongest fully-open Base model
Olmo 3 Think: Reasoning model with step-by-step thinking — outperforms Qwen 2.5, Gemma 2/3, and DeepSeek R1
Olmo 3 Instruct: Model generating concise and direct responses — optimized for function calling
Olmo 3 RL-Zero: Trained with RL directly from Base — fully open RL benchmark
Key Results
Key benchmark results for Olmo 3.1 Think 32B:
| Category | Benchmark | Score |
|---|---|---|
| Math | MATH | 96.2 |
| Math | AIME 2024 | 80.6 |
| Reasoning | BigBenchHard | 88.6 |
| Reasoning | ZebraLogic | 80.1 |
| Coding | HumanEvalPlus | 91.5 |
| Coding | LiveCodeBench v3 | 83.3 |
| IF | IFEval | 93.8 |
| Knowledge | MMLU | 86.4 |
Training Cost
Approximately 56 days using 1,024 H100 GPUs (estimated cost: $2.75M)
- Pretraining: ~47 days
- Post-training: ~9 days
Open Artifacts
All intermediate checkpoints, training data, code, and evaluation tools are released:
- Models: All checkpoints for Base, Think, Instruct, and RL-Zero
- Data: Dolma 3 (pretraining), Dolci (post-training)
- Code: OLMo-core, Open Instruct, duplodocus, OLMES
Core philosophy: To truly advance open-source AI, it is necessary to make not just the final model but the entire “path” to it transparent and accessible.