Appendix III
Large Model Compute Rankings and GPU Capacity Utilization
This appendix presents a comprehensive ranking of 182 notable AI models, combining data from Epoch AI’s “Notable AI Models” database (Epo ) with organizational compute capacity estimates from Appendix I. For each model, we track:
- Model name and developing organization(s)
- Training compute requirements (FLOPs)
- Lab/Cloud provider responsible for training
- Parent organization’s 2024 estimated peak annual FLOP capacity
- Three metrics of organizational impact:
- Share of organization’s publicly known models: Training FLOPs divided by total known training FLOPs for that organization
- Share of peak annual FLOP budget: Training FLOPs divided by parent organization’s 2024 estimated peak annual FLOP capacity
- Share of peak annual FLOP budget with 100x sweep: Same as above, but assuming each model required 100x more compute for development and testing
The data is presented in six tables, ordered by decreasing training compute requirements. This allows tracking the evolution of model scale over time and comparing relative organizational investments in different AI capabilities. Note that training compute estimates for the most recent models are based on publicly available information and may be incomplete or imprecise.
Table 7.1: AI Model Training Compute Requirements (Part 1 of 6)
| Model | Organization | Lab/Cloud | Train FLOPs | Parent Org Peak Annual FLOPs |
Model/Public Models (%) |
Model/Peak Annual (%) | Model/Peak w/100x (%) |
|---|---|---|---|---|---|---|---|
| Gemini 1.0 Ultra | Google DeepMind | Google DeepMind | $5.00 \times 10^{25}$ | $3.87 \times 10^{28}$ | 45.65 | 0.129 | 12.93 |
| Claude 3.5 Sonnet | Anthropic | Anthropic/Amazon | $4.98 \times 10^{25}$ | $2.27 \times 10^{28}$ | 69.74 | 0.220 | 21.96 |
| GPT-4o | OpenAI | Microsoft/OpenAI | $3.81 \times 10^{25}$ | $4.35 \times 10^{28}$ | 53.36 | 0.088 | 8.75 |
| Llama 3.1-405B | Meta AI | Meta AI | $3.80 \times 10^{25}$ | $5.65 \times 10^{28}$ | 66.32 | 0.067 | 6.72 |
| GPT-4 | OpenAI | Microsoft/OpenAI | $2.10 \times 10^{25}$ | $4.35 \times 10^{25}$ | 29.41 | 0.048 | 4.82 |
| Gemini 1.0 Pro | Google DeepMind | Google DeepMind | $1.83 \times 10^{25}$ | $3.87 \times 10^{28}$ | 16.71 | 0.047 | 4.73 |
| Claude 3 Opus | Anthropic | Anthropic/Amazon | $1.64 \times 10^{25}$ | $2.27 \times 10^{28}$ | 22.97 | 0.072 | 7.23 |
| Gemini 1.5 Pro | Google DeepMind | Google DeepMind | $1.58 \times 10^{25}$ | $3.87 \times 10^{28}$ | 14.43 | 0.041 | 4.09 |
| Llama 3-70B | Meta AI | Meta AI | $7.86 \times 10^{24}$ | $5.65 \times 10^{28}$ | 13.72 | 0.014 | 1.39 |
| GPT-4o mini | OpenAI | Microsoft/OpenAI | $7.36 \times 10^{24}$ | $4.35 \times 10^{28}$ | 10.31 | 0.017 | 1.69 |
| PaLM 2 | Google DeepMind | $7.34 \times 10^{24}$ | $3.87 \times 10^{28}$ | 6.70 | 0.019 | 1.90 | |
| Llama 3.3 | Meta AI | Meta AI | $6.86 \times 10^{24}$ | $5.65 \times 10^{28}$ | 11.98 | 0.012 | 1.21 |
| Amazon Nova Pro | Amazon | Anthropic/Amazon | $6.00 \times 10^{24}$ | $2.27 \times 10^{28}$ | 8.40 | 0.026 | 2.65 |
| Amazon Titan | Amazon | Anthropic/Amazon | $4.80 \times 10^{24}$ | $2.27 \times 10^{28}$ | 6.72 | 0.021 | 2.12 |
| Claude 2 | Anthropic | Anthropic/Amazon | $3.87 \times 10^{24}$ | $2.27 \times 10^{28}$ | 5.41 | 0.017 | 1.70 |
| Minerva (540B) | Google DeepMind | $2.74 \times 10^{24}$ | $3.87 \times 10^{28}$ | 2.50 | 0.007 | 0.71 | |
| GPT-3.5 (text-davinci-003) | OpenAI | Microsoft/OpenAI | $2.58 \times 10^{24}$ | $4.35 \times 10^{28}$ | 3.61 | 0.006 | 0.59 |
| U-PaLM (540B) | Google DeepMind | $2.53 \times 10^{24}$ | $3.87 \times 10^{28}$ | 2.31 | 0.007 | 0.065 | |
| PaLM (540B) | Google Research | Google DeepMind | $2.53 \times 10^{24}$ | $3.87 \times 10^{28}$ | 2.31 | 0.007 | 0.65 |
| Flan-PaLM 540B | Google DeepMind | $2.50 \times 10^{24}$ | $3.87 \times 10^{28}$ | 2.28 | 0.006 | 0.65 | |
| FLAN 137B | Google Research | Google DeepMind | $2.05 \times 10^{24}$ | $3.87 \times 10^{28}$ | 1.87 | 0.005 | 0.53 |
| Meta Movie Gen Video | Meta AI | Meta AI | $1.65 \times 10^{24}$ | $5.65 \times 10^{28}$ | 2.88 | 0.003 | 0.029 |
| Megatron-Turing NLG 530B | Microsoft, NVIDIA | Microsoft/OpenAI | $1.17 \times 10^{24}$ | $4.35 \times 10^{28}$ | 1.64 | 0.003 | 0.27 |
| Llama 2-70B | Meta AI | Meta AI | $8.10 \times 10^{23}$ | $5.65 \times 10^{28}$ | 1.41 | 0.003 | 0.14 |
| Gopher (280B) | DeepMind | Google DeepMind | $6.31 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.58 | 0.002 | 0.16 |
| Chinchilla | DeepMind | Google DeepMind | $5.76 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.53 | 0.001 | 0.15 |
| LLaMA-65B | Meta AI | Meta AI | $5.50 \times 10^{23}$ | $5.65 \times 10^{28}$ | 0.96 | 0.001 | 0.10 |
| OPT-175B | Meta AI | Meta AI | $4.30 \times 10^{23}$ | $5.65 \times 10^{28}$ | 0.75 | 0.001 | 0.08 |
| BlenderBot 3 | McGill University, Meta AI, Mila | Meta AI | $4.30 \times 10^{23}$ | $5.65 \times 10^{28}$ | 0.75 | 0.001 | 0.08 |
| Parti | Google Research | Google DeepMind | $3.96 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.36 | 0.001 | 0.10 |
| FunSearch | Google DeepMind | Google DeepMind | $3.87 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.35 | 0.001 | 0.10 |
Table 7.2: AI Model Training Compute Requirements (Part 2 of 6)
| Model | Organization | Lab/Cloud | Train FLOPs | Parent Org Peak Annual FLOPs |
Model/Public Models (%) |
Model/Peak Annual (%) | Model/Peak w/100x (%) |
|---|---|---|---|---|---|---|---|
| GLaM | Google DeepMind | $3.64 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.33 | 0.001 | 0.09 | |
| LaMDA | Google DeepMind | $3.55 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.32 | 0.001 | 0.09 | |
| AlphaGo Zero | DeepMind | Google DeepMind | $3.41 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.31 | 0.001 | 0.09 |
| Galactica | Meta AI | Meta AI | $3.24 \times 10^{23}$ | $5.65 \times 10^{28}$ | 0.57 | 0.001 | 0.06 |
| InstructGPT 175B | OpenAI | Microsoft/OpenAI | $3.19 \times 10^{23}$ | $4.35 \times 10^{28}$ | 0.45 | 0.001 | 0.07 |
| GPT-3 175B (davinci) | OpenAI | Microsoft/OpenAI | $3.14 \times 10^{23}$ | $4.35 \times 10^{28}$ | 0.44 | 0.001 | 0.07 |
| ST-MoE | Google, Google Brain,Google Research | Google DeepMind | $2.90 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.26 | 0.001 | 0.07 |
| Flamingo | DeepMind | Google DeepMind | $2.19 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.20 | 0.001 | 0.06 |
| AlexaTM 20B | Amazon | Anthropic/Amazon | $2.04 \times 10^{23}$ | $2.27 \times 10^{28}$ | 0.29 | 0.001 | 0.09 |
| AlphaGo Master | DeepMind | Google DeepMind | $2.00 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.18 | 0.001 | 0.05 |
| ViT-22B | Google DeepMind | $1.93 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.18 | 0.001 | 0.05 | |
| PaLI | Google DeepMind | $1.69 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.15 | 0.000 | 0.04 | |
| AlphaCode | DeepMind | Google DeepMind | $1.64 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.15 | 0.000 | 0.04 |
| Llama Guard | Meta AI | Meta AI | $1.60 \times 10^{23}$ | $5.65 \times 10^{28}$ | 0.28 | 0.000 | 0.03 |
| UL2 | Google Research,Google Brain | Google DeepMind | $1.20 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.11 | 0.000 | 0.03 |
| Meena | Google Brain | Google DeepMind | $1.12 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.10 | 0.000 | 0.03 |
| OpenVLA | Stanford,UC Berkeley,Toyota,DeepMind,MIT | Google DeepMind | $1.10 \times 10^{23}$ | $3.87 \times 10^{28}$ | 0.10 | 0.000 | 0.03 |
| Llama 2-7B | Meta AI | Meta AI | $8.40 \times 10^{22}$ | $5.65 \times 10^{28}$ | 0.15 | 0.000 | 0.01 |
| Switch | Google DeepMind | $8.22 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.08 | 0.000 | 0.02 | |
| mT5-XXL | Google, Google Research | Google DeepMind | $8.20 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.07 | 0.000 | 0.02 |
| ByT5-XXL | Google, Google Research | Google DeepMind | $8.10 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.07 | 0.000 | 0.02 |
| LLaVA 1.5 | UW Madison,Microsoft Research | Microsoft/OpenAI | $7.81 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.11 | 0.000 | 0.02 |
| LLaVA | UW Madison,Microsoft,Columbia | Microsoft/OpenAI | $7.80 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.11 | 0.000 | 0.02 |
| ProtT5-XXL | TU Munich,Med AI,NVIDIA,Oak Ridge,Google | Google DeepMind | $7.37 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.07 | 0.000 | 0.02 |
| ESM2-15B | Meta AI,NYU,Stanford,MIT | Meta AI | $7.35 \times 10^{22}$ | $5.65 \times 10^{28}$ | 0.13 | 0.000 | 0.01 |
| Codex | OpenAI | Microsoft/OpenAI | $7.34 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.10 | 0.000 | 0.02 |
| CoCa | Google Research | Google DeepMind | $7.30 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.07 | 0.000 | 0.02 |
| OpenAI Five | OpenAI | Microsoft/OpenAI | $6.70 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.09 | 0.000 | 0.02 |
| AlphaStar | DeepMind | Google DeepMind | $5.93 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.05 | 0.000 | 0.02 |
| ViT-G/14 | Google Brain,Google Research | Google DeepMind | $5.85 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.05 | 0.000 | 0.02 |
| XGLM-7.5B | Meta AI,Facebook AI Research | Meta AI | $2.25 \times 10^{22}$ | $5.65 \times 10^{28}$ | 0.04 | 0.000 | 0.00 |
Table 7.3: AI Model Training Compute Requirements (Part 3 of 6)
| Model | Organization | Lab/Cloud | Train FLOPs | Parent Org Peak Annual FLOPs |
Model/Public Models (%) |
Model/Peak Annual (%) | Model/Peak w/100x (%) |
|---|---|---|---|---|---|---|---|
| GraphCast | Google DeepMind | Google DeepMind | $2.10 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.02 | 0.000 | 0.01 |
| NLLB | Meta AI | Meta AI | $1.75 \times 10^{22}$ | $5.65 \times 10^{28}$ | 0.03 | 0.000 | 0.00 |
| RETRO-7B | DeepMind | Google DeepMind | $1.68 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.02 | 0.000 | 0.00 |
| Turing-NLG | Microsoft | Microsoft/OpenAI | $1.57 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.02 | 0.000 | 0.00 |
| Imagen | Google Brain | Google DeepMind | $1.46 \times 10^{22}$ | $3.87 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| OpenAI Five Rerun | OpenAI | Microsoft/OpenAI | $1.30 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.02 | 0.000 | 0.00 |
| CLIP (ViT L/14@336px) | OpenAI | Microsoft/OpenAI | $1.05 \times 10^{22}$ | $4.35 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| AudioGen | Meta AI, Hebrew University | Meta AI | $9.50 \times 10^{21}$ | $5.65 \times 10^{28}$ | 0.02 | 0.000 | 0.00 |
| T5-3B | Google DeepMind | $9.00 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.01 | 0.000 | 0.00 | |
| iGPT-L | OpenAI | Microsoft/OpenAI | $8.91 \times 10^{21}$ | $4.35 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| ContextNet + Noisy Student | Google DeepMind | $8.16 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.01 | 0.000 | 0.00 | |
| Segment Anything Model | Meta AI | Meta AI | $7.80 \times 10^{21}$ | $5.65 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| Conformer + Wav2vec 2.0 | Google, Google Research,Google Brain | Google DeepMind | $7.60 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| GNMT | Google DeepMind | $6.62 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.01 | 0.000 | 0.00 | |
| ADM | OpenAI | Microsoft/OpenAI | $6.20 \times 10^{21}$ | $4.35 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| XLNet | CMU,Google Brain | Google DeepMind | $6.19 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| NUWA | Microsoft Research, Peking University | Microsoft/OpenAI | $4.84 \times 10^{21}$ | $4.35 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| AlphaFold-Multimer | Google DeepMind,DeepMind | Google DeepMind | $4.35 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ViT-Huge/14 | Google Brain,Google Research | Google DeepMind | $4.26 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Whisper | OpenAI | Microsoft/OpenAI | $4.21 \times 10^{21}$ | $4.35 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| Gato | DeepMind | Google DeepMind | $4.02 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ViT-G (model soup) | UW, Columbia, Google, Meta, Tel Aviv | Meta AI | $3.40 \times 10^{21}$ | $5.65 \times 10^{28}$ | 0.01 | 0.000 | 0.00 |
| ViT-G (model soup) | UW, Columbia, Google, Meta, Tel Aviv | Google DeepMind | $3.40 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ELECTRA | Stanford,Google, Google Brain | Google DeepMind | $3.10 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| AlphaFold 2 | DeepMind | Google DeepMind | $2.99 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
Table 7.4: AI Model Training Compute Requirements (Part 4 of 6)
| Model | Organization | Lab/Cloud | Train FLOPs | Parent Org Peak Annual FLOPs |
Model/Public Models (%) |
Model/Peak Annual (%) | Model/Peak w/100x (%) |
|---|---|---|---|---|---|---|---|
| ALBERT-xxlarge | Toyota Tech Institute,Google | Google DeepMind | $2.39 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| NASv3 (CIFAR-10) | Google Brain | Google DeepMind | $2.20 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| GPT-2 (1.5B) | OpenAI | Microsoft/OpenAI | $1.92 \times 10^{21}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| EMDR | Mila,McGill,DeepMind | Google DeepMind | $1.91 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| AlphaGo Lee | DeepMind | Google DeepMind | $1.90 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| BigGAN-deep 512x512 | Heriot-Watt,DeepMind | Google DeepMind | $1.80 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| MnasNet-A3 | Google DeepMind | $1.50 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| MnasNet-A1 + SSDLite | Google DeepMind | $1.50 \times 10^{21}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| Swin Transformer V2 | Microsoft Research Asia | Microsoft/OpenAI | $1.10 \times 10^{21}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| JFT | Google Research,CMU | Google DeepMind | $8.43 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| OpenAI TI7 DOTA 1v | OpenAI | Microsoft/OpenAI | $6.05 \times 10^{20}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| BERT-Large-CAS (PTB+WT2+WT103) | Amazon | Anthropic/Amazon | $5.21 \times 10^{20}$ | $2.27 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
Table 7.5: AI Model Training Compute Requirements (Part 5 of 6)
| Model | Organization | Lab/Cloud | Train FLOPs | Parent Org Peak Annual FLOPs |
Model/Public Models (%) |
Model/Peak Annual (%) | Model/Peak w/100x (%) |
|---|---|---|---|---|---|---|---|
| Big Transformer for Back-Translation | Facebook AI Research,Google Brain | Google DeepMind | $4.78 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Xception | Google DeepMind | $4.36 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| AmoebaNet-A (F=448) | Google Brain | Google DeepMind | $3.85 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| AlphaGo Fan | DeepMind | Google DeepMind | $3.80 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| SNM-skip | Google DeepMind | $2.98 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| BERT-Large | Google DeepMind | $2.85 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| IMPALA | DeepMind | Google DeepMind | $1.68 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Mesh-TensorFlow Transformer 4.9B | Google Brain | Google DeepMind | $1.62 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Contriever | Meta AI,UCL,PSL,Grenoble | Meta AI | $1.57 \times 10^{20}$ | $5.65 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| AlphaFold | DeepMind | Google DeepMind | $1.00 \times 10^{20}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| EfficientNetV2-XL | Google, Google Brain | Google DeepMind | $9.56 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| MoE-Multi | Jagiellonian University,Google Brain | Google DeepMind | $9.39 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Adaptive Input Transformer + RD | Microsoft Research Asia,Soochow | Microsoft/OpenAI | $8.20 \times 10^{19}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| DeiT-B | Meta AI,Sorbonne University | Meta AI | $7.88 \times 10^{19}$ | $5.65 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| BEIT-3 | Microsoft | Microsoft/OpenAI | $7.00 \times 10^{19}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Mesh-TensorFlow Transformer 2.9B | Google Brain | Google DeepMind | $6.84 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| PNASNet-5 | Johns Hopkins,Google AI,Stanford | Google DeepMind | $6.63 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Sparse all-MLP | Meta AI | Meta AI | $6.08 \times 10^{19}$ | $5.65 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ConvS2S (ensemble of 8 models) | Meta AI | Meta AI | $5.64 \times 10^{19}$ | $5.65 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Seq2Seq LSTM | Google DeepMind | $5.60 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| MuZero | DeepMind | Google DeepMind | $4.80 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Population-based DRL | DeepMind | Google DeepMind | $3.49 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| QT-Opt | Google Brain,UC Berkeley | Google DeepMind | $3.49 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| LSTM (Hebbian, Cache, MbPA) | DeepMind,UCL | Google DeepMind | $3.33 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ResNet-200 | Microsoft Research Asia | Microsoft/OpenAI | $2.97 \times 10^{19}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Segatron-XL large, M=384 + HCP | Microsoft Research, Waterloo | Microsoft/OpenAI | $2.65 \times 10^{19}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| MultiBand Diffusion | Meta AI, Hebrew U,LORIA | Meta AI | $2.60 \times 10^{19}$ | $5.65 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Transformer local-attention (NesT-B) | Google Cloud, Google Research | Google DeepMind | $2.41 \times 10^{19}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| MSRA (C, PReLU) | Microsoft Research | Microsoft/OpenAI | $2.40 \times 10^{19}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Detic | Meta AI,UT Austin | Meta AI | $2.34 \times 10^{19}$ | $5.65 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| GPT-1 | OpenAI | Microsoft/OpenAI | $1.76 \times 10^{19}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| TransE | UTC-CNRS,Google | Google DeepMind | $1.34 \times 10^{18}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
Table 7.6: AI Model Training Compute Requirements (Part 6 of 6)
| Model | Organization | Lab/Cloud | Train FLOPs | Parent Org Peak Annual FLOPs |
Model/Public Models (%) |
Model/Peak Annual (%) | Model/Peak w/100x (%) |
|---|---|---|---|---|---|---|---|
| KN-LM | Google DeepMind | $7.73 \times 10^{17}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| WeNet (Penn Treebank) | Amazon | Anthropic/Amazon | $7.30 \times 10^{17}$ | $2.27 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Unsupervised High-level Feature Learner | Google DeepMind | $6.00 \times 10^{17}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| CT-MoS (WT2) | Google,National Tsing Hua University | Google DeepMind | $5.62 \times 10^{17}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| DistBelief Speech | Google DeepMind | $3.11 \times 10^{17}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| Mogrifier RLSTM (WT2) | DeepMind | Google DeepMind | $1.40 \times 10^{17}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ReLU-Speech | Google,Toronto,NYU | Google DeepMind | $1.28 \times 10^{17}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Large regularized LSTM | NYU,Google Brain | Google DeepMind | $9.10 \times 10^{16}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| R-FCN | Tsinghua,Microsoft Research | Microsoft/OpenAI | $6.15 \times 10^{16}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ADAM (CIFAR-10) | Amsterdam,OpenAI,Toronto | Microsoft/OpenAI | $6.05 \times 10^{16}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Word2Vec (large) | Google DeepMind | $3.89 \times 10^{16}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| ENAS | Google Brain,CMU,Stanford | Google DeepMind | $2.01 \times 10^{16}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| DARTS | DeepMind,CMU | Google DeepMind | $1.10 \times 10^{16}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| NAS with base 8 and shared embeddings | Google Brain | Google DeepMind | $1.05 \times 10^{16}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| ISS | Duke University,Microsoft | Microsoft/OpenAI | $3.40 \times 10^{15}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| Search-Proven Best LSTM | Google DeepMind | $3.34 \times 10^{15}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 | |
| DQN | DeepMind | Google DeepMind | $2.30 \times 10^{15}$ | $3.87 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
| RankNet | Microsoft Research,Microsoft | Microsoft/OpenAI | $3.48 \times 10^{12}$ | $4.35 \times 10^{28}$ | 0.00 | 0.000 | 0.00 |
References
- Data on machine learning hardware. Updated December 30, 2024.