Appendix II
Large Model Compute Rankings and GPU Capacity Utilization
This appendix presents a comprehensive ranking of 182 notable AI models, combining data from Epoch AI's "Notable AI Models" database with organizational compute capacity estimates from Appendix I. For each model, we track:
- Model name and developing organization(s)
- Training compute requirements (FLOPs)
- Lab/Cloud provider responsible for training
- Parent organization's 2024 estimated peak annual FLOP capacity
- Share of organization's publicly known models
- Share of peak annual FLOP budget
Frontier Models ($10^{24}$+ FLOPs)
Table 1: AI Model Training Compute Requirements - Frontier Scale
| Model | Organization | Lab/Cloud | Train FLOPs | Model/Peak Annual (%) |
|---|---|---|---|---|
| Gemini 1.0 Ultra | Google DeepMind | Google DeepMind | $5.00 \times 10^{25}$ | 0.129 |
| Claude 3.5 Sonnet | Anthropic | Anthropic/Amazon | $4.98 \times 10^{25}$ | 0.220 |
| GPT-4o | OpenAI | Microsoft/OpenAI | $3.81 \times 10^{25}$ | 0.088 |
| Llama 3.1-405B | Meta AI | Meta AI | $3.80 \times 10^{25}$ | 0.067 |
| GPT-4 | OpenAI | Microsoft/OpenAI | $2.10 \times 10^{25}$ | 0.048 |
| Gemini 1.0 Pro | Google DeepMind | Google DeepMind | $1.83 \times 10^{25}$ | 0.047 |
| Claude 3 Opus | Anthropic | Anthropic/Amazon | $1.64 \times 10^{25}$ | 0.072 |
| Gemini 1.5 Pro | Google DeepMind | Google DeepMind | $1.58 \times 10^{25}$ | 0.041 |
| Llama 3-70B | Meta AI | Meta AI | $7.86 \times 10^{24}$ | 0.014 |
| GPT-4o mini | OpenAI | Microsoft/OpenAI | $7.36 \times 10^{24}$ | 0.017 |
| PaLM 2 | Google DeepMind | $7.34 \times 10^{24}$ | 0.019 | |
| Llama 3.3 | Meta AI | Meta AI | $6.86 \times 10^{24}$ | 0.012 |
| Amazon Nova Pro | Amazon | Anthropic/Amazon | $6.00 \times 10^{24}$ | 0.026 |
| Amazon Titan | Amazon | Anthropic/Amazon | $4.80 \times 10^{24}$ | 0.021 |
| Claude 2 | Anthropic | Anthropic/Amazon | $3.87 \times 10^{24}$ | 0.017 |
| Minerva (540B) | Google DeepMind | $2.74 \times 10^{24}$ | 0.007 | |
| GPT-3.5 | OpenAI | Microsoft/OpenAI | $2.58 \times 10^{24}$ | 0.006 |
| PaLM (540B) | Google Research | Google DeepMind | $2.53 \times 10^{24}$ | 0.007 |
| U-PaLM (540B) | Google DeepMind | $2.53 \times 10^{24}$ | 0.007 | |
| Flan-PaLM 540B | Google DeepMind | $2.50 \times 10^{24}$ | 0.006 | |
| FLAN 137B | Google Research | Google DeepMind | $2.05 \times 10^{24}$ | 0.005 |
| Meta Movie Gen Video | Meta AI | Meta AI | $1.65 \times 10^{24}$ | 0.003 |
| Megatron-Turing NLG 530B | Microsoft, NVIDIA | Microsoft/OpenAI | $1.17 \times 10^{24}$ | 0.003 |
Large Models ($10^{22}$ - $10^{24}$ FLOPs)
Table 2: AI Model Training Compute Requirements - Large Scale
| Model | Organization | Lab/Cloud | Train FLOPs | Model/Peak Annual (%) |
|---|---|---|---|---|
| Llama 2-70B | Meta AI | Meta AI | $8.10 \times 10^{23}$ | 0.001 |
| Gopher (280B) | DeepMind | Google DeepMind | $6.31 \times 10^{23}$ | 0.002 |
| Chinchilla | DeepMind | Google DeepMind | $5.76 \times 10^{23}$ | 0.001 |
| LLaMA-65B | Meta AI | Meta AI | $5.50 \times 10^{23}$ | 0.001 |
| OPT-175B | Meta AI | Meta AI | $4.30 \times 10^{23}$ | 0.001 |
| BlenderBot 3 | Meta AI, McGill, Mila | Meta AI | $4.30 \times 10^{23}$ | 0.001 |
| Parti | Google Research | Google DeepMind | $3.96 \times 10^{23}$ | 0.001 |
| FunSearch | Google DeepMind | Google DeepMind | $3.87 \times 10^{23}$ | 0.001 |
| GLaM | Google DeepMind | $3.64 \times 10^{23}$ | 0.001 | |
| LaMDA | Google DeepMind | $3.55 \times 10^{23}$ | 0.001 | |
| AlphaGo Zero | DeepMind | Google DeepMind | $3.41 \times 10^{23}$ | 0.001 |
| Galactica | Meta AI | Meta AI | $3.24 \times 10^{23}$ | 0.001 |
| InstructGPT 175B | OpenAI | Microsoft/OpenAI | $3.19 \times 10^{23}$ | 0.001 |
| GPT-3 175B | OpenAI | Microsoft/OpenAI | $3.14 \times 10^{23}$ | 0.001 |
| ST-MoE | Google DeepMind | $2.90 \times 10^{23}$ | 0.001 | |
| Flamingo | DeepMind | Google DeepMind | $2.19 \times 10^{23}$ | 0.001 |
| AlexaTM 20B | Amazon | Anthropic/Amazon | $2.04 \times 10^{23}$ | 0.001 |
| AlphaGo Master | DeepMind | Google DeepMind | $2.00 \times 10^{23}$ | 0.001 |
| ViT-22B | Google DeepMind | $1.93 \times 10^{23}$ | 0.001 | |
| PaLI | Google DeepMind | $1.69 \times 10^{23}$ | <0.001 | |
| AlphaCode | DeepMind | Google DeepMind | $1.64 \times 10^{23}$ | <0.001 |
| Llama Guard | Meta AI | Meta AI | $1.60 \times 10^{23}$ | <0.001 |
| UL2 | Google Research | Google DeepMind | $1.20 \times 10^{23}$ | <0.001 |
| Meena | Google Brain | Google DeepMind | $1.12 \times 10^{23}$ | <0.001 |
| OpenVLA | Stanford, UC Berkeley, DeepMind | Google DeepMind | $1.10 \times 10^{23}$ | <0.001 |
| Llama 2-7B | Meta AI | Meta AI | $8.40 \times 10^{22}$ | <0.001 |
| Switch | Google DeepMind | $8.22 \times 10^{22}$ | <0.001 | |
| mT5-XXL | Google Research | Google DeepMind | $8.20 \times 10^{22}$ | <0.001 |
| ByT5-XXL | Google Research | Google DeepMind | $8.10 \times 10^{22}$ | <0.001 |
| LLaVA 1.5 | UW Madison, Microsoft | Microsoft/OpenAI | $7.81 \times 10^{22}$ | <0.001 |
| LLaVA | UW Madison, Microsoft, Columbia | Microsoft/OpenAI | $7.80 \times 10^{22}$ | <0.001 |
| ProtT5-XXL | TU Munich, NVIDIA, Google | Google DeepMind | $7.37 \times 10^{22}$ | <0.001 |
| ESM2-15B | Meta AI, NYU, Stanford, MIT | Meta AI | $7.35 \times 10^{22}$ | <0.001 |
| Codex | OpenAI | Microsoft/OpenAI | $7.34 \times 10^{22}$ | <0.001 |
| CoCa | Google Research | Google DeepMind | $7.30 \times 10^{22}$ | <0.001 |
| OpenAI Five | OpenAI | Microsoft/OpenAI | $6.70 \times 10^{22}$ | <0.001 |
| AlphaStar | DeepMind | Google DeepMind | $5.93 \times 10^{22}$ | <0.001 |
| ViT-G/14 | Google Brain | Google DeepMind | $5.85 \times 10^{22}$ | <0.001 |
| XGLM-7.5B | Meta AI | Meta AI | $2.25 \times 10^{22}$ | <0.001 |
| GraphCast | Google DeepMind | Google DeepMind | $2.10 \times 10^{22}$ | <0.001 |
| NLLB | Meta AI | Meta AI | $1.75 \times 10^{22}$ | <0.001 |
| RETRO-7B | DeepMind | Google DeepMind | $1.68 \times 10^{22}$ | <0.001 |
| Turing-NLG | Microsoft | Microsoft/OpenAI | $1.57 \times 10^{22}$ | <0.001 |
Medium Models ($10^{20}$ - $10^{22}$ FLOPs)
Table 3: AI Model Training Compute Requirements - Medium Scale
| Model | Organization | Train FLOPs |
|---|---|---|
| Imagen | Google Brain | $1.46 \times 10^{22}$ |
| OpenAI Five Rerun | OpenAI | $1.30 \times 10^{22}$ |
| CLIP (ViT L/14) | OpenAI | $1.05 \times 10^{22}$ |
| AudioGen | Meta AI, Hebrew University | $9.50 \times 10^{21}$ |
| T5-3B | $9.00 \times 10^{21}$ | |
| iGPT-L | OpenAI | $8.91 \times 10^{21}$ |
| ContextNet + Noisy Student | $8.16 \times 10^{21}$ | |
| Segment Anything | Meta AI | $7.80 \times 10^{21}$ |
| Conformer + Wav2vec 2.0 | $7.60 \times 10^{21}$ | |
| GNMT | $6.62 \times 10^{21}$ | |
| ADM | OpenAI | $6.20 \times 10^{21}$ |
| XLNet | CMU, Google Brain | $6.19 \times 10^{21}$ |
| NÜWA | Microsoft Research, Peking U | $4.84 \times 10^{21}$ |
| AlphaFold-Multimer | Google DeepMind | $4.35 \times 10^{21}$ |
| ViT-Huge/14 | Google Brain | $4.26 \times 10^{21}$ |
| Whisper | OpenAI | $4.21 \times 10^{21}$ |
| Gato | DeepMind | $4.02 \times 10^{21}$ |
| ViT-G (model soup) | UW, Columbia, Google, Meta | $3.40 \times 10^{21}$ |
| ELECTRA | Stanford, Google | $3.10 \times 10^{21}$ |
| AlphaFold 2 | DeepMind | $2.99 \times 10^{21}$ |
| ALBERT-xxlarge | Toyota Tech Institute, Google | $2.39 \times 10^{21}$ |
| NASv3 (CIFAR-10) | Google Brain | $2.20 \times 10^{21}$ |
| GPT-2 (1.5B) | OpenAI | $1.92 \times 10^{21}$ |
| EMDR | Mila, McGill, DeepMind | $1.91 \times 10^{21}$ |
| AlphaGo Lee | DeepMind | $1.90 \times 10^{21}$ |
| BigGAN-deep | DeepMind | $1.80 \times 10^{21}$ |
| MnasNet-A3 | $1.50 \times 10^{21}$ | |
| Swin Transformer V2 | Microsoft Research Asia | $1.10 \times 10^{21}$ |
| JFT | Google Research, CMU | $8.43 \times 10^{20}$ |
| OpenAI TI7 DOTA 1v1 | OpenAI | $6.05 \times 10^{20}$ |
| BERT-Large-CAS | Amazon | $5.21 \times 10^{20}$ |
| Big Transformer Back-Trans | Meta AI, Google Brain | $4.78 \times 10^{20}$ |
| Xception | $4.36 \times 10^{20}$ | |
| AmoebaNet-A | Google Brain | $3.85 \times 10^{20}$ |
| AlphaGo Fan | DeepMind | $3.80 \times 10^{20}$ |
| SNM-skip | $2.98 \times 10^{20}$ | |
| BERT-Large | $2.85 \times 10^{20}$ | |
| IMPALA | DeepMind | $1.68 \times 10^{20}$ |
| Mesh-TensorFlow 4.9B | Google Brain | $1.62 \times 10^{20}$ |
| Contriever | Meta AI, UCL | $1.57 \times 10^{20}$ |
| AlphaFold | DeepMind | $1.00 \times 10^{20}$ |
| EfficientNetV2-XL | $9.56 \times 10^{19}$ | |
| MoE-Multi | Jagiellonian, Google Brain | $9.39 \times 10^{19}$ |
| DeiT-B | Meta AI, Sorbonne | $7.88 \times 10^{19}$ |
| BEIT-3 | Microsoft | $7.00 \times 10^{19}$ |
| PNASNet-5 | Johns Hopkins, Google AI | $6.63 \times 10^{19}$ |
| Sparse all-MLP | Meta AI | $6.08 \times 10^{19}$ |
| ConvS2S | Meta AI | $5.64 \times 10^{19}$ |
| Seq2Seq LSTM | $5.60 \times 10^{19}$ | |
| MuZero | DeepMind | $4.80 \times 10^{19}$ |
| QT-Opt | Google Brain, UC Berkeley | $3.49 \times 10^{19}$ |
| ResNet-200 | Microsoft Research Asia | $2.97 \times 10^{19}$ |
| MultiBand Diffusion | Meta AI, Hebrew U | $2.60 \times 10^{19}$ |
| Detic | Meta AI, UT Austin | $2.34 \times 10^{19}$ |
| GPT-1 | OpenAI | $1.76 \times 10^{19}$ |
Small/Historical Models ($10^{12}$ - $10^{19}$ FLOPs)
Table 4: AI Model Training Compute Requirements - Small/Historical Scale
| Model | Organization | Train FLOPs |
|---|---|---|
| TransE | UTC-CNRS, Google | $1.34 \times 10^{18}$ |
| KN-LM | $7.73 \times 10^{17}$ | |
| WeNet | Amazon | $7.30 \times 10^{17}$ |
| Unsupervised High-level Feature | $6.00 \times 10^{17}$ | |
| CT-MoS | Google, Nat'l Tsing Hua | $5.62 \times 10^{17}$ |
| DistBelief Speech | $3.11 \times 10^{17}$ | |
| Mogrifier RLSTM | DeepMind | $1.40 \times 10^{17}$ |
| ReLU-Speech | Google, Toronto, NYU | $1.28 \times 10^{17}$ |
| Large regularized LSTM | NYU, Google Brain | $9.10 \times 10^{16}$ |
| R-FCN | Tsinghua, Microsoft | $6.15 \times 10^{16}$ |
| ADAM (CIFAR-10) | Amsterdam, OpenAI, Toronto | $6.05 \times 10^{16}$ |
| Word2Vec (large) | $3.89 \times 10^{16}$ | |
| ENAS | Google Brain, CMU, Stanford | $2.01 \times 10^{16}$ |
| DARTS | DeepMind, CMU | $1.10 \times 10^{16}$ |
| NAS base 8 | Google Brain | $1.05 \times 10^{16}$ |
| ISS | Duke, Microsoft | $3.40 \times 10^{15}$ |
| Search-Proven Best LSTM | $3.34 \times 10^{15}$ | |
| DQN | DeepMind | $2.30 \times 10^{15}$ |
| RankNet | Microsoft Research | $3.48 \times 10^{12}$ |
Key Findings
Compute Concentration
The data reveals that even the largest AI models consume less than 1% of their parent organization's estimated peak annual FLOP capacity. For example:
- Gemini 1.0 Ultra ($5 \times 10^{25}$ FLOPs) represents only 0.129% of Google DeepMind's annual capacity
- Claude 3.5 Sonnet ($4.98 \times 10^{25}$ FLOPs) represents 0.220% of Anthropic/Amazon's capacity
- GPT-4o ($3.81 \times 10^{25}$ FLOPs) represents 0.088% of Microsoft/OpenAI's capacity
Implications for ABC
These findings support the thesis argument that current AI training dramatically underutilizes available computing resources. If organizations are using less than 1% of their compute for their flagship models, this suggests:
- Significant compute overhead for experimentation and hyperparameter tuning (potentially 100x the final training run)
- Large amounts of compute dedicated to inference rather than training
- Substantial untapped capacity that could be unlocked through better coordination mechanisms like ABC
Scaling Trends
The most recent frontier models (2024) train at approximately $10^{25}$ FLOPs, representing a roughly 100x increase from GPT-3's $3 \times 10^{23}$ FLOPs in 2020. This exponential scaling continues to validate the importance of access to compute resources for AI capability development.