MODEL SCALE 10²⁵ ▓▓▓▓▓▓▓▓ frontier 10²³ ▓▓▓▓ large 10²¹ ▓▓ medium training compute

TOP MODELS Gemini Ultra Claude 3.5 GPT-4o Llama 3.1 all ~10²⁵ the frontier

< 1% USED org capacity: ████████████ model train: ▌ huge gap! underutilized

100x SCALING 2020: GPT-3 10²³ 2024: GPT-4o 10²⁵ 100x in 4 yrs exponential

THE IMPLICATION if < 1% used for flagships what could ABC unlock? 6+ OOM more the opportunity

Appendix II

Large Model Compute Rankings and GPU Capacity Utilization

This appendix presents a comprehensive ranking of 182 notable AI models, combining data from Epoch AI's "Notable AI Models" database with organizational compute capacity estimates from Appendix I. For each model, we track:

Model name and developing organization(s)
Training compute requirements (FLOPs)
Lab/Cloud provider responsible for training
Parent organization's 2024 estimated peak annual FLOP capacity
Share of organization's publicly known models
Share of peak annual FLOP budget

Frontier Models ($10^{24}$+ FLOPs)

Table 1: AI Model Training Compute Requirements - Frontier Scale

Model	Organization	Lab/Cloud	Train FLOPs	Model/Peak Annual (%)
Gemini 1.0 Ultra	Google DeepMind	Google DeepMind	$5.00 \times 10^{25}$	0.129
Claude 3.5 Sonnet	Anthropic	Anthropic/Amazon	$4.98 \times 10^{25}$	0.220
GPT-4o	OpenAI	Microsoft/OpenAI	$3.81 \times 10^{25}$	0.088
Llama 3.1-405B	Meta AI	Meta AI	$3.80 \times 10^{25}$	0.067
GPT-4	OpenAI	Microsoft/OpenAI	$2.10 \times 10^{25}$	0.048
Gemini 1.0 Pro	Google DeepMind	Google DeepMind	$1.83 \times 10^{25}$	0.047
Claude 3 Opus	Anthropic	Anthropic/Amazon	$1.64 \times 10^{25}$	0.072
Gemini 1.5 Pro	Google DeepMind	Google DeepMind	$1.58 \times 10^{25}$	0.041
Llama 3-70B	Meta AI	Meta AI	$7.86 \times 10^{24}$	0.014
GPT-4o mini	OpenAI	Microsoft/OpenAI	$7.36 \times 10^{24}$	0.017
PaLM 2	Google	Google DeepMind	$7.34 \times 10^{24}$	0.019
Llama 3.3	Meta AI	Meta AI	$6.86 \times 10^{24}$	0.012
Amazon Nova Pro	Amazon	Anthropic/Amazon	$6.00 \times 10^{24}$	0.026
Amazon Titan	Amazon	Anthropic/Amazon	$4.80 \times 10^{24}$	0.021
Claude 2	Anthropic	Anthropic/Amazon	$3.87 \times 10^{24}$	0.017
Minerva (540B)	Google	Google DeepMind	$2.74 \times 10^{24}$	0.007
GPT-3.5	OpenAI	Microsoft/OpenAI	$2.58 \times 10^{24}$	0.006
PaLM (540B)	Google Research	Google DeepMind	$2.53 \times 10^{24}$	0.007
U-PaLM (540B)	Google	Google DeepMind	$2.53 \times 10^{24}$	0.007
Flan-PaLM 540B	Google	Google DeepMind	$2.50 \times 10^{24}$	0.006
FLAN 137B	Google Research	Google DeepMind	$2.05 \times 10^{24}$	0.005
Meta Movie Gen Video	Meta AI	Meta AI	$1.65 \times 10^{24}$	0.003
Megatron-Turing NLG 530B	Microsoft, NVIDIA	Microsoft/OpenAI	$1.17 \times 10^{24}$	0.003

Large Models ($10^{22}$ - $10^{24}$ FLOPs)

Table 2: AI Model Training Compute Requirements - Large Scale

Model	Organization	Lab/Cloud	Train FLOPs	Model/Peak Annual (%)
Llama 2-70B	Meta AI	Meta AI	$8.10 \times 10^{23}$	0.001
Gopher (280B)	DeepMind	Google DeepMind	$6.31 \times 10^{23}$	0.002
Chinchilla	DeepMind	Google DeepMind	$5.76 \times 10^{23}$	0.001
LLaMA-65B	Meta AI	Meta AI	$5.50 \times 10^{23}$	0.001
OPT-175B	Meta AI	Meta AI	$4.30 \times 10^{23}$	0.001
BlenderBot 3	Meta AI, McGill, Mila	Meta AI	$4.30 \times 10^{23}$	0.001
Parti	Google Research	Google DeepMind	$3.96 \times 10^{23}$	0.001
FunSearch	Google DeepMind	Google DeepMind	$3.87 \times 10^{23}$	0.001
GLaM	Google	Google DeepMind	$3.64 \times 10^{23}$	0.001
LaMDA	Google	Google DeepMind	$3.55 \times 10^{23}$	0.001
AlphaGo Zero	DeepMind	Google DeepMind	$3.41 \times 10^{23}$	0.001
Galactica	Meta AI	Meta AI	$3.24 \times 10^{23}$	0.001
InstructGPT 175B	OpenAI	Microsoft/OpenAI	$3.19 \times 10^{23}$	0.001
GPT-3 175B	OpenAI	Microsoft/OpenAI	$3.14 \times 10^{23}$	0.001
ST-MoE	Google	Google DeepMind	$2.90 \times 10^{23}$	0.001
Flamingo	DeepMind	Google DeepMind	$2.19 \times 10^{23}$	0.001
AlexaTM 20B	Amazon	Anthropic/Amazon	$2.04 \times 10^{23}$	0.001
AlphaGo Master	DeepMind	Google DeepMind	$2.00 \times 10^{23}$	0.001
ViT-22B	Google	Google DeepMind	$1.93 \times 10^{23}$	0.001
PaLI	Google	Google DeepMind	$1.69 \times 10^{23}$	<0.001
AlphaCode	DeepMind	Google DeepMind	$1.64 \times 10^{23}$	<0.001
Llama Guard	Meta AI	Meta AI	$1.60 \times 10^{23}$	<0.001
UL2	Google Research	Google DeepMind	$1.20 \times 10^{23}$	<0.001
Meena	Google Brain	Google DeepMind	$1.12 \times 10^{23}$	<0.001
OpenVLA	Stanford, UC Berkeley, DeepMind	Google DeepMind	$1.10 \times 10^{23}$	<0.001
Llama 2-7B	Meta AI	Meta AI	$8.40 \times 10^{22}$	<0.001
Switch	Google	Google DeepMind	$8.22 \times 10^{22}$	<0.001
mT5-XXL	Google Research	Google DeepMind	$8.20 \times 10^{22}$	<0.001
ByT5-XXL	Google Research	Google DeepMind	$8.10 \times 10^{22}$	<0.001
LLaVA 1.5	UW Madison, Microsoft	Microsoft/OpenAI	$7.81 \times 10^{22}$	<0.001
LLaVA	UW Madison, Microsoft, Columbia	Microsoft/OpenAI	$7.80 \times 10^{22}$	<0.001
ProtT5-XXL	TU Munich, NVIDIA, Google	Google DeepMind	$7.37 \times 10^{22}$	<0.001
ESM2-15B	Meta AI, NYU, Stanford, MIT	Meta AI	$7.35 \times 10^{22}$	<0.001
Codex	OpenAI	Microsoft/OpenAI	$7.34 \times 10^{22}$	<0.001
CoCa	Google Research	Google DeepMind	$7.30 \times 10^{22}$	<0.001
OpenAI Five	OpenAI	Microsoft/OpenAI	$6.70 \times 10^{22}$	<0.001
AlphaStar	DeepMind	Google DeepMind	$5.93 \times 10^{22}$	<0.001
ViT-G/14	Google Brain	Google DeepMind	$5.85 \times 10^{22}$	<0.001
XGLM-7.5B	Meta AI	Meta AI	$2.25 \times 10^{22}$	<0.001
GraphCast	Google DeepMind	Google DeepMind	$2.10 \times 10^{22}$	<0.001
NLLB	Meta AI	Meta AI	$1.75 \times 10^{22}$	<0.001
RETRO-7B	DeepMind	Google DeepMind	$1.68 \times 10^{22}$	<0.001
Turing-NLG	Microsoft	Microsoft/OpenAI	$1.57 \times 10^{22}$	<0.001

Medium Models ($10^{20}$ - $10^{22}$ FLOPs)

Table 3: AI Model Training Compute Requirements - Medium Scale

Model	Organization	Train FLOPs
Imagen	Google Brain	$1.46 \times 10^{22}$
OpenAI Five Rerun	OpenAI	$1.30 \times 10^{22}$
CLIP (ViT L/14)	OpenAI	$1.05 \times 10^{22}$
AudioGen	Meta AI, Hebrew University	$9.50 \times 10^{21}$
T5-3B	Google	$9.00 \times 10^{21}$
iGPT-L	OpenAI	$8.91 \times 10^{21}$
ContextNet + Noisy Student	Google	$8.16 \times 10^{21}$
Segment Anything	Meta AI	$7.80 \times 10^{21}$
Conformer + Wav2vec 2.0	Google	$7.60 \times 10^{21}$
GNMT	Google	$6.62 \times 10^{21}$
ADM	OpenAI	$6.20 \times 10^{21}$
XLNet	CMU, Google Brain	$6.19 \times 10^{21}$
NÜWA	Microsoft Research, Peking U	$4.84 \times 10^{21}$
AlphaFold-Multimer	Google DeepMind	$4.35 \times 10^{21}$
ViT-Huge/14	Google Brain	$4.26 \times 10^{21}$
Whisper	OpenAI	$4.21 \times 10^{21}$
Gato	DeepMind	$4.02 \times 10^{21}$
ViT-G (model soup)	UW, Columbia, Google, Meta	$3.40 \times 10^{21}$
ELECTRA	Stanford, Google	$3.10 \times 10^{21}$
AlphaFold 2	DeepMind	$2.99 \times 10^{21}$
ALBERT-xxlarge	Toyota Tech Institute, Google	$2.39 \times 10^{21}$
NASv3 (CIFAR-10)	Google Brain	$2.20 \times 10^{21}$
GPT-2 (1.5B)	OpenAI	$1.92 \times 10^{21}$
EMDR	Mila, McGill, DeepMind	$1.91 \times 10^{21}$
AlphaGo Lee	DeepMind	$1.90 \times 10^{21}$
BigGAN-deep	DeepMind	$1.80 \times 10^{21}$
MnasNet-A3	Google	$1.50 \times 10^{21}$
Swin Transformer V2	Microsoft Research Asia	$1.10 \times 10^{21}$
JFT	Google Research, CMU	$8.43 \times 10^{20}$
OpenAI TI7 DOTA 1v1	OpenAI	$6.05 \times 10^{20}$
BERT-Large-CAS	Amazon	$5.21 \times 10^{20}$
Big Transformer Back-Trans	Meta AI, Google Brain	$4.78 \times 10^{20}$
Xception	Google	$4.36 \times 10^{20}$
AmoebaNet-A	Google Brain	$3.85 \times 10^{20}$
AlphaGo Fan	DeepMind	$3.80 \times 10^{20}$
SNM-skip	Google	$2.98 \times 10^{20}$
BERT-Large	Google	$2.85 \times 10^{20}$
IMPALA	DeepMind	$1.68 \times 10^{20}$
Mesh-TensorFlow 4.9B	Google Brain	$1.62 \times 10^{20}$
Contriever	Meta AI, UCL	$1.57 \times 10^{20}$
AlphaFold	DeepMind	$1.00 \times 10^{20}$
EfficientNetV2-XL	Google	$9.56 \times 10^{19}$
MoE-Multi	Jagiellonian, Google Brain	$9.39 \times 10^{19}$
DeiT-B	Meta AI, Sorbonne	$7.88 \times 10^{19}$
BEIT-3	Microsoft	$7.00 \times 10^{19}$
PNASNet-5	Johns Hopkins, Google AI	$6.63 \times 10^{19}$
Sparse all-MLP	Meta AI	$6.08 \times 10^{19}$
ConvS2S	Meta AI	$5.64 \times 10^{19}$
Seq2Seq LSTM	Google	$5.60 \times 10^{19}$
MuZero	DeepMind	$4.80 \times 10^{19}$
QT-Opt	Google Brain, UC Berkeley	$3.49 \times 10^{19}$
ResNet-200	Microsoft Research Asia	$2.97 \times 10^{19}$
MultiBand Diffusion	Meta AI, Hebrew U	$2.60 \times 10^{19}$
Detic	Meta AI, UT Austin	$2.34 \times 10^{19}$
GPT-1	OpenAI	$1.76 \times 10^{19}$

Small/Historical Models ($10^{12}$ - $10^{19}$ FLOPs)

Table 4: AI Model Training Compute Requirements - Small/Historical Scale

Model	Organization	Train FLOPs
TransE	UTC-CNRS, Google	$1.34 \times 10^{18}$
KN-LM	Google	$7.73 \times 10^{17}$
WeNet	Amazon	$7.30 \times 10^{17}$
Unsupervised High-level Feature	Google	$6.00 \times 10^{17}$
CT-MoS	Google, Nat'l Tsing Hua	$5.62 \times 10^{17}$
DistBelief Speech	Google	$3.11 \times 10^{17}$
Mogrifier RLSTM	DeepMind	$1.40 \times 10^{17}$
ReLU-Speech	Google, Toronto, NYU	$1.28 \times 10^{17}$
Large regularized LSTM	NYU, Google Brain	$9.10 \times 10^{16}$
R-FCN	Tsinghua, Microsoft	$6.15 \times 10^{16}$
ADAM (CIFAR-10)	Amsterdam, OpenAI, Toronto	$6.05 \times 10^{16}$
Word2Vec (large)	Google	$3.89 \times 10^{16}$
ENAS	Google Brain, CMU, Stanford	$2.01 \times 10^{16}$
DARTS	DeepMind, CMU	$1.10 \times 10^{16}$
NAS base 8	Google Brain	$1.05 \times 10^{16}$
ISS	Duke, Microsoft	$3.40 \times 10^{15}$
Search-Proven Best LSTM	Google	$3.34 \times 10^{15}$
DQN	DeepMind	$2.30 \times 10^{15}$
RankNet	Microsoft Research	$3.48 \times 10^{12}$

Key Findings

Compute Concentration

The data reveals that even the largest AI models consume less than 1% of their parent organization's estimated peak annual FLOP capacity. For example:

Gemini 1.0 Ultra ($5 \times 10^{25}$ FLOPs) represents only 0.129% of Google DeepMind's annual capacity
Claude 3.5 Sonnet ($4.98 \times 10^{25}$ FLOPs) represents 0.220% of Anthropic/Amazon's capacity
GPT-4o ($3.81 \times 10^{25}$ FLOPs) represents 0.088% of Microsoft/OpenAI's capacity

Implications for ABC

These findings support the thesis argument that current AI training dramatically underutilizes available computing resources. If organizations are using less than 1% of their compute for their flagship models, this suggests:

Significant compute overhead for experimentation and hyperparameter tuning (potentially 100x the final training run)
Large amounts of compute dedicated to inference rather than training
Substantial untapped capacity that could be unlocked through better coordination mechanisms like ABC

Scaling Trends

The most recent frontier models (2024) train at approximately $10^{25}$ FLOPs, representing a roughly 100x increase from GPT-3's $3 \times 10^{23}$ FLOPs in 2020. This exponential scaling continues to validate the importance of access to compute resources for AI capability development.