MODEL SCALE 10²⁵ ▓▓▓▓▓▓▓▓ frontier 10²³ ▓▓▓▓ large 10²¹ ▓▓ medium training compute
TOP MODELS Gemini Ultra Claude 3.5 GPT-4o Llama 3.1 all ~10²⁵ the frontier
< 1% USED org capacity: ████████████ model train: ▌ huge gap! underutilized
100x SCALING 2020: GPT-3 10²³ 2024: GPT-4o 10²⁵ 100x in 4 yrs exponential
THE IMPLICATION if < 1% used for flagships what could ABC unlock? 6+ OOM more the opportunity

Appendix II

Large Model Compute Rankings and GPU Capacity Utilization

This appendix presents a comprehensive ranking of 182 notable AI models, combining data from Epoch AI's "Notable AI Models" database with organizational compute capacity estimates from Appendix I. For each model, we track:

  • Model name and developing organization(s)
  • Training compute requirements (FLOPs)
  • Lab/Cloud provider responsible for training
  • Parent organization's 2024 estimated peak annual FLOP capacity
  • Share of organization's publicly known models
  • Share of peak annual FLOP budget

Frontier Models ($10^{24}$+ FLOPs)

Table 1: AI Model Training Compute Requirements - Frontier Scale

Model Organization Lab/Cloud Train FLOPs Model/Peak Annual (%)
Gemini 1.0 UltraGoogle DeepMindGoogle DeepMind$5.00 \times 10^{25}$0.129
Claude 3.5 SonnetAnthropicAnthropic/Amazon$4.98 \times 10^{25}$0.220
GPT-4oOpenAIMicrosoft/OpenAI$3.81 \times 10^{25}$0.088
Llama 3.1-405BMeta AIMeta AI$3.80 \times 10^{25}$0.067
GPT-4OpenAIMicrosoft/OpenAI$2.10 \times 10^{25}$0.048
Gemini 1.0 ProGoogle DeepMindGoogle DeepMind$1.83 \times 10^{25}$0.047
Claude 3 OpusAnthropicAnthropic/Amazon$1.64 \times 10^{25}$0.072
Gemini 1.5 ProGoogle DeepMindGoogle DeepMind$1.58 \times 10^{25}$0.041
Llama 3-70BMeta AIMeta AI$7.86 \times 10^{24}$0.014
GPT-4o miniOpenAIMicrosoft/OpenAI$7.36 \times 10^{24}$0.017
PaLM 2GoogleGoogle DeepMind$7.34 \times 10^{24}$0.019
Llama 3.3Meta AIMeta AI$6.86 \times 10^{24}$0.012
Amazon Nova ProAmazonAnthropic/Amazon$6.00 \times 10^{24}$0.026
Amazon TitanAmazonAnthropic/Amazon$4.80 \times 10^{24}$0.021
Claude 2AnthropicAnthropic/Amazon$3.87 \times 10^{24}$0.017
Minerva (540B)GoogleGoogle DeepMind$2.74 \times 10^{24}$0.007
GPT-3.5OpenAIMicrosoft/OpenAI$2.58 \times 10^{24}$0.006
PaLM (540B)Google ResearchGoogle DeepMind$2.53 \times 10^{24}$0.007
U-PaLM (540B)GoogleGoogle DeepMind$2.53 \times 10^{24}$0.007
Flan-PaLM 540BGoogleGoogle DeepMind$2.50 \times 10^{24}$0.006
FLAN 137BGoogle ResearchGoogle DeepMind$2.05 \times 10^{24}$0.005
Meta Movie Gen VideoMeta AIMeta AI$1.65 \times 10^{24}$0.003
Megatron-Turing NLG 530BMicrosoft, NVIDIAMicrosoft/OpenAI$1.17 \times 10^{24}$0.003

Large Models ($10^{22}$ - $10^{24}$ FLOPs)

Table 2: AI Model Training Compute Requirements - Large Scale

Model Organization Lab/Cloud Train FLOPs Model/Peak Annual (%)
Llama 2-70BMeta AIMeta AI$8.10 \times 10^{23}$0.001
Gopher (280B)DeepMindGoogle DeepMind$6.31 \times 10^{23}$0.002
ChinchillaDeepMindGoogle DeepMind$5.76 \times 10^{23}$0.001
LLaMA-65BMeta AIMeta AI$5.50 \times 10^{23}$0.001
OPT-175BMeta AIMeta AI$4.30 \times 10^{23}$0.001
BlenderBot 3Meta AI, McGill, MilaMeta AI$4.30 \times 10^{23}$0.001
PartiGoogle ResearchGoogle DeepMind$3.96 \times 10^{23}$0.001
FunSearchGoogle DeepMindGoogle DeepMind$3.87 \times 10^{23}$0.001
GLaMGoogleGoogle DeepMind$3.64 \times 10^{23}$0.001
LaMDAGoogleGoogle DeepMind$3.55 \times 10^{23}$0.001
AlphaGo ZeroDeepMindGoogle DeepMind$3.41 \times 10^{23}$0.001
GalacticaMeta AIMeta AI$3.24 \times 10^{23}$0.001
InstructGPT 175BOpenAIMicrosoft/OpenAI$3.19 \times 10^{23}$0.001
GPT-3 175BOpenAIMicrosoft/OpenAI$3.14 \times 10^{23}$0.001
ST-MoEGoogleGoogle DeepMind$2.90 \times 10^{23}$0.001
FlamingoDeepMindGoogle DeepMind$2.19 \times 10^{23}$0.001
AlexaTM 20BAmazonAnthropic/Amazon$2.04 \times 10^{23}$0.001
AlphaGo MasterDeepMindGoogle DeepMind$2.00 \times 10^{23}$0.001
ViT-22BGoogleGoogle DeepMind$1.93 \times 10^{23}$0.001
PaLIGoogleGoogle DeepMind$1.69 \times 10^{23}$<0.001
AlphaCodeDeepMindGoogle DeepMind$1.64 \times 10^{23}$<0.001
Llama GuardMeta AIMeta AI$1.60 \times 10^{23}$<0.001
UL2Google ResearchGoogle DeepMind$1.20 \times 10^{23}$<0.001
MeenaGoogle BrainGoogle DeepMind$1.12 \times 10^{23}$<0.001
OpenVLAStanford, UC Berkeley, DeepMindGoogle DeepMind$1.10 \times 10^{23}$<0.001
Llama 2-7BMeta AIMeta AI$8.40 \times 10^{22}$<0.001
SwitchGoogleGoogle DeepMind$8.22 \times 10^{22}$<0.001
mT5-XXLGoogle ResearchGoogle DeepMind$8.20 \times 10^{22}$<0.001
ByT5-XXLGoogle ResearchGoogle DeepMind$8.10 \times 10^{22}$<0.001
LLaVA 1.5UW Madison, MicrosoftMicrosoft/OpenAI$7.81 \times 10^{22}$<0.001
LLaVAUW Madison, Microsoft, ColumbiaMicrosoft/OpenAI$7.80 \times 10^{22}$<0.001
ProtT5-XXLTU Munich, NVIDIA, GoogleGoogle DeepMind$7.37 \times 10^{22}$<0.001
ESM2-15BMeta AI, NYU, Stanford, MITMeta AI$7.35 \times 10^{22}$<0.001
CodexOpenAIMicrosoft/OpenAI$7.34 \times 10^{22}$<0.001
CoCaGoogle ResearchGoogle DeepMind$7.30 \times 10^{22}$<0.001
OpenAI FiveOpenAIMicrosoft/OpenAI$6.70 \times 10^{22}$<0.001
AlphaStarDeepMindGoogle DeepMind$5.93 \times 10^{22}$<0.001
ViT-G/14Google BrainGoogle DeepMind$5.85 \times 10^{22}$<0.001
XGLM-7.5BMeta AIMeta AI$2.25 \times 10^{22}$<0.001
GraphCastGoogle DeepMindGoogle DeepMind$2.10 \times 10^{22}$<0.001
NLLBMeta AIMeta AI$1.75 \times 10^{22}$<0.001
RETRO-7BDeepMindGoogle DeepMind$1.68 \times 10^{22}$<0.001
Turing-NLGMicrosoftMicrosoft/OpenAI$1.57 \times 10^{22}$<0.001

Medium Models ($10^{20}$ - $10^{22}$ FLOPs)

Table 3: AI Model Training Compute Requirements - Medium Scale

Model Organization Train FLOPs
ImagenGoogle Brain$1.46 \times 10^{22}$
OpenAI Five RerunOpenAI$1.30 \times 10^{22}$
CLIP (ViT L/14)OpenAI$1.05 \times 10^{22}$
AudioGenMeta AI, Hebrew University$9.50 \times 10^{21}$
T5-3BGoogle$9.00 \times 10^{21}$
iGPT-LOpenAI$8.91 \times 10^{21}$
ContextNet + Noisy StudentGoogle$8.16 \times 10^{21}$
Segment AnythingMeta AI$7.80 \times 10^{21}$
Conformer + Wav2vec 2.0Google$7.60 \times 10^{21}$
GNMTGoogle$6.62 \times 10^{21}$
ADMOpenAI$6.20 \times 10^{21}$
XLNetCMU, Google Brain$6.19 \times 10^{21}$
NÜWAMicrosoft Research, Peking U$4.84 \times 10^{21}$
AlphaFold-MultimerGoogle DeepMind$4.35 \times 10^{21}$
ViT-Huge/14Google Brain$4.26 \times 10^{21}$
WhisperOpenAI$4.21 \times 10^{21}$
GatoDeepMind$4.02 \times 10^{21}$
ViT-G (model soup)UW, Columbia, Google, Meta$3.40 \times 10^{21}$
ELECTRAStanford, Google$3.10 \times 10^{21}$
AlphaFold 2DeepMind$2.99 \times 10^{21}$
ALBERT-xxlargeToyota Tech Institute, Google$2.39 \times 10^{21}$
NASv3 (CIFAR-10)Google Brain$2.20 \times 10^{21}$
GPT-2 (1.5B)OpenAI$1.92 \times 10^{21}$
EMDRMila, McGill, DeepMind$1.91 \times 10^{21}$
AlphaGo LeeDeepMind$1.90 \times 10^{21}$
BigGAN-deepDeepMind$1.80 \times 10^{21}$
MnasNet-A3Google$1.50 \times 10^{21}$
Swin Transformer V2Microsoft Research Asia$1.10 \times 10^{21}$
JFTGoogle Research, CMU$8.43 \times 10^{20}$
OpenAI TI7 DOTA 1v1OpenAI$6.05 \times 10^{20}$
BERT-Large-CASAmazon$5.21 \times 10^{20}$
Big Transformer Back-TransMeta AI, Google Brain$4.78 \times 10^{20}$
XceptionGoogle$4.36 \times 10^{20}$
AmoebaNet-AGoogle Brain$3.85 \times 10^{20}$
AlphaGo FanDeepMind$3.80 \times 10^{20}$
SNM-skipGoogle$2.98 \times 10^{20}$
BERT-LargeGoogle$2.85 \times 10^{20}$
IMPALADeepMind$1.68 \times 10^{20}$
Mesh-TensorFlow 4.9BGoogle Brain$1.62 \times 10^{20}$
ContrieverMeta AI, UCL$1.57 \times 10^{20}$
AlphaFoldDeepMind$1.00 \times 10^{20}$
EfficientNetV2-XLGoogle$9.56 \times 10^{19}$
MoE-MultiJagiellonian, Google Brain$9.39 \times 10^{19}$
DeiT-BMeta AI, Sorbonne$7.88 \times 10^{19}$
BEIT-3Microsoft$7.00 \times 10^{19}$
PNASNet-5Johns Hopkins, Google AI$6.63 \times 10^{19}$
Sparse all-MLPMeta AI$6.08 \times 10^{19}$
ConvS2SMeta AI$5.64 \times 10^{19}$
Seq2Seq LSTMGoogle$5.60 \times 10^{19}$
MuZeroDeepMind$4.80 \times 10^{19}$
QT-OptGoogle Brain, UC Berkeley$3.49 \times 10^{19}$
ResNet-200Microsoft Research Asia$2.97 \times 10^{19}$
MultiBand DiffusionMeta AI, Hebrew U$2.60 \times 10^{19}$
DeticMeta AI, UT Austin$2.34 \times 10^{19}$
GPT-1OpenAI$1.76 \times 10^{19}$

Small/Historical Models ($10^{12}$ - $10^{19}$ FLOPs)

Table 4: AI Model Training Compute Requirements - Small/Historical Scale

Model Organization Train FLOPs
TransEUTC-CNRS, Google$1.34 \times 10^{18}$
KN-LMGoogle$7.73 \times 10^{17}$
WeNetAmazon$7.30 \times 10^{17}$
Unsupervised High-level FeatureGoogle$6.00 \times 10^{17}$
CT-MoSGoogle, Nat'l Tsing Hua$5.62 \times 10^{17}$
DistBelief SpeechGoogle$3.11 \times 10^{17}$
Mogrifier RLSTMDeepMind$1.40 \times 10^{17}$
ReLU-SpeechGoogle, Toronto, NYU$1.28 \times 10^{17}$
Large regularized LSTMNYU, Google Brain$9.10 \times 10^{16}$
R-FCNTsinghua, Microsoft$6.15 \times 10^{16}$
ADAM (CIFAR-10)Amsterdam, OpenAI, Toronto$6.05 \times 10^{16}$
Word2Vec (large)Google$3.89 \times 10^{16}$
ENASGoogle Brain, CMU, Stanford$2.01 \times 10^{16}$
DARTSDeepMind, CMU$1.10 \times 10^{16}$
NAS base 8Google Brain$1.05 \times 10^{16}$
ISSDuke, Microsoft$3.40 \times 10^{15}$
Search-Proven Best LSTMGoogle$3.34 \times 10^{15}$
DQNDeepMind$2.30 \times 10^{15}$
RankNetMicrosoft Research$3.48 \times 10^{12}$

Key Findings

Compute Concentration

The data reveals that even the largest AI models consume less than 1% of their parent organization's estimated peak annual FLOP capacity. For example:

  • Gemini 1.0 Ultra ($5 \times 10^{25}$ FLOPs) represents only 0.129% of Google DeepMind's annual capacity
  • Claude 3.5 Sonnet ($4.98 \times 10^{25}$ FLOPs) represents 0.220% of Anthropic/Amazon's capacity
  • GPT-4o ($3.81 \times 10^{25}$ FLOPs) represents 0.088% of Microsoft/OpenAI's capacity

Implications for ABC

These findings support the thesis argument that current AI training dramatically underutilizes available computing resources. If organizations are using less than 1% of their compute for their flagship models, this suggests:

  • Significant compute overhead for experimentation and hyperparameter tuning (potentially 100x the final training run)
  • Large amounts of compute dedicated to inference rather than training
  • Substantial untapped capacity that could be unlocked through better coordination mechanisms like ABC

Scaling Trends

The most recent frontier models (2024) train at approximately $10^{25}$ FLOPs, representing a roughly 100x increase from GPT-3's $3 \times 10^{23}$ FLOPs in 2020. This exponential scaling continues to validate the importance of access to compute resources for AI capability development.