AI Hardware Timeline — Chips, GPUs, and Compute That Power AI

Q: What is the difference between training and inference hardware?

Training builds a model by feeding it data — this requires massive compute for weeks or months. Inference runs the finished model to answer questions, needing speed and efficiency rather than raw power. The industry is shifting toward inference-optimised systems. See more at https://aihumanlove.com/ai-breakthroughs/

Q: Why is there an AI chip shortage?

Demand grew faster than manufacturing could expand. Advanced chips require cutting-edge fabrication mostly done by TSMC, and the supply chain includes memory, packaging, power delivery, and cooling — each of which can bottleneck independently. Track ongoing issues at https://aihumanlove.com/timelines/

2026
2025
2024
2022
2020
2018
2016
2012

2026

Mar 2026

TSMC adds $100 billion to Arizona chip manufacturing TSMC

TSMC announced a further $100 billion investment in its Arizona fab complex, on top of $65 billion already committed. The expansion adds fabrication plants, advanced packaging, and a research centre. The first 4nm fab is expected to begin production in early 2025.

Infrastructure
TSMC Press Release ↗

Jan 2026

OpenAI and Cerebras sign a $10 billion inference infrastructure deal Cerebras

OpenAI partnered with Cerebras to deploy 750 megawatts of wafer-scale inference hardware, designed for real-time GPT-5 inference. Cerebras separately revived its IPO plans for mid-2026.

Milestone
Cerebras ↗

2025

Dec 2025

NVIDIA acquires Groq for $20 billion NVIDIA

NVIDIA bought Groq's inference chip technology and engineering team in its largest acquisition. Groq's engineers joined a new Real-Time Inference division, reflecting the industry shift from training hardware toward inference-optimised systems.

Milestone
NVIDIA ↗

Mid-2025

NVIDIA Blackwell B200 GPUs reach mass production NVIDIA

NVIDIA's Blackwell architecture hit full-scale production after initial GB200 NVL72 systems shipped to cloud providers in late 2024. The B200 offered roughly 2.5x speed and 25x energy efficiency gains over Hopper for inference work.

Milestone
NVIDIA ↗

2025

AI data centre power demands strain US electrical grids Industry

AI compute pushed electrical grids toward capacity limits. The largest US grid operator projected a six-gigawatt reliability shortfall by 2027. Chip designers responded by making energy efficiency a first-class design goal alongside raw performance.

Bottleneck
Utility Dive ↗

2025

Memory shortages cause 40–60% AI deployment delays Industry

High-bandwidth memory hit severe shortages, creating bottlenecks even as GPU supply improved. Enterprise customers reported significant deployment delays. The pattern showed that AI hardware supply chains involve far more than GPUs alone — memory, packaging, and cooling all became chokepoints in sequence.

Bottleneck
AI News ↗

Jun 2025

AMD ships MI355X, its most competitive data centre GPU AMD

AMD released the Instinct MI355X, claiming four times the performance of its MI300X for AI training and inference. The chip gave cloud providers a credible alternative to NVIDIA and some leverage on pricing.

Milestone
AMD Newsroom ↗

2024

Cerebras builds WSE-3 with 4 trillion transistors on a single wafer Cerebras

Cerebras announced its third-generation Wafer-Scale Engine on TSMC 3nm, packing roughly 4 trillion transistors onto a single wafer-sized die. The company raised $1.1 billion at an $8.1 billion valuation to scale production. It remains the largest chip ever built.

Milestone
Cerebras ↗

2022

Mar 2022

NVIDIA announces the H100 Hopper GPU NVIDIA

The H100 introduced a Transformer Engine built specifically for large language models, with up to 9x faster training over the A100. Demand massively outstripped supply throughout 2023, with individual GPUs trading above $40,000 on secondary markets. This was the chip behind the GPT-4 era.

Milestone
NVIDIA Newsroom ↗

2020

Nov 2020

Apple ships the M1, bringing neural engines to consumer laptops Apple

Apple’s first custom ARM silicon for Mac included a 16-core Neural Engine running 11 trillion operations per second on TSMC 5nm. It proved that dedicated ML hardware in a consumer device could outperform general-purpose processors in both speed and energy efficiency.

Milestone
Apple Newsroom ↗

May 2020

NVIDIA launches the A100, its first GPU built for AI from scratch NVIDIA

The A100 delivered 19.5 teraflops of FP32 performance with multi-instance GPU technology that let one chip run multiple AI jobs simultaneously. It became the standard training hardware for GPT-3, DALL-E, and the first wave of foundation models.

Milestone
NVIDIA Newsroom ↗

2018

Google opens TPU access to cloud customers Google

After developing Tensor Processing Units internally since 2015, Google made TPU v3 pods available through Google Cloud. TPUs became the training hardware behind BERT and later PaLM, establishing the template for tech giants building their own AI silicon rather than relying entirely on NVIDIA.

Custom Silicon
Google Cloud ↗

2016

May 2016

Google reveals it has been running custom AI chips since 2015 Google

At Google I/O, Google disclosed that custom Tensor Processing Units had been running in its data centres since 2015, powering Search, Street View, and AlphaGo. The announcement showed that the largest AI workloads were already outgrowing general-purpose hardware.

Custom Silicon
DataCenter Knowledge ↗

2012

AlexNet wins ImageNet using two gaming GPUs NVIDIA

Alex Krizhevsky’s deep neural network won the ImageNet competition by a wide margin, trained on two NVIDIA GTX 580 consumer graphics cards with 3 GB of memory each. The result proved that GPUs designed for gaming could train neural networks far faster than CPUs — the insight that would eventually redirect NVIDIA’s entire business.

Milestone
IEEE Spectrum ↗

Frequently Asked Questions

Why does AI need specialised hardware?

Training and running AI models involves billions of simple math operations happening at once. General-purpose CPUs handle tasks one at a time; GPUs and custom AI chips run thousands in parallel, which cuts training time from months to days. As models got larger, the gap between general and specialised hardware widened. More on how AI tools work on our AI infrastructure page.

What is the difference between training and inference hardware?

Training is the process of building a model by feeding it data and adjusting its parameters — this requires massive compute for weeks or months. Inference is running the finished model to answer questions or generate content, which needs speed and efficiency rather than raw power. The industry is shifting attention from training chips toward inference-optimised systems as deployed AI usage grows. See more timelines on AI breakthroughs.

Why is there an AI chip shortage?

Demand for AI hardware grew faster than manufacturing could expand. Advanced chips require cutting-edge fabrication (mostly done by TSMC in Taiwan), and the supply chain includes high-bandwidth memory, advanced packaging, power delivery, and cooling — each of which can bottleneck independently. Even when GPU supply improved, memory shortages created new delays. We track ongoing supply chain issues across all our timelines.