From mathematical daydreams to trillion-parameter models — the ideas, inventions, and nations shaping the long march toward Artificial General Intelligence.
British mathematician Alan Turing published "On Computable Numbers," introducing the concept of a universal machine that could simulate any computation. This theoretical framework laid the mathematical bedrock for every computer and every AI system that would follow.
Established that a single machine could, in principle, perform any computation — the philosophical ancestor of the idea that one machine could think generally.
Turing published "Computing Machinery and Intelligence," posing the question "Can machines think?" and proposing a practical test (now called the Turing Test) to evaluate machine intelligence. He predicted that by the year 2000 machines would fool human judges 30% of the time.
Gave the world its first concrete benchmark for machine intelligence and framed the AGI challenge as a scientific question rather than a philosophical one.
John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon organised the Dartmouth Summer Research Project on Artificial Intelligence. The workshop coined the term "artificial intelligence" and established AI as a distinct academic discipline. McCarthy later founded the Stanford AI Laboratory.
Created the institutional and intellectual framework for AI research, bringing together the founding generation whose students would dominate the field for decades.
At Cornell, Frank Rosenblatt built the Mark I Perceptron, the first machine that could learn from examples. Funded by the US Navy, it was a hardware neural network that learned to classify simple visual patterns using adaptive weights.
Proved that machines could learn, not just follow rules — the seed from which all modern deep learning would eventually grow.
John McCarthy at MIT created LISP (List Processing), a programming language built on lambda calculus with features like recursion, dynamic typing, and garbage collection. It became the standard language for AI research for the next three decades.
Gave AI researchers a tool powerful enough to express symbolic reasoning, enabling the first wave of expert systems and knowledge-based AI.
Joseph Weizenbaum at MIT created ELIZA, a simple pattern-matching program that simulated a psychotherapist. Despite using no real understanding, many users became emotionally attached to it, revealing humanity's readiness to attribute intelligence to machines.
Demonstrated the "ELIZA effect" — humans' tendency to read deep intelligence into simple systems — a phenomenon still shaping AI product design and public perception today.
Marvin Minsky and Seymour Papert published "Perceptrons," a mathematical analysis proving that single-layer perceptrons could not learn certain functions (like XOR). The book was widely interpreted as a death blow to neural network research.
Triggered the first "AI winter" for neural networks, pushing funding and attention toward symbolic AI for over a decade — but the limitations they identified also pointed toward the multi-layer solution.
British mathematician James Lighthill published a devastating government-commissioned report concluding that AI had failed to deliver on its promises. The UK government cut nearly all AI funding, and the report influenced funders worldwide.
Proved that a single country's assessment could reshape global research priorities — AI funding collapsed across Europe and slowed in the US for most of the 1970s.
Philosopher John Searle at UC Berkeley proposed the Chinese Room thought experiment, arguing that a computer manipulating symbols according to rules does not truly "understand" anything. The argument challenged whether symbol-processing AI could ever achieve genuine intelligence.
Became the most debated philosophical argument against strong AI, forcing researchers to clarify what "intelligence" and "understanding" actually mean — a debate that intensifies with every new frontier model.
Japan's Ministry of International Trade and Industry (MITI) launched the Fifth Generation Computer Systems project, a $400 million national initiative to build intelligent computers using logic programming and parallel processing. It aimed to achieve conversational AI and expert reasoning within a decade.
Triggered a global AI arms race — the US launched the Strategic Computing Initiative and the UK started the Alvey Programme in direct response. Though the project fell short of its goals, it demonstrated how national ambition could mobilise AI research at scale.
David Rumelhart, Geoffrey Hinton, and Ronald Williams published a clear, practical method for training multi-layer neural networks using backpropagation of errors. Though the algorithm had been discovered earlier, this paper demonstrated it could learn useful internal representations.
Resolved the limitation Minsky and Papert had identified — multi-layer networks could now learn XOR and much more. This single paper reignited neural network research and set the stage for deep learning.
Roboticist Hans Moravec at Carnegie Mellon articulated what became known as Moravec's Paradox: high-level reasoning (chess, logic) is computationally cheap for machines, but low-level sensorimotor skills (walking, catching a ball) are enormously hard. "It is comparatively easy to make computers exhibit adult-level performance on intelligence tests, and difficult to give them the skills of a one-year-old."
Revealed that AGI would require solving embodied intelligence, not just abstract reasoning — a lesson the robotics and multimodal AI communities are still working through today.
Yann LeCun at Bell Labs demonstrated that convolutional neural networks (CNNs) trained with backpropagation could recognise handwritten digits with high accuracy. The system was deployed commercially to read ZIP codes on US mail.
Proved that neural networks could solve real commercial problems — and that the right architecture matched to the right data could outperform hand-engineered solutions.
Sepp Hochreiter's diploma thesis (supervised by Jürgen Schmidhuber in Munich) formally identified the vanishing gradient problem — the mathematical reason deep neural networks were failing to learn. Gradients shrank exponentially through layers, making training beyond a few layers impractical.
Diagnosing the disease was the first step to curing it. This work directly led to LSTM networks and later to the architectural innovations (residual connections, normalization) that made modern deep learning possible.
IBM's Deep Blue defeated world chess champion Garry Kasparov in a six-game match. The system used brute-force search evaluating 200 million positions per second, combined with hand-tuned evaluation functions and an opening book crafted by grandmasters.
Showed the world that machines could outperform the best human minds in complex strategic tasks — but also highlighted that raw computation without learning was a dead end for general intelligence.
Sepp Hochreiter and Jürgen Schmidhuber published Long Short-Term Memory (LSTM), a recurrent neural network architecture with gated memory cells that could learn to store, retrieve, and forget information over long sequences. Developed between Munich and Switzerland.
Solved the vanishing gradient problem for sequences, enabling breakthroughs in speech recognition, machine translation, and time-series prediction — the dominant sequence architecture until transformers arrived 20 years later.
The Canadian Institute for Advanced Research (CIFAR) launched its Neural Computation and Adaptive Perception programme, providing sustained funding to Geoffrey Hinton, Yoshua Bengio, and Yann LeCun when neural network research was deeply unfashionable. This "Canadian Mafia" would go on to win the 2018 Turing Award.
Demonstrated that a small country's patient investment in unfashionable science could reshape an entire field. Canada became the birthplace of modern deep learning, and the three researchers it funded became the "godfathers of AI."
Geoffrey Hinton at the University of Toronto published a breakthrough method for training deep neural networks using layer-by-layer unsupervised pretraining followed by fine-tuning. For the first time, networks with many layers could be trained effectively.
Launched the deep learning revolution. The paper's title — "A Fast Learning Algorithm for Deep Belief Nets" — coined the term that would define the next two decades of AI.
Stanford professor Fei-Fei Li and her team published ImageNet, a dataset of 14 million hand-labelled images across 20,000+ categories. The associated annual competition (ILSVRC) became the benchmark that drove computer vision forward.
Proved that data at scale was as important as algorithms — a lesson that would define the entire trajectory toward AGI. ImageNet competitions became the arena where deep learning proved itself.
Jeff Dean and Andrew Ng launched Google Brain, a deep learning research project within Google. Using 16,000 CPU cores across 1,000 machines, the team trained a neural network that learned to detect cats in YouTube videos without being told what a cat was — unsupervised feature learning at scale.
Marked the moment Big Tech realised deep learning was not academic curiosity but a core technology. Google, Facebook, and others began building massive AI research labs, shifting the centre of gravity from universities to corporations.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto trained AlexNet, a deep convolutional neural network, on GPUs. It won the ImageNet competition with a top-5 error rate of 15.3% — crushing the runner-up at 26.2%. The margin of victory was unprecedented.
The single most consequential moment in modern AI. Proved that deep networks + GPUs + big data could demolish traditional approaches. Every major AI company traces its current strategy back to this result.
Google acquired London-based DeepMind Technologies for approximately $500 million. Founded by Demis Hassabis, Shane Legg, and Mustafa Suleiman, DeepMind's explicit mission was to "solve intelligence." The acquisition brought together Google's infrastructure with DeepMind's reinforcement learning expertise.
Signalled that the world's most valuable companies now viewed AGI as a realistic engineering goal, not science fiction. DeepMind's London lab became one of the most important AI research centres on the planet.
Ian Goodfellow at the University of Montreal invented GANs — a framework where two neural networks compete, one generating data and one judging it. The idea reportedly came to him during a conversation at a bar. GANs became the dominant generative model for half a decade.
Gave machines the ability to create, not just classify — a fundamental capability for any system aspiring to general intelligence.
DeepMind's AlphaGo defeated world Go champion Lee Sedol 4-1 in Seoul. Go has more possible positions than atoms in the universe, making brute-force search impossible. AlphaGo combined deep neural networks with Monte Carlo tree search to develop intuitive, human-like play.
Proved that AI could master domains requiring intuition and creativity, not just calculation. Move 37 in Game 2 — a move no human would play — demonstrated genuine machine creativity.
Eight researchers at Google (Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, and Polosukhin) published the transformer architecture, replacing recurrence entirely with self-attention mechanisms. The paper's deceptively simple title belied its revolutionary impact.
Created the architectural foundation for GPT, BERT, PaLM, Claude, LLaMA, and every other large language model. Without this paper, the current path to AGI would not exist.
China's State Council released a national strategy aiming to make China the world leader in AI by 2030, with a domestic AI industry worth $150 billion. The plan committed massive government funding, designated national AI champions (Baidu, Alibaba, Tencent, iFlytek, SenseTime), and integrated AI into education at all levels.
Turned AI development into an explicit geopolitical race. The US-China AI competition that defines the 2020s traces directly to this document.
DeepMind's AlphaZero mastered chess, Go, and shogi (Japanese chess) in hours, starting from only the rules — no human games, no human knowledge, no opening books. It defeated the world's strongest specialised programs in all three games.
Demonstrated that a single general learning algorithm could achieve superhuman performance across multiple domains from scratch — arguably the closest thing to "general" intelligence demonstrated in a narrow system.
In 2018, Google released BERT (bidirectional pretraining) and OpenAI released GPT-1 (autoregressive pretraining), two competing approaches to making language models understand context. BERT dominated NLP benchmarks; GPT's approach — simply predicting the next word — would ultimately prove more scalable.
Established that pretraining on massive text data creates powerful representations of language — the core insight that "next-word prediction is all you need" to learn about the world.
The United Arab Emirates established the Technology Innovation Institute (TII) and began building what would become the Falcon series of large language models. The UAE had already appointed the world's first Minister of State for Artificial Intelligence in 2017, signalling AI as a core national strategy.
Showed that AGI-relevant research was no longer confined to traditional tech powers. Smaller nations with strategic vision and funding could become serious players in frontier AI development.
OpenAI trained GPT-2, a 1.5-billion parameter language model that generated remarkably coherent text. They initially withheld the full model, citing concerns about misuse — the first major public debate about whether AI capabilities should be freely shared.
Marked the moment AI safety went from academic concern to front-page news. The tension between open research and responsible deployment that defines today's AI landscape crystallised around this release.
OpenAI released GPT-3, a 175-billion parameter model that could perform tasks it was never explicitly trained for — translation, code generation, arithmetic — simply from a few examples in its prompt. The "scaling hypothesis" — that bigger models develop qualitatively new abilities — gained its strongest evidence.
Convinced much of the AI community that scale itself might be a path to general intelligence. The race to build bigger models accelerated dramatically.
DeepMind's AlphaFold 2 solved the 50-year-old protein folding problem, predicting 3D structures of proteins to near-experimental accuracy. It later predicted the structure of nearly every known protein — over 200 million structures — and contributed to the 2024 Nobel Prize in Chemistry.
Showed that AI could solve fundamental scientific problems that had defeated human researchers for decades. If AGI means "as good as the best humans at everything," AlphaFold was the strongest evidence yet.
AI21 Labs released Jurassic-1, a 178-billion parameter model rivalling GPT-3, built in Tel Aviv. Israel — with more AI startups per capita than any other nation — demonstrated disproportionate impact: companies like Mobileye (autonomous driving), AI21, and Habana Labs (AI chips, acquired by Intel) positioned the country as a per-capita AI superpower.
Proved that population size matters less than talent density, military-technical pipeline (Unit 8200), and startup culture in the race toward advanced AI.
DeepMind researchers published findings showing that most large language models were significantly undertrained. Their "Chinchilla" model, with 70 billion parameters trained on 1.4 trillion tokens, outperformed the 280-billion parameter Gopher. The key insight: model size and data size should scale together.
Shifted the entire industry's approach to training. Instead of just building bigger models, labs began focusing on data quality, quantity, and the optimal balance between parameters and training tokens.
OpenAI released ChatGPT, a conversational interface to GPT-3.5. It reached 1 million users in 5 days and 100 million in 2 months — the fastest-growing application in history. For the first time, hundreds of millions of people experienced AI that could hold a conversation.
Transformed AGI from an abstract research goal into a mainstream cultural conversation. The entire world began debating whether machines were on the verge of thinking, and billions in investment followed.
OpenAI released GPT-4, a multimodal model that could process text and images. It passed the bar exam in the 90th percentile, scored in the top percentiles on AP exams, and demonstrated reasoning abilities that led some researchers to publish papers about "sparks of AGI."
Made the AGI debate urgent. When a system can pass the bar exam and write working software, the question "how far are we from AGI?" shifted from decades to years in many researchers' estimates.
Three former Google DeepMind and Meta researchers founded Mistral AI in Paris. Within months, they released Mixtral 8x7B, an open-source mixture-of-experts model that rivalled GPT-3.5. The company raised €385 million in its first year, making it Europe's most valuable AI startup.
Proved Europe could compete at the frontier of AI research, not just regulate it. France's investment in mathematics education — producing Fields Medal winners at a disproportionate rate — finally found its commercial AI application.
Chinese lab DeepSeek released R1, an open-source reasoning model that matched proprietary frontier models at a fraction of the training cost. The model demonstrated strong chain-of-thought reasoning and was released with full weights, challenging the assumption that cutting-edge AI required Western-scale budgets.
Shattered the narrative that only well-funded US labs could build frontier AI. Proved that algorithmic innovation and efficiency could compensate for raw compute, intensifying global competition.
OpenAI released o1, a model that spends more time "thinking" before answering. Using reinforcement learning to develop internal reasoning chains, o1 achieved state-of-the-art results on mathematics, coding, and scientific reasoning benchmarks — in some cases matching PhD-level performance.
Opened a new scaling axis: instead of just making models bigger, making them think longer during inference. This "test-time compute" approach may be a critical ingredient for AGI-level reasoning.
Anthropic released Claude 3.5 Sonnet with the ability to see, understand, and control a computer screen. For the first time, an AI could autonomously navigate software, fill forms, write documents, and execute multi-step workflows across applications.
Moved AI from answering questions to doing work. Agentic AI — systems that act autonomously in the real world — represents one of the final bridges between narrow AI and general-purpose intelligence.
India surpassed the US and China in the number of AI and machine learning engineers, with its IIT system and tech ecosystem producing more AI practitioners than any other country. Companies like Krutrim (founded by Ola's Bhavish Aggarwal) raised $50 million to build India-first foundation models in multiple Indian languages.
Demonstrated that the road to AGI runs through talent as much as compute. India's massive engineering pipeline ensures it will be a defining force in AI development for decades.
Anthropic released Claude Opus 4 and Sonnet 4, models capable of sustained, multi-hour agentic work — coding, research, and analysis across complex, multi-step tasks. Extended thinking capabilities allowed models to reason through problems requiring hundreds of steps.
Shifted the benchmark from "can it answer a question?" to "can it do a day's work?" Long-horizon autonomous agents represent the frontier of what separates current AI from AGI.
OpenAI released GPT-5.4 with a 1-million token context window, unifying previously separate reasoning, coding, and general capabilities into a single model. The system demonstrated 33% fewer factual errors and improved agentic workflow completion.
Consolidation of capabilities into single models mirrors how general intelligence works — not separate modules, but unified understanding. Each generation closes the gap between narrow and general.
Despite extraordinary progress, researchers continue to debate what's still needed for AGI. Open challenges include: genuine causal reasoning (not just pattern matching), persistent memory and learning from experience, embodied intelligence and physical world understanding, robust common sense, goal-setting and autonomous motivation, and consciousness — if it matters at all.
The road to AGI may be 90% travelled or only 10%. The honest answer is that nobody knows — but the pace of progress suggests we'll find out sooner than most expected.
This page documents publicly reported milestones on the path toward artificial general intelligence, for informational and educational purposes. All descriptions are based on published research, official announcements, and news reporting. The ordering reflects when each development occurred; it does not imply a single linear path to AGI.
Some content on this page was created with the assistance of AI tools.