IA Descubre
Everything AI changes.
LTX 2.3 generates synchronised video and audio in a single pass
Lightricks released LTX 2.3, a 22-billion-parameter diffusion transformer model that generates synchronised video and audio in a single forward pass. The model supports resolutions up to 4K at 50 frames per second, marking a significant leap in real-time media generation quality.
Demonstrated that unified video-and-audio generation is now possible without separate pipeline stages, pointing toward a future where multimedia content can be produced end-to-end by a single model.
Read more →GPT-5.4 launches with 1 million token context window
OpenAI released GPT-5.4, its most capable frontier model, available in Standard, Thinking, and Pro variants. The model features a context window of up to 1 million tokens (the largest from OpenAI), a reported 33% reduction in factual errors compared to GPT-5.2, and improved capabilities across coding, reasoning, and agentic workflows.
Consolidated multiple prior model specialisations into a single frontier release, signalling a shift away from separate reasoning and coding models toward unified general-purpose systems.
OpenAI AnnouncementAtlas Humanoid with Neural Large Behavior Models
Boston Dynamics demonstrated a newly redesigned Atlas humanoid robot powered by neural Large Behavior Models from Toyota Research Institute. The robot performed complex multi-task sequences with self-correction, learning control policies without hand-coded routines. Boston Dynamics has deployed over 500 robots with revenue exceeding $130 million.
Showed that neural behavior models could enable humanoid robots to learn and adapt autonomously, marking a shift from scripted robotics to learned control policies and moving practical robotics deployment beyond research labs.
Toyota Research InstituteDeepSeek R1 Open-Source Reasoning Model
DeepSeek released R1, an open-source reasoning model that demonstrates competitive performance with proprietary frontier models. The release includes both full weights and distilled versions, making advanced reasoning capabilities accessible to the open-source community.
Shifted the competitive landscape by proving open-source models could match closed proprietary systems on reasoning benchmarks, accelerating the pace of public AI development.
GitHub ReleaseClaude 3.5 Sonnet with Computer Use
Anthropic released Claude 3.5 Sonnet with native computer interaction capabilities, allowing the model to see, understand, and control a computer screen. This enables autonomous execution of multi-step digital workflows without relying on separate tool APIs.
Introduced practical agent capabilities where AI can autonomously navigate and control software systems, bridging the gap between language understanding and real-world digital task execution.
Anthropic BlogOpenAI o1 Reasoning Model Launch
OpenAI introduced o1, a model trained to spend more time thinking through problems before responding. It achieves state-of-the-art performance on mathematical, coding, and scientific reasoning tasks by using reinforcement learning to develop internal reasoning processes.
Demonstrated that scaling compute during inference—not just training—unlocks new reasoning capabilities, opening a new frontier for AI model development.
OpenAI AnnouncementMeta Llama 3.1 405B Open-Source Release
Meta released Llama 3.1 405B, a 405-billion parameter open-source model that rivals closed proprietary models on performance benchmarks. The full weights were made freely available for research and commercial use.
Demonstrated that large-scale, high-quality open models could be both competitive and responsibly deployed, energizing the open-source AI ecosystem and reducing proprietary model dependency.
Meta BlogGPT-4o Multimodal Model
OpenAI released GPT-4o, a model optimized to handle text, vision, and audio seamlessly in a unified way. The model shows significant performance improvements over GPT-4 and can process audio and images natively without intermediate conversions.
Advanced multimodal AI beyond text and image to include audio, expanding the types of problems AI systems can solve and improving efficiency through native input handling.
OpenAI AnnouncementAlphaFold 3 Predicts Protein-Ligand Complexes
DeepMind released AlphaFold 3, expanding beyond protein structure prediction to accurately predict protein-DNA, protein-RNA, and protein-ligand interactions. The model achieved 50% accuracy improvement over AlphaFold 2 and contributed to structural understanding underlying the 2024 Nobel Prize in Chemistry.
Extended AI's impact on structural biology beyond proteins to the broader proteome-scale prediction of molecular interactions, accelerating drug discovery and systems biology research.
Nature: AlphaFold 3Google Gemini 1.5 Pro with 1M Token Context
Google introduced Gemini 1.5 Pro, a model capable of processing a context window of up to 1 million tokens. This enables the model to work with entire books, lengthy video transcripts, and massive code repositories in a single prompt.
Pushed the boundary of AI context understanding from hundreds of thousands to millions of tokens, enabling entirely new classes of long-form understanding and analysis tasks.
Google BlogMixtral 8x7B Mixture-of-Experts Model
Mistral AI released Mixtral 8x7B, a sparse mixture-of-experts model that achieves performance comparable to much larger models while maintaining efficiency. The model uses 8 expert networks, activating only 2 per token for computational efficiency.
Proved that mixture-of-experts architecture could deliver frontier performance at reduced computational cost, influencing subsequent model designs across the industry.
Mistral AI BlogGraphCast Achieves Superior Weather Forecasting
DeepMind released GraphCast, a graph neural network model that predicts weather globally at 0.25-degree resolution in under 1 minute. The model outperformed the European Centre for Medium-Range Weather Forecasts (ECMWF) on 90% of evaluated meteorological variables, producing predictions that took traditional systems 10 minutes to calculate.
Demonstrated that deep learning could outperform traditional physics-based weather models on practical operational forecasts, opening new possibilities for rapid, accurate climate prediction.
DeepMind BlogGPT-4 Turbo with 128K Context
OpenAI released GPT-4 Turbo with a 128,000 token context window, 4x the original GPT-4 context. The model also features reduced hallucination rates and lower API costs compared to previous versions.
Made frontier model capabilities more accessible and practical, enabling developers to work with longer documents and more complex tasks within a single API call.
OpenAI BlogTechnology Innovation Institute Releases Falcon 180B
The Technology Innovation Institute (TII) released Falcon 180B, a 180-billion parameter open-source language model trained on 3.5 trillion tokens. At release, it was the largest openly available language model, surpassing Llama 2 on multiple benchmarks including MMLU, LAMBADA, and HellaSwag.
Advanced the frontier of open-source models and demonstrated that large-scale open training could match or exceed contemporary proprietary systems in capability.
TIIMeta Llama 2 Open-Source Release
Meta released Llama 2, a family of open-source language models ranging from 7B to 70B parameters. Made freely available for research and commercial use, with models trained on 2 trillion tokens of public data.
Democratized access to large-scale language models and sparked rapid innovation in the open-source community, reducing reliance on closed proprietary models.
Meta AI BlogClaude 2 Language Model Release
Anthropic released Claude 2, a significantly improved version with longer context (100K tokens), better performance on complex reasoning tasks, and improved safety properties. The model set new benchmarks for instruction following and factuality.
Established Anthropic as a major player in frontier AI development and demonstrated that safety-focused training could coexist with competitive performance.
Anthropic BlogGPT-4 Launch
OpenAI released GPT-4, a multimodal model accepting both text and image inputs. It demonstrated significant improvements in reasoning, safety, and reliability compared to GPT-3.5, with performance surpassing human experts on many professional benchmarks.
Established multimodal AI as a major capability and set a new standard for reasoning quality, influencing the entire industry's approach to model development and evaluation.
OpenAI ResearchChatGPT Public Launch
OpenAI released ChatGPT to the public, a conversational interface powered by GPT-3.5. It reached 1 million users in 5 days and 100 million in 2 months, becoming the fastest-growing application in history.
Catalyzed mainstream awareness of AI capabilities and sparked global conversation about AI's potential and risks, transforming AI from a technical domain into a cultural phenomenon.
OpenAI BlogOpenAI Open-Sources Whisper Speech Recognition
OpenAI released Whisper, an open-source automatic speech recognition (ASR) model trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The model achieves 50% fewer errors than current specialized models and supports transcription in 96 languages.
Democratized high-quality speech recognition across languages and made multilingual ASR accessible to developers worldwide, reducing the barrier to integrating speech capabilities into applications.
OpenAI BlogStable Diffusion Public Release
Stability AI released Stable Diffusion, an open-source text-to-image generation model. Available under an open license, it could run on consumer hardware and sparked a wave of creative applications and fine-tuned variants.
Democratized image generation technology by making it open-source and computationally accessible, enabling millions of developers to build creative applications.
Hugging FaceBigScience Releases BLOOM 176B Multilingual Model
The BigScience collaborative initiative released BLOOM, a 176-billion parameter open-source language model trained across 46 natural languages and 13 programming languages. At the time of release, it was the largest openly available language model in existence.
Demonstrated that large-scale open multilingual training was feasible at frontier scale and made diverse language capabilities available to the global research community without proprietary restrictions.
Hugging FaceDALL-E 2 Image Generation Model
OpenAI released DALL-E 2, a significantly improved text-to-image model with better understanding of natural language prompts and higher image quality. The model demonstrated zero-shot generalization to novel concepts and creative variations.
Advanced generative models beyond language into vision, demonstrating that transformer-based architectures could scale across modalities.
OpenAI ResearchAlphaFold 2 Solves Protein Folding
DeepMind's AlphaFold 2 solved the protein folding problem, predicting 3D protein structures to near-experimental accuracy at the CASP14 competition. The breakthrough came from combining attention mechanisms with evolutionary biology insights. The achievement later contributed to the 2024 Nobel Prize in Chemistry awarded to David Baker, Demis Hassabis, and John Jumper.
Demonstrated AI's transformative potential for fundamental science, revolutionizing structural biology, drug discovery, and biological research. The system enabled the prediction of the entire human proteome and countless others.
Nature: AlphaFold 2 paperGPT-3 Language Model Breakthrough
OpenAI published GPT-3, a 175-billion parameter language model demonstrating few-shot learning across diverse tasks without task-specific fine-tuning. The model showed emergent abilities like chain-of-thought reasoning and simple code generation.
Proved that scaling transformer models to billions of parameters unlocked emergent few-shot capabilities, establishing the scaling paradigm that dominates modern AI development.
arXiv Paper"Attention Is All You Need" Transformer Paper
Google researchers published "Attention Is All You Need," introducing the Transformer architecture built entirely on attention mechanisms. This paper became one of the most cited in machine learning, fundamentally changing how neural networks are designed.
Established the architectural foundation for all modern large language models, eliminating the need for recurrence and enabling efficient parallel training on massive datasets.
arXiv PaperAlphaGo Defeats Lee Sedol
DeepMind's AlphaGo defeated world champion Lee Sedol in a 5-game match of Go, winning 4-1. Using deep neural networks combined with tree search, AlphaGo exhibited intuitive play and strategic understanding previously thought impossible for machines.
Demonstrated that deep learning combined with search could master complex domains with astronomically large decision spaces, proving AI could compete with and exceed human expertise in abstract reasoning.
DeepMind Case StudyGenerative Adversarial Networks (GANs) Introduced
Ian Goodfellow and collaborators introduced Generative Adversarial Networks, a framework where two neural networks compete—one generating data and one discriminating real from fake. This sparked a revolution in generative modeling and unsupervised learning.
Created a new paradigm for generative models that powered subsequent breakthroughs in image synthesis, style transfer, and creative AI applications.
arXiv PaperAlexNet Wins ImageNet Competition
A deep convolutional neural network called AlexNet won the ImageNet Large Scale Visual Recognition Challenge with a top-5 error rate of 15.4%, far exceeding traditional computer vision approaches at 26.2%. The win sparked the deep learning revolution in vision.
Demonstrated that deep learning could dramatically outperform hand-crafted features, launching the era of deep learning and transforming computer vision, and subsequently all of AI.
NIPS PaperNOAA deploys AI weather models operationally for the first time
The US National Oceanic and Atmospheric Administration deployed three AI-driven global weather models into operational service: AIGFS, AIGEFS, and a hybrid system combining AI with traditional physics-based ensembles. Built on DeepMind's GraphCast foundation and fine-tuned on NOAA data, the models use up to 99.7% fewer computing resources while extending forecast skill by 18–24 hours beyond traditional systems.
Marked the first time AI weather models replaced components of an operational national forecasting system, validating years of research into machine-learning-based atmospheric prediction.
NOAA AnnouncementAlphaEvolve discovers new mathematical structures and improves on Strassen's algorithm
DeepMind released AlphaEvolve, a Gemini-powered evolutionary coding agent that discovered new algorithms for matrix multiplication — improving upon Strassen's 1969 result — and established a new lower bound for the kissing number problem in 11 dimensions. Tested across 67 open problems in mathematics, it rediscovered best-known solutions in most cases and improved several.
Showed that AI can make original contributions to pure mathematics at a level validated by Fields Medal-winning mathematicians, including Terence Tao who collaborated on the work.
DeepMind BlogFirst fully autonomous AI research system attempts end-to-end science
Sakana AI launched The AI Scientist, the first comprehensive system designed to automate the entire research lifecycle — from idea generation and coding to running experiments and writing complete scientific manuscripts. The system produced research papers for approximately $15 each, though independent evaluation revealed coding errors in 42% of experiments and limited novelty detection.
Established a new benchmark for what autonomous AI research systems can attempt, while also clearly illustrating the current gap between automated research quantity and human-level scientific quality.
Sakana AIGenCast AI outperforms world's best weather ensemble on 97% of targets
DeepMind's GenCast, a diffusion-based probabilistic weather model, was shown to outperform the European Centre for Medium-Range Weather Forecasts (ECMWF) full 51-member ensemble system on 97.2% of 1,320 evaluation targets across 1–15 day forecast windows. The model generates forecasts in minutes rather than hours.
Demonstrated that AI-based probabilistic forecasting can surpass the gold standard of operational meteorology, setting the stage for the operational deployments that followed in 2025.
DeepMind BlogGNoME discovers 2.2 million new crystal structures
DeepMind released GNoME (Graph Networks for Materials Exploration), a deep learning tool that predicted the stability of 2.2 million new crystal structures — equivalent to roughly 800 years of traditional materials science knowledge. Of these, 380,000 were identified as highly stable and promising for experimental synthesis. External labs independently created 736 of these new structures.
Transformed materials discovery from a slow, trial-and-error process into an AI-guided search, with 52,000 new lithium-ion conductors among the findings — directly relevant to next-generation batteries and energy storage.
NatureGraphCast outperforms traditional weather forecasting on 90% of variables
DeepMind's GraphCast, a graph neural network weather model, outperformed the ECMWF's operational forecasting system on 90% of evaluated meteorological variables. The model produces global 0.25-degree resolution forecasts in under one minute — a process that takes traditional physics-based models significantly longer.
Provided the first large-scale evidence that deep learning could match and exceed traditional physics-based atmospheric modelling, triggering operational adoption by national weather services worldwide.
DeepMind Blog