AI News May 29 – June 5, 2026: Microsoft Build 2026, RTX Spark, Nemotron 3 Ultra, MiniMax M3

June 2, 2026 Product

Microsoft launches seven in-house MAI models at Build 2026, led by its first flagship reasoning model

Microsoft used its Build 2026 conference to release seven in-house MAI models in Microsoft Foundry, headed by MAI-Thinking-1, its first flagship reasoning model — a sparse mixture-of-experts system with a 256K-token context that the company says matches Claude Opus 4.6 on the SWE-Bench Pro coding benchmark. The line-up also includes MAI-Image-2.5 with a faster Flash variant, MAI-Voice-2 with voice cloning across more than 15 languages, and MAI-Transcribe-1.5, claimed as state-of-the-art across 43 languages.

Why it matters: The breadth marks Microsoft's most serious push yet toward building its own frontier stack across text, image, voice and speech rather than relying on partner models.

Microsoft introduces Scout, its first always-on "Autopilot" agent inside Microsoft 365

Microsoft unveiled Scout, the first of a new product class it calls "Autopilot" — an always-on autonomous agent that works in the background across Teams, Outlook, OneDrive and SharePoint, joining group chats and handling email threads as a participant rather than a sidebar panel. Each agent runs under its own governed Entra identity, with sensitive actions gated behind human sign-off and Microsoft Purview data-protection policies enforced before anything is sent. Scout is available now for Copilot Frontier users, with general availability targeted for October.

Why it matters: Scout is the first OS-scale autonomous agent embedded directly inside mainstream productivity surfaces, moving agents from experiment toward an everyday default.

NVIDIA and Microsoft unveil RTX Spark — a unified superchip for local frontier AI on Windows

NVIDIA and Microsoft announced RTX Spark, a unified superchip pairing a 20-core Grace CPU with a Blackwell GPU and up to 128GB of unified memory, delivering around one petaflop of AI performance alongside high-end gaming. NVIDIA says it can run a 120-billion-parameter model with a one-million-token context entirely on-device, with no cloud round-trip and no data leaving the machine. RTX Spark laptops and compact desktops — including a Microsoft Surface Laptop Ultra — are due from Asus, Dell, HP, Lenovo and MSI in autumn 2026.

Why it matters: Practical local inference of frontier-scale models on a laptop is a major enabler for on-device agents and private, offline AI.

NVIDIA releases Nemotron 3 Ultra, a 550B open-weights model built for long-running agents

NVIDIA released Nemotron 3 Ultra, a 550-billion-parameter open-weights model (55B active) built on a hybrid Mamba-Transformer mixture-of-experts architecture with a one-million-token context, post-trained specifically for multi-step agentic tasks. NVIDIA published the weights, training data and recipes on Hugging Face, OpenRouter and NIM under a permissive licence. On the Artificial Analysis Intelligence Index it scores 48 — the highest of any US open-weight model.

Why it matters: It narrows the gap between open and closed frontier models for agent workloads, and ships with unusually open training materials.

MiniMax releases M3 — an open-weight coding model with a one-million-token context

MiniMax released M3, an open-weight coding model with a one-million-token context window and native multimodal input. The company reports 59% on SWE-Bench Pro — ahead of OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro, though still trailing Claude Opus 4.7 — at roughly a twentieth of the per-token compute of its previous generation. Several benchmark figures were produced on MiniMax's own infrastructure and await independent verification.

Why it matters: It pairs very long context with frontier-level coding performance at low cost, pushing open models further into serious software engineering.

OpenAI brings Codex computer use to Windows, with remote control from mobile

OpenAI shipped computer use for its Codex app on Windows, letting the agent see, click and type inside any Windows application to test, debug and refine code, with remote control from ChatGPT on iOS, Android or a Mac. The agent takes over the active desktop in the foreground while a task runs. The feature launched on macOS and Windows, excluding the UK, EEA and Switzerland at release.

Why it matters: Reliable desktop computer use on Windows is a significant step toward general-purpose agents that operate the everyday software people already use.

GitHub launches a Copilot desktop app built around agents and any model

GitHub introduced a dedicated Copilot desktop app for Windows, Mac and Linux, designed around AI agents rather than chat. Users pick a session mode — Interactive, Plan or Autopilot — and choose any model, with bring-your-own-key support for OpenAI, Anthropic, Microsoft Foundry and other providers. The app is in technical preview.

Why it matters: Letting developers swap in the best available model per task, free of provider lock-in, speeds the adoption of frontier models.

Microsoft Execution Containers turn Windows into a governed sandbox for AI agents

Microsoft announced Microsoft Execution Containers (MXC), a policy-driven sandbox built into Windows that lets developers and administrators declare exactly which files, directories and network resources an agent may access, enforced at runtime by the operating-system kernel. Launch partners include GitHub Copilot, NVIDIA, OpenAI, OpenClaw, Hermes and Manus. Process and session isolation reach Windows Insiders shortly after Build, with Defender, Entra, Intune and Purview controls in preview from July.

Why it matters: Turning Windows into a governed runtime for agents is foundational infrastructure for safe consumer and enterprise agent deployment.

Microsoft Project Solara reimagines devices that run agents instead of apps

Microsoft detailed Project Solara, an Android-based platform for devices that run AI agents in place of apps. Reference designs include a wearable badge with a built-in camera and one-press recording and transcription, and a desk hub that signs users in by facial recognition and, with a monitor attached, becomes a full cloud Windows machine. Microsoft will not ship the hardware itself; partners including AccuWeather, Best Buy, CVS Health, Levi's and Target are expected to pilot devices.

Why it matters: It tests whether dedicated, always-on agent hardware can move beyond the phone into cheaper, purpose-built form factors.

Mayo Clinic and Microsoft to build a frontier AI model for healthcare

Mayo Clinic and Microsoft announced a collaboration to develop a frontier AI model purpose-built for healthcare, combining Mayo's de-identified clinical data and longitudinal expertise with Microsoft's AI and cloud capabilities. The model will be owned by Mayo Clinic and made available to other institutions through Azure Foundry APIs, with the stated aim of supporting earlier diagnoses and more personalised treatment decisions.

Why it matters: It is one of the first serious attempts at a vertical, domain-specific frontier model in medicine, owned by the clinical institution rather than the technology provider.

Ideogram 4.0 ships with open weights, 2K resolution and layout control

Ideogram released version 4.0, a 9.3-billion-parameter diffusion transformer trained from scratch, with native 2K resolution, a JSON prompt format offering bounding-box layout and hex-colour control, and native background transparency. The weights are published on Hugging Face under a commercial licence. It topped the open-weight tier of the DesignArena leaderboard, placing behind only closed models from OpenAI and Google.

Why it matters: High-quality open image generation with strong text and layout control reduces dependence on closed image models.

Reve 2.0 reaches No. 2 on the image Arena with a layout-based approach

Reve launched 2.0, a 4K image model that replaces text prompts with an editable, code-like layout in which every element has a position, size and description. It ranked second on the public text-to-image Arena, behind only OpenAI's GPT Image 2, despite being trained on far fewer GPUs than its larger rivals.

Why it matters: Layout-based, agent-readable image generation is a distinct take on controllability, and shows small labs can still reach the top tier.

xAI previews Grok Imagine 1.5 with audio-native image-to-video

xAI released a preview of Grok Imagine 1.5, an image-to-video model that turns a still image or text prompt into 720p clips of six to fifteen seconds with synchronised native audio, including music, sound effects and lip-synced dialogue. xAI says it took the top spot on the Image-to-Video Arena at launch, ahead of Google Veo and Seedance 2.0.

Why it matters: It adds a strong, audio-native entrant to an increasingly crowded generative-video field.

AI News: May 29 – June 5, 2026