## LLMs
Model releases and benchmarks dominated the week. ZhipuAI’s GLM-4.7 arrived with standout coding performance (near the top on SWE-bench), strong math and tool use, and broad availability, sparking industry interest and community AMAs. Google’s Gemini 3 line expanded: Flash launched with major speed gains and new long-context techniques pushed 90% accuracy at a 1M-token window, while Pro began surfacing inside Google products. Baidu’s ERNIE-5.0-Preview climbed to the top of Chinese leaderboards, and MiniMax introduced M2.1 targeting agentic coding in production. Anthropic’s Claude Opus 4.5 is testing faster than Sonnet 4.5 in real-world workloads. On the research frontier, LLaDA2.0 scaled diffusion-style language models to 100B parameters. New evaluation suites also landed: Medmarks became the largest medical LLM benchmark, and MoReBench exposed persistent moral reasoning gaps across models.
## News / Update
Google introduced the Interactions API with Gemini Deep Research integration to support stateful, background AI workflows, and announced a Gemini 3 hackathon in Singapore. YouTube unveiled a Gemini 3-powered Playables Builder for creators to generate mini-games. Uber and Baidu plan a self-driving pilot in London starting in 2026. Groq launched a developer-meets–F1 series, and LangChain hosted an SF breakfast on agent systems. A major GAO report highlighted long-running U.S. efforts to keep China multiple generations behind in semiconductors, while discussion grew around DeepSeek potentially becoming China’s next national tech champion. A large-scale survey found 100+ AI agent frameworks amassing 400k+ GitHub stars, underscoring rapid ecosystem growth but widespread developer confusion. Teams using OpenHands agents reported 8B tokens processed in two weeks, signaling accelerating adoption of agentic coding.
## New Tools
Several launches expanded the builder toolkit. Anthropic and MATS released Bloom, an open-source framework for generating robust, customizable behavioral evals. Paperedge opened early access to a free, collaborative AI research manager with chat over libraries. Qwen-Image-Layered open-sourced Photoshop-style image decomposition and promptable RGBA layers for powerful, controllable editing. Pollen Robotics’ Reachy Mini made humanoid experimentation accessible via a Python SDK. Google’s new Interactions API gives developers server-side state, background tasks, and access to Gemini Deep Research to orchestrate more adaptive applications.
## Features
Major platforms shipped meaningful upgrades. OpenAI hardened ChatGPT Atlas and its browser agent with RL-driven red teaming to detect and mitigate prompt-injection attacks; it also rolled out “Your Year with ChatGPT” to select countries. Google integrated Gemini 3 Pro into Search for dynamic layouts and interactive explanations, and added a Data Table feature to NotebookLM for faster analysis. Video creation advanced with Kling 2.6’s one-take motion control and Runway Gen-4.5’s improved realism and physics. vLLM delivered a flurry of improvements—Omni for unified multimodal serving, a 0.13.0 engine with selective kernel compilation and advanced attention, Blackwell Ultra SM103 support, and faster times-to-first-token via DeepSeek kernels. Additional updates included mlx-swift-audio’s new TTS/ASR models, OpenCode’s remote MCP sessions with sandboxing, Base44’s 10× DB speedups, and Apple’s Sharp model elevating Vision Pro scene reconstruction fidelity. Context Arena added bias analysis tools for more transparent model evaluation.
## Tutorials & Guides
Hands-on learning resources proliferated. NVIDIA and Unsloth published a practical fine-tuning guide (LoRA, FFT, RL, hardware tips). PatronusAI demystified RL environments with examples, while a widely shared post broke down prompt techniques for context engineering. An 87-page SLM survey mapped the trade-offs in small language models, and a weekly research digest highlighted advances in code, VLM synergy, and transformer modeling. Additional learning drops covered few-shot learning with MAML, a cinematic Veo 3.1 prompt recipe, a clear primer on distillation (with historical context from LUPI’s “teacher” framework), a vLLM deployment recipe for MiMo-V2-Flash, and practical lessons on avoiding “AI failures” caused by missing context or unclear instructions. Talks and threads explored how coding agents reshape workflows and why software reliability still hinges on human oversight and complexity management.
## Showcases & Demos
Demos spanned code, robotics, and media. Locally running GLM-4.7 on Apple Silicon generated a complete Space Invaders game; GPT-5.2-Codex iteratively built a 3D dog-walking sim from reference images; and a developer taught a robot to dance jazz in three days. Robotics progressed with π completing all Robot Olympics tasks and flood mapping accelerated by fine-tuned Segment Anything Models. Creative pipelines impressed: generative refocusing changed photo depth after capture, procedural methods brought films natively into VR, and new systems animated any character in any world. 3D-RE-GEN showcased indoor scene reconstruction, EpsteinVR’s JVR offered controversial immersive tours, and community creators produced polished AI anime sequences as tool quality rises.
## Discussions & Ideas
Debates focused on where intelligence comes from and how to deploy it responsibly. Demis Hassabis and Yann LeCun clashed on the nature of general intelligence, while multiple studies examined how reasoning emerges across pre-training, mid-training, and RL, cautioning that skill gains hinge on specific conditions. Researchers warned that retraining on low-quality social content can cause lasting reasoning decay. The browser emerged as a likely “body” for agents, reframing how AI takes real actions online. Verification, not training time, was framed as the new bottleneck; Brandolini’s Law highlighted the uphill battle against AI-generated “slop,” and public sentiment still skews toward hype and mistrust. Agentic systems research pointed to adapting both reasoning and toolsets, moving beyond rigid multi-agent pipelines to flexible workflows. Hiring shifted toward “super-unicorns” who blend engineering, design, and agent integration. Additional threads argued that image+text+knowledge integration is the next model frontier, that scientific and physical intelligence demands richer data (e.g., egocentric human capture), and that even as AI transforms coding, software still fails for familiar human reasons—making precise prompts, visible state, and robust verification critical.