## News / Update
The week brought a flurry of industry moves and evaluation initiatives. Google DeepMind is opening a Gemini research hub in Singapore and funding continues to shape the ecosystem with Google’s $10M gift establishing the Geoff Hinton AI Chair at the University of Toronto. Enterprises are scaling agents in production: Brazil’s Itaú Unibanco rolled out the Devin agent across its SDLC, and major insurers report heavy use of automation to cut operational overhead. Perplexity attracted investment from Cristiano Ronaldo, Meta’s early GPU bet is paying dividends post-Metaverse pivot, and Stripe now powers payments inside Amazon’s Kiro AI IDE. On measurement, NIST and the US CAISI published frameworks pushing rigorous, construct-valid evaluations, while major labs launched the AI Evaluator Forum for independent testing; new benchmarks such as Global MMLU 2.0 and OlmOCR-Bench broaden multilingual and document-understanding assessment. NeurIPS highlights included cross-organization insights (Google, Anthropic, Qwen), Best Paper honors for Artificial Hivemind, and Kaimeng He’s Test of Time reflections, while ICLR 2026 disclosed a security breach and a coordinated X/Twitter crypto hack hit several accounts. Market signals show Gemini’s user base rising since Gemini 3 even as ChatGPT remains larger; some sites report ChatGPT referrals convert better than Google. Anthropic released a dataset of 1,250 work interviews on real AI usage, Liquid AI launched Liquid Labs for efficient/adaptive systems, and a new search engine for economic data debuted. Adjacent tech updates included record-detail commercial satellite imagery.
## New Tools
A stream of creative and developer tools landed. Runway Gen-4.5 and Kling Avatar 2.0 expand video and avatar generation with more expressive, longer clips and faster turnarounds, while Z-Image Turbo and Bytedance’s Seedream 4.5 raise the bar for image generation and editing. PosterCopilot targets professional graphic design with layout reasoning and editable controls; MagicPath now supports image-on-canvas workflows for visual coding; and an upgraded full-text search engine delivers up to 20x speedups. Teams can now automate bug logging from Slack to Linear via an AI agent and delegate Codex cloud tasks directly from Linear, while Base_44 enables “import any NPM package by prompt.” Microsoft introduced VibeVoice-Realtime-0.5B for real-time voice generation. New platforms surfaced for CI-style data pulls by chat and a specialized engine for official economic data. Seasonal touches included a more realistic, full-body AI “Santa.” Many of these tools emphasize reliability and workflow fit, with LangChain’s new retry middleware helping agents recover gracefully from flaky tools.
## LLMs
Model releases and scoreboards kept shifting. Google’s Gemini 3 Deep Think mode drives stronger reasoning by exploring multiple hypotheses in parallel, with Gemini 3 Pro also topping Yupp’s SVG generation board. Mistral 3 introduced three compact models plus the Large 3 MoE, and Mistral Large 3 now leads open-source coding leaderboards, with local support rolling out via Ollama. Anthropic’s Claude 4.5 Opus set the pace on AutoCodeBench V2 and solved CORE-Bench for scientific reproducibility tasks, signaling progress in agent-like competence. OpenAI’s GPT-5.1 Codex Max arrived in the API and “Code Arena,” positioning itself for high-agency coding workflows. DeepSeek v3.2 posted big latency and throughput gains alongside more constructive disagreement styles. Multimodal advances included Meituan’s OneThinker for unified visual reasoning, Meta & KAUST’s MoS (Mixture of States) to better fuse diffusion dynamics with text, and new image/video leaders like Nano Banana Pro and Seedream 4.5. Evaluation infrastructure also expanded with Global MMLU 2.0 for multilingual testing and OlmOCR-Bench for document understanding. New entrants such as APT (Agentic Pretrained Transformer) target reduced hallucinations, while Microsoft’s VibeVoice brings real-time voice modeling into the mix.
## Features
Several products shipped meaningful upgrades. Gemini 3’s Deep Think mode is now live for Ultra users, offering iterative, parallel reasoning for complex math, science, and coding—already manifesting in creative coding examples. Mistral Large 3 support is landing in Ollama for local experimentation. Cursor redesigned its model picker to simplify selection, ChatGPT updated its browsing to include Grokipedia as a source, and Stripe’s MCP enables payment flows directly within Amazon’s Kiro AI IDE. LangChain added automatic retries and middleware for more resilient agents, while engineering workflows benefited from agents that file and manage Linear issues from Slack and a Codex integration that executes tasks from within Linear. A high-performance full-text search engine update promises large speed gains, and MagicPath’s canvas now supports images for code-driven visual creation.
## Tutorials & Guides
Hands-on learning took center stage. Andrew Ng announced a practical course on building coding agents with tool use, and a joint course with CrewAI focuses on designing and deploying collaborative multi-agent systems. The OSS AI Summit will demo real agent workflows with LangChain and MCP, while a fireside chat with leaders behind coding agents offers UX and product lessons. Weaviate published nine new recipes for builders, and OpenAI released a prompting guide to get the most from GPT-5.1 Codex Max.
## Showcases & Demos
Long-horizon, reliable control and agency were on display. X-VLA folded cloth flawlessly for two hours in an uncut demo, with checkpoints available for fine-tuning; Microsoft’s Copilot in “Agent Mode” went head-to-head with Excel world champions; and live terminals streamed model tokens during candid discussions. Gemini 3 Deep Think produced complex creative coding from a single prompt, while creators demonstrated accelerated pipelines for full-length AI anime, highlighting how improved image/video models can compress production timelines from weeks to days. Upgraded seasonal avatars showcased more emotive, full-body performance for consumer experiences.
## Discussions & Ideas
Debate centered on what makes progress meaningful and measurable. Yejin Choi’s NeurIPS keynote critiqued sloppy synthetic data and highlighted how certain RL fine-tuning methods can degrade reasoning, underscoring the need for better training signals; her talk also spotlighted approaches like Quiet-STaR and collaborative thought processes. Researchers and standards bodies warned of leaderboard pitfalls and called for construct-valid evaluations, while workshops argued that memorization-driven privacy fears are often overstated and that optimizing solely for math/coding doesn’t translate to human-centric utility. Broader reflections covered AGI as the capacity to acquire arbitrary new skills, the resurgence of semantic code search via multi-vector and token-level embeddings, and using filesystems to tame context management. Discussions also probed whether open-licensed corpora can sustain competitive foundation models, “Nested Learning” as a more efficient training paradigm, and the practical limits of AI lie detection after months of negative results. Commentary ranged from Jeff Dean’s defense of publishing Transformers to industry analyses on AV safety reform and Meta’s GPU “windfall” reshaping its AI trajectory.
## Memes & Humor
No notable memes or humor surfaced in this cycle.