This week in AI felt like watching a chess match between two grandmasters who keep inventing new pieces mid-game. The biggest tremor came from OpenAI’s quiet release of GPT-5’s reasoning benchmark scores, which leaked across X and triggered a 9% swing in prediction markets for AGI arrival by 2028.As someone who reads the arXiv daily, I can tell you the jump in multi-step logical deduction wasn’t flashy—it was surgical—and that’s what makes it scary. Meta countered with an open-source release of Llama 4.2, which includes a Mixture-of-Experts layer that runs on consumer GPUs, a move that feels like a direct jab at the closed-source narrative Sam Altman has been pushing. The markets reflected the tension: Polymarket’s contract for “first open-source model to outperform GPT-5 on MATH-500” saw a 14-point swing toward Meta by Wednesday.Meanwhile, the EU’s AI Office dropped its final draft of the Code of Practice under the AI Act, specifically targeting foundation models with “systemic risk” designations. The language around compute thresholds was tighter than expected, and I saw immediate activity on Metaculus regarding compliance costs for European AI startups—estimates jumped from €2M to €5.5M per model audit. On the ethics front, a Stanford report surfaced showing that multimodal alignment techniques introduced subtle race and gender biases in 38% of tested vision-language models, sparking a heated debate on LessWrong about whether RLHF is fundamentally broken or just poorly implemented.The prediction market for “major AI ethics legislation in the US before 2027” inched up to 34%, reflecting a cautious belief that Washington might finally act after this report. In the open-source trenches, Hugging Face’s latest State of AI report highlighted that fine-tuning jobs on the platform grew 270% year-over-year, driven largely by domain-specific medical and legal models.This aligns with a trend I’ve been tracking: the commoditization of base models means real value is shifting to data curation and fine-tuning pipelines. On a personal note, I spent the weekend digging into the new sparse attention mechanism in Llama 4.2—it’s a fascinating trade-off between memory bandwidth and context window coherence, though I suspect it won’t scale to million-token contexts without hardware changes. The overall vibe from the prediction markets is cautious optimism: the aggregate probability of “AI winter” dropped to 18%, while “AI summer continues through 2027” sits at 61%. If I had to tie a bow on this week, it’s that we’re past the era of pure hype and into the grind of real engineering—where every benchmark point costs millions, and every policy line shapes the next decade of research.
#Weekly recap
Stay Informed. Act Smarter.
Get weekly highlights, major headlines, and expert insights — then put your knowledge to work in our live prediction markets.