Researchers Break Every AI Defense, Urge Vendor Scrutiny
A sobering paper from researchers at OpenAI, Anthropic, and Google DeepMind has just shattered the foundational assumptions of the commercial AI security market. Published in October 2025, their work, 'The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections,' systematically dismantled twelve published defense mechanisms, achieving bypass rates above 90% for most.The core finding is stark: the near-zero attack success rates touted by vendors are a dangerous illusion, tested only against static, non-adaptive threats. In reality, when faced with an adversary that iterates and learns—like the paper’s authors did with a $20,000 prize pool to incentivize attacks—prompting, training, and filtering-based defenses all collapsed, some with 100% failure rates.This exposes a critical architectural flaw; many defenses, particularly stateless filters akin to traditional web application firewalls, are blind to the semantic, multi-turn nature of modern jailbreaks like Crescendo or automated gradient-based attacks. The implications arrive at a perilous moment.With Gartner predicting 40% of enterprise apps will integrate AI agents by 2026, deployment velocity far outpaces security maturation. Threat actors are already exploiting this gap, shifting to malware-free, AI-augmented operations that compress campaign timelines from months to hours, as documented in recent CrowdStrike and IBM breach reports.The research underscores that security through obscurity is futile at the inference layer, as defense patterns eventually leak into training data. For CISOs, the mandate is clear: vendor scrutiny must now center on adaptive testing, stateful context tracking, and bi-directional controls. The vertical adoption curve meeting a flat security curve is where the next wave of breaches will happen.
#AI security
#jailbreaks
#prompt injection
#adaptive attacks
#vendor evaluation
#enterprise AI
#featured