Researchers Break Every AI Defense; 7 Questions to Ask Vendors
A sobering research paper published in October 2025 by a coalition of scientists from OpenAI, Anthropic, and Google DeepMind has delivered a devastating verdict on the current state of AI security. Their work, titled 'The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections,' systematically dismantled twelve published defense mechanisms, achieving bypass rates exceeding 90% on most.This isn't a minor vulnerability; it's a foundational crack. The core issue is that these commercial defenses are tested against static, predictable attacks, while real-world adversaries adapt.The team employed a rigorous, adaptive methodology—including a $20,000 prize pool to incentivize breakthroughs—to stress-test prompting-based, training-based, and filtering-based defenses. All collapsed.Prompting defenses saw 95-99% attack success; training-based methods fared even worse at 96-100%. This exposes a critical market failure: enterprises are procuring AI security products validated against a threat model that doesn't exist.The implications are magnified by the vertical deployment curve of AI agents into enterprise applications, predicted by Gartner to jump from under 5% to 40% by end-2026, while security innovation lags. The research highlights specific architectural failures, such as the inability of stateless Web Application Firewalls (WAFs) to track conversational context across multiple turns, allowing attacks like 'Crescendo' to succeed by fragmenting malicious intent over innocent-looking dialogue.Similarly, automated gradient-based attacks and semantic obfuscation techniques easily bypass signature-based detection. As AI researcher and blogger Daniel Reed observes, this creates a dangerous asymmetry where defense mechanisms, once published, become part of the training data for the next generation of models, rendering 'security through obscurity' obsolete.The paper maps four active attacker profiles already exploiting these gaps: external adversaries operationalizing published research, malicious B2B clients, compromised API consumers, and negligent insiders engaging in 'shadow AI. ' For security leaders, the mandate is clear.Vendor claims of near-zero attack rates are meaningless without proof of resilience under adaptive, multi-turn pressure. Procurement conversations must now be grounded in seven critical questions, from how a solution detects encoded payloads and tracks conversational state to its mean time to update against novel, publicly documented attack patterns. The bottom line is an urgent call for a paradigm shift in AI security, moving from static, filter-based models to dynamic, stateful systems designed for an adversary that learns and evolves in real-time.
#AI security
#jailbreaks
#prompt injection
#adaptive attacks
#enterprise AI
#vendor due diligence
#featured