Baidu claims new open-source AI model beats GPT-5 and Gemini.

2 hours ago7 min read

In a development that could significantly alter the competitive dynamics of the global AI landscape, Baidu has unveiled ERNIE-4. 5-VL-28B-A3B-Thinking, an open-source multimodal model that makes the audacious claim of outperforming industry titans like OpenAI's GPT-5 and Google's Gemini 2.5 Pro on specific vision-language benchmarks. This isn't merely an incremental update; it represents a strategic masterstroke in architectural efficiency.The model's core innovation lies in its sophisticated Mixture-of-Experts (MoE) design, which maintains a total of 28 billion parameters but dynamically routes each input through only 3 billion active parameters. This selective activation is akin to consulting a team of specialized experts for a problem rather than mobilizing an entire army, resulting in a dramatic reduction in computational hunger.Baidu's documentation asserts this allows the model to run on a single 80GB GPU—hardware that is practically commoditized in enterprise data centers—thereby lowering the barrier to entry for powerful multimodal AI in a way that models requiring multi-GPU, six-figure deployments simply cannot match. The technical underpinnings are fascinating.The 'Thinking with Images' feature is a genuine departure from conventional vision-language models, which typically process images at a fixed, often suboptimal resolution. By enabling the AI to dynamically zoom in and out of an image to examine details, it mimics a fundamental human problem-solving behavior.This capability, especially when integrated with external tools like image search, suggests profound applications in industrial quality control, where detecting a microscopic crack, or complex document analysis, where understanding a tiny footnote in a schematic is critical. Furthermore, its release under the permissive Apache 2.0 license is a deliberate and savvy move, creating a stark contrast with the more guarded, often commercially restrictive approaches of its Western counterparts and virtually guaranteeing rapid adoption and experimentation within the global developer community. However, in the rigorous world of AI research, claimed benchmark superiority must be met with healthy skepticism until independently validated.While Baidu's technical report details an 'extensive mid-training phase' on a 'diverse corpus of premium visual-language reasoning data,' the AI community will now subject these claims to intense scrutiny. The real test will be its performance on novel, real-world enterprise tasks beyond the curated benchmarks.Does this signal a broader trend where model efficiency and clever architecture begin to trump sheer scale and parameter count? If Baidu's claims hold, it could force a recalibration of the entire field, proving that the path to advanced AI may not be exclusively paved with exponentially larger models but with smarter, more resource-conscious designs. This release is a clear declaration that Chinese tech giants are not just competing domestically but are poised to challenge the very foundations of the global AI infrastructure market with open-source, commercially viable alternatives.

#Baidu

#ERNIE

#multimodal AI

#open-source

#computer vision

#enterprise AI

#featured

Stay Informed. Act Smarter.

Get weekly highlights, major headlines, and expert insights — then put your knowledge to work in our live prediction markets.

Comments

Loading comments...