AIgenerative aiVideo Generation
OpenCV founders launch AI video startup CraftStory
The creators of OpenCV, the foundational computer vision library that has become the de facto standard for developers worldwide with over 84,000 GitHub stars, have re-emerged on the AI scene with a startup that challenges the very architecture of contemporary video generation. CraftStory, launched by Victor Erukhimov—a key figure in maintaining and advancing OpenCV after Intel reduced its support, leading to the founding of Itseez and its subsequent acquisition—has unveiled Model 2.0, a system capable of producing coherent, human-centric videos up to five minutes long. This represents a monumental leap beyond the current benchmarks set by well-funded rivals; OpenAI's much-hyped Sora 2, for instance, is constrained to 25-second clips, while most other models, including Google's Veo, typically generate snippets of ten seconds or less.The startup's modest $2 million funding, primarily from Andrew Filev, the entrepreneur behind the $2. 25 billion Citrix acquisition of Wrike, stands in stark contrast to the billions flowing into competing labs, yet their technical approach is arguably more sophisticated.CraftStory's breakthrough hinges on a parallelized diffusion architecture, a fundamental departure from the sequential methods that dominate the field. Where traditional models process video as an expanding three-dimensional volume—requiring exponentially more data, compute, and parameters for longer durations—CraftStory runs multiple smaller diffusion algorithms simultaneously across the entire timeline, with bidirectional constraints that allow the latter part of a video to influence the beginning, thereby preventing the accumulation of visual artifacts that plague autoregressive generation.This architectural insight is compounded by a data strategy that prioritizes quality over quantity; instead of relying solely on internet-scraped videos, the company hired studios to film actors with high-frame-rate cameras, capturing crisp detail in fast-moving elements like fingers and avoiding the motion blur inherent in standard footage. This focus on high-quality, proprietary data for training challenges the prevailing 'bigger is better' paradigm in large language and multimodal models, suggesting that targeted, expertly curated datasets can yield superior results without exorbitant computational budgets.Erukhimov's deep background in computer vision, rather than the transformer architectures that have recently dominated AI, provides a distinct advantage in understanding temporal coherence, facial dynamics, and human motion—problems fundamentally different from text generation. The current Model 2.0 operates as a video-to-video system, animating a user-uploaded still image by replicating movements from a 'driving video,' with professional actors receiving revenue shares when their motion data is used, and includes advanced lip-sync and gesture alignment to match speech rhythm and emotional tone. CraftStory's enterprise-focused strategy targets a glaring market gap: the need for long-form, consistent videos for corporate training, software tutorials, and product demonstrations, where a polished ten-second clip is functionally inadequate.Filev envisions a fragmented market ecosystem where large tech companies serve as API providers for general-purpose models, while specialized players like CraftStory build the 'production studio' on top, delivering tangible value in specific verticals. The roadmap includes developing a direct text-to-video model and supporting moving-camera scenarios, pushing further into a niche that leverages their core competency in understanding human movement and narrative consistency over raw, undirected generative power.
#CraftStory
#AI video generation
#long-form video
#computer vision
#parallel diffusion
#enterprise AI
#OpenAI Sora
#Google Veo
#featured