Keling O1: China’s AI Video Model Challenges Global Leaders with Multi-Modal Capabilities
The competition in artificial intelligence-powered video generation is heating up, with Kuaishou’s Keling AI launching its “world’s frist unified multi-modal video large model,” Keling O1, on December 1st. This arrival follows recent advancements from overseas competitors like Runway, with its Gen-4.5 model surpassing Google’s Veo3, and signals a new phase in the rapidly evolving landscape of AI-driven content creation.
The emergence of powerful AI graph generation models, such as Google’s Nano Banana Pro, has fueled a technological surge, culminating in a fierce “big melee” for dominance in video generation. Keling O1 aims to simplify the complex process of video creation, traditionally requiring a combination of image generation, video generation, and editing software – a workflow often hampered by inconsistencies and the need for repeated refinement, or what industry users call “card draws.”
Keling O1 distinguishes itself by integrating multiple tasks – reference video integration, video editing, and content transformation – into a single, unified model. According to testing by Daily Economic News reporters, the model offers “full-process semantic control,” allowing users to generate or modify videos with simple, one-sentence commands. This means users can input text, images, or even subject references to guide the creation of detailed video content.
Specifically, the model allows for the upload of up to seven reference images or subjects, enabling users to combine elements like characters, props, and scenes to bring static images to life. Post-generation, users can further refine the video by adding, deleting, or modifying elements, adjusting styles, color
definition images are recommended for optimal results, the model can independently lock and maintain the characteristics of each character or prop. This capability is particularly useful for creating promotional videos by quickly combining product images and scene elements.
Furthermore, the model supports the combination of different skills, allowing users to simultaneously apply multiple instructions, such as combining reference images with style modifications. These advancements broaden Keling’s potential applications, positioning it as a valuable “productivity tool” for film, advertising, and video post-production.
Despite its advancements, some users have expressed concerns about the cost of using Keling O1.Pricing is based on usage, with rates of 8 “inspiration values” per second for video-free input and 12 for video-based input. A one-month Keling gold membership, costing 66 yuan, provides 660 inspiration points, enough for approximately a 5-second high-quality video without video input.
Behind the technological advancements, Keling is projecting notable financial growth. The company announced it anticipates exceeding 1 billion yuan in commercial revenue in 2025. Currently focused on B-side (business) customers, Kuaishou Technology CEO Cheng Yixiao indicated a future shift towards accelerating C-side (consumer) commercialization by “further productizing Keling’s technical capabilities and combining them with social interaction.” This strategy is partly influenced by the release of OpenAI’s Sora 2,which deeply integrates video generation with social platforms.
Pan Helin, a member of the Information and Dialog Economy Expert Committee of the Ministry of Industry and Information Technology, believes content creation platforms will ultimately benefit most from advancements in video generation. “These platforms have the most relevant user groups and the largest user audience,” he stated. “The upgrade in content creation brought about by generative AI will further affect creators and viewers.” He added that creators on platforms like Kuaishou will likely leverage AI tools like Keling to enhance content quality and attract users.
Keling recently launched the video generation 2.6 model, adding simultaneous sound and picture output, and upgraded its Vincent and Tusheng sound and picture functions, now supporting both Chinese and English voice generation with video lengths up to 10 seconds. The company also unveiled the next generation of its Keling Digital Human 2.0, capable of generating videos up to 5 minutes long with improved expression and precise control of hand and mouth shapes.
As the video generation landscape continues to evolve, the question remains: who will emerge as the dominant force? The competition is fierce, but Keling O1’s multi-modal capabilities and rapid commercialization speed position it as a significant contender in the global arena.
