Google has unveiled its latest video-generating AI model, Veo 3, at the 2025 Google I/O developer conference. This innovative model is capable of not only creating high-quality video footage but also generating synchronized audio, which includes sound effects, background noises, and dialogue. This marks a significant advancement from its predecessor, Veo 2, in terms of overall performance and output quality.
Veo 3 is available for subscribers of Google’s $249.99-per-month AI Ultra plan and can be accessed through the Gemini chatbot app. The model accepts prompts in the form of text or images, allowing users to describe characters, settings, and even dictate how dialogue should sound. Demis Hassabis, CEO of Google DeepMind, highlighted this development as a move away from “the silent era of video generation,” showcasing Veo 3’s ability to create a more immersive viewer experience.
The market for video generation tools is becoming increasingly crowded, with many startups and tech giants, including OpenAI and Alibaba, racing to release similar technologies. While numerous providers exist, Veo 3 aims to distinguish itself with its advanced audio capabilities. It can automatically sync generated sounds to video clips by understanding the raw pixel data, a unique feature that may set it apart from competitors.
Veo 3’s audio capabilities are likely rooted in previous work done by DeepMind in “video-to-audio” AI, specifically their efforts to train models to create soundtracks by analyzing arrays of sounds and dialogue alongside video content. Although the exact content used for training Veo 3 remains undisclosed, it is plausible that YouTube, which is owned by Google, plays a crucial role in its training data.
This development holds promise for various industries, from filmmaking to content creation, providing tools that enhance creativity and productivity. As AI technology evolves, the integration of audio with video generation could lead to new opportunities for enhanced storytelling and improved user engagement, ushering in a new era of multimedia creativity.