At the Mobile World Congress (MWC) 2024 in Shanghai, China, the spotlight is on a groundbreaking 3D display being showcased by the Nubia Global booth. While the event features numerous innovations, a significant highlight comes from ByteDance, the company behind TikTok, which has officially launched its AI voice assistant for smartphones.
This novel assistant utilizes ByteDance’s Doubao large language model and functions with remarkable autonomy, mimicking the role of a personal assistant rather than just serving as a standard application. With capabilities reminiscent of the AI from the film “Her,” this assistant can perform a multitude of tasks, including opening tabs, booking tickets, and searching for information across the device.
The AI voice assistant will debut with the M153 Nubia smartphone, although availability will be limited. Plans are underway for ByteDance to license the technology to other Chinese smartphone manufacturers, following reports from Eastmoney, a Chinese financial news platform.
Operating at the system level, the Doubao model enables the assistant to interact directly with the screen and applications. This allows it to perform various tasks seamlessly—organizing files, filling out forms, and even suggesting dining options that fit within the user’s budget and preferences. Notably, the assistant has a memory feature that can store important information such as meeting notes or personal preferences, transforming them into helpful reminders or to-do lists. Users can ask specific questions, like inquiring about previous train seats or favorite cafés near their workplace.
The Doubao assistant represents a significant advancement over older voice technologies, which often encountered lag and emotional limitations in their responses. Instead of the traditional method of recording speech, converting it to text, and sending it to a server, ByteDance’s technology employs a speech-to-speech system, facilitating quick interaction that enables users to interject, much like natural human conversation.
According to the Guangdong Yangcheng Evening News, the latest iterations of Doubao’s voice capabilities are reported to be nearly indistinguishable from human speech in terms of realism and emotional expression. Although ByteDance acknowledges that the technology remains in beta and further testing is needed, the developments hint at an emerging reality where AI assistants can engage with a depth of understanding and emotional nuance.
This unveiling is indicative of a broader ambition across global tech hubs such as San Francisco, Beijing, Seoul, and Cupertino, where engineers are striving to create AI that can listen, comprehend, and act naturally, without the user needing to navigate the underlying software. Similar advancements have been made in the U.S. with tools like OpenAI’s GPT-4o and Google’s Gemini Live mode, both of which support real-time conversations. However, what sets Doubao apart is its design specifically for the Chinese market, embedded in a locally produced smartphone, marking a significant step in the integration of AI with consumer technology in China.
