OpenAI has introduced real-time video capabilities for ChatGPT, following an anticipation that lasted nearly seven months. During a livestream on Thursday, the company unveiled its Advanced Voice Mode, which now includes visual understanding. Users with ChatGPT Plus, Team, or Pro subscriptions can utilize their smartphones to point at objects and receive immediate responses from ChatGPT.
This upgraded feature enhances functionality, allowing it to comprehend and describe what is visible on a device’s screen through screen sharing. This includes the ability to explain settings menus or provide assistance with math problems.
To engage the Advanced Voice Mode with vision, users can tap the voice icon next to the ChatGPT chat bar, followed by the video icon in the bottom left corner to initiate video interaction. Screen sharing is also enabled via a simple three-dot menu selection.
The rollout began Thursday and is expected to be completed within a week; however, it will not be accessible to all users immediately. Specifically, subscribers of ChatGPT Enterprise and Edu will have to wait until January, and there is currently no timeline for availability in the EU, Switzerland, Iceland, Norway, or Liechtenstein.
In a recent demonstration on CNN’s “60 Minutes,” OpenAI President Greg Brockman showcased how Advanced Voice Mode with vision could engage in a quiz about anatomy, accurately recognizing drawings made by Anderson Cooper. While it demonstrated impressive capabilities, there were moments when it faltered, such as making an error on a geometry problem, highlighting that it may still be prone to inaccuracies.
The development of Advanced Voice Mode faced multiple delays, partly due to the feature being announced before it was fully ready for production. Initially intended for a swift rollout in April, it has taken more time to perfect the visual components.
As OpenAI continues to innovate, competitors like Google and Meta are also working on similar technology. Google has recently made strides with its own advanced AI feature, Project Astra, which is currently in testing for Android users.
In addition to the new visual capabilities, OpenAI has launched a festive “Santa Mode,” allowing users to hear responses in Santa’s voice by selecting the snowflake icon in the ChatGPT app.
This significant advancement in AI technology presents exciting opportunities for enhancing user experience and interaction with chatbots. The addition of visual processing to conversational AI indicates a promising future where human-like interactions become increasingly seamless and effective.
Summary: OpenAI has launched real-time video capabilities within ChatGPT’s Advanced Voice Mode, allowing enhanced interaction through visual recognition. Subscriptions to ChatGPT Plus, Team, or Pro gain this feature, while a festive “Santa Mode” has also been introduced. Despite earlier delays, this launch showcases OpenAI’s commitment to innovation in conversational AI, competing with similar developments from companies like Google and Meta.