OpenAI has officially unveiled its Advanced Voice Mode with vision capabilities for ChatGPT, a feature that was anticipated after its initial demonstration nearly seven months ago. The announcement was made during a livestream event on Thursday.
With this new functionality, users subscribed to ChatGPT Plus, Team, or Pro can utilize their mobile devices to point at objects and receive near real-time responses from ChatGPT. The Advanced Voice Mode can also interpret what is visible on the screen through screen sharing, providing assistance with various tasks, such as explaining settings menus or offering help with math problems.
To enable this feature, users can simply tap the voice icon next to the chat bar in the ChatGPT app, followed by the video icon at the bottom left to activate video mode. Additionally, screen sharing can be accessed by selecting the three-dot menu and clicking on “Share Screen.”
The rollout of this Advanced Voice Mode with vision will begin immediately and is expected to conclude within a week. However, it’s important to note that not all users will be able to access this new feature right away. Subscribers of ChatGPT Enterprise and Edu will have to wait until January for access, while a timeline for users in the EU, Switzerland, Iceland, Norway, or Liechtenstein has yet to be established.
In a demonstration shown during a segment on CNN’s “60 Minutes,” OpenAI President Greg Brockman showcased the capabilities of Advanced Voice Mode with vision by having ChatGPT quiz host Anderson Cooper on his anatomy knowledge. As Cooper illustrated body parts on a blackboard, ChatGPT effectively recognized and commented on the drawings.
While the feature exhibited impressive capabilities, it did showcase some limitations, including a mistake made on a geometry problem, highlighting the potential for inaccuracies that users should be aware of.
OpenAI’s release of this feature comes after several delays, originally promising its arrival much earlier this year, but requiring additional time for refinement. Competitors like Google and Meta are also developing similar functionalities for their chatbot technologies, with Google recently making its real-time, video-analyzing feature, Project Astra, available to select testers on Android.
In addition to these advancements, OpenAI also introduced a new “Santa Mode,” allowing users to engage with ChatGPT using a preset voice resembling Santa Claus, adding a playful and festive touch to user interactions.
This significant step forward in AI capabilities not only enhances the functionality of ChatGPT but also brings us closer to a future where conversational AI can engage in more immersive and meaningful ways, underscoring the continuous innovation within the tech landscape.