ChatGPT Can Now Watch and Interact With You in Real Time
OpenAI finally rolls out ChatGPT's video capabilities after a seven-month wait, just as Google debuts Project Astra and Meta pushes its own AI assistant.
OpenAI finally rolls out ChatGPT's video capabilities after a seven-month wait, just as Google debuts Project Astra and Meta pushes its own AI assistant.
OpenAI took the wraps off ChatGPT's long-promised video capabilities Thursday, letting users point their phones at objects for real-time AI analysis—a feature that's been gathering dust since its first demo in May.
Previously, you could input text, charts, voice, or still photos and interact with GPT. This feature, released late Thursday, allows GPT to watch you in real time and conversationally provide feedback. For instance, in my tests, this mode was able to solve math problems, give food recipes, tell stories, and even turn itself into my daughter’s new best friend, interacting with her while making pancakes, giving suggestions and encouraging her learning process through different games.
The release comes just a day after Google showed its own take on a camera-enabled AI assistant powered by the newly minted Gemini 2.0. Meta's been playing in this sandbox too, with its own AI that can see and chat through phone cameras.
ChatGPT's new tricks aren't for everyone though. Only Plus, Team, and Pro subscribers can access what OpenAI calls "Advanced Voice Mode with vision." The Plus subscription costs $20 a month, and the Pro tier costs $200.
“We're excited to announce that we're bringing video to Advanced voice mode so you can bring live video and also live screen sharing into your conversations with ChatGPT,” Kevin Weil, OpenAI’s Chief Product Officer, said in a video Thursday.
The stream was part of its “12 Days of OpenAI” campaign that will show 12 different announcements in as many consecutive days. So far, OpenAI has launched its o1 model for all users and unveiled the ChatGPT Pro plan for $200 per month, introduced reinforcement fine-tuning for customized models, released its generative video app Sora, updated its canvas feature, and released ChatGPT to Apple devices via the tech giant's Apple Intelligence feature.
The company gave a peek at what it can do during Thursday's livestream. The idea is that users can activate the video mode, in the same interface as advanced voice, and start interacting with the chatbot in real time. The chatbot has great vision understanding and is capable of providing relevant feedback with low latency, making the conversation feel natural.
Getting here wasn't exactly smooth sailing. OpenAI first promised these features "within a few weeks" in late April, but the feature was postponed following controversy over mimicking actress Scarlett Johansson's voice—without her permission—in advanced voice mode. Since video mode relies on advanced voice mode, that apparently slowed the rollout.
And rival Google is not sitting idle. Project Astra just landed in the hands of "trusted testers" on Android this week, promising a similar feature: an AI that speaks multiple languages, taps into Google's search and maps, and remembers conversations for up to 10 minutes.