OpenAI, the leading artificial intelligence research laboratory, has recently released an upgraded version of its popular text-based AI model, ChatGPT. The new model, named GPT-4o or Omni, is a multimodal AI capable of understanding and processing text, images, videos, and audio. This significant upgrade brings massive reasoning and natural language capabilities to the free version of ChatGPT for the first time.
Access to this new model in ChatGPT is available through chatgpt.com or a Mac app. The rollout is happening gradually in batches, allowing users to experience its enhanced features step by step.
Microsoft's Azure AI has also introduced GPT-4o as a flagship multimodal model, offering text, vision, and audio capabilities in preview on Azure OpenAI Service. This integration provides a richer user experience by handling multimodal inputs seamlessly.
The initial release of GPT-4o focuses on text and image inputs with further capabilities like audio and video to be added later. Possible use cases for this model include enhanced customer service, advanced analytics, and content innovation.
OpenAI's CEO Sam Altman compared the conversational abilities of the new AI to that of Scarlett Johansson in the movie 'Her.' However, it is important to remember that while GPT-4o can mimic human conversation and emotions, it does not possess true consciousness or feelings.
Google and OpenAI are currently competing to build a combination of ChatGPT and Search. Shivakumar Venkataraman, a Google Search veteran VP, has joined OpenAI to lead search efforts in this regard.
It is crucial for journalists to remain unbiased when reporting on AI advancements. The potential implications of these technologies are vast and can significantly impact various aspects of our lives. By providing factual information and avoiding sensationalism or speculation, we can help ensure that the public has a clear understanding of the current state and future possibilities of AI.