AI Chatbots Have Arrived Which Can Simultaneously Handle Words, Images, and Sounds(Synopsis)

New ‘multimodal’ AIs (meaning multiple types of AI input and output– like text, image, audio, video and more) have advanced beyond simply responding to text and can now analyze images and engage in spoken conversation. OpenAI released a multimodal version of its ChatGPT software, powered by the LLM GPT-4. Similarly, Google and Meta have incorporated image and audio features into their chatbot models, too. These multimodal AIs can perform various tasks, including accurately splitting a bar tab based on a photo of a receipt and providing detailed descriptions of images.

Multimodal AIs combine language-based neural networks with AI algorithms specifically designed for image or audio analysis, either by stacking them or by tightly integrating their code. Although the exact inner workings of these models are undisclosed, they rely on type transformers, which convert inputs into vector data to enable a more humanlike interaction.

The whytry.ai article you just read is a brief synopsis; the original article can be found here: Read the Full Article…

AI Chatbots Have Arrived Which Can Simultaneously Handle Words, Images, and Sounds
(Synopsis)

Can the United Nations Soon Broker Significant International AI Agreements?
(Synopsis)

Recommended

AI Creators Are Working Hard to ‘Humanize’ Chatbots
(Synopsis)

Survey Reveals That Europeans Want Strong AI Regulations
(Synopsis)

Professors Are Trying to Integrate ChatGPT into Learning Without the Cheating
(Synopsis)

Massive Stone Walls Are Being Constructed by a Lone AI Robot
(Synopsis)

Email a Link

About Why Try AI?

Welcome Back!

Retrieve your password