OpenAI、自然な会話を実現する新しい対話型AIサービス『GPT-4o』を発表OpenAI Announces New Conversational AI Service 'GPT-4o' with Natural Interaction Capabilities

2024年5月14日 12:47

OpenAIは5月13日（現地時間）、新しい対話型生成AIサービス「GPT-4o」を発表しました。

GPT-4oの概要

GPT-4oは有料版「ChatGPT Plus」や「ChatGPT Team」のユーザー向けに提供が開始されており、今後は企業向けのエンタープライズ版も提供予定です。無料版のChatGPTユーザーにも利用可能ですが、1日に使えるメッセージの数に制限があります。

ChatGPTの声や反応の自然さ

最も注目すべきは、「ChatGPTの声や反応の自然さ」です。画像などの複数の情報形式（マルチモーダル）を扱える生成AIサービスは他社にもありますが、GPT-4oを搭載したChatGPTのデモは非常に自然でした。

例えば、iPhone版のChatGPTアプリに「Hey, ChatGPT」と呼びかけて質問を始めると、途中で話題を変えてもきちんと対応し、隣にいる別の人が会話に参加しても適切に反応します。

使用した感想

反応の速さだけでなく、その流暢さにも驚かされました。デモは英語で行われましたが、ほとんどネイティブスピーカーのように聞こえました。さらに、お題に合わせて歌を歌ったり、「ロボットみたいに歌って」といった無理な注文にも応じていました。

このような特徴は、日常的な利用だけでなく、ビジネスの現場でも役立ちます。デモでは、macOS版アプリで音声アシスタント機能を起動し、表示中のソースコードを音声でレビューする様子も紹介されました。

今後の展開

ただし、残念な点として、この自然な対話機能はまだ正式版ではなく、今後アルファ版（試験版）として提供される予定です。まずはPlusユーザー向けにリリースされる予定です。

競合他社との比較

OpenAIの発表に対して、他社も同様の機能を持つサービスを提供しています。例えば、グーグルは5月2日に日本語を含む多言語対応の「Gemini」アプリを公開しました。このアプリでは音声での質問のほか、スマホのカメラを使った質問や検索も無料で利用できます。

大手企業の動向

OpenAIの発表は、グーグルの年次開発者会議「Google I/O」の1日前に行われました。グーグルはこの直前に、リアルタイムの動画分析をするAIアプリのデモをSNS上で公開し、両社が競い合っている様子が伺えます。

今後も、5月21日のマイクロソフトの「Microsoft Build」や、6月10日のアップルの「WWDC」など、他の大手企業の発表が続きます。先陣を切ったOpenAIの発表に続いて、各社がどのような対抗サービスを公開するかが注目されます。

On May 13th (local time), OpenAI announced a new interactive generative AI service called "GPT-4o."

Overview of GPT-4o

GPT-4o is now available to users of the paid versions "ChatGPT Plus" and "ChatGPT Team," with plans to offer it to enterprise customers in the future. It is also available to free ChatGPT users, but with a limit on the number of messages that can be used per day.

Naturalness of ChatGPT's Voice and Reactions

One of the most noteworthy aspects is the "naturalness of ChatGPT's voice and reactions." While other companies offer generative AI services that can handle multiple types of input (multimodal), the demo of ChatGPT equipped with GPT-4o was remarkably natural.

For example, in the iPhone version of the ChatGPT app, you can start a conversation by saying "Hey, ChatGPT." The AI can keep up even if the topic changes mid-conversation, and it also responds appropriately if another person joins the conversation.

User Impressions

Not only was the response speed impressive, but the fluency was also striking. Although the demo was conducted in English, it sounded almost like a native speaker. Additionally, it could sing songs tailored to the topic and respond to unusual requests like "sing like a robot."

These features are useful not only for everyday use but also in business settings. The demo showcased the voice assistant feature in the macOS app, where it reviewed displayed source code by voice.

Future Development

However, a downside is that the feature enabling natural conversation, as seen in the demo, is "not yet in its final version." While GPT-4o, capable of multimodal analysis, will be rolled out sequentially, the "new Voice Mode" for natural conversation will be released in an alpha version (initial test version) first, aimed at Plus users.

Comparison with Competitors

Despite the impressive naturalness and response speed, similar features are already implemented by other companies like Google. For instance, on May 2nd, Google launched the "Gemini" app, which supports languages other than English, including Japanese. This app allows users to ask questions via voice and use their smartphone camera (still images) for free searches.

Movements of Major Companies

OpenAI's announcement was made one day before Google's annual developer conference "Google I/O." Just before OpenAI's event, Google demonstrated an AI app for real-time video analysis on social media, indicating the competitive nature between the two companies.

Upcoming events include Microsoft's "Microsoft Build" on May 21st and Apple's "WWDC" on June 10th. While OpenAI has taken the lead with its announcement, it will be interesting to see what competing services the major tech companies will reveal.

#OpenAI #GPT4o #AI技術 #AItechnology

この記事が気に入ったらサポートをしてみませんか？