
【マシューのAIニュース】英語解説を日本語で読む【2023年9月30日|@Matthew Berman】

OpenAIはDolly 3をリリースし、Chat GPTにウェブブラウジング機能を復活・マルチモーダル機能を追加した。Sam Almanは冗談でAGI達成を発表。Johnny IとSam AlmanのAI iPhone制作の噂。TeslaのOptimusロボットは自己キャリブレーション能力を持つ。MetaはAI機能のサングラスをRaybanと共同で、またMeta AIやAIキャラクター、AI生成アート製品Emuをリリースした。MicrosoftはWindows 11にAIアシスタント「Copilot」を搭載。SpaceXは米国宇宙軍との衛星通信契約を獲得。CIAは中国対抗のAIツールを開発中。Quantum ComputingにはGoogleやIBMが投資。YouTubeはAI機能をクリエイター向けに追加。GoogleはOpenAI対抗のAIモデル「Gemini」を開発中。

And we're back.


Not only am I back from traveling last week, but the world of AI is back in a big way with an absolutely insane week of AI news.


This week, the AI battle between Meta and OpenAI heated up with numerous game-changing launches from OpenAI, bringing incredible new capabilities to ChatGPT.


Meta is also launching more AI features and a pair of AI-enabled sunglasses.


Amazon is catching up in the AI race with a massive investment in a leading AI company.


Tesla is showing off its updated Optimus robot, and Microsoft launches Windows 11 with AI built into everything.

テスラは最新のOptimusロボットを披露し、マイクロソフトはAIを全てに組み込んだWindows 11を発表する。

Sit back, relax, remember to subscribe for breakdowns of all the most important AI news, and let's go!


OpenAI absolutely dominated AI news this week with a number of new launches.


Honestly, even if they launched just one of these things, it would have been incredible.


First, OpenAI launched Dolly 3.

まず、OpenAIはDolly 3を発表した。

This actually happened last week, but I didn't get a chance to talk about it.


Dolly 3 is the newest version of their generative Art Product, which directly competes with Midjourney and Leonardo.

Dolly 3はジェネレーティブ・アート製品の最新バージョンで、MidjourneyやLeonardoと直接競合する。

From the initial samples I've seen, Dolly is now on par with the newest version of Midjourney.


Check out some of these images.


What also impresses me is the range of styles it can create.


Version 3 is a big leap forward compared to version 2.


Check out this example comparing V2 to V3 with the prompt: An expressive oil painting of a basketball player dunking depicted as an explosion of a nebula.

V2とV3をプロンプトで比較した例をご覧ください: 星雲の爆発のように描かれたバスケットボール選手のダンクの表現力豊かな油絵。

Dolly 3 also seems to be really capable of producing legible text in these images, which has always been a struggle for generative art.


Additionally, it's built natively on ChatGPT, which means you can use ChatGPT as a brainstorming partner to help create the best prompts.


This is already a popular technique to create prompts with Midjourney, and now it's seamlessly built into the ChatGPT workflow.


OpenAI spent a lot of time safety testing Dolly 3.

OpenAIはDolly 3の安全性テストに多くの時間を費やしました。

According to their blog post, Dolly 3 has mitigations to decline requests that ask for public figures by name.

彼らのブログポストによると、Dolly 3は公人の名前を尋ねるリクエストを拒否する緩和策を持っています。

They improve safety performance in risk areas like generation of public figures and harmful biases related to visual over/under representation.


In partnership with red teamers and domain experts who stress test the model, they help inform risk assessment and mitigation efforts in areas like propaganda and misinformation.


This, of course, is going to be a huge problem as all of these AI tools become better.


Right now, Dolly 3 is only available to ChatGPT plus and Enterprise users, but that $20 a month is continuing to increase in value, especially when you hear some of these next stories.


Another OpenAI launch that went under the radar but is incredibly important is web browsing in ChatGPT.


Now, ChatGPT has access to the entire internet instead of just what is built into the model.


Where we would frequently get the my knowledge cutoff date is September 2021 warnings.


But you're probably thinking, Didn't ChatGPT already have web browsing?


And the answer is, yeah, they did.


It was an incredible feature.


But a couple of months ago, OpenAI disabled it without much explanation.


The only reason they gave is that sometimes ChatGPT browsing would occasionally display content in unintended ways.


So, the example that they provided is users asking for the full text of a URL, and ChatGPT would actually give it.


And that's probably a big copyright risk exposure for the company, and why they decided to take it down.


But now, website owners can decide whether they want ChatGPT to be able to pull content from their site or not through the robots.txt file, which is the same thing that web crawlers use like Google.


I'm glad browsing is back because it makes ChatGPT much more powerful.


And in the biggest and most impressive launch this week by OpenAI, ChatGPT now has the ability to see, hear, and speak.


This is called multimodal, and this multimodal capability allows ChatGPT to be able to read images and have voice dialogue with users.


In an example provided in the launch blog post, a user asks ChatGPT how to lower their bike seat and provides a picture of the bike for context.


ChatGPT then provides some advice, and the user follows up with another image showing the specific part of the bike that might need to be adjusted.


After this back and forth, ChatGPT provides advice for the user's specific bike.


Then, the user shows a picture of their toolset, and ChatGPT tells the user which tool to use.


It's absolutely mind-blowing.


I've been collecting some insane examples of ChatGPT vision, such as taking a handwritten website flowchart, and ChatGPT just builds the entire website.


Let me know in the comments if you want me to put together a video showing off all the amazing examples of ChatGPT vision that I've been collecting.


But that's not all.


ChatGPT also can now communicate with voice and can have full conversations.


Simply open the ChatGPT app on your phone and start talking.


ChatGPT will also reply back with a voice, rather than only text.


They trained it using voice actors, and the voice is actually really good, not robotic at all.


Larry was a unique Hedgehog, unlike any other.


These features are rolling out over the next 2 weeks to ChatGPT plus users.


OpenAI has also given Spotify podcasters the ability to translate their voice into different languages.


But it's not dubbed or transcribed, it's the actual podcaster's voice but in different languages.


Imagine this video with my voice, except I'm speaking Spanish, Portuguese, Italian, French, Mandarin.


It's really incredible and opens up my content and everybody's content to a much wider audience throughout the world.


Check out this clip from Lex Fridman showing him speaking in Spanish.


How do you think all of these features is what Siri could have been all this time?


And Apple has some big goals to hit to compete with ChatGPT, although I know they're working on a lot of this functionality and bringing it into Siri.


Okay, I know it seems like just OpenAI news this week, but we're almost done.


On Reddit, Sam Altman seemingly confirmed OpenAI achieved AGI, but quickly followed up to make sure people knew he was kidding.




For a company whose goal is AGI and the clear leader in cutting-edge AI technology, this joke landed flat on its face.


I don't know why he would have joked about it.


It could easily have been real and scared a lot of people.


But again, he clarified that he's joking and that if he were actually to announce it, he wouldn't do it in a Reddit comment.


Fair enough, but still not funny.


He should probably stick to building AI and leave the jokes for comedians.


The last story about OpenAI is an interesting rumor.


The famed designer Jony Ive, who helped transform Apple into the design powerhouse it is today with his work on products like Mac, iPods, iPhones, is reportedly in talks with Sam Altman to create the iPhone of AI.


Apparently, they've raised a billion dollars from SoftBank CEO and founder Masayoshi Son for the project and could include chipmaker ARM for the hardware.


There's very little in the way of confirmations about this story, but as it unfolds, I'll definitely keep you posted.


Next, as we accelerate into the future, Tesla released a new video of their Optimus robot.


Since launching just a couple of years ago, Optimus has vastly improved.


I mean, at the initial launch, it was literally humans dressed in robot suits dancing around.


In this update video, Optimus is now capable of self-calibration of its arms and legs using only vision and joint position, and coding sorting blocks by color, even when the environment is dynamically changing and balancing on one leg.


Boston Dynamics is still the king of robotics for now, with its robots able to literally do parkour.


But they've been working on it for decades, and as mentioned, Tesla is only a couple of years into their development, and the progress they've made is impressive.


Next, Meta had a few major AI launches this week.


First, Meta launched Meta AI, which is a new AI experience across all their family of products.

まず、MetaはMeta AIを発表した。Meta AIは、同社の全製品にまたがる新しいAI体験である。

Meta AI, in beta, is an advanced conversational assistant that will be available in WhatsApp Messenger and Instagram.

Meta AIはベータ版で、WhatsApp MessengerとInstagramで利用可能な高度な会話アシスタントだ。

It'll also be coming to the new Quest 3 VR, as well as their new sunglasses product.

また、新しいQuest 3 VRや新しいサングラス製品にも搭載される予定だ。

According to Meta's blog post, Meta AI is powered by a custom model that leverages technology from LLaMA 2 and their latest large language model research.

Metaのブログポストによると、Meta AIは、LLaMA 2の技術と最新の大規模言語モデル研究を活用したカスタムモデルを搭載している。

In text-based chats, Meta AI has access to real-time information through their search partnership with Bing and also offers a tool for image generation.

テキストベースのチャットでは、Meta AIはBingとの検索パートナーシップを通じてリアルタイムの情報にアクセスでき、画像生成ツールも提供している。

So, Microsoft is not only powering ChatGPT browsing, but now also Meta AI browsing.

つまり、マイクロソフトはChatGPTのブラウジングだけでなく、Meta AIのブラウジングもサポートしているのだ。

Seems like the clear winner here is Microsoft.


Additionally, Meta is creating AIs that have more personality, opinions, and interests, and are a bit more fun to interact with.


Along with Meta AI, there are 28 more AIs that you can message on WhatsApp Messenger and Instagram.

Meta AIに加え、WhatsAppメッセンジャーやInstagramでメッセージのやり取りができるAIが28種類ある。

You can think of these AIs as a new cast of characters, all with unique backstories.


Some of these characters include TikTok star Charlie D'Amelio, Chris Paul, Kendall Jenner, Mr. Beast, and Snoop Dogg.

TikTokのスターであるCharlie D'Amelio、Chris Paul、Kendall Jenner、Mr. Beast、Snoop Doggなどです。

A full list of characters can be found on the blog post, which I'll link to in the description below.


I'm all about AI Snoop Dogg.


Who are you going to use?


Meta also launched Emu, their next-generation AI generative art product.


Emu is looking to compete directly with Midjourney and is built directly into a number of their different products, including Messenger.


Emu will also be capable of creating stickers, which are incredibly popular on the Messenger platform.


They're also building AI generative art functionality into Instagram and WhatsApp.


Continuing the theme of adding generative art to their products, Meta is also adding AI image editing using learnings from their Segment Anything research paper.

自社製品にジェネレーティブ・アートを加えるというテーマを継続し、Metaもまた、Segment Anythingの研究論文からの学習を使ってAI画像編集機能を追加する。

For example, you'll be easily able to change the backdrop of a photo to change the location with a feature called Backdrop.


And in the name of safety, they're going to clearly mark images that were created or manipulated with AI.


And you already know I'm a big fan of doing this.


I think all AI-generated content should be marked as such.


Next, as mentioned, Meta is launching sunglasses in partnership with Ray-Ban.


These glasses actually look normal.


Remember Google's attempt at making smart glasses about a decade ago?


Yeah, this isn't that.


Meta glasses will include a ton of AI functionality and allow you to livestream, capture photos, play music, make phone calls, and chat with Meta AI easily.

Metaのメガネには大量のAI機能が搭載され、ライブストリーム、写真撮影、音楽再生、電話、Meta AIとのチャットが簡単にできるようになる。

These glasses will come in two styles and a number of color variations.


And the only real distinction that makes them clearly unique are the cameras on the front.


This seems like a privacy nightmare, but everyone already has cameras in their pocket, so maybe this isn't much different.


What do you think?


Meta also launched a new version of their Quest VR headset.

Metaはまた、Quest VRヘッドセットの新バージョンを発表した。

With all of the AI news coming from Meta lately, it's easy to forget that Mark Zuckerberg pivoted their entire company around the metaverse.


Quest 3 will come with increased processing power, improved graphics and resolution, a slimmer profile, and improved sound quality.

Quest 3は、処理能力の向上、グラフィックと解像度の改善、薄型化、音質の向上が図られる予定だ。

Meta is racing to prepare for the soon-to-be-launched Apple Vision headset.


I've played around with VR headsets in the past, but they've never really become part of my daily workflow.


They're really cool, but I just haven't found daily use cases for them.


I'm incredibly excited for Apple Vision, and maybe that's because I'm a big Apple fanboy.

私はApple Visionにとても興奮している。それは私が大のAppleファンボーイだからかもしれない。

But this new Meta Quest also looks really cool, and the Meta Quest will come in at $499, which is 1/7th the cost of the Apple Vision.

しかし、この新しいMeta Questもとてもクールで、Meta Questの価格は499ドルで、Apple Visionの7分の1だ。

So, Meta is really taking a very different go-to-market approach than Apple.


However, Meta is clearly labeling the Quest 3 as a mixed reality headset, whereas before, I believe they only called it virtual reality.

しかし、MetaはQuest 3をMixed Realityヘッドセットと明確に表示している。

This is likely in response to Apple calling their headset mixed reality and never using the words virtual reality.


It seems the term VR has gone out of style.


Next, Mistral AI has launched its own 7 billion parameter large language model.


This new model, called Mistral 7B, beats LLaMA 2.7B on all benchmarks and LLaMA 1.34B on many benchmarks.

Mistral 7Bと呼ばれるこの新しいモデルは、すべてのベンチマークでLLaMA 2.7Bを、多くのベンチマークでLLaMA 1.34Bを上回っている。

And best of all, it's truly 100% open source, coming with an Apache 2.0 license.

そして何よりも、Apache 2.0ライセンスによる100%オープンソースである。

According to the launch blog post, it approaches code LLaMA 7B performance on code while remaining good at English tasks.

発表されたブログ記事によると、英語のタスクに優れながら、コード上ではLLaMA 7Bのパフォーマンスに近づいている。

All of the AI benchmarks are fine, but I found that they don't necessarily translate to real-world use cases.


Do you want me to run a full test on it myself and make a video about it?


Let me know in the comments.


Next, not to be left out of the AI race, Amazon made a couple of AI announcements this week.


First, Amazon acquired a large stake in the AI company Anthropic.


Anthropic is the maker of Claude, a direct and extremely capable competitor to ChatGPT.


Amazon invested $4 billion into Anthropic but also signaled a larger collaboration between the two companies, including AWS becoming Anthropic's primary cloud provider.


The two companies already launched the cloud model on Amazon Bedrock, which is one of their many AWS cloud services.

両社はすでに、AWSの数あるクラウドサービスのひとつであるAmazon Bedrock上でクラウドモデルの提供を開始している。

You'll be able to customize and fine-tune Claude using Bedrock.


Claude's AI capabilities will also start to be incorporated into other Amazon products.


This is a smart move by Amazon, echoing a similar strategy between Microsoft and OpenAI, with Microsoft's enormous investment in OpenAI and also providing them with cloud services through Azure.


Amazon is also bringing generative AI functionality to Alexa.


According to Dave Limp, the SVP of Devices and Services at Amazon, Our latest model has been specifically optimized for voice and the things we know our customers love, like having access to real-time information, efficiently controlling their smart home, and getting the most out of their home entertainment.


Amazon's new AI and Alexa will be conversational and will not only take into account voice but also body language, eye contact, and gestures.


You'll also be able to control your smart home, for which Amazon has been the clear winner in the space for a while.


Next, Leonard, a competitor to Midjourney but with an amazing interface, has launched a new feature called Elements.


Elements adds the ability to incorporate Luras into your GAN workflow.


According to the announcement, We have simplified the process for you to seamlessly blend various styles, mix models, and achieve incredible effects that align perfectly with your creative vision.


You can create an array of powerful effects on your generated images by combining artistic styles such as Baroque, Glass, Steel, Inferno, and many more.


Leonard is clearly the David versus multiple Goliaths, including Midjourney and Dolly 3, but I've been a big fan of Leonard from the very early days.

Leonardは、MidjourneyやDolly 3を含む複数のゴリアテに対するダビデであることは明らかですが、私はごく初期の頃からLeonardの大ファンでした。

One of my first videos was about Leonard, so I'm rooting for them.


Elements is available for all users right now, so be sure to check it out.


Microsoft this week launched Windows 11 with Copilot.

マイクロソフトは今週、コパイロットを搭載したWindows 11を発表した。

According to the announcement, New for Windows 11, Copilot in Windows is an AI-powered intelligent assistant that helps you get answers and inspirations from across the web, supports creativity and collaboration, and helps you focus on the task at hand.

発表によると、Windows 11の新機能であるCopilotは、AIを搭載したインテリジェントなアシスタントで、ウェブ全体から答えやインスピレーションを得るのを助け、創造性とコラボレーションをサポートし、目の前のタスクに集中できるようにします。

Copilot has been built into nearly every aspect of the Windows operating system and can not only answer your questions but can also control different aspects of your Windows environment.


I haven't had a chance to download and play around with Copilot yet, but I'm definitely going to do that soon.


Next, while not AI news, it's certainly futuristic.


Elon Musk's SpaceX has won a big US Space Force contract with Star Shield.

イーロン・マスクのSpaceXは、Star Shieldと米宇宙軍の大型契約を獲得した。

SpaceX will provide customized satellite communications for the military under the company's new Star Shield program.

SpaceXは、同社の新しいStar Shieldプログラムのもと、カスタマイズされた衛星通信を軍に提供する。

According to a quote from CNBC, The SpaceX contract provides for Star Shield end-to-end service via the Starlink constellation, user terminals, ancillary equipment, network management, and other related services.

CNBCからの引用によると、SpaceXの契約は、Starlinkコンステレーション、ユーザー端末、補助装置、ネットワーク管理、およびその他の関連サービスを介してStar Shieldのエンドツーエンドのサービスを提供する。

Space Force spokesperson Lynn Stepanic said, Star Shield is a new line of business for SpaceX, which it just launched last year, and the Pentagon already purchases the company's rockets, so they already have an existing relationship.


Not much more detail is available about this yet, but I'll keep an eye on it.


Also in government news, the CIA is building its own AI tool to rival China's capabilities.


According to decrypt.co, nameless for now, the tool will be trained on publicly available data and aims to help US spies quickly verify information.


It doesn't have a launch date yet, and I wonder if they're going to be working with leading AI companies like OpenAI and Meta on this.


The CIA AI will be able to analyze large swaths of data to help keep the US safe.


Next, apparently the real technology to be worried about is not AI, but rather quantum computing.


Quantum computing not only promises to give us incredible computing power to improve our world, but also threatens to upend many other security technologies, such as encryption.


Quantum computing doesn't operate like standard computers using ones and zeros, known as binary, but instead uses quantum bits or qubits that allow for calculating an unlimited number of possible outcomes.


This method has the potential to transform many industries, including logistics, healthcare, finance, cybersecurity, weather predictions, and more.


Leading tech companies are investing heavily in developing quantum computing, including Google and IBM.


But as Spider-Man's Uncle Ben said, With great power comes great responsibility.


Next, YouTube is launching a number of AI features, and I'm really excited about this being a YouTube creator myself.


Thanks to Billow Wal Sudu for putting together a summary of these AI features, including AI video with Dream Screen, which is a way to visually transport yourself anywhere by typing a prompt.

プロンプトを入力することで、自分自身を視覚的にどこにでも移動させる方法であるDream Screenを使ったAIビデオなど、これらのAI機能をまとめてくれたBillow Wal Suduに感謝する。

And this is going to be available in YouTube Shorts and generates fantastical backgrounds in both image and video form.

そして、これはYouTube Shortsで利用可能になる予定で、画像と動画の両方で幻想的な背景を生成する。

Also, a free editing app called YouTube Create, which is a mobile app that provides easy professional editing tools to craft high-quality videos in minutes, probably very similar to CapCut.

また、YouTube Createと呼ばれる無料の編集アプリは、高品質の動画を数分で作るための簡単なプロ仕様の編集ツールを提供するモバイルアプリで、おそらくCapCutによく似ている。

Personalized AI insights, which allows you to get tailored video ideas and outlines in YouTube Studio based on your channel and current trends.

パーソナライズされたAIインサイト。YouTube Studioで、あなたのチャンネルや現在のトレンドに基づいた動画のアイデアやアウトラインを得ることができる。

Auto-dubbing with a feature called Aloud, which automatically dubs and localizes your videos into other languages with one click.


Assistive music search, which automatically finds the perfect free soundtrack for your video, and AI will recommend songs and beats that fit best into your music.


And I bet they're also going to include AI-generated music in that.


So there's a number of features coming for YouTube creators, and I can't wait to try them out.


Now, for the AI video of the week, Tim Grman provided the suggestion this week, so thank you to Tim.

今週のAI動画は、Tim Grmanさんが提案してくれましたので、ありがとうございます。

In this video, we see a spectacular giant video during a concert show, showing a rapidly evolving skeleton guy.


The visuals are stunning, and I can't even imagine what it was like to actually be there, seeing it on a huge screen, all the people around you, music playing, and the incredible energy.


Check out the video now.


A reflection of eternity, a further and beyond.


For our last story, Google is nearing the launch of Gemini, its direct competitor to GPT-4.


And although OpenAI beat Google to launching multimodal features, it's rumored that Gemini will include multimodal features at launch.


Google is currently giving a small set of companies access to Gemini for testing purposes.


Gemini is a collection of AI models that will have access to the internet as well as all of your information, such as email, calendars, and docs.


It'll also be capable of writing code and generating images, all of the features that ChatGPT already supports.


I've mentioned this before, but it seems like every tech company, including Google, is playing catch-up with OpenAI.


This must be especially frustrating for Google, given they published the original research paper that kickstarted this wave of AI technology.


Attention is all you need.


If you liked this video, please consider giving it a like and subscribe, and I'll see you in the next one.

