
【マット・ウルフのAIニュース:AGIに一歩近づいた!】英語解説を日本語で読む【2024年7月20日|@Matt Wolfe】

今週のAIニュースのまとめです。OpenAIはAGI(人工汎用知能)への5つのステップを発表しました。第1レベルは現在のChatGPTやClaudeなどの会話型AI、第2レベルは人間レベルの問題解決能力を持つ推論AIで、これに非常に近いと言われています。第3レベルはフライト予約やメール返信などを自動化するエージェント、第4レベルは新しいアイデアを創出するイノベーターAI、そして第5レベルは組織の仕事をこなすAIです。現在、第1レベルから第2レベルに移行しつつあるとのことです。OpenAIは「Strawberry」とコードネームされた新しい推論技術を開発中で、以前「Q*」と呼ばれていたものと考えられています。この技術はインターネットを自律的にナビゲートし、複雑なタスクを計画・実行できるAIを目指しています。AnthropicのAIチャットボット「Claude」はAndroid向けアプリがリリースされ、Googleはロックされた状態でも質問に答える「Gemini」を発表しました。Googleはまた、AIを活用したビデオ作成アプリ「Google Vids」をテスト中で、YouTubeは「YouTube Music Sound Search」と呼ばれる新機能をテストしています。最後に、NVIDIAとMistralが新しい12Bパラメータモデル「Mistral NeMo」を開発し、オフライン環境でも高性能なAIを提供できると発表しました。

Here's the AI news that you might have missed this week.


Starting with the fact that OpenAI mapped out their five levels towards the progress of AGI.


Here's a quick breakdown of those five levels.


Level one, they say would be chat bots and AI with conversational language.


That's essentially what we're getting right now out of chat, GPT, Claude, Llama three, things like that.

それが、現在私たちがチャット、GPT、Claude、Llama threeなどから得ているものです。

You have level two, which is reasoners that can do human level problem solving.


They claim they're very, very close to level two right now.


It moves on to level three, which is agents or systems that can take actions on our behalf book flights for us, respond to emails for us, things like that.


There's level four, which they say is the innovators AI that can aid in invention.


It's actually going to create novel ideas.


Finally you have level five, which is organizations and AI that can do the work of an organization.


Basically we're right here right now.


We're at level one, almost on the level two.


We're right on that precipice of level two and OpenAI believes that we'll sort of move through each of these levels on our way to a true AGI.


This was actually released on July 11th last week, but it didn't make it into last week's video, but it felt a little extra relevant this week because this week we got the news that OpenAI has been working on a new reasoning technology code named strawberry.


I've seen a lot of other YouTube videos and a lot of ex posts about this with people speculating.


A lot of people believe that this was what was originally called Q-Star and they've now rebranded it to strawberry.


This comes from a leaked internal document.


It says teams inside of OpenAI are working on strawberry according to a copy of a recent internal OpenAI document seen by Reuters.


Reuters couldn't ascertain the precise date of the document and they could not establish how close strawberry is to actually being publicly available.


Likely not very close.


The aim of strawberry is to not just generate answers to queries, but to plan ahead enough to navigate the internet autonomously and reliably to perform what OpenAI terms deep research article does claim the strawberry project was formerly known as Q*.


This exact article here was actually updated after it was published to add this section.


It says a different source briefed on the matter said OpenAI has tested AI internally that scored over 90% on a math data set, a benchmark of championship math problems.


Reuters couldn't actually Figure out if they were referring to the strawberry project or not, but it kind of sounds like they're probably the same project.


Outside of this little information that we have on it, there's a lot of speculation around what this is, but it sounds like this strawberry is pretty close to that level two towards AGI that we were just talking about.


At the moment it sounds like the main purpose of this research is for this new model to essentially do research among the capabilities.


OpenAI is aiming strawberry at is performing long horizon tasks or complex tasks that require a model to plan ahead and perform a series of actions over an extended period of time.


OpenAI specifically wants its model to use these capabilities to conduct research by browsing the web autonomously with the assistance of a computer using agent or CUA that can then take actions based on its findings.


Not much more is known about this and OpenAI has notoriously been kind of hush hush about their upcoming models.


Usually we don't know much about them until literally the day they make the announcement of them.


While we're on the topic of OpenAI, more people from OpenAI are coming out and talking about some of the questionable practices of OpenAI's business.


It came out this week that some whistleblowers are saying that OpenAI illegally keeps employees from talking to government regulators about problems at work and removes their rights to rewards for blowing the whistle.


This comes from a letter that was sent to Gary Gensler, the chair of the SEC.


OpenAI refutes the claims saying that they have a policy on whistleblowers that protects employees' rights to make protected disclosures.


This isn't the first time that OpenAI's policies and contracts with their employees have been under scrutiny.


Several weeks ago, it came out that OpenAI was forcing people to sign non-despair agreements.


If they talked badly about OpenAI, they could lose their vested equity in the company.


It sounds like people are coming forward and claiming that if we blow the whistle on anything we think OpenAI is doing, it's somewhat suspicious.


We can also lose our vested equity and that's not legal.


The sources are anonymous.


OpenAI claims that that's not actually happening, but I have a feeling OpenAI is probably in the process of overhauling a lot of their contracts that get signed by any new employees that join the company due to all this scrutiny.


Back when most of these people probably signed up for OpenAI, the company wasn't nearly as bigger in the public eye.


Now that they are as big and in the public eye, a lot of this stuff is kind of starting to come under the microscope.


While we're on the topic of OpenAI, there's some speculation that maybe the DALL·E image model recently got an update.


This is a post from my buddy angry penguin over on X where he shows off an image that he created that has pretty legible writing in it.


This clearly says evolve all over it.


Previously, DALL·E struggled with words.


If I go to DALL·E and say create an image of a robot holding a sign that says, please subscribe, I actually get an image that has the words kind of nailing it.


I think DALL·E did make some updates because the text seems to be much more clear than it used to be.


If you're interested in using DALL·E, but you don't have a ChatGPT Plus account, you can always go to Bing.com/images/create and use DALL·E three for free over on Bing's website, which if DALL·E three did get an update, it appears to have also rolled out here inside of being image creator.

DALL·Eを利用したいけれども、ChatGPT Plusアカウントを持っていない場合は、いつでもBing.com/images/createにアクセスして、Bingのウェブサイト上で無料でDALL·E threeを使用することができます。もしDALL·E threeが更新された場合、画像作成者としてこちらでも展開されているようです。

Two and a half out of four sort of nailed what I was going for.


We also got some new demo videos from Sora.


We can see this like black and white video showing all sorts of different clips in black and white that actually look pretty dang impressive.


These were shared on Matthew Berman's ex account.


Here's another one that he shared of like ocean crashing and I don't know, a gas station or motel or something.


But yeah, we're getting more demos from Sora, which is just making people more anxious to actually get their hands on it.


But right now we do sort of have that itch scratched in the form of runway gen three and Luma's dream machine.


We can actually create some pretty good AI generated videos now with those tools.


It sort of damped down the excitement for Sora a little bit, but the fact that this can create much longer videos and OpenAI tends to kind of set the bar for almost everything they put out.


I'm still excited about it, but I have gotten that need met with some other tools recently.


Andrej Karpathy, who previously worked at OpenAI and then recently stepped away, just announced a new venture that he's working on.


He said, excited to share that I'm starting an AI plus education company called Eureka Labs.


At Eureka Labs, they're building a new kind of school that is AI native.


They say that subject matter experts who are deeply passionate, great at teaching, infinitely patient, and fluent in all of the world's languages are also very scarce and cannot personally tutor all 8B of us on demand.


It sounds like he's creating a sort of online education where the teacher still designs the course materials, but they are supported, leveraged and scaled with an AI teaching assistant who is optimized to help guide the students through them.


This announcement here is really all that we have.


He hasn't really talked a whole lot about this more than the announcement, but what I'm sort of imagining is that a teacher with subject matter expertise goes in, creates an entire course on their subject matter.


All of that information is then sort of trained in the AI.


I don't know if they're going to use retrieval augmented generation or they're going to fine tune the model.


I don't know exactly how they're going to do it, but all of the information that the teacher taught is now available inside of the model.


Anybody who wants to learn this stuff can then work with a tutor who understands all of the training material and can speak to the student in whatever language they want to learn in.


This will massively scale the ability of an individual teacher who can teach the concept once and then let their AI assistant teach it to everybody else who wants to learn that information.


I'm just sort of speculating on what this is going to look like.


I don't know exactly, but that's sort of what the concept sounds like to me.


If you're a fan of Anthropics Claude and you don't have an iPhone, well, good news.

Anthropics Claudeのファンで、iPhoneを持っていない場合、良いニュースです。

They just released it on Android.


It's been on iOS for a couple of months now and they just now rolled out an Android version.


Personally, I'm still a fan of the ChatGPT app a little bit more than the Anthropic app, just because the conversational voice portion of the ChatGPT app is actually really, really good.


When I'm on my computer, I usually use either Claude or Perplexity.


When I'm using my phone, I still go to the ChatGPT app, but I also understand most people probably don't want to pay for three separate chat subscriptions.


If you really like the ability to have a voice conversation with an AI ChatGPT is still the way.


If you don't care about that, you just want the best model in your hand.


Claude is probably the best.


They now have an Android app.


Since we're talking about Android phones, Gemini now answers general questions when your Android phone is locked.


There's not too much more to share on this story.


It is exactly what it sounds like.


Google now lets you get answers from Gemini without actually unlocking your device.


Also this week, Google announced Google Vids.

今週、GoogleはGoogle Vidsを発表しました。

Vids is an AI powered video creation app that's designed for work and deeply integrated with the workspace suite you use every day.


You can actually find it over at workspace.google.com/products/Vids.

実際には、workspace.google.com/products/Vids で見つけることができます。

Right now it's not available to everybody.


They say we're currently testing this new application with a select group of trusted testers.


According to the video on their website, it looks like you give it a prompt like help me create a sales training video.


It will help create this like slide style video for you.


There's a bunch of different styles that you can choose from.


Once you pick your style, you can speak out a script, add a voiceover to it, and add stock footage to it to get the perfect sort of layout for your video.


It creates that sort of slide presentation video for you.


Since we're talking about Google and we're talking about video, let's talk about this new feature that YouTube is rolling out called YouTube music sound search.


It sounds like a feature that's very similar to Shazam where you can have it listen to a snippet of music and it will Figure out what song it is.


You can also hum the song and it'll be able to Figure out what song it was based on your humming.


We can see some screenshots that they shared here.


They've got a little search box here with a microphone next to it.


I'm assuming they click the microphone and then it says play, sing, or hum a song.


It figures out what song that you were trying to find just based on the singing or humming.


YouTube's also testing an AI-generated conversational radio.


It'll let you create a custom radio by describing exactly what they want to hear.


This article goes on to say be on the lookout for ask for music any way you like card in your home feed.


This will open the chat-based UI with a field at the bottom that lets you ask for music.


There's been a little bit more controversy this week about the source of training data or various AI models.


This article on proof news here claims that Apple and video and Anthropic use thousands of swiped YouTube videos to train AI.


Basically here's what's happening with this.


There's a company called Eleuther AI, which is an open-sourced company that collects a whole bunch of data from everywhere and puts it into what they call the pile.

Eleuther AIという企業があり、これはあらゆるところから大量のデータを収集し、それを「山」と呼ばれるものにまとめています。

The pile is this giant data set that companies then use to train their AI models initially so that it can just sort of learn how the language works and just get injected with a ton of data to start.


While this pile is trained on publicly available data.


It turns out that a lot of that publicly available data was transcripts that were copied and pasted straight from YouTube videos.


A lot of YouTubers started to notice there's data in there from people like MKBHD, MrBeast, PewDiePie, and others.


This site proof news.org actually put up a little search engine so that you can see if a video that you created or literally anybody's video is found within the pile's data set.

このサイトproof news.orgは、自分が作成した動画や誰か他の人の動画がそのデータセットの中にあるかどうかを確認できるように、ちょっとした検索エンジンを設置しました。

I did a search for my own name and no results were found.


I don't know whether I should be offended or relieved at the time the data was scraped.


My channel probably just wasn't big enough.


After all this came out, Apple stepped up to say, yes, we've used the pile for some research purposes and some training, but the model that we're using inside of our Apple intelligence is not trained on the pile's data.


That information is not inside of Apple's training set according to them.


Microsoft has a platform called designer, which if you're not familiar with it, it's very similar to Canva.


It's a platform to create things like YouTube thumbnails and banner ads and Instagram images and things like that.


This designer platform is now being rolled out into a whole bunch of different Microsoft apps directly where you can use the copilot sidebar over here, ask it to create a specific image in a specific style, and it will actually use Microsoft's designer to create that image and allow you to pull it directly inside of your document or your PowerPoint or whatever Microsoft tool that you're using.


Here's another example of it being shown off inside of Microsoft PowerPoint where they create some images with designer over here.


It generates some images and then they just pull that in as the background of the slide.


Designer also got a free mobile app on both iOS and Android so you can easily create and edit images on the go on a mobile device.


There's a whole bunch of other new features for designer.


If you want to dive deeper into it, this is something that you're really interested in.


I will make sure it's linked up in the description so you can see all of the updates here.


The article is quite long and there are quite a few updates, but it seems like it's got some other pretty cool features like this restyle feature.


You upload an image and it restyles it to a different style of image.


Mistral, the French AI company that develops Large Language Models released a new model called code stroll Mamba.

大規模言語モデルを開発するフランスのAI企業であるMistralは、新しいモデル「code stroll Mamba」をリリースしました。

This is a model designed for code generation.


It is open source and it can handle an input of up to 256,000 tokens, which is double what OpenAI currently offers with ChatGPT.


That's roughly 192,000 words between the amount of text inputted and the amount of text outputted.


This is a 7B parameter model and offers a fast response time even with longer input text.


If you're a coder and you're looking to try another Large Language Model to see if it outperforms the other models you've tried, maybe code stroll Mamba is a choice to try out.

もしもあなたがコーダーであり、他のモデルを試してみて、それがこれまで試した他のモデルを上回るかどうかを見たいと考えているなら、おそらくCode Stroll Mambaは試してみる価値がある選択肢です。

Amazon started rolling out an AI shopping assistant called Rufus, which apparently answers questions about shopping and also politics.


Rufus is essentially a chat bot just like ChatGPT, but it's built directly inside of the Amazon app and it's trained on the data that's in Amazon.


You can ask what are the best lawn games for kids birthday parties?


It will suggest lawn games as well as where to find them and buy them on Amazon.


His verge article also tested some other questions and got it to answer questions about the candidates for the 2024 election.


I've got more bad news.


If you're in the EU, it sounds like Metta is not going to be offering their multimodal models in the European union.


They will be offering their normal text input output models like Llama, but you're probably not going to be able to create AI images, AI videos, and anything other than more text.


If you're in the EU due to the EU's, I guess, unclear policies, they say here we will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment.


It says here that Metta's issue isn't with the still being finalized AI act, but rather with how it can train models using data from European customers while complying with GDPR, the EU's existing data protection law.


The United Kingdom has nearly identical laws to GDPR, but Metta says it isn't seeing the same level of regulatory uncertainty and plans to launch its new model for the UK users.


Here's something that I came across on X from Johannes Stelzer.

こちらはJohannes Stelzerから見つけたものです。

I just thought it was really cool.


They hooked up a little, meaty device to their computer and they can turn the knobs to change different aspects of the images.


They appear to be using Stable Diffusion here.


Using these knobs to change different elements within Stable Diffusion, different sort of parameters.


I just thought it looked really cool.


I wanted to share it.


They also put the code for it up on GitHub.


If you want to play around with something like this and hook up SDXL to a meaty device, well, that's available for you to do.


Here's another article that I came across that I couldn't find a whole lot on.


I just thought it looked cool is from Gizmo China.

私はただそれがかっこいいと思ったのですが、それはGizmo Chinaからです。

Turn your selfie into a printable 3d character with Tencent's AI powered app.


Apparently this is an app where you can upload a selfie and it will generate a 3d model based on that one selfie that is so good that you can 3d print it.


I actually did some digging to try to find more info about what they're doing here.


This was literally the only article I can find about it.


But as I learned more about it, I do have a 3d printer.


I do love AI.


This is something I will be playing with if I can get my hands on it.


Here's something interesting.


AI systems achieve a 96% accuracy in determining the sex from dental X-rays.


They basically trained an AI model on a whole bunch of dental images.


When they ran new dental images through it, it was able to determine the sex of whose teeth those were at a rate of 96% accuracy.


The ones that it wasn't accurate on, that was mostly children.


The article claims that it's less accurate if you're six or under, or basically haven't lost your teeth yet.


The main use case for something like this would be in forensics.


If they find skeletal remains or something, they can actually identify the sex of the skeletal remains.


But I just thought it was fascinating.


I thought I'd share with you.


I started recording that video while I was in San Diego still.


I'm on vacation in Colorado and a few more pieces of news came out that I wanted to make sure got shared in Friday's news video, including the fact that OpenAI just launched a new model today on Thursday, the day I record this called GPT-4o mini with pretty much every Large Language Model creator out there, creating models that are smaller, designed to be more cost efficient and faster.

私はコロラドで休暇中で、金曜日のニュースビデオで共有したいいくつかの新しい情報が出てきました。その中には、OpenAIが今日木曜日に新しいモデルを発表したことも含まれています。このモデルはGPT-4o miniと呼ばれ、ほとんどの大規模言語モデルの開発者が参加しており、より小さく、コスト効率が高く、より速いモデルを作成することを目指しています。

OpenAI needed to create a language model to compete.


This new GPT-4o is replacing the old GPT three point five, not quite as powerful as the full on GPT-4o, but it is faster and smarter than the previous GPT three point five.


We can see that right now today, GPT-4o mini supports text and vision in the API with support for text, image, video, and audio inputs and outputs in the future.

今日の現時点では、GPT-4 MiniはAPIでテキストとビジョンをサポートしており、将来的にはテキスト、画像、ビデオ、オーディオの入出力をサポートします。

It's got 128,000 token context window.


You should still be able to put large amounts of text as your input.


However, the output only supports 16,000 tokens.


We can see this comparison here of model evaluation scores with GPT-4o in pink being the best model.

GPT-4 Miniとのモデル評価スコアの比較を見ると、ピンク色のGPT-4が最も優れたモデルであることがわかります。

It pretty much performs the best across the board here in every test with GPT-4o mini, this new model that was just released performing second best across pretty much all of these benchmarks here.

GPT-4o miniは、ここで行われたすべてのテストでほぼ最高のパフォーマンスを発揮しており、この新しいモデルはほぼすべてのベンチマークで2番目に優れたパフォーマンスを示しています。

Keep in mind this is comparing it to these other companies, smaller models.


It almost kind of feels unfair to be putting GPT-4o in here compared against Claude Haiku and Gemini flash, which is both of those platforms, smaller models while GPT-4o is OpenAI's current state of the art model.

GPT-4oをClaude HaikuやGemini flashと比較するのは、少し不公平な気がします。これらは両方ともより小さなモデルであり、一方GPT-4oはOpenAIの最新モデルです。

But nonetheless, we can see how this new mini version of GPT-4o outperforms all the other mini models that are out there.

それでも、この新しいGPT-4o mini版が他のすべてのミニモデルを凌駕していることがわかります。

If we log into our ChatGPT account here up in the top left corner, where you select the model, you can see that we now have access to four Oh four Oh mini and the legacy GPT-4.

左上隅にあるChatGPTアカウントにログインすると、モデルを選択する場所で、4 Oh 4 Oh miniと従来のGPT-4にアクセスできることがわかります。

At the time of this recording, when I try to not log in and just use the free version, it's still claiming it's using ChatGPT-3.5, although it does say here in ChatGPT free plus and team users will be able to use GPT-4o mini starting today.

この録音時点では、ログインせずに無料版を使用しようとすると、まだChatGPT-3.5を使用していると主張していますが、ここにはChatGPT無料プラスおよびチームユーザーが今日からGPT-4o miniを使用できると記載されています。

In other Large Language Model news, NVIDIA and Mistral teamed up to create Mistral NeMo.

他の大規模言語モデルのニュースでは、NVIDIAとMistralが協力してMistral NeMoを作成しました。

This is a 12B parameter model and it also has 128,000 tokens, just like the new GPT-4o mini model.

これは12Bパラメーターモデルで、新しいGPT-4o miniモデルと同様に128,000トークンを持っています。

What's cool about this model is it's actually designed to be run on device.


We can see here, it says this model's efficiency and local deployment capabilities could attract businesses operating environments with limited internet connectivity or those with stringent data privacy requirements.


They do go on to say that it's more designed for laptops and desktop PCs than smartphones.


Companies that want to run a really, really powerful Large Language Model with a large context window that can take a lot of input and a lot of output text and may be concerned about privacy or not have internet access.


They have a model that they can use that's going to provide pretty much everything you're going to need.


It says the model is immediately available, and we have a link here with a downloadable version promised in the near future.


You can actually try this model out over on NVIDIA's website.


If we come over to build.nvidia.com/explore/discover, click on reasoning over on the left side, we can see fresh off the press, Mistral NeMo 12 B instruct.

もし私たちが「build.nvidia.com/explore/discover」にアクセスし、左側にあるreasoningをクリックすると、最新のMistral NeMo 12 Bの説明書が表示されます。

If we click in here, we get a chat window where we can actually play around with this model.


If we want to, again, this is a cloud version where you can just sort of play around with it, but a desktop version is coming soon.


Finally, if you're planning on watching the summer Olympics this year, it looks like Google's AI is going to be everywhere.


Google is apparently the official AI sponsor for team USA and claim that they're going to have ads all over for all of the various Google AI products.

Googleは明らかにチームUSAの公式AIスポンサーであり、様々なGoogle AI製品の広告がたくさん出ると主張しています。

If you haven't heard enough about AI lately on TV, well watching the Olympics, you're going to see a lot of it.


まだの方は、私が見つけた最もクールなAIツールをまとめているFuture Tools.ioをチェックしてみてください。

I keep the AI news page up to date on pretty much a daily basis.


We've got a free newsletter where you can get all of the coolest AI tools and most interesting AI news delivered directly to your email inbox.


You can find it all over at futuretools.io completely free.


Thank you so much for tuning into this video.


I really appreciate you.


I have a feeling the AI news is really going to start heating up again real soon.


There's a lot of cool things in the works that I've sort of been getting some sneak peeks of, and I'm excited to share what's on the way.


