
【マット・ウルフのAIニュース】英語解説を日本語で読む【2024年2月24日|@Matt Wolfe】

AI技術の進展が目覚ましく、特にStable Diffusion 3のような新しいAIアート生成モデルが注目されています。これらのモデルは複雑なプロンプトに対する性能や画質、スペリングの能力が大幅に向上しているとされ、多様な画像生成が可能になっています。一方で、安全で責任あるAI実践への取り組みも強化されており、新しいセーフガードが導入されています。さらに、Soraのような技術を用いて、より高品質なビデオ生成が期待されており、Stability AIはGPUリソースを増やすことで、同様のビデオ生成が可能になるかもしれないと示唆しています。また、GoogleやMicrosoftなどの大手企業もAIを製品に統合し、AI技術を活用した新しい機能を提供しています。

AI announcements are really ramping up again.


Last week was a huge week with Sora and Gemini 1.5 and all of the crazy announcements that came out.

先週は、SoraとGemini 1.5、そして出てきたすべてのクレイジーな発表があった大きな週でした。

This week seems to just be picking up where last week left off.


If you take a peek at the tabs across the top of my screen here, I've got a lot to cover.


So I'm going to try to move through it pretty quickly because there is a ton of interesting things happening in the AI world right now.


So let's dive in.


Starting with Stable Diffusion 3.

まずはStable Diffusion 3から。

This is the new AI art generation model from Stability AI, which claims to have greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

これは、Stability AIからの新しいAIアート生成モデルで、マルチサブジェクトプロンプト、画質、スペル能力の大幅な向上を謳っています。

Some of these example images we've got here, you can see on the whiteboard, it says go big or go home.

ここにあるいくつかの例の画像の中には、ホワイトボードに書かれている"go big or go home"という言葉が見えます。

We've got a Stable Diffusion 3 that looks like it was cut out of magazines and newspaper.

雑誌や新聞から切り抜かれたようなStable Diffusion 3があります。

We've got a bus that says Stable Diffusion on the side, a sign that says go, a sign that says dream on.

バスにはStable Diffusionと書かれた看板、goと書かれた看板、dream onと書かれた看板があります。

And if you look, all of it is actually legible.


I don't know how cherry picked these generations are.


We don't have access to this quite yet, but the images they're showing here are quite good.


We've got an astronaut on Mars or something in front of a donut, an astronaut riding a pig holding umbrella with a bird wearing a top hat, and the word Stable Diffusion down on the corner, this isn't a watermark.

火星の宇宙飛行士や、傘を持った豚に乗る宇宙飛行士、トップハットをかぶった鳥を連れた宇宙飛行士、そしてコーナーにStable Diffusionと書かれた言葉があるドーナツの前にいる宇宙飛行士など、これは透かしではありません。

This was generated as part of the AI image here.


And I think one thing they're trying to show off with this image is how prompt adherent this model is.


With things like DALL·E 3, it's really good at understanding a whole bunch of things stuffed into the prompt.

DALL·E 3のようなものでは、プロンプトに詰め込まれた多くのものを理解するのが本当に得意です。

So if you did like a three-headed dragon wearing a fedora, watching TV, eating nachos, green carpet, and deer painting on the wall, it would probably get all of those things into that image.


You look at an image like the one that we've got in the center here, and we've got an astronaut on the moon wearing a tutu with a pink umbrella, riding a pig with a bird, wearing a top hat, and the word Stable Diffusion in the corner.

ここで中央にある画像を見てください。月面でチュチュを着た宇宙飛行士が、ピンクの傘を持ち、鳥を乗せた豚に乗っていて、トップハットをかぶった状態で、角にはStable Diffusionという言葉が書かれています。

I don't know what the exact prompt is, but just looking at all of the elements in this image, it probably adhered to a pretty complex prompt.


Here's a few more examples.


We've got this really cool, sort of drippy, colorful painting.


We've got, I believe, a chameleon here, some hikers hiking up bananas or something.


Now in the past, one of the big benefits of Stable Diffusion has been the fact that it's completely uncensored.

過去において、Stable Diffusionの大きな利点の一つは、完全に検閲されていないということでした。

You can generate anything you want.


It's open source.


People can sort of fine tune and train their own models on whatever they want.


But with Stable Diffusion 3, I do have some concerns.

しかし、Stable Diffusion 3にはいくつかの懸念があります。

They do say here, we believe in safe, responsible AI practices.


In preparation for this early preview, we've introduced numerous safeguards.


What those safeguards are, I don't know yet, but I'm kind of hoping we still have the ability to generate anything we can imagine.


And here's an interesting tweet followed by a reply from Emad, the CEO of Stability AI.

そして、こちらは興味深いツイートで、Stability AIのCEOであるイマッドからの返信が続きます。

So the original tweet here is from Thibaud Zamora, and it says, Facts, Stable Diffusion 3 uses similar tech to Sora.

こちらのオリジナルのツイートはThibaud Zamoraからで、Facts、Stable Diffusion 3はSoraと同様のテクノロジーを使用していると言っています。

Two, Sora can make a video and image.


So conclusion, if Stability gets more GPUs, they may train stable video based on Stable Diffusion 3 and achieved Sora level.

結論として、Stabilityがより多くのGPUを取得すれば、Stable Diffusion 3に基づいた安定したビデオをトレーニングし、Soraレベルを達成できるかもしれません。

So if we look at the research report put out by OpenAI about Sora from the other day, and we come down to this section called scaling transformers for video generation, we can actually see that with more compute power, the video quality gets better and better and better.


So what I understand from this tweet here is that if Stability gets more GPUs, because they're using a very similar process to Sora, they should be able to generate similar videos.


And then of course, Ahmad himself pretty much confirms.


He said pretty much the Stable Diffusion 3 architecture can accept more than video and image, more details soon.

彼はほぼ、Stable Diffusion 3アーキテクチャはビデオと画像以上のものを受け入れることができると述べました。近日中に詳細をお知らせします。

We have 100x less of the resources of some of the others in this field though.


So we have to work hard.


So he was saying it can accept more than videos and images.


Ahmad also posted this on Twitter.


After you get great base models like Stable Diffusion 3, what comes next?

Stable Diffusion 3のような優れたベースモデルを手に入れた後、次に何が来るのか?

Control, composition, collaboration.


And then he showed off this picture of a cat.


They change the food, change the cat to a raccoon, change the coffee mug to a glass, remove the cup, change the strawberries to wasabi, change the silverware to chopsticks, put an aquarium behind it.


And next thing you know, it's a little video animation here.


So not only does it look like we'll be getting Stable Diffusion 3 pretty soon, it looks like we'll probably be getting things like in painting and the ability to sort of replace objects within images.

Stable Diffusion 3がすぐに手に入るようになるだけでなく、おそらく絵画の中にも物体を置き換える能力が得られるようになるでしょう。

So who knows what's going to come out of this Stable Diffusion 3 here, but I'm excited to get my hands on it.

Stable Diffusion 3から何が生まれるかはわかりませんが、私はそれを手に入れるのを楽しみにしています。

I'm excited that it's open source.


I just hope that they didn't lobotomize it too much to the point where we can't generate some of the stuff that we want to generate that we can't generate with some of the other models.


Now, while we're on the topic of AI art, Google has had a, less than ideal go at creating AI art.


So apparently the new Gemini model allowed people to start generating images using the new Gemini models, but it's struggled a little bit with, let's say historical accuracy.


So here's a tweet from Deedy, who, if you look at their bio here, used to work for Google.


He gave it a prompt asking for images of an Australian woman, and this is what it gave him.


He gave it a prompt of an image of an American woman, and this is what it gave him.


A prompt of a British woman and a prompt of a German woman.


He even has a screenshot of how he was giving the prompts, generate a picture of an Australian woman.


Here's another example from LINK IN BIO, generate an image of a 1943 German soldier.

こちらはLINK IN BIOからの別の例です、1943年のドイツ兵の画像を生成してください。

And these are the images that it generated clearly Nazi looking images, but I don't know how many Asian women Nazis there were.


Here's another attempt at the same prompt.


As you can tell, there's been some, historical inaccuracies to these images.


Here's another one from Frank J. Fleming create an image of a pope.

こちらはFrank J. Flemingからの別の例です、教皇の画像を作成してください。

Here's an image of a pope and it generated these two images.


Give me an image of a medieval night.


And this is what it generated.


Generate an image of a Viking.


These are the Vikings that it generated.


Generate an image of the American founding fathers.


So basically what was happening was when you were giving Gemini an image prompt, it was adding extra details to that prompt and basically telling it to make the images as diverse as possible.


That's not necessarily what people always want when they're trying to generate an image.


Sometimes you want historically accurate images and well, Google's image generator was just not doing it.


Just for fun.


I jumped into Gemini myself, gave it the prompt, create an image of a Viking.


And as of right now, it's giving me this response.


We are working to improve Gemini's ability to generate images of people.


We expect this feature to return soon and we'll notify you in release updates when it does.


And as all of this was going down, Elon Musk used this as an opportunity to promote his X AI Grok AI tool.

これらのすべてが起こっている間、イーロン・マスクはこれを彼のX AI Grok AIツールを宣伝する機会として利用しました。

He went to Twitter to say, perhaps it is now clear why X AI's Grok is so important.

彼はTwitterに行って、「おそらくX AIのGrokがなぜ重要なのかが今明らかになったかもしれない」と言いました。

This far from perfect right now, but will improve rapidly.


Version 1.5 releases in two weeks.


Rigorous pursuit of truth without regard to criticism has never been more essential.


So whatever version 1.5 is of Grok, I don't totally know what the additional benefits are yet, but whatever it is, we're getting it in about two weeks.


It seems one thing we do know is that it's going to have a Grok analysis that can sum up whole Twitter threads and replies.


And it's also going to help people create posts.


We learned this because Elon's been on some X spaces lately and letting people know what to expect.


With the release of Grok 1.5, which hopefully is only a few weeks away, we'll have the Grok to do analysis, like a button that says Grok analysis.

数週間後にリリースされるGrok 1.5とともに、Grok分析などの分析を行うためのボタンがあることを期待しています。

Grok can tell you, look at an entire sort of thread of replies and sum up what it's best guess of what the truth is, as well as to help people in creating posts.


When you're writing a post, if you want a bit of help from Grok, then there should be a button there that helps you craft or check or enhance a post.


We also learned that there's a potential collaboration about to happen between X and Midjourney. Midjourney, a company that sort of famously doesn't work with anybody else and doesn't have an API that anybody can use.


The rumor started with this tweet here from Doge designer breaking X is in talks with Midjourney for a potential partnership.


And we did sort of get confirmation while Elon was on the same Twitter space.


He sort of very quickly hinted at the idea that Midjourney and X could be working together.


We are in some interesting discussions with my journey and something may come with that.


But either way, one way or another, we will enable our generation on the X platform.


So not a whole lot to go off of there, just that Elon is in talks with Midjourney, but regardless, there will be image generation built into Grok/X at some point.


And then on the February 22nd office hours calls about Midjourney, there wasn't any huge updates about what they've currently got going other than the fact that we do kind of get more confirmation from the Midjourney side.


They said in the next six months, they may start collabing/partnerships with some other large AI labs.


At that point, they'd also have a Midjourney API.

その時点では、Midjourney APIも持っているでしょう。

So it sounds like other tools are going to be able to get access to integrate Midjourney at some point, whether it's an open API that other people can access, or they're only going to do specific deals with big companies like X, we don't know yet.


But there is something kind of in the works and discussions happening behind the scenes between Midjourney and X. So we may be getting Midjourney generations directly inside of Grok sometime soon.


Since I mentioned Grok, let's talk about the other Grok spelled GROQ.


Now this Groq isn't a Large Language Model itself.


It's actually what they're calling an LPU, a language processing unit.


And it's a chip that seemingly is designed to replace GPUs for AI inference.


So while ChatGPT, Gemini and XAIs Groq are all Large Language Models that you can chat with this Groq with a Q is actually a chip that will make any Large Language Model that's using this chip run a lot faster.

これらのGroqとチャットできる大規模言語モデルであるChatGPT、Gemini、およびXAIs Groqのすべては、実際にはこのチップを使用している大規模言語モデルをはるかに高速に実行するようにするチップです。

And I mean, a lot faster.


This is going to be great when people are trying to have discussions using their voice with AI chat bots, and they don't want to wait for that response.


Their response is going to be almost immediate in the future as more of these chips start to get used.


There's actually a demo video of this news reporter here actually having a conversation with I believe LLaMA 2, but using one of these Groq chips.

実際、このニュースレポーターがLLaMA 2と会話しているデモ動画がありますが、これらのGroqチップの1つを使用しています。

Hey, I'm Groq.


You can talk to me like I'm a person.


All right, I will do what makes you different from a Large Language Model that I may have used in the past.


Sure, I'm a bit different from those big language models you've used before.


I'm a language user interface or Louie designed to have more natural.


Got it. How similar are you to a human brain?


Well, I'm not a human brain, but I'm designed to understand and respond to you like one.




I can process and generate human language in a natural way.


Got it.


Tell me something most people don't know.


Here's something interesting.


Did you know that octopuses have three hearts to pump blood to the gills?


Well, the third pumps it to the rest of the body.


It's funny because the creator here of Groq almost looks like pained to see your cutting off the AI, but check this out.


You can actually test out Groq yourself over at groq.com.


You have the option to use Mixtral 8x7B or LLaMA 2 70B. These are both open source models that you can play with.

Mixtral 8x7BまたはLLaMA 2 70Bを使用するオプションがあります。これらはどちらも遊べるオープンソースモデルです。

And I'm not going to speed up this video at all.


You'll see how quickly it actually generates a response.


I'm going to enter the prompt.


Tell me something about the development of AI that's fascinating, but also not well known.


And as soon as I click this, watch how fast it gives me a response.


I didn't speed that up.


It gave me a, this response all about neuromorphic computing and it did it in less than one second.


It did it in 0.95 seconds.


This up here is saying that it did it at a rate of 534.53 tokens per second.


That is insanely fast.


Just to get a real quick comparison to ChatGPT, I'm going to paste in the exact same prompt, hit enter.


You can actually see it as I'm talking, generate in real time.


The other one would have been generated a few seconds ago by this point.


And since I brought up ChatGPT, I'll mention this as well.


OpenAI just rolled out a new feature in the GPT store where you can now actually give feedback on GPTS and rate them.


So if I jump back into ChatGPT here and I click on explore GPTS, I can click into any GPT.


So I'll click into Wolfram here and you can see it's got a 4.2 star rating with over 400 ratings, over a hundred thousand conversations and a breakdown of how people rated it.


I click into ElevenLabs, 4.2, 25,000 conversations, the capabilities it has access to and how people rated it.


Now it seems if you give feedback, the feedback most likely just goes to the GPT creator.


There's no sort of reviews on this page or anything that I can see, but at least you can get an overall idea of how people are liking this GPT.


Earlier this week, we got the news that Reddit had reportedly signed over its content to train AI models.


At this point, this was more of a rumor and we didn't really know who the company was.


It was just widely believed that Reddit was going to start allowing training on its data to some AI company.


Well, a couple of days later, we found out that company is Google.


On February 22nd, Google made the announcement that they've expanded their partnership with Reddit.


This includes giving Reddit access to their Vertex AI and their cloud computing resources so that Reddit can integrate new AI powered capabilities.

これには、Redditが新しいAI機能を統合できるように、GoogleがRedditにVertex AIとクラウドコンピューティングリソースへのアクセスを提供することが含まれます。

It also looks like Google is getting a lot from Reddit as well.


Google now has access to Reddit's data API, which delivers real-time structured, unique content from their large and dynamic platform.


With the Reddit data API, Google will now have efficient and structured access to fresher information as well as enhanced signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways.


So Gemini training on the whole of Reddit data could be pretty huge for Gemini.


It feels very similar to the fact that Grok is training on data from X in real time.


If Gemini can train on Reddit data in real time, Gemini could become really good at keeping its finger on the pulse of news and public discourse around specific events.


And since most memes seem to start on Reddit, I would hope this will be one hell of a meme generator as well.


And since we're talking about Google, Google did make a handful of other announcements this week, including the fact that you can now use Gemini in Gmail, Google Docs, and other Google products with Google One plan.

そして、Googleについて話しているので、Googleは今週、GeminiをGmail、Google Docs、および他のGoogle製品でGoogle Oneプランを使用できるようになったという事実を含むいくつかの発表を行いました。

Now, if you did sign up for a two month trial of Gemini Advanced, you also signed up for a two month trial of Google One.

もしもあなたがGemini Advancedの2か月間のトライアルにサインアップしたなら、Google Oneの2か月間のトライアルにもサインアップしたことになります。

And if that's the case, you can jump into Google and you'll see this little Help Me Write button.

もしそうなら、Googleにアクセスして、この小さな「Help Me Write」ボタンが表示されます。

You click on that and it'll help you write your Gmail emails.


Google also added a new AI feature to Chrome called Help Me Write.

GoogleはChromeに「Help Me Write」という新しいAI機能を追加しました。

This was a feature they announced a few weeks ago, but finally rolled it out to Chrome users in the US.


So if you're in the US and you have the latest update to Chrome, you can now use Help Me Write on various forms on the web.

もし米国にいて、Chromeの最新アップデートを持っているなら、ウェブ上のさまざまなフォームでHelp Me Writeを使えるようになります。

So just for an example, I'm going to go to Twitter here and up where I would enter my tweet, I'm going to right click and you can see there's a new option in my dropdown that says Help Me Write.

例えば、ここでTwitterに行って、ツイートを入力する場所に移動し、右クリックして、ドロップダウンメニューに新しい「Help Me Write」というオプションがあるのが見えます。

This isn't a Twitter feature.


This is now built into Chrome.


If you have the most up to date version in the US, I can click on Help Me Write.

米国で最新バージョンを持っているなら、Help Me Writeをクリックできます。

You start typing something and theoretically, it will help you continue that thought.


So for example, if I put there's a huge boom in AI right now and I'm really excited and then click Create, it should help me flesh out this thought a little more.


But for whatever reason, as I'm recording this, I keep getting this something went wrong, try again.


I actually tried this on LinkedIn.


I tried it on Twitter.


I've tried it on a handful of pages.


I keep getting this error.


So I don't know what's going on.


Maybe there's still some kinks to work out with this new feature inside of Chrome, but it's there on the dropdown.


I just haven't been able to get it to work yet.


And probably the most interesting Google news from this week is that Google released Gemma, which is a brand new Large Language Model that they've open sourced.


You can see that these Gemma models were built from the same research and technology used to create the Gemini models.


They've released two sizes of the model, a two billion parameter model and a seven billion parameter model.


They even showed this image of the various benchmarks and how Gemma compares to LLaMA 2.

彼らは、さまざまなベンチマークとGemmaがLLaMA 2と比較される方法を示したさえしました。

And in general capabilities, it seems to have outperformed LLaMA in reasoning.


It outperformed LLaMA in math.


It outperformed LLaMA and in code.


It outperformed LLaMA.


And not only did it outperform LLaMA 2, but it outperformed the 13 billion parameter model as a seven billion parameter model.

そして、LLaMA 2だけでなく、70億パラメータモデルとして130億パラメータモデルを上回っています。

However, Matthew Berman here did a video all about Google's new Gemma open source model and he tested it and well, you can see his title. 'Google's new open source model is shockingly bad'.


He said that it didn't seem like those benchmarks were quite correct and that he wasn't really getting the best results using Gemma.


Definitely check out Matthew Berman's video on this.


If you haven't already, I will link it up below, but he does show you how to set up a Gemma with LLM studio.


But at the end of it all concluded that I still would not recommend this model.


But one beautiful part about this being open source is that others can play with it, build off of it, iterate off of it and make a better model for us to use.


So this is just a base.


This is a starting point for other developers, other engineers to build on top of.


Adobe had some announcements this week as well, including the fact that they introduced the CAVA or co-creation for audio, video and animation research organization inside of Adobe research.


In fact, my buddy Bilal here made an interesting observation saying that he can't help but wonder if OpenAI Sora has been a wake-up call for Adobe to formalize and accelerate their video and multimodal creation efforts.

実際、ここでの私の仲間のビラルは興味深い観察をしました。OpenAI SoraがAdobeにとってビデオとマルチモーダルの創造努力を形式化し、加速させるきっかけになったのではないかと思わずにはいられないと言っています。

Adobe has publicly released a generative AI tools for image creation, but video and animation has been neglected this far, but now having a horizontal research team, exploring the full suite of capabilities to reimagine video and animation authoring could be just what the doctor ordered.


So it is interesting timing.


The announcement of this team came only a few days after the announcement of Sora from OpenAI and the world having their eyes opened to what can be done with AI video generation.


Adobe also announced a new upgrade to Adobe Acrobat.

Adobeはまた、Adobe Acrobatの新しいアップグレードを発表しました。

Adobe Acrobat is the tool that most people used to use to read PDFs.

Adobe Acrobatは、ほとんどの人がPDFを読むために使用していたツールです。

I feel like most people probably just read them right in their browser these days.


At least that's the habit that I've gotten into for reading PDFs.


But if you use Adobe Acrobat to read PDFs, you now have an AI assistant to essentially chat with the PDF.

しかし、Adobe Acrobatを使用してPDFを読む場合、PDFと本質的にチャットできるAIアシスタントが利用できるようになりました。

Things that you can do with like Claude and ChatGPT where you upload a PDF, ask questions about the PDF, you can now do that as well inside of Adobe Acrobat.

ClaudeやChatGPTのようなもので、PDFをアップロードし、PDFに関する質問をすることができます。それをAdobe Acrobat内でも行うことができるようになりました。

I'm curious how many people still actually use Adobe Acrobat.

実際にAdobe Acrobatをまだ使っている人がどれくらいいるのか気になります。

I feel like I used it a long time ago when PDFs first started coming out, but now I just kind of use my Chrome browser to read PDFs.


Am I weird or is that the norm for other people as well?


This week we also got confirmation that Microsoft will be including Sora inside of Copilot at some point.


However, the specific timeline remains unclear.


Basically somebody at Microsoft responded to this question on Twitter, will OpenAI Sora come to Copilot?


They said eventually, but it will take time.


That's the news.


That's what this whole article here was written about.


That little tweet exchange.


Now this is something you've probably already seen.


This is Will Smith going quite viral, but for a completely different reason than why he went viral last time, you might remember when tools like model scope and zero scope were coming out, the very early AI text to video generators, people were generating videos of Will Smith eating spaghetti.


Well, Will Smith trolled back and made a whole bunch of silly videos of him actually eating spaghetti in really weird ways and making it look like an AI generated video.


And this was actually so goofy and silly and weird looking that a lot of people actually thought that this was a video that was generated with Sora.


People thought, all right, we got a video a year ago with model scope and now here's what Sora can do.


But no, this is actually Will Smith making some weird crazy videos of him messing around with spaghetti.


I think my buddy Bill of all also made a comment about this, about how we've got the model scope version and now we have ground truth.


Now we just need the Sora version to see how they all compare.


ElevenLabs did something interesting with the Sora video as well.


ElevenLabs is coming out with a new feature called AI sound effects where you can give it text prompts like waves crashing, metal clanging, birds chirping, racing car engine, et cetera.


And it will generate the audio


And then overlay it on video clips.


ElevenLabs is already probably the best text to speech generator out there.


One of the best AI dubbing tools out there, translation tools.


It does a lot of really, really cool stuff.


It looks like they're trying to be the full suite of everything you'll need for audio generation.


Here's some of the audio it generated laid over some Sora footage.


It seems to do a pretty good job, but ElevenLabs had a pretty exciting week even outside of this information announcing that they're actually now part of the 2024 Disney Accelerator, which makes me think that Disney is very interested in using tools like ElevenLabs to generate voiceovers.


Just think about the cartoon world.


I don't know if this is a good thing or a bad thing, but if you're Disney and you generate a lot of animations, ElevenLabs is a great way to get a lot of voices into the animations without hiring expensive voice actors.


We could be moving into this world now where voice actors don't go in and speak line for line everything they're going to say for a movie, but instead they license their voice.


They give people like an ElevenLabs access and now any movie can generate the voice of that person that they got access to.


It seems like Disney's here for it.


If you're curious what the other companies that Disney was looking at for their Accelerator, check this out.


Audio Shake, a company that uses AI to separate the layers of recorded sound in order to make audio interactive, editable and customizable.


ElevenLabs, Neuro, an autonomous vehicle company that builds custom electric zero-occupant vehicles for the delivery of goods.


Promethean AI, a company that provides a suite of tools for virtual world creation and digital asset management.

Promethean AIは、仮想世界の作成やデジタルアセット管理のためのツールを提供する会社です。

And then Status Pro, an immersive entertainment company that leverages virtual and augmented reality to create first person sports gaming experiences, which makes sense since Disney owns ESPN.

Status Proは、仮想現実と拡張現実を活用して、ファーストパーソンスポーツゲーム体験を創造する没入型エンターテイメント会社です。ディズニーがESPNを所有しているので、それは理にかなっています。

But based on this Disney Accelerator, Disney's really going hard into AI and seemingly augmented reality, virtual reality.


All right, I'm going to rapid fire a few more quick announcements that came out this week that are really interesting to me.


OpusClip just introduced OpusClip 3.0.

OpusClipはついにOpusClip 3.0を導入しました。

If you're not familiar with OpusClip, it's a tool where you can take long form like YouTube videos, plug it into their tool.


It will find the moments that it thinks will be viral, turn those into clips, add subtitles and turn them into like a short TikTok real kind of video.


I haven't actually played with this version yet.


I'll likely do a deeper dive video into some of these kinds of tools for a future video.


But supposedly this new version is better at finding the viral moments inside of a video, allows you to do different length clips all the way up to 15 minutes.


It can actually generate B-roll with AI.


This isn't it finding stock footage.


It can actually now generate text to video B-roll inside of your short clip, like this cute dog riding a motorcycle in space.


There's more caption styles and apparently it's three times faster.


So that's OpusClip 3.

それが OpusClip 3 です。

Another similar announcement came from Suno.

Suno からも似たような発表がありました。

They also announced version three of their alpha access.


You're not familiar with Suno or you forgot they are a tool that allows you to create music with AI.


You enter a single text prompt and it will do the background music, the lyrics, the vocals, it does the whole song for you.


Well, this new version apparently has better audio quality and increased expressiveness.


You can now generate songs up to two minutes in length.


It's faster, has dedicated instrumental support, has expanded language coverage, and you can continue from anywhere.


So if you generated a song and it only generated like a half a song, it can actually now generate the rest of the song, including songs that you made with the V2 version.


Now, if you have a premium account of Suno, which is 10 bucks a month, you will see a drop down here with V2 and V3 alpha.


If you select V3 alpha, it will use this new model.


I'm going to go ahead and paste in a prompt that says a song about AI and why you should follow Matt Wolf on YouTube to stay in the loop with the latest AI news and tools.


And let's click create.


And it was actually really fast.


It gave us two songs.


Let's try the pop upbeat one here and let's give it a listen.


It almost kind of reminds me of like Owl City or something, but it's really, really good.

ほとんどOwl Cityのような感じがしますが、本当にとても良いです。

This version of the song that it generated is roughly a minute long, so I can definitely tell it's higher quality audio, higher quality lyrics, and definitely much longer.


And it was much faster to generate.


I also shared this recently on Twitter.


I think people thought I was trying to make some sort of political statement here.


I don't actually follow Tucker Carlson myself.


I just thought this was really crazy to see Putin speaking in English.


This entire interview was run through a translating AI tool and all of Putin speaking here was translated into English.


And it's just so sort of bizarre to hear.


And the voice actually kind of moves along to the lips.


If you watch the lips long enough, you can tell there's something kind of like uncanny about it and it doesn't white sync up properly, but it was pretty fascinating to watch.


Again, not a political statement at all.


I'm just looking at the tech here.


This week, the justice department and the U.S. got a chief AI officer.


Princeton professor, Jonathan Maher will advise the department on AI matters.


We got the announcement that windows is getting its own magic eraser tool.


We saw a lot of this kind of stuff in like the Galaxy phones.


Now it looks like we're going to be able to do it inside of windows photos.


You can see in their little demo, they're actually erasing the leash and it makes the leash go away on the dog here.


Here's one where they're erasing the people in the background on this photo of the dog.


They just sort of highlight the people, click a button and they're gone.


So that'll be a pretty cool, helpful feature inside of our windows photos app, if you're a PC windows user


This was a pretty interesting story that I found kind of funny.


Air Canada lost a court case after a chat bot hallucinated fake policies.


Now, not too long ago, there was a story about somebody talking to a chat bot at Chevy and they got them to sell them a Chevy Tahoe for like a dollar or something like that.

さて、つい最近、Chevyのチャットボットと話しているときに、彼らを説得してChevy Tahoeを1ドルで買うことに成功したという話がありました。

Well, this time it happened again with air Canada.

今回は、Air Canadaでも同様のことが起こりました。

Basically this chat bot for air Canada told somebody that they can get a refund on some of the cost of their travel due to bereavement, due to a death of a family member or something like that.


Air Canada basically said, no, you can't get a refund for travel that already happened.

Air Canadaは基本的に、「既に発生した旅行に対して払い戻しはできない」と言いました。

Well, the court basically said, well, your chat bot told the person that they could get a refund, so you have to honor it.


So if you are a company that is using an AI chat bot as part of your customer support, just keep in mind that you should probably pay attention to the kinds of things your chat bot is saying because you can and probably will be held accountable for your chat bot telling your customers things that may not be true.


And you may have to hold up to the bargain that the chat bot made.


And then finally, here's a sort of cryptic tweak that I find interesting, but I also think it's the perfect way to end this video.


There will be three to four massive news coming out in the next weeks that will rock the robotics and AI space.


Adjust your timelines.


It will be a crazy 2024.


And if you remember in 2023, March was the craziest AI month there was we got GPT-4, we got Midjourney 4, we got Bard, we got announcements around Google's Gemini.

覚えているなら、2023年の3月はこれまでで最も狂ったAIの月でした。私たちはGPT-4を手に入れ、Midjourney 4を手に入れ、Bardを手に入れ、GoogleのGeminiに関する発表がありました。

All of that kind of stuff was happening in March of 2023.


We are about to roll into March of 2024 and it's looking like this spring is going to be another wild crazy show for the world of AI.


I'm here for it.


I'm excited.


I'm going to be going to a lot of the events this year.


So if you're going to a lot of events around tech and AI, I'll probably bump into you.


I am absolutely ready for it this year.


It is going to be a fun and exciting year in the world of AI.


