

OpenAIはDALL-E3という画像生成ソフトウェアをリリース予定で、GPT-4.5としても知られる。DALL-E3はデザイン用途に有用で、複数のAIモデルの組み合わせが今後のトレンドとされる。3D生成技術も進展中で、AI利用のリスクと対策が議論される中、YouTubeはクリエイター向けのAIツールセットを発表した。MicrosoftはWindows 11にAI機能を追加し、Googleとの連携を強化するBard、そしてAmazonのAlexaが音声改善を目指している。

So, one of the things that was just recently released was DALL·E3.


Now, DALL·E3, if you don't know, is similar to DALL·E2.


It's essential image generation software, quite like Midjourney.


But this one is a bit different, you see, OpenAI actually did something that I didn't predict they were going to do.


And what they were able to do was they were able to add text.


As you can see on screen, it says Larry is so cute, what makes him super duper.

画面にあるように、Larry is so cute, what makes him super duperと書かれています。

So essentially, they've integrated this chatbot with DALL·E3, which is, of course, an image generation software.


So you can see they're able to design some stickers, they're able to talk about many different things.


And I think this is probably GPT-4.5.


And you might be wondering, why are you saying that this is GPT-4.5?


Because previously, we did talk about how ChatGPT is going to get incrementally upgraded, more things are going to be released, and we know that it is going to be more multimodal than it is going to be anything else.


So of course, this is technically DALL·E3, but it's also an upgrade to ChatGPT.


So of course, everyone's wondering when is GPT5, yada yada yada.


It seems like we're getting these updates in ChatGPT that are essentially becoming, you know, slowly and surely, um, towards GPT5.


So this is a major update for ChatGPT because, um, something like this is really cool because you can now actually talk with the text and, you know, ChatGPT can now give you an image.


And of course, we do know that this isn't exactly like the previous image capabilities where you can analyze images.


But in order to create these images, quite like MidJourney, this is really, really cool.


Now, we aren't sure how good this is compared to MidJourney in terms of realism, but we do know that in terms of stickers and various artistic things, this is pretty, pretty good.


So we don't know when this is going to be released.


I think OpenAI did say that it's going to be released sometime in September, but it will be really, really interesting to see how this does work.


So if you did want to see some comparisons between ChatGPT and, of course, MidJourney in terms of DALL·E3 vs MidJourney, the top one that we can see here, this is, of course, MidJourney.


And the bottom one that we can see, no, no, top one is actually DALL·E3.


And of course, the bottom one right here is, of course, MidJourney.


Now, this one, you're supposed to have like a heart with a universe inside of it, um, with clouds coming from the sky.


And I do think that this one that they've shown us does actually beat MidJourney in this area.


But once again, I do think that what we're starting to see from certain image generations is that certain image generations are best at certain styles.


I don't think there's one image generation that is best for all styles.


Even in MidJourney, you have to use V4 for more artistic and V5 for more realistic, and version 5.2 for more realistic and, I guess, artistic combined together.


For example, this is an example of um, some leaves dressed up as dancers or anthropomorphic leaves, um, as some country folklore singers.


And this is DALL·E, which looks pretty good.


And then, of course, this is mid Journey, but like I said before, I'm not sure which engine the person did use in Midjourney, so I can't really judge that.


But this is why I'm saying various different softwares, and depending on which version is released, they're going to generate a different specific result.


And then, of course, we have this one right here, which is a close-up of a hermit crab nestled in wet sand.


And the top is, of course, Daily 3, and the bottom is, of course, mid Jenny.


Like we said, mid Journey does look more realistic, um, like insanely realistic.


I would believe this is a real picture.


Um, and then this one, of course, also does one look realistic as well, but I don't want to say this is actually a hermit crab and this is actually just a normal crab.


So, I mean, it's up to you which one you think is better.


I'd love to know your thoughts down in the comment section below.


But, um, let's move now.


This was something I did want to talk about last week, but the video was like 28 minutes, so we just didn't have time to put it in.


But this is next GPT any to any multimodal LM, and this is generally where I think GPT-5 and all these other large language models are going to go.


I know that many different people in the industry are thinking that, you know, we're going to decide one large language model that can do absolutely everything, and maybe that's going to be possible.


But I like to think that what we can see here with next GPT, um, any to any multimodal LM, like previously, like Microsoft's Jarvis and other systems, is that this is much more realistic.


So I think what we're going to have is one large language model in the middle, and then this large language model is going to call on every single other large language model for the specific requests.


So, for example, when it needs to generate an image, it's going to use mid Journey or maybe Daily 3.

例えば、画像を生成する必要があるときは、Mid JourneyやDaily 3を使う。

When it needs text, it's going to use ChatGPT.


When it needs to do audio, it's going to pull on the ElevenLabs API.

音声が必要なときは、ElevenLabs APIを使う。

Um, and when it needs Vision, it's going to pull on the best Vision generator out there.


And I think with all that combined, that's going to be more like an AGI rather than doing one AI that can do absolutely everything.


I mean, I think it's going to be similar to how the human body works, or you have the brain that pulls on, you know, the nose for smelling, you know, the tongue for tasting, the eyes for seeing, um, rather than the brain just absolutely doing everything.


So I think it's going to be kind of like a body.


Um, and this was a paper, so you know, it says, As we humans always perceive the word and communicate with people through various moDALL·Eties, developing an any-to-any large language model capable of accepting and delivering content becomes essential to human-level AI.


So I think that is going to be very interesting because it can do any-to-any, um, which is even video.


So I think that's where things are going to go.


Um, and it will be interesting to see how that does work.


But I think that this is where these large language model systems are going to go.


In even when we just before saw how ChatGPT is already using DALL·E3, I think that is exactly where we are headed.


So if you're wondering what the future of AI is going to look like, this is a good show.


This is another thing I did want to include as well.


This is called MV dream multi-view diffusion for 3D generation.


This is really, really good.


I mean, I've seen tons of different 3D model generations, and this is really, really good.


Like, it's approaching that level of detail where you can actually use this stuff right now.


You can see Viking Axe fantasy weapon AK Blender.

Viking Axe fantasy weapon AK Blenderをご覧ください。

Um, and this is just a text-to-3D prompt.


So I mean, look at this.


I mean, Gandalf smiling white hair examples.


I mean, this is really, really good.


Like, before, you would see a lot of stuff as, of course, like right here, this is where you're seeing every other project, and all of the examples, I mean, you can see how it looks.


You know, this one's blurry.


This one's, that one's not too bad, but these ones aren't as good.


And of course, you can see they've managed to fix various issues and various quality issues.


Um, and yeah, this is really, really promising because when you compare it and you see the level of detail on this one, um, it just goes to show that with 3D, like, I feel like every week we're getting increases.


And here's the thing that someone previously said on Reddit, I'm not sure what the post is, but what they did say is that sometimes what people are looking for is one major breakthrough.


And of course, that does happen.


But sometimes all of these smaller breakthroughs on the smaller level allow us to create that much more smoother transition into whatever it is that we might be moving towards.


So of course, you can see right here that 3D is getting maybe 10% better, you know, every now and again, like 10% better every month.


You know, eventually you can have something that's crazy.


Remember mid Journey, every month it got better and better.


And eventually now we have something that is pretty crazy.


So this is crazy.


Like Jack Sparrow wearing sunglasses, boom, a 3D model.


Um, and that's pretty crazy for me.


So, um, yeah, I think this is, uh, really, really cool.


I think it's very, very interesting.


Um, I know something that I wanted to put out there.


I'm pretty sure there is a GitHub page that you can use or a Hugging Face spot.

GitHubのページとか、Hugging Faceのスポットがあると思うんだけど。

But this shows promising results because once this is perfected, guys, it's really going to change the entire.


Then, of course, you can see right here, tech tycoons combined with a net worth of roughly 550 billion gathered in the same room today for a cinema for a Senate Forum on the future and regulation of AI from Bloomberg.


So here you can see all of the people who are leaders and pioneers in the AI space.


They're pretty much, you know, own the decision on where this stuff is going to move, are discussing AI.


Now, what was interesting was one of the conversations that they had.


Now, the conversation wasn't that great, but there was something that I did want to pick up on because I don't think enough people are paying attention to this.


And I think once again, it shows that we are right now in a race to the ball.


So you can see right here, according to the Washington Post, one of the 22 tech titans at that Senate meeting, Tristan Harris of the Center for Humane Technology pulled the room that with 800 and a few hours of work, his team were able to strip Metis safety controls of its open source large language model, LAMA 2, and the AI responded to prompts with instructions to develop a biological weapon.

ですので、ワシントンポストによれば、上院の会議での22人のテクノロジー大物のうちの1人、ヒュームンテクノロジーのセンターのトリスタン・ハリスは、彼のチームが僅か800時間でオープンソースの大規模言語モデル、LAMA 2の安全装置を取り除くことができ、AIは生物兵器を開発する指示に対して反応したと部屋に伝えました。

Now, you have to understand that this is a problem because, of course, Meta Chief Mark Zuckerberg reportedly replied that those obstructions are available on the internet.


This is an example of AI doing sophisticated research.


Fair enough.


But like we stated, this is just the beginning.


So remember, okay, um, this is pretty crazy, okay?


Because, of course, as they state, DeepMind was able to develop AlphaFold, which is pretty pretty crazy.


That solved a lot of stuff that would have taken us years to solve.


So the point here is that, of course, right now, these AI systems aren't great.


I mean, they're great, but they aren't great to the point where they can develop completely new biological agents.


But the point is, is that if they're able to do that, you know, five, ten years from now, that is going to be crazy.


Because if we have open source tools which anyone can access, of course, this is good that anyone can access it, but that, of course, opens up to bad actors.


Maybe someone wants to develop something that could ruin a whole town, ruin parts of the world.


I mean, it's definitely something that we need to be careful of.


Because if we don't have safeguards and we don't have regulations, then this kind of stuff is going to fall into the wrong hands.


And of course, you know, they equate this to being somewhat of a nuclear bomb.


And giving a nuclear bomb to everyone on the planet is a recipe for disaster.


Because it only takes one person to set it off.


And we know that with eight billion people, there's definitely at least a few of those who are crazy enough to just do something just to see how it goes.


So it is definitely something that shows that although open source AI models are good, I don't think that the best, because the risk of people using them to, you know, create fraud and just do many different things, it's just too much.


But it'll be interesting to know what you guys think.


If these open source AI models should still be allowed, or if you think that the risk of bad actors is just too high.


Then what we have here is something that is very interesting.


I'm glad that this is now starting to get more recognition.


This is called autonomous driving with Chain of Thought autopilot thinking out loud in text.

これはChain of Thought autopilot thinking out loud in textと呼ばれる自動運転のことです。

It says, Linger One is the most interesting work I've read in auto driving for a while.

Linger Oneは、しばらくの間で読んだ自動運転に関する最も興味深い作品です。

Before perception, then driving action, then after perception, textual reasoning, and then action.


So if you don't know what Chain of Thought prompting is, it's essentially where you ask a large language model a question.

もしあなたがChain of Thoughtのプロンプトが何であるか知らないのであれば、大きな言語モデルに質問するところです。

Okay, so for example, let's say I asked GPT, What's two plus two?


Um, it might just say back four.


But let's say I asked it, What's two plus two?


And then I say, let's think step by step and show your reasoning and explain your reasoning before you give them your answer.


Then it's going to say two plus two is four because when you add two and then you add two, it's gonna be four.


And of course, it's meant for more complex questions, but they're now applying this to, I guess you could say, driving.


So it says Lingo One trains a video language model that comments on the ongoing scene.

それはLingo Oneが進行中のシーンにコメントするビデオ言語モデルを訓練すると言います。

And then you can ask it to explain its decisions as to why you stopped planning and what are you going to do next.


And of course, it shows you okay, why it's made these decisions.


So, um, at the start, we can see here it says, I'm edging due to the slow-moving traffic.


And then, of course, um, as you move on, it says, I'm overtaking a vehicle that's parked on the side.


So I think this is interesting because it gives us an insight as to how these large language models are making their decisions.


I'm accelerating now since the road is clear and remaining stationary as the lead vehicle is also stopped.


Um, and I think this might be, it might be a breakthrough.


I'm not too sure, but um, I think Elon Musk did make a comment on this because he did talk about LLMs.


But yeah, I can't, I can't, I can't find the actual tweet from Elon Musk where he does talk about LLMs.


But I do think that this is going to be interesting to see if this is, uh, more successful than other decisions.


Of course, contextual reasoning, as we know, does improve a model's response by around, you know, 20 to 30, or in some cases, even five times.


So it will be interesting to see how, right now, what's also very, very interesting as well is that Elon Musk actually said something literally an hour ago that means he has inside information about.


So he said, Okay, that Midjourney will be releasing something significant soon.


Okay, and this was in response to a tweet where it was talked about DALL·E3.


Once deployed, we'll improve at a faster rate.


And Elon Musk says, Midjourney will be releasing something significant soon.


So that means, of course, Elon Musk has insider information about what's going on at Midjourney.


And it's no surprise, I mean, if I work there, but I'm trying to think what exactly is Midjourney releasing?


Is it going to be finally 3D?


Because apparently that's what they're working on.


Or is it going to finally be the desktop browser/area that they were working on that was leaked?


It was leaked and it was, and I did see, I do have the screenshot to show you because they don't want anyone to see that stuff.


And I'm guessing they're just waiting competition and they want to save you that kind of stuff.


Now, there's a tweet here that does say, according to David Holes, Midjourney V6 will be a bigger jump from V5 with better image quality and text prompting.


And Midjourney 3D should come out in the next six months.


Now, if you want to know what 3D might look like, we do have an image trailer from this account, Nick Floats.


Now, I'm not sure if he made this himself.


I'm pretty sure it was him because I didn't manage to find this trial, but this does look very interesting.


And this is possible because the software and technology does exist to do this already.


Okay, and I'm not surprised if this does actually happen with Midjourney because, like we stated before, being able to do this, I know looking at crime scenes as we discussed before, like a snapshot of a crime scene or something like that, that would be pretty interesting for detectives or managing to do real estate or, you know, try to virtually explore and figure out where you're going to be, where you place certain things.


I definitely think it's going to, you know, be that next.


Then, of course, we have RoboFab introducing the world's first factory for humanoid robots.


And essentially, this is pretty crazy because, of course, as you know, humanoid robots are becoming more and more popular.


But this is a factory for the, you know, humanoid robots that are basically going to be everywhere.


And this is what they want.


So they're trying to reduce the cost of these automated rooms.


Essentially, what's even crazy about this, as well, so that I forgot to mention, is that they're going to be using the robots to actually assist in the factory in which they're building robots.


So I guess you could say this is somewhat kind of exponential and very interesting because I didn't expect to see this, you know, technological announcement happening so quickly.


So these robots, I'm desperately sure that these companies are going to be getting more funding and more crowdfunding because, of course, investors definitely want to benefit from this huge, huge industry that is projected to have billions and billions of dollars, will be worth trillions.


It will be interesting to see how this does play out.


Then essentially what we had was a surprise. We had this platform, YouTube, saying that they're going to release a bunch of new AI tools and it's going to be helping creators and anyone that does want to become a content creator.


So it will be interesting to see exactly how this stuff does work, how it does change the entire platform in terms of content creation.


If it's actually going to be good or if it's going to be just pretty bad.


So it says AI images for shorts and stuff.


So what I'm going to do is I'm going to leave some of the video in here so you guys can see exactly how it looks and how it works because this explanation is vastly better than mine.


Yeah, are you actually rolling?


We're actually, oh, we're rolling.


Okay, yeah, we're good to go. Let's go.


Okay, so YouTube just announced this set of AI and editing tools that are going to revolutionize the platform, making creation easier and more fun for everybody.


The aim is to unlock more creativity for more creators than ever before.


The most exciting part to me is that what was announced is supposedly just the beginning.


Let's get into it. Okay, first up, let's talk about DreamScreen, the new image and video generation experiment that's making its way to YouTube shorts.


Powered by amazing AI technology, DreamScreen lets you bring your imagination to life by simply typing in ideas as text prompts.


It then generates super fun images and videos that you can use to set the scene.


Alright, let's see this in action. (laughs) I kind of don't want to leave.

さて、これをアクションで見てみましょう。 (笑) 私は出て行くのが嫌だ。

This is nice.


These new tools are expanding the boundaries of digital art.


And that's not all.


Meet YouTube Create.

YouTube Createに会ってみてください。

This is a new app that YouTube is building to make editing easier for everybody.


And it's free of charge.


It includes access to thousands of royalty-free tracks and sound effects.


And you can automatically create captions for your video with just one tap.


Cleo Jade, are you not mindless?


I'm very excited about this in a genuine way.


Last but not least, and my personal favorite, there's a feature that lets you clean up and remove any background noise.


I live in New York.


That would be really helpful.


The beta for YouTube Create is available first on Android to creators in select countries right now.

YouTube Createのベータ版は、現在、一部の国のクリエイター向けにAndroidで最初に利用可能です。

So go and check it out.


These announcements show that YouTube is really starting to transform the way that content is created, helping more creators make more content in more ways than ever before.


They're shrinking the gap between our wildest ideas and what we can actually create.


Until then, just keep making things.


I'll see you on YouTube.


Okay, so of course we had Microsoft announce Your Copilot, which is the Windows 11 update.

もちろん、MicrosoftがWindows 11のアップデートであるYour Copilotを発表したことがあります。

And it's going to include most, um, pretty much just a ton of different AI updates, including Paint, Photos, ClipChamp, and more, all to your Windows PC.

そして、それにはPaint、Photos、ClipChampなど、あなたのWindows PC向けの多くのAIのアップデートがほとんど含まれています。

And of course, well, they also talked about how was that Bing is going to be adding support for the latest DALL·E3 model from OpenAI and deliver more personalized answers based on your search history.


And of course, just essentially a whole update that just makes everything a lot better.


So it's going to be really interesting because, of course, Microsoft is pushing this entire AI into the entire system.


Now, I do think that this is going to be interesting because, of course, as you know, Apple's operating software doesn't utilize any of this at all.


And it seems like Apple is currently being left behind in the AI race.


They haven't even spoke or even anticipated anything just yet.


So it will be interesting to see how Microsoft manages to gain ground because, of course, things with AI do move a lot quickly and a lot quicker than people do expect.


And you can be left behind because, of course, Internet Explorer did actually catch Google off guard.

もちろん、Internet Explorerが実際にGoogleを驚かせたという事実があります。

Now, of course, I would show you this entire video.


I probably might make a dedicated video on this.


But of course, you can see Bard can now connect your Google Apps and services.

しかし、もちろん、Bardが今あなたのGoogle Appsやサービスに接続できることがわかります。

So it says, Use Bard alongside your Google Apps and services.

それは、BardをあなたのGoogle Appsやサービスと一緒に使用するということです。

Easily double-check its response and access features in more places.


So what does this actually mean?


So you know how you have the Google Drive, you have Gmail, you have YouTube.

あなたがGoogle Driveを持っていたり、Gmailを持っていたり、YouTubeを持っていたりすることを知っているでしょう。

You can actually check and use Google Drive or, I mean, you can actually use Bard to check your Google Drive, to check your Gmail, to check your YouTube, to check Google Flights, to check pretty much all your editing.

実際にBardを使ってGoogle Driveをチェックしたり、Gmailをチェックしたり、YouTubeをチェックしたり、Google Flightsをチェックしたり、実際にはほとんどすべての編集をチェックすることができます。

So this is like actually having a great personal assistant.


And this is really, really, really cool because this is something that's more valuable than ChatGPT.


Because of course, ChatGPT is great, but I mean, you have person.


One of the large problems with ChatGPT is that if I need information, I have to give it that information.


And it is pretty time-consuming, especially if you're using ChatGPT to respond to email, to do work with your business, with your company.


Having to constantly feed it information, especially after updates, is very, very time-consuming.


This is where Bard comes in.


And it's already connected with Gmail.


You know, it can already help you with shared conversations, just so much stuff.


And it's really, really cool that now they have this.


Okay, and this is going to be something that, once again, like we said, is a step up here.


So of course, as you can see, when you go into Bard, this is exactly including Bard extensions.


Of course, if you click next, then you're going to see Bard meets Google Workspace because it has all of your stuff.


And of course, it's really cool is that double-check Bard responses.


You can check how accurate Bard's responses actually are.


So if you're not confident about Bard or if Bard's not confident about something, you can click a button and it's going to show you how accurate that response really was.


So of course, you can see right there, it's giving you a lot more stuff.


And of course, as you know, it's now, of course, you can upload images.


So one of the Bard's new features, I took a random picture of a car that I found on the internet.


And then, I literally just said, YouTube this.


So, if you don't know what it is, you can say, Google this or YouTube this.


And then, essentially, it's going to give you like YouTube videos that are about that.


So, I think this is going to be really interesting to see what kinds of cars it chooses.


Um, and what kinds of things are linked in here.


So, it's really, really interesting.


And it also says, YouTube video views will be stored in your YouTube history.


And so, yeah, I think this is going to be used a lot more than people did expect.


Then, of course, we have Amazon's Alexa voice.


It's allowed to become a lot more natural and a lot more better and a lot more clearer.


So, this is going to be interesting because this is something that we did expect earlier on the year.


We did Bedrock, which was a bunch of different Foundation models to all of their different APIs and services.


But this is, of course, to their lead product, which is, of course, Alexa or as many of you may know, Amazon's Astros.


So, take a look at this clip because I think it showcases exactly what Amazon is doing.


And, of course, Amazon Alexa hasn't really had much of the spotlight since Siri and other things like Google have all say.

そして、もちろん、Amazon AlexaはSiriやGoogleなど他のものが注目を浴びて以来、あまり注目されていませんでした。

But I think whatever company manages to get a home device that is really good first off the ground, that is integrated with a large language model, that it sounds natural, is actually useful, probably can make jokes and stuff like that, it's definitely going to take this next wave by storm.

