【Kling：Soraを超える中国のAI】英語解説を日本語で読む【2024年6月7日｜@TheAIGRID】

2024年6月9日 11:51

「Kling」は、2011年に設立された北京を拠点とする中国の大手技術企業「快手（Kuaishou）」によって開発されました。このツールは、動画生成の一貫性と品質において、現時点での最高水準を誇ります。デモ映像の一つでは、ゴビ砂漠の夕日の中で馬に乗る男が映画のシーンのように描かれています。他のデモでは、月面を走る宇宙飛行士が滑らかに描かれ、様々なカメラアングルからのキャラクターの一貫性が示されています。さらに、Klingは最大2分間の動画を30フレーム/秒で生成する能力を持ち、その一貫性は驚異的です。また、物理世界の特性をシミュレートする能力にも優れており、例えば牛乳をカップに注ぐシーンでは、ミルクの流れやカップが徐々に満たされる様子が非常に現実的に描かれています。また、麺を食べる中国人男性の映像では、AI生成であるとは思えないほどリアルな細部が描かれています。
公開日：2024年6月7日
※動画を再生してから読むのがオススメです。

China just went ahead and released their text to video tool, and it is pretty, pretty incredible.

中国はただ前に進んで、テキストをビデオに変換するツールをリリースしましたが、それは非常に素晴らしいです。

I'm going to show you guys a quick sample of some of the clips, and then we'll dive into the good stuff.

私は皆さんにいくつかのクリップのサンプルをお見せしますが、その後、本題に入ります。

What you just saw was, of course, the very, very impressive Kling AI.

ご覧になったのは、もちろん非常に印象的なKling AIです。

This is Kling AI video generation tool, and this is something that was launched by Kuaishou, and this is a major Chinese technology company that was launched in 2011 with its headquarters in Beijing.

これはKling AIビデオ生成ツールであり、これは2011年に北京に本社を置く中国の主要な技術企業である快手によって立ち上げられたものです。

This I genuinely have to say, on some of these demos, I would argue that it actually genuinely surpasses Sora in terms of the consistency and what it's able to do regarding the quality of the clips.

これは本当に言わなければならないことですが、これらのデモの中には、実際にはSoraを上回ると主張できるものがあり、クリップの品質に関して何ができるかについても一貫性があります。

Trust me when I say that you need to watch this video until the end because once you truly understand how great this system is in terms of its ability to generate high-quality clips with a very decent amount of consistency amongst the scene, in terms of ensuring that characters remain stable and consistent.

このビデオを最後まで見る必要があると言ったときに私を信じてください。なぜなら、このシステムが高品質のクリップを生成する能力において非常に一貫性のある量で、キャラクターが安定して一貫していることを確認する点において、このシステムがどれだけ素晴らしいかを本当に理解した時にです。

For example, clips like this that are just a remarkable display of true understanding of exactly what's going on.

例えば、このようなクリップは、実際に何が起こっているかを正確に理解している素晴らしいディスプレイです。

It is showing us that right now we are seeing that AI is coming to a point where other nations are truly starting to catch up and slowly even surpass some of the state-of-the-art models in certain areas like text to video.

今、AIが他の国々が本当に追いつき、時には特定の分野で最先端のモデルを超えるようになっていることを示しています。例えば、テキストからビデオへの変換のような特定の分野で。

Let's dive in to exactly what makes this entire system so effective and how it actually works and how the team managed to crack this in such a short time frame.

この全体のシステムがどうやって効果的なのか、実際にどのように機能し、チームがどのようにしてこの短期間でこれを解決したのか、を詳しく見ていきましょう。

There are six different things that they talk about on their webpage, and I'm going to show you guys exactly what they are.

彼らのウェブページで話している6つの異なることがあり、それらが何かを皆さんに正確に示します。

One of the things that they talk about is 3D spatial temporal attention.

彼らが話していることの1つは3D空間的時間的注意です。

You can see right here that the prompt we have is a man riding a horse in the Gobi desert with a beautiful sunset behind him, movie quality like scene.

ここで見ることができるのは、私たちが持っているプロンプトが、ゴビ砂漠で馬に乗る男性が美しい夕日を背景にしている、映画のようなシーンです。

Essentially, this is where they've adopted a 3D spatial temporal attention mechanism which can better model complex spatial temporal motion and generate video content with larger movements while conforming to the laws of motion.

基本的に、彼らは3D空間時間アテンションメカニズムを採用しており、これにより複雑な空間的時間的動きをよりよくモデル化し、動きの法則に従いながらより大きな動きを持つビデオコンテンツを生成することができます。

This isn't by far their best clip.

これは彼らの最高のクリップではありません。

In fact, this is probably the worst clip that you're going to see for the entire video.

実際、これはおそらくビデオ全体で見ることになる最悪のクリップです。

But essentially what they talk about here is the ability to ensure that when they're generating clips that have a lot of different moving parts and a lot of things that you know have motion in them, it's very difficult to ensure that certain things are actually pretty consistent.

しかし、基本的に彼らがここで話しているのは、異なる動く部分や動きのあるものが多く含まれるクリップを生成する際に、特定のものが実際にかなり一貫していることを確認する能力です。

With this clip, we can see that things remain consistent.

このクリップでは、物事が一貫していることがわかります。

We have the person riding, and we can see their body doing what it should be.

私たちは人が乗っていて、その人の体が正しい動きをしているのが見えます。

This is how riders move when they're on a horse.

これが乗馬者が馬に乗っているときの動き方です。

We also have the dust trails, and of course, we have the legs of the horse moving in sync with the entirety of this clip, as well as the background that is moving correctly.

また、私たちはダストトレイルも持っており、もちろん、このクリップ全体と同期して動く馬の脚もあり、正しく動く背景もあります。

This is something that is remarkably impressive.

これは非常に印象的なものです。

They also demonstrate another example here where an astronaut runs on the lunar surface.

ここでも、宇宙飛行士が月の表面を走る例が示されています。

The low angle shot shows the vast background of the moon, and the movements are smooth and light.

低い角度からのショットは、月の広大な背景を示し、動きは滑らかで軽やかです。

So what we can see here is that this is a clip where we can see an astronaut running across the moon, a very, very decent one.

ここで見ることができるのは、宇宙飛行士が月を横切って走る様子を見ることができるクリップであり、非常に立派なものです。

I don't think this is their highly scaled model.

これは彼らの高度にスケールされたモデルではないと思います。

I'm guessing that this was just where they wanted to showcase what you can do when you have the camera angle panning from below all the way to above.

私は、これはカメラアングルが下から上までパンすることを示したかった場所だと推測しています。

So this was an example where they're trying to show as much character consistency among different camera angles.

これは、異なるカメラアングルの間でキャラクターの一貫性をできるだけ示そうとしている例でした。

I think this, while yes, the quality isn't remarkably incredible, I still think that it shows what this system is able to do because if you take a look at, for example, the shadows that most people wouldn't even think to look at, they do look remarkably accurate.

このシステムが何ができるかを示していると思います。品質は驚くほど素晴らしいとは言えませんが、例えば、ほとんどの人が見ることさえ考えない影を見ると、非常に正確に見えると思います。

Now let's take a look at another example of their 3D spatial temporal attention mechanism in action.

さて、彼らの3D空間時間注意メカニズムが作用している別の例を見てみましょう。

This is where we have the most interesting thing, and this is by far one of the most interesting demos that you'll probably see in the entire video.

これは、最も興味深いものであり、おそらくビデオ全体で見るであろう中で最も興味深いデモの1つです。

There are a lot more that are remarkably impressive, and I was truly shocked by this.

非常に印象的なものがたくさんあり、私はこれに本当に驚かされました。

I know we have that as a meme on this channel amongst the AI community, but this was generally pretty surprising that they managed to catch up or at least be on the level of Sora in such a short time frame.

このチャンネルでAIコミュニティの間でミームとして持っていることはわかっていますが、これは、Soraと同じレベルに追いつくか、少なくとも追いつこうとしたことが非常に驚くべきことでした。

This is where they talk about thanks to the efficient training infrastructure, extreme inference optimization, and scalable infrastructure, the keyling large model can generate videos up to two minutes long with a rate of 30 frames per second.

これは、効率的なトレーニングインフラ、極端な推論最適化、およびスケーラブルなインフラによって、キーリング大規模モデルが1秒あたり30フレームの速度で最大2分のビデオを生成できることについて話している場面です。

That's the info that they have on their website, and I think this is arguably more impressive than the OpenAI Sora video because what we are seeing here is a two-minute long video that is remarkably consistent with the background animation.

それは彼らのウェブサイトに掲載されている情報ですが、これはOpenAIのSoraビデオよりも印象的だと思います。なぜなら、ここで見ているのは背景アニメーションと非常に一貫した2分間のビデオだからです。

I guess you could say whatever the background footage is.

背景映像が何であるかと言えるかもしれません。

I mean, it's truly, truly impressive with as to what we are seeing here.

つまり、ここで見ているものは本当に、本当に印象的です。

I would argue that this is much longer than some of the Sora demonstrations because Sora demonstrations, as far as I know, were limited to around one minute.

私は、Soraのデモの中でも、私が知っている限りでは、約1分に制限されていたものよりもはるかに長いと主張します。

Now they might be working on Sora 2, but if we're taking a look at what we're truly seeing here, this is truly a remarkable level of consistency and a remarkable level of temporal consistency.

今、彼らはSora 2に取り組んでいるかもしれませんが、私たちが本当に見ているものを見ていると、これは本当に驚くほどの一貫性と時間的一貫性の高いレベルです。

Because what we have to truly think about here is that the AI system must need to understand exactly what's going on over a longer period of time.

なぜなら、ここで真に考えなければならないのは、AIシステムが長い時間の間に正確に何が起こっているかを理解する必要があるからです。

You have to understand that the longer the context is, the harder it is for these AI systems to, I guess you could say, be remarkably consistent.

コンテキストが長いほど、これらのAIシステムが非常に一貫していることが難しくなることを理解する必要があります。

We can see that this consistency is ushered amongst the entirety of this two-minute long clip generation.

この2分間のクリップ生成全体にわたって、この一貫性が導入されていることがわかります。

There was another example, but I didn't choose to include it because it's not as good as this one in terms of actually explaining what's going on.

別の例もありましたが、実際に何が起こっているかを説明する点で、この例ほど良くないので含めることを選択しませんでした。

I think this example right here, and you can even see that there are literal train lines as the train is going across, and of course, maybe the background doesn't make sense because it looked like that was Rome.

この例は、列車が通過する際に文字通りの鉄道路線があることがわかりますし、もちろん、背景がローマのように見えるので意味が通じないかもしれません。

and then it looked like another place was Arctic, so maybe that's a bit small in terms of the details that it might be missing, but I think that video generation up to two minutes long where you have this level of consistency.

そして、別の場所が北極のように見えたので、細部が欠けているかもしれませんが、この一貫性のレベルがある2分間までのビデオ生成は素晴らしいと思います。

And usually with the kind of AI systems that we're working with, the longer the systems generate things for, the more errors you start to see because things just get lost in translation, I guess you could say.

そして、私たちが取り組んでいるAIシステムでは、システムが生成する時間が長くなるほど、エラーが増えてくる傾向があります。翻訳の過程で情報が失われるため、と言えるかもしれません。

Like as the information is processed through the AI system, a lot of it does get lost, which is why early on a lot of the AI video systems that we used to see were only two to three seconds worth of videos.

情報がAIシステムを通過すると、多くの情報が失われるため、私たちが以前に見ていたAIビデオシステムの多くは、わずか2〜3秒の価値しかなかったのです。

And now you can see we've got things that are up to two minutes long, and there doesn't seem to be any real glitchiness or any real loss of quality regarding what's going on here.

そして今では、2分間までのビデオがあり、ここで何が起こっているかに関して、実際のところ、本当のグリッチや品質の低下は見られません。

This is something that I think is remarkably impressive because it shows that this system is able to generate consistency, especially when the AI system is able to look at what the scenery is like, and it's able to generate consistent footage for whatever system or whatever scene may be next.

このことは非常に印象的だと思います。なぜなら、このシステムが一貫性を生み出す能力があることを示しているからです。特に、AIシステムが風景がどのようなものかを見ることができ、次に何のシステムやシーンが来るかに関係なく一貫した映像を生成することができるからです。

And all of the motion that's going on here, I think this is genuinely really, really remarkable and impressive.

ここで起こっているすべての動きは、本当に非常に印象的で素晴らしいと思います。

Now, one of the most impressive things that we did see with other AI systems was their ability to simulate the physical world properties.

今、他のAIシステムで見た最も印象的なことの1つは、物理世界の特性をシミュレートする能力でした。

This was something that was talked about in the Sora paper because it was hailed as a new capability that was, I guess you could say, kind of emergent because it was something that we didn't really expect.

これはSora論文で話題になったことで、新しい能力として称賛されました。私たちが本当に期待していなかったものだったので、新たな能力と言えるかもしれません。

But of course, as these AI systems are trying to predict the next frame or, I guess you could say, make the videos all in one go, which is usually the architecture that we know that they use, they have to, I guess you could say, understand how the physical world works in order to create a video clip that actually looks realistic.

しかし、これらのAIシステムが次のフレームを予測しようとしたり、ビデオを一気に作成しようとしたりする際には、物理世界がどのように機能するかを理解する必要があり、実際にリアルなビデオクリップを作成するためには。

And whatever kind of world model they may have internally, this shows us that they're able to simulate the physical properties of the real world and generate videos that conform to the laws of physics.

そして、彼らが内部で持っているどのような世界モデルであれ、それは彼らが実際の物理法則に準拠したビデオを生成できることを示しています。

So here we can see that the prompt is carefully pour the milk into the cup.

ここでは、提示された指示が注意深くミルクをカップに注ぐことです。

The milk flows steadily and the cup is gradually filled with milky white.

ミルクは着実に流れ、カップは徐々に乳白色で満たされていきます。

So that's the actual extracted prompt from the website, and what we can see here is remarkable consistency in such a short clip.

これがウェブサイトから抽出された実際の指示であり、ここで見られるのは、非常に短いクリップでの驚くべき一貫性です。

Now, there's another clip that I do want to show you guys that has been remarkably impressive because I would say if there is probably one video clip that you do take away from this video, it's going to be this one.

さて、非常に印象的だった別のクリップがありますので、このビデオから持ち帰るべきビデオクリップがあるとすれば、おそらくこれになるでしょう。

So take a look at this clip right here: a Chinese man sitting at a table eating noodles with chopsticks.

では、このクリップをご覧ください：中国人男性が箸で麺を食べているテーブルに座っています。

And I would have to argue that if I personally saw this clip like 480p maybe on a forum or something, I wouldn't for the life of me think that this is AI generated at all.

個人的には、もしフォーラムなどで480pでこのクリップを見たら、一生AIが生成したものだとは思わないでしょう。

But we can clearly see here that this actually is AI generated, but it looks remarkably impressive because one of the things that you don't see here is that the man doesn't actually have sauce around his lips.

しかし、ここで明らかに見られるのは、実際にこれがAIが生成したものであるということですが、男性の口周りにソースがないのは見られないので、非常に印象的です。

But as he inhales, not inhales, the sauce, you can see that there is all of this mess around the lips, and that's because of the sauce that is orange at the bottom of the, I think, noodles here.

おそらく、ここにある麺の底にオレンジ色のソースがあるため、彼が吸い込む、吸い込まないに関わらず、唇の周りにはこのような混乱があるのが見えます。

So I think it's rather impressive that such a subtle detail is captured with the AI system, which is truly, truly in my opinion, remarkable because it shows that all of these small details are captured by the systems, and they're not really messing out on any of the finer quality details that we do expect from traditional video footage.

私は、AIシステムがこのような微妙なディテールを捉えることができることはかなり印象的だと思います。実際、私の意見では、これは本当に驚くべきことです。なぜなら、これらの小さなディテールがすべてシステムによって捉えられており、私たちが伝統的なビデオ映像から期待する細かい品質のディテールを見逃していないからです。

So this was one of the clips that I think truly showed people that, hey, this is a system that is really, really up there in terms of its ability to generate clips that are impressive.

これは、本当に人々に示したクリップの1つだと思います。ねえ、これは本当に印象的なクリップを生成する能力において非常に高い位置にあるシステムだということを。

And I think unless you're actually just focusing on the hands because the hands don't look as realistic, I mean, you can see just a little bit of inconsistency, just a little bit, but enough to let you know that what you're watching is AI generated.

そして、手だけに焦点を当てている場合を除いて、手があまりリアルに見えないと思います。つまり、少しの不一致が見られますが、それだけで、見ているものがAIによって生成されたものであることがわかります。

But I think this is, of course, something that is just remarkable in itself, especially the way the noodles move and the fact that the guy's emotions look very, very realistic.

しかし、これは、もちろん、非常に印象的なものであり、麺の動きや男性の感情が非常にリアルに見える点が特に素晴らしいと思います。

There was also this example, and this is where the chef chopping onions in the kitchen preparing for a dish.

また、この例もあります。これは、料理を準備するキッチンで玉ねぎを刻むシェフの映像です。

And I would argue that yes, this isn't as good as the previous one, and it isn't as long as the previous one, but it still is a demonstration of this simulation of simulating the physical world's properties.

そして、前のものほど良くはないし、前のものほど長くもないと主張しますが、それでも、物理世界の特性をシミュレートするこのシミュレーションのデモンストレーションであると言えます。

And the reason why they've likely included this one is because what you are doing in this video clip right here is that you are basically changing the physical nature of that onion.

そして、おそらく彼らがこれを含めた理由は、このビデオクリップで行っていることが、実際にその玉ねぎの物理的性質を変えているからだと思われます。

Okay, so essentially the reason that this is, of course, so impressive is because you have to truly understand what is going to happen to an onion when it is cut by this blade.

では、これがもちろん非常に印象的な理由は、この刃で切られたときに玉ねぎに何が起こるかを本当に理解しなければならないからです。

And you can see that as it is cut, you can see more onions are processed.

そして、切られると、より多くの玉ねぎが処理されているのが見えます。

and then they are split out by the knife, which is pretty impressive because this shows a decent level of understanding by this ai system.

そして、それらはナイフによって分割され、これはかなり印象的です。なぜなら、これはこのAIシステムによる理解のかなりのレベルを示しているからです。

I would say that it is very, very, very hard to get this kind of consistency with whatever ai system you are using.

私は、どのAIシステムを使用しているにせよ、この種の一貫性を得るのは非常に難しいと言えます。

This ai system was truly, truly impressive because there are also other examples of them being able to generate high quality things and just do a whole bunch of other useful things that we may have not have even thought about.

このAIシステムは本当に素晴らしかったです。なぜなら、高品質なものを生成したり、我々が考えもしなかった便利なことをたくさん行うことができる他の例もあるからです。

So, one of the things that they spoke about was, of course, the strong concept combination ability.

彼らが話したことの1つは、もちろん、強力な概念の組み合わせ能力です。

So, this ai system is remarkably good at combining different concepts together.

このAIシステムは、異なる概念をうまく組み合わせることに非常に優れています。

So, this is a white cat driving a car through a busy downtown street with tall buildings and pedestrians in the background.

これは、背景に高いビルや歩行者がいる賑やかなダウンタウンの通りを車で走る白い猫です。

The reason that they've done this example is because this footage doesn't exist.

この例を行った理由は、この映像が存在しないからです。

So, a cat driving a car downtown through a busy city street, of course, footage like this hasn't been recorded before.

白い猫が車を運転して賑やかな都市の通りを走る映像は、もちろん、これまで記録されたことがありません。

It doesn't really exist on any, I guess you could say, person's hard drive or any of those large databases where they just house millions of royalty stock videos.

これは、どのような人のハードドライブや何百万ものロイヤリティフリーのストックビデオを収容する大規模なデータベースにも存在しません。

I'm guessing that what we have here is a situation where they're demonstrating this AI system's ability to generate new and interesting videos that haven't existed before, and combine existing videos with other new concepts to create new pieces of material.

私たちがここで持っているのは、このAIシステムが新しい興味深いビデオを生成し、存在しなかったビデオと既存のビデオを組み合わせて新しい素材を作成する能力をデモンストレーションしている状況だと推測しています。

Which is, of course, very, very fascinating because it shows us that this is a system that doesn't fail when it tries to mimic exactly what is going on with the real world.

もちろん、これは非常に魅力的です。なぜなら、これはこのシステムが現実世界で起こっていることをまさに模倣しようとするときに失敗しないことを示しているからです。

We can see the background is, of course, very, very good in its consistency and we can see that even the subtle movements of the cat as it looks around and drives the car, those seem quite realistic, if I say so myself.

背景はもちろん非常に一貫性があり、猫が周りを見回したり車を運転する際の微妙な動きさえも、私が言うのも変ですがかなりリアルに見えます。

Now, once again, you can see that they've demonstrated this ability in this here where we have a macro lens volcano erupting in a coffee cup, a scene that, of course, you wouldn't ever see unless you somehow manage to have a volcano erupting in your coffee cup.

さらに、ここでこのマクロレンズで噴火する火山がコーヒーカップの中にあるシーンを示しています。もちろん、あなたが何らかの方法でコーヒーカップの中で火山が噴火することがない限り、このようなシーンを見ることはありません。

But what we have here is a demonstration of exactly how great this system is.

しかし、ここにあるのは、このシステムがどれほど素晴らしいかを実証しているものです。

So, we've got a situation on our hands where it's not just good at replicating some of the footage that we've seen before, it manages to show us how the liquid from the volcano actually transfers into this like coffee style liquid and gets melted along the cup edge here.

私たちが手にしている状況は、以前に見た映像の一部を複製するだけでなく、実際に火山から液体がどのようにコーヒー風の液体に移り、カップの縁に沿って溶ける様子を示すことができるという点です。

And one of my personal favorites from this entire strong concept combination ability was the ability for this lego character visiting an art gallery.

この強力なコンセプト組み合わせ能力の中で、私の個人的なお気に入りの1つは、レゴのキャラクターが美術館を訪れる能力でした。

I thought that the reason that this was so good was because this video clip actually captured the nuances of how lego characters actually walked.

私は、これがとても良かった理由は、このビデオクリップが実際にレゴキャラクターがどのように歩くかの微妙なニュアンスを捉えていたからだと思いました。

If you've ever seen a lego movie, you'll know that those characters in the movie, they actually walk exactly like this, which is remarkably surprising.

もしレゴの映画を見たことがあれば、その映画のキャラクターたちは実際にこのように歩くことを知っているでしょう。それは驚くほど驚くべきことです。

The fact that they were able to actually actively capture exactly how this lego character walks.

彼らが実際にこのレゴキャラクターがどのように歩くかを正確に捉えることができたという事実。

And of course, you can see even on the right there as a little easter egg, there is also a lego character there too.

そしてもちろん、右側にはちょっとしたお楽しみとして、そこにもレゴのキャラクターがいます。

It's very interesting because what's fascinating, as well, was that this character on the right was in focus.

興味深いのは、右側のこのキャラクターが焦点を合わせていることです。

As the lego character keeps walking forward and forward, it then shifts to being out of focus.

レゴのキャラクターが前に前に歩き続けると、焦点が外れるようになります。

Which is, like I said before, if you're someone that doesn't really understand how videos work because you've never worked in media before, you might miss some of the subtle details as well as some of the subtle mistakes.

これは、以前に述べたように、メディアで働いたことがないためにビデオの仕組みを理解していない人は、微妙なディテールや微妙なミスを見逃すかもしれません。

But I think you can grow to appreciate them more, especially if you've had that background, which is why when I look at some of these clips, they truly do make me pretty impressed.

しかし、私はそれらをもっと評価するように成長できると思います。特に、そのような背景がある場合は、だからこそ、私がこれらのクリップのいくつかを見ると、本当にかなり感心します。

I think that this one here was really, really cool because it showed the ability to capture specific details across many, many different clips.

このクリップは、多くの異なるクリップで特定の詳細を捉える能力を示していたので、本当に素晴らしいと思います。

One of the things that I really did like, and I have to say that I think personally this is my favorite feature from this video system, and what we have here is movie quality image generation.

私が本当に気に入っていることの1つは、個人的にこれがこのビデオシステムのお気に入りの機能だと思うと言わざるを得ません。ここにあるのは、映画の品質の画像生成です。

One of the biggest gripes that we've had and that I've personally had with video ai systems is the fact that they just don't have the good quality.

私たちが持っていた一番の不満の1つは、個人的に私が持っていたビデオAIシステムの品質が良くないということです。

Whilst yes, temporal consistency is something that we do look for in these video clips, the problem is right now that the quality is just not there.

確かに、時間的な一貫性はこれらのビデオクリップで求めるものですが、問題は今のところ品質がそこにないということです。

But you can see right here with the prompt that we have, this is a very high quality clip that looks remarkably accurate of what we've described.

しかし、ここにあるプロンプトを見ると、これは私たちが説明したものと非常に正確に見える非常に高品質なクリップです。

Now, I want to show you guys this clip instead because this is the clip right here that showcases just how good the quality is in terms of what you're getting here.

さて、こちらのクリップを見せたいと思います。なぜなら、こちらのクリップは、ここで得られる品質の良さを示しているからです。

And if I'm being completely honest with you, the quality here might not look as good as it can be because I've of course downloaded this clip.

そして、完全に正直であれば、ここでの品質は、もちろんこのクリップをダウンロードしたからかもしれませんが、それができるだけ良く見えないかもしれません。

And then, I've uploaded this clip, and then I've recorded my screen, and then I've once again processed the video again, and then it's been uploaded to YouTube, and of course, it's been compressed again.

そして、このクリップをアップロードし、画面を録画し、再びビデオを処理し、YouTubeにアップロードし、もちろん再度圧縮されました。

So trust me when I say, when you see this raw video, it actually looks remarkably impressive in terms of its quality.

だから、生のビデオを見ると、その品質は実際には非常に印象的に見えると言っても信じてください。

This is something that I'm not just stating for the video, but it does look really, really high quality, higher quality than anything I've seen.

これはただ動画のために述べているだけではなく、実際に非常に高品質に見えます。私が見た中でこれまでのどんなものよりも高品質です。

Now, of course, post-processing with any upscaling video softwares that you want to use, I think in the future, this is not going to be a difficult problem to solve at all.

今後、使用したい任意のアップスケーリングビデオソフトウェアでのポストプロセッシングは、全く解決が難しい問題ではないと思います。

But I do think that having a system that can natively output high-quality footage is going to be something that is a game-changer for industries.

しかし、ネイティブで高品質な映像を出力できるシステムを持つことは、産業にとってゲームチェンジャーになると思います。

And of course, you can see right here that this is a prompt of a chimney under the sunset, and this is where you can start to see that the high-quality nature of this AI system isn't just for show.

そしてもちろん、ここで見ることができるのは、夕日の下の煙突のプロンプトであり、このAIシステムの高品質な性質が見せかけだけではないことがわかります。

It's something that is truly, truly, truly impressive.

それは本当に、本当に、本当に印象的なものです。

So I think that when we take a look at all of these factors combined and the fact that this system apparently is in alpha testing as in some people are actively being able to use this, shows us that China is rapidly advancing with their video models and all models that they are currently using.

だから、これらすべての要素を総合して見ると、このシステムが現在アルファテスト中であり、一部の人々がこれを積極的に使用できるということは、中国が現在使用しているビデオモデルやすべてのモデルで急速に進化していることを示しています。

Now, another feature that they actually spoke about was the varied aspect ratio.

さらに、彼らが実際に話した別の機能は、異なるアスペクト比です。

So they spoke about how Keyling adopts a variable resolution training strategy which allows it to output a variety of different video aspect ratios for the same content during the inference process, meeting the needs for video materials in richer scenarios.

したがって、Keylingは、同じコンテンツに対してさまざまなビデオアスペクト比を出力できる変動解像度トレーニング戦略を採用しており、より豊かなシナリオでのビデオ素材のニーズを満たしています。

So that's what the website said, but essentially we have it here in a 1080 by 1080 scene, and then on the left here we have it in a 920 by 1080 scene, which is basically just, of course, the portrait edition.

ウェブサイトに書かれていることですが、基本的にはここには1080×1080のシーンがあり、左側には920×1080のシーンがあり、これは基本的にポートレート版です。

And then, of course, we have this square edition.

そしてもちろん、こちらには四角い版もあります。

There was a landscape edition, but I didn't include it because I'm sure you guys can completely understand the picture of whatever this AI system is trying to do.

ランドスケープ版もありましたが、このAIシステムが何をしようとしているかを完全に理解していただけると思うので、含めませんでした。

But I genuinely have to say that with this clip being in there as something that I personally think is probably one of the most realistic clips.

このクリップが含まれていることを正直に言わなければならないが、個人的にはおそらく最もリアルなクリップの1つだと思います。

And of course, when we do take a look at some of these, for example, this bird right here being very, very high quality, and of course, this road right here showing us the kind of consistency that I just wouldn't even think of.

もちろん、例えば、この鳥が非常に高品質であり、もちろん、この道路が私たちに考えもしなかったような一貫性を示しているのを見ると、いくつかを見てみるときに。

This right here showing us the real-world physics being demonstrated, a very, very consistent fish underwater, and of course, one of my favorites was the panda playing the guitar.

ここには、実世界の物理が示され、非常に一貫した水中の魚があり、もちろん、私のお気に入りの1つはパンダがギターを弾いているのを見ることができます。

Now, there was one clip that I actually did forget to add, but I'm going to show it to you all now because the consistency of that clip is remarkable.

今、実際に追加し忘れたクリップが1つありましたが、そのクリップの一貫性は驚くほどですので、今皆さんに見せます。

And I'm going to show you guys why, although it was slightly demonstrated a little bit before.

そして、私が皆さんに見せたかったクリップをビデオが終わる前に見せます。

So this is the clip I wanted to show you guys before the video ended.

これは、ビデオが終わる前に皆さんに見せたかったクリップです。

So this was a clip where we have a little boy eating a burger, but take a look at what happens because there was also this clip from Sora, and I would argue that it was remarkably impressive for this very reason.

これは、少年がハンバーガーを食べているクリップで、何が起こるかを見てください、なぜなら、Soraからのこのクリップもあり、私はそれが非常に印象的だと主張します。

So he takes a bite of the burger.

彼はハンバーガーの一口を取る。

and then you can see literally as he's taken a bite that there is quite a lot of mess around his mouth, which i think is remarkably accurate for of course how kids eat.

そして、彼が一口を取ると、口の周りにかなりの散らかりがあるのが見えますが、これは子供たちが食べる方法には驚くほど正確だと思います。

It's managing to simulate the fact that there might be certain particles left on, well not actually particles, just call these actual crumbs.

それは、実際には粒子とは言わず、ただ実際のクラムと呼ぶべきものが残っているかもしれないことをシミュレートしています。

But of course, that they'd be on his face.

もちろん、それらは彼の顔にあるでしょう。

I just thought that this is an eerie, eerie realistically generation for such an ai system.

私はただ、このようなAIシステムに対する不気味で非常に現実的な生成だと思いました。

Overall, I think that what this is going to do for the dynamics of the ai marketplace is it's going to show us that china can compete quickly and efficiently to not only the state of what the united states is doing in terms of their ai development, but in even some instances manage to surpass them.

全体的に、これがAI市場のダイナミクスにもたらす影響として、中国がアメリカのAI開発の状況だけでなく、いくつかの場合にはそれを凌駕することができることを示すと思います。

Which means that now that china is of course focusing a lot of their efforts on these kinds of systems, and of course we've seen a variety of different advancements across many different domains, i genuinely wouldn't be surprised if in a couple of months we do get a bunch of different chinese ai tools that are far superior than what the united states has.

つまり、中国がもちろんこの種のシステムに多くの努力を集中していることを考えると、私たちはさまざまな分野でさまざまな進歩を見てきましたが、数か月後に米国が持っているものよりも優れた多くの中国のAIツールを手に入れることに驚かないでしょう。

And it may create an even worse terminal race condition where other nations are fighting to develop the very best ai systems, which could lead to detrimental outcomes.

そして、他の国々が最高のAIシステムを開発するために戦っている状況を悪化させ、有害な結果につながる可能性があるかもしれません。

Now i know that yes this is literally just a text to video ai video, but of course it shows us that this kind of technology was something that we really looked at and we heralded it as if it was going to be something that was completely impossible just 18 months ago.

今、私は知っています、これは文字からビデオへのAIビデオに過ぎませんが、もちろん、この種の技術が18か月前には完全に不可能だと考えられていたものを本当に見て、それを称賛していたことを示しています。

And now we have a system that some people would say is just remarkably you know just realistic.

そして今、私たちには、一部の人々が単に驚くほど現実的だと言うシステムがあります。

So i would say overall what does this do for your timelines in terms of where you think ai is going to go because i don't think if it was for saura or for google's recent vo we would even be maybe not as shocked by this kind of demo.

だから、全体的に言えば、これがAIがどこに向かっているかについてのあなたのタイムラインにどのような影響を与えるのかということです、なぜなら、私は、もしソーラやGoogleの最近のVOがなかったら、この種のデモに驚かなかったかもしれないからです。

But for me personally this kind of makes me believe that the kind of capable systems that we're going to be getting in the future even if it's not from the united states but from another country who is developing it it's definitely going to be absolutely incredible.

しかし、個人的には、これは私に、将来私たちが手に入れることになる可能性のあるシステムが、米国ではなく他の国から提供されるかもしれないとしても、本当に信じさせるものです。

Because if another country releases this tool and a lot of people are using it then man the fight for customers and of course the marketplace is going to be very very incredible to watch.

このツールを他の国がリリースした場合、多くの人々がそれを使用すると、顧客獲得の競争やもちろん市場は非常に驚くべきものになるでしょう。

With that being said, let me know what your favorite demo was.

それが言われたとおり、あなたのお気に入りのデモは何だったか教えてください。

Was it the man eating the noodles with chopsticks?

その男性は箸で麺を食べていたのでしょうか？

Was it the high-quality blue rose petals in HD?

HDの高品質な青いバラの花びらでしたか？

Was it the chimney under the sunset that looked remarkably interesting?

非常に興味深いと思われる夕焼けの下の煙突でしたか？

Or was it the very long video generation up to two minutes long, which showed remarkable consistency across many different areas and demonstrated a very good ability to generate the physical world?

それとも、非常に長いビデオ生成で、多くの異なる領域で驚くほど一貫性があり、物理世界を生成する能力が非常に優れていることを示した、最大2分の長さのビデオでしたか？

Or was it, of course, the cat driving around the city with tall buildings and pedestrians in the background?

それとも、もちろん、背景に高い建物や歩行者がいる都市を走る猫でしたか？

I'd love to know what you guys thought about this.

皆さんがこれについてどう思ったか知りたいです。

Do you think this is actually a major AI update, or do you think this is not something worth your time?

これは実際に重要なAIのアップデートだと思いますか、それともあなたの時間に値するものではないと思いますか？

Otherwise, I'll see you guys in the next video.

それでは、次のビデオでお会いしましょう。

この記事が気に入ったらサポートをしてみませんか？