【Gen-2】英語解説を日本語で読む【2023年4月27日｜@Theoretically Media】

2023年4月30日 10:34

画像や文章を本格的な映画に変換するGen-2の魅力を紹介します。
公開日：2023年4月27日
※動画を再生してから読むのがオススメです。

Hey everyone, so last week I did a video on how to take your Midjourney or AI generated images and turn them into cinematic presentation videos.

先週、MidjourneyやAIが生成した画像を映画のようなプレゼンテーションビデオにする方法についてビデオを撮りました。

If you missed it, it's linked below, but today we're going into an even crazier direction.

しかし、今日はさらにクレイジーな方向へ進んでいきます。

Today we're going to take our images and turn them into actual movies.

今日は、私たちの画像を実際のムービーに変えてみましょう。

Buckle up, this is wild.

これはワイルドです。

So today we're going to take a look at Gen 2 by RunwayML.

そこで今日は、RunwayMLの「Gen 2」を見てみましょう。

It's a text to video generator that honestly is nothing short of amazing.

これはテキストからビデオへの変換ツールで、正直言って素晴らしいの一言に尽きます。

We're going to do a full walkthrough and go over some of the pros and cons of where the AI is right now.

私たちは完全なウォークスルーを行い、AIが今ある場所の長所と短所を説明するつもりです。

Plus, I'm going to share with you some tips and tricks so that when you start using Gen 2, you'll be able to hit the ground running right away.

さらに、Gen 2を使い始めるときに、すぐに使いこなすことができるように、いくつかのヒントとトリックを紹介します。

Currently Gen 2 is Discord based, which should make all of you Midjourney users feel quite at home.

現在、Gen 2はDiscordをベースにしているので、Midjourneyのユーザーの皆さんはとてもくつろげるはずです。

Generating videos is really, really simple.

ビデオの作成は、本当に、本当に簡単です。

All you do is at the Gen 2 bot and type in a prompt of whatever it is that you want to see.

Gen 2のボットで、見たいものをプロンプトで入力するだけです。

In this case, I did a man on a boat on the ocean near some islands, and this is the video that it generated.

この場合、私は、いくつかの島の近くの海でボートに乗っている人を演じました、そしてこれが生成されたビデオです。

I mean, that's pretty crazy when you think about it.

考えてみれば、これはかなりクレイジーなことです。

That man doesn't exist, that boat doesn't exist, that moment in time doesn't exist, and yet we have video of it.

その男も、その船も、その瞬間も存在しないのに、その映像があるのです。

Currently the output from a prompt is about four seconds, but I've got a couple of creative tricks that I'll show you later on in the video to extend that time out just a little bit.

現在、プロンプトからの出力は約4秒ですが、この後ビデオで紹介するいくつかのクリエイティブなトリックを使えば、この時間をほんの少し延ばすことができますよ。

Overall, I would say the rendering happens fairly quickly.

全体として、レンダリングはかなり速いですね。

I think the longest that I've waited so far is about two minutes, which is pretty remarkable when you think about what it's actually doing.

今までで一番長く待ったのは2分くらいですが、これは実際にやっていることを考えると、かなり驚くべきことだと思います。

Your video will output at a resolution of 768 by 448 in a 16-9 aspect ratio.

出力される動画の解像度は768×448、アスペクト比は16対9です。

There currently isn't a way to change that aspect ratio to say like 916 for vertical videos or 2-1 for, you know, a big cinemascope kind of vibe to it yet.

アスペクト比を916のような縦長の動画や、2-1のような大きなシネマスコープ的な雰囲気に変更する方法は今のところまだありません。

I mean, it's still early in, so I presume that will happen at some point, and it's gonna be awesome when it does.

まだ初期段階なので、いつかは実現すると思いますし、実現したらすごいことだと思います。

Now that said, we do have commands to upscale our video to a higher resolution, and we can also interpolate our footage via a command to kind of get rid of that choppiness.

とはいえ、ビデオをより高い解像度にアップスケールするコマンドもありますし、コマンドで映像を補間して、途切れ途切れになるのを解消することも可能です。

We're going to take a look at that in one second.

これについては、1秒後に見てみましょう。

So let's take a look at how that looks with a New York City scene.

それでは、ニューヨークのシーンでどのように見えるか見てみましょう。

So, I generated this video using the prompt: New York City Street, busy people walking.

このビデオは、プロンプトを使用して作成しました：ニューヨークの街並み、忙しく歩く人たち。

And I utilized the upscale and interpolate commands, again dash dash upscale and dash dash interpolate, and it returned to me a video with the size 1536 by 896, which, you know, is a pretty decent size.

そして、アップスケールと補間を指定するコマンド、つまりダッシュダッシュアップスケールとダッシュダッシュインターポレートを使用し、サイズが1536 x 896のビデオが返されました。これは、かなりまともなサイズだと思います。

So let's take a look at that real quick.

では、早速見てみましょう。

Yeah, it's pretty remarkable.

うん、なかなか素晴らしいね。

So, I did just want to briefly point out that a few weeks ago I was thinking about doing a video on text-to-video, and the level of output that we were getting at that point in time via the various tools was this.

つまり、数週間前にテキストから動画への変換に関する動画を作成しようと考えていたことを簡単に触れておきたいのですが、その時点でのさまざまなツールを使った出力レベルはこれでした。

This is like literally nearly the same prompt.

これは文字通りほぼ同じプロンプトのようなものです。

It was New York City Street, and this was what we got.

ニューヨークの街並みで、このようなものが出てきました。

So, and, and now we're, you know, and now we're here.

それで、そして、今、私たちは、ここにいます。

We've come a long way in a very, very short amount of time, so I can't even imagine where we're going to be, you know, two years from now.

2年後にはどうなっているのか、想像もつかないくらい、短い時間で長い道のりを歩んできました。

So having done our establishing shot, I wanted to start populating our city street a little bit, so I prompted a businessman walking on a phone, and this is the video that we got.

そこで、エスタブリッシング・ショットを撮った後、街並みを少しずつ再現しようと思い、電話をしながら歩いているビジネスマンを登場させたところ、このような映像になりました。

That's pretty crazy remarkable.

この映像は、とても素晴らしいものです。

Yeah, there's some jank in the hands, but, you know, that's to be expected.

手元が少し乱れていますが、これは想定内のことです。

We're very early into this technology, but still, that's super, super, super impressive.

この技術の初期段階ですが、それでも、超、超、超、超感動的です。

And just to give you an idea of the difference between running upscale and interpolate on versus off, here's the same prompt with those commands turned off.

また、アップスケールとインターポレーションのオンとオフの違いを理解していただくために、同じプロンプトでこれらのコマンドをオフにした状態もご覧ください。

So with upscale and interpolate turned off, we get this, which, my god, look at the size of that phone.

つまり、アップスケールとインターポレートをオフにすると、このようになります。

That's amazing.

これはすごいことです。

I hope AI weirdness never completely goes away, because I love it.

AIの奇妙さが完全になくなることはないだろう、私はそれが好きなのだから。

Yeah, I mean, overall, though, you can see that there's a lot more choppiness between the frames, and our resolution is definitely taking a hit.

でも、全体的に見ると、フレーム間のカクカク感が増し、解像度が低下しているのがわかりますね。

So I continued on and generated a few more videos and strung the whole thing out and cut it together, and this is what we got.

そこで、さらに数本のビデオを作成し、全体をつないでカットしたものがこれです。

So yeah, in about 20 minutes, I had a video sequence of an alien invasion of New York City without ever leaving my desk.

そう、約20分で、自分の机を離れることなく、エイリアンがニューヨークを侵略する映像が完成したのです。

And I got to say, for the most part, despite the fact that there's a couple of, you know, weird AI things like this guy's got way too many alien things, I'm going to continue to continue to do the same thing.

そして、この男性が持っているエイリアンのものが多すぎるなどの、いくつかの奇妙なAIの問題にもかかわらず、私は引き続き同じことを続けるつもりです。

But again, and I'm going to focus on this one.

でも、また、これに集中するんです。

So I'm going to do a video sequence of an alien invasion of New York City, and I'm going to leave it at that.

だから、ニューヨークをエイリアンが侵略してくるという映像のシークエンスをやって、そのままにしておく。

And then, you know, I'm going to do a full-on video sequence of it, and then like this guy's got way too many buttons on his jacket and, you know, her hand is a little bit weird.

そして、本格的なビデオシーケンスで、この男のジャケットのボタンは多すぎる、とか、彼女の手はちょっと変だ、とかね。

Those are things that I don't think you notice when things are in motion.

そういうことは、動いているときには気づかないことだと思うんです。

I mean, there is the one issue of this girl in the background, her eye gets a little bit wonky, but that's pretty nitpicky considering, you know, what is happening here, that this is just a text input and then video is being outputted.

もちろん、背景にいる女の子の目が少し変になっているという問題がありますが、これはテキスト入力だけで動画が出力されていることを考えると、かなり細かい指摘だと思います。

But now we're going to take a look at taking reference images and using those as part of your prompt to really hone in on the cinematic vision that you're looking for.

しかし、これから参考画像を撮影し、それをプロンプトの一部として使用することで、映画的なビジョンに磨きをかけることができるのです。

Before we jump into the next section, I would briefly like to invite you to hit the like and subscribe button if you have not had the chance to yet.

次のセクションに入る前に、もしまだ機会がなければ、「いいね！」と「購読」ボタンを押すことをお勧めします。

Additionally, I want to thank everybody that's donated to the Midjourney cheat sheets that have appeared in previous videos.

さらに、これまでのビデオに登場したMidjourneyのチートシートに寄付してくれた人たちにも感謝したいと思います。

Honestly, your support means a lot to me.

正直なところ、皆さんのサポートは私にとって大きな意味を持ちます。

I really truly do thank you from the bottom of my heart.

本当に心から感謝しています。

Uh, okay, let's jump in.

では、さっそく始めてみましょう。

Bouncing over to Midjourney, last week we created a sequence of about seven images to create kind of a gothic Victorian spooky story.

Midjourneyに話を移しますが、先週は7枚の画像を組み合わせて、ゴシック・ビクトリア調の不気味な物語を作りました。

We then took all of those images and brought them into Wondershare Filmora to do some parallaxing and masking to create a cinematic presentation.

そして、これらの画像をすべてWondershare Filmoraに取り込み、パララックスとマスキングを行い、映画のようなプレゼンテーションを作成しました。

The video ended up being about 30 seconds long.

最終的に約30秒のビデオになりました。

Let's take a look at that real quick.

それでは早速見てみましょう。

If you want to see the full process on how that video was made, that link is below.

この動画がどのように作られたのか、全工程をご覧になりたい方は、以下のリンクからどうぞ。

So, taking our images that we generated in Midjourney and using them as reference images in Gen 2 is fairly simple, but like anything else, it kind of, you know, gets a little more complex as we go along.

途中で生成した画像をGen 2の参照画像として使うのはかなり簡単ですが、他のことと同じように、進むにつれて少し複雑になっていきます。

So the simplicity part is just hitting this plus button and then uploading your reference image as you would if you were using Midjourney and using an image prompt.

つまり、このプラスボタンを押すだけで、Midjourneyを使用して画像プロンプトを使用する場合と同じように、参照画像をアップロードすることができるのです。

Now you'll still need to add a text prompt.

ただし、テキストプロンプトを追加する必要があります。

And this is actually a pretty good tip that came out of the Gen 2 Discord discussions: you could just type in your own text prompt, but you kind of get better results, oddly enough, if you take your image and run it over to Clip Interrogator.

そして、これは実際にGen 2 Discordの議論から得られたかなり良いヒントですが、自分のテキストプロンプトを入力するだけでもいいのですが、不思議なことに、画像をClip Interrogatorにかけると、より良い結果が得られます。

I'll have a link to Clip Interrogator down below, but basically you just drag your image in, you hit submit, and then it almost kind of functions as the describe feature in Midjourney.

Clip Interrogatorへのリンクは後述しますが、基本的には画像をドラッグして送信するだけで、Midjourneyの描写機能とほぼ同じような機能を果たします。

It just analyzes through the image and then comes up with a prompt based off of it.

画像を分析し、それに基づいてプロンプトを表示するのです。

So my initial run without Clip Interrogator gave us this video, which yeah, it's not quite working right.

だから、Clip Interrogatorを使わない最初の実行では、このような動画ができましたが、たしかにうまく機能していません。

Taking the Clip Interrogator prompt, which came back with an old mansion with a dark cloud over it, dark and spooky themes, light green and amber, coastal scenery.

Clip Interrogatorのプロンプトを見ると、暗い雲に覆われた古い屋敷、暗く不気味なテーマ、ライトグリーンとアンバー、海岸の風景という結果が返ってきました。

It actually had more, but I kind of whittled it down because there was a lot of sort of irrelevant stuff in there.

本当はもっとあったのですが、関係ないものがたくさんあったので、絞り込みました。

And then also playing around with a command called the CFG scale, which is dash dash CFG underscore scale.

さらに、CFGスケールと呼ばれるコマンドも使ってみました。これはダッシュダッシュCFGアンダースコアスケールです。

What you can think of that as is almost like the stylized command in Midjourney, which is the higher you go, the more it will look at your reference image, but the more unstable your output is going to be.

それをMidjourneyのスタイライズコマンドのようなものだと考えることができます。つまり、値を高くするほど、参照画像をより参照してくれますが、出力が不安定になる傾向があります。

The lower you go, the less consideration it's going to take to your reference image, but the more stable your video is going to be.

低くすればするほど、参照画像への配慮は少なくなりますが、映像はより安定したものになります。

So running at a CFG scale of 20 got us this, which is cool.

CFGのスケールを20にすると、このようになります。

It's not quite what I was looking for, but yeah, pretty neat.

私が求めていたものとはちょっと違いますが、なかなかいい感じです。

Super dark actually, but is definitely closer than that first one was and didn't have that weird ending.

実際、超ダークですが、最初の作品よりは確実に近いですし、あの変な終わり方もありません。

So another quick note is that you can't just link your reference image.

もう1つ注意してほしいのは、参照画像をリンクさせるだけではいけないということです。

Otherwise you end up with something completely insane.

そうしないと、とんでもないものができてしまうからです。

You actually have to re-upload your reference image each time.

参考画像は毎回アップロードし直さなければなりません。

If you try to give them a link, it basically gets super weird from that point.

もしリンクを貼ろうとすると、そこからが超不思議なことになります。

So we had our house image that I just linked and then our prompt, and then this was the output, which is completely wrong.

つまり、先ほどリンクした家の画像と、プロンプトがあり、そしてこれが出力されましたが、これは完全に間違っています。

Hilarious, but wrong.

面白いけど、間違っている。

Ultimately, once I figured that out and actually weirdly enough stopped playing with the CFG scale and left it at default, I got something that I actually really liked.

最終的に、このことを理解し、奇妙なことにCFGスケールを弄るのをやめてデフォルトのままにしたところ、実に気に入ったものができました。

So it goes to show sometimes default is best.

つまり、デフォルトがベストであることもあるのです。

So this was actually the output that we got.

実際、このような出力が得られました。

And I was pretty happy with that.

これにはかなり満足しています。

So moving on to the second shot, this one was nailed almost right away.

2枚目の撮影に移りますが、こちらはほとんどすぐに決まりました。

The prompt was a woman in a black dress walks upstairs in the background, storm clouds slowly move.

黒いドレスを着た女性が2階を歩いていて、背景には嵐雲がゆっくりと動いている、というものでした。

And the first output was this.

そして、最初の出力はこれでした。

I did run it one more time just to see, and actually ended up with something that I liked even more.

試しにもう1回やってみたら、もっと気に入ったものができた。

And we got this shot, which I actually, I like this a little bit more.

そして、このショットが出来上がりました。実は、このショットはもう少し気に入っています。

The angle is just a little bit more dramatic.

アングルがもう少しドラマチックなんです。

It just feels more cinematic.

より映画的な感じがします。

Yeah, I like this one.

そうですね、この写真が好きです。

That's another tip is that if you're trying to get shots like this, if you put in cinematic action in the front of your prompt, you're more apt to get kind of more cinematic looking things.

このようなショットを撮るには、プロンプトの前に映画のようなアクションを入れれば、より映画のようなものが撮れるということです。

So our third image is where we started to run into some real trouble.

3枚目の画像は、実際に問題に直面し始めたところです。

You know, we have a hand reaching out to a door.

これは、ドアに手を伸ばしているところです。

And I think as we all know, AI does not do hands well.

ご存知のように、AIは手をうまく表現できません。

So there was a lot of like David Cronenberg esque video outputs coming.

そのため、デヴィッド・クローネンバーグのような映像出力がたくさん出てきました。

So let's take a look at a couple of them.

では、そのうちのいくつかを見てみましょう。

Yeah, there was this guy, there was this one that just ignored the hand altogether.

手を完全に無視したものがありました。

We had this one, which I don't even know what's happening here.

これは、何が起こっているのかわからないものです。

This guy, which, yeah, again, that's fairly horrific.

この男は、またしても、かなり恐ろしいです。

And this one was my favorite.

そして、これが一番好きでした。

That's just super surreal.

超シュールです。

So the walking down the hallway shot was one that I was really curious to see how Gentoo would handle considering it's our our sort of our face reveal shot.

廊下を歩くショットは、私たちの顔出しショットということで、Gentooがどう扱うかとても興味があったんです。

And Gentoo really didn't disappoint.

しかし、Gentooは期待を裏切りませんでした。

So this is the shot that it gave back, which is, I think, pretty good.

これがそのショットなのですが、これはかなりいい出来だと思います。

It really does kind of capture the essence of our character.

私たちのキャラクターの本質を捉えていると思います。

So let's take all of this string it all together and create a film version of our cinematic presentation from last week.

では、この写真をすべてつなぎ合わせて、先週のシネマティック・プレゼンテーションの映画版を作りましょう。

So overall, I think that's pretty amazing.

全体として、これはとても素晴らしいことだと思います。

It actually reminds me a lot of experimental films that I saw when I was in college and maybe even made one or two experimental films like it when I was in college.

大学時代に見た実験的な映画を思い出させますし、大学時代にはこのような実験的な映画を1、2本作ったかもしれません。

But overall, yeah, it's not a one to one recreation of our original presentation, but that's okay, because we're essentially changing from one medium to another.

でも、全体的には、そうですね、私たちのオリジナルのプレゼンテーションを1対1で再現しているわけではありませんが、それはそれでいいんです、私たちは本質的にあるメディアから別のメディアへと変化しているのですから。

And more importantly, it nailed the tone, which I think has a large part to do with sort of storyboarding this thing beforehand in mid journey.

さらに重要なのは、トーンに釘付けになったことです。これは、旅の途中で事前に絵コンテを描いたことに大きく関係していると思います。

And then ultimately, for the problem shots, like the door here in particular, and for some pacing things, I think that the best way to go is to actually use a combination of Gentoo output and some of the Diasho stuff from mid journey.

そして最終的に、問題のあるショット、特にここでのドアや、いくつかのペーシングの問題については、Gentooの出力と途中のDiashoのものを組み合わせて使うのが最善の方法だと思います。

Like in this case, just swapping out the door handle for our Diasho version and, you know, just giving some pacing issues by using some of the mid journey stuff, I think, creates an overall better aesthetic presentation.

例えば、この場合、ドアノブをDiasho版に交換し、途中のものを使ってペーシングの問題を少し改善することで、全体的により美しい見た目のプレゼンテーションができると思います。

As a final experiment, I wanted to see how Gentoo would handle animation.

最後の実験として、Gentooがアニメーションをどう扱うか見てみたかったんです。

I caught a couple of episodes of Samurai Jack recently again, still holds up, by the way.

最近、『サムライ・ジャック』のエピソードをいくつか観たんですが、今でも十分楽しめますよ、ところで。

So that was on my brain.

それが頭にあったんです。

So, I just went over to mid journey and generated a Samurai Jack esque character, and ran that with the prompt: Samurai walks to camera autumn forest concept art Samurai Jack.

そこで、mid journeyでサムライ・ジャック風のキャラクターを作成し、それをプロンプトで実行しました：サムライがカメラに向かって歩く秋の森のコンセプトアートサムライ・ジャック。

And ended up getting this, which it's okay.

で、結局これができたんだけど、まあ、いいんじゃない。

It's animated.

アニメーションです。

It's not quite what I was looking for.

でも、私が求めていたものとはちょっと違うんです。

So close ups, it really wasn't handling very well with the image reference.

クローズアップは、イメージリファレンスとの相性があまり良くなかったんです。

So that was one, and then here was another one.

これが1つで、もう1つはこちらです。

But I did ultimately land on this and this, which I thought were cool, not, you know, in the Samurai Jack style, but had an aesthetic that I actually thought was pretty cool.

しかし、最終的にこの作品とこの作品にたどり着きました。サムライ・ジャックのようなスタイルではなく、とてもクールな美学を持っていると思いました。

So I'm always pretty big on narrative when I'm doing like these little experiments.

だから、こういった小さな実験をするときは、いつも物語性を重視しているんだ。

It doesn't have to be anything grand or anything.

壮大なものである必要はないんです。

It's just kind of a little short story.

ちょっとした短編小説のようなものです。

And so as I was generating out these ideas, I came up with the story of a Samurai coming across another Samurai and they duel.

それで、アイデアを練っているうちに、侍が他の侍と出会って決闘するというストーリーを思いつきました。

It's very, very simple.

とてもシンプルな話です。

So let's take a look at how it ultimately ended up looking ultimately.

では、最終的にどのような作品に仕上がったのか、見てみましょう。

I think that came out kind of cool and sort of in that minimalist Samurai Jack style, although it didn't ape the aesthetics of it necessarily.

サムライ・ジャックの美学を模倣したわけではありませんが、ミニマルなサムライ・ジャックのスタイルで、クールな仕上がりになっていると思います。

So one thing to note is that the sword fighting sequence was not actually Gen 2.

ここでひとつ注意しておきたいのは、剣術のシークエンスは、実は第2世代ではなかったということです。

This was actually Gen 1.

これは実はGen1だったのです。

The reason being is that Gen 2 apparently has not been trained on Samurai's sword fighting.

なぜかというと、2代目はサムライの剣術の訓練を受けていないようなのです。

So you can't get that animation.

だから、あのアニメーションはできないんです。

It just ends up being characters just standing there.

結局、キャラクターがただ立っているだけになってしまうのです。

But yeah, overall, I think it kind of works.

でも、全体的に見れば、うまくいっていると思います。

I mean, I think maybe if I had spent some more time on it, it could have been a little more dramatic and cool looking, but it works.

もう少し時間をかければ、もっとドラマチックでクールな仕上がりになったかもしれませんが、でも、うまくいっています。

Personally, I'm very excited about all of this.

個人的には、このすべてがとても楽しみです。

This is kind of the moment that I've been waiting for since, you know, those original Dali images first came out, you know, and I was thinking to myself, man, I can't wait until you can add, you know, motion to those pictures.

これは、初期のダリの画像が最初に出てきたときからずっと待ち望んでいた瞬間で、自分自身に「ああ、その絵に動きを加えることができる日が待ちきれない」と思っていました。

And, you know, here we are only much, much sooner than I had anticipated.

そして、予想よりもずっとずっと早く、今、私たちはここにいるのです。

So Gen 2 is currently in beta.

Gen 2は、現在ベータ版です。

If you're looking to get access, I would recommend joining the Discord.

もしアクセスしたいのであれば、Discordに参加することをお勧めします。

The link to that is below.

そのリンクは下にあります。

And honestly, just kind of hanging out and being nice.

そして正直なところ、ただぶらぶらしているのがいいんだ。

Usually about once a week, they open up the door to let people in.

通常、1週間に1回程度、ドアを開けて人を入れてくれる。

So, you know, if you're part of the community and you're not being a jerk, you know, you've got a pretty good chance of getting in.

だから、もしあなたがコミュニティの一員で、嫌な奴でなければ、参加できる可能性はかなり高いと思う。

Other than that, it probably won't be too long until Gen 2 is publicly released.

それ以外は、Gen2が一般にリリースされるまで、おそらくそれほど長くはかからないだろう。

I want to say the beta for Gen 1 lasted about a month or a month and a half or so.

Gen1のベータ版は1カ月か1カ月半くらいだったと思う。

So alternatively, you could just wait it out.

だから、それを待つという手もある。

In the meantime, I do invite you to stick around and watch another video from the channel.

その間に、このチャンネルから別のビデオを見てください。

My name is Tim.

私の名前はティムです。

I thank you for watching.

ご視聴ありがとうございました。

この記事が気に入ったらサポートをしてみませんか？