【Stable Diffusion XL】英語解説を日本語で読む【2023年7月1日｜@Matthew Berman】

2023年7月2日 10:42

高品質な画像を生成するためのオープンソースのAIモデルである新しいStable Diffusion XL 0.9が紹介されています。ベータ版と比較して画像の品質が向上したことをデモンストレーションし、風景、手、さまざまな芸術的スタイルなどの例を紹介しています。Stable Diffusion XLモデルは、大規模なパラメーター数と2つのclipモデルを使用しており、より詳細で解像度の高いリアルな画像を生成します。動画では、システム要件、研究目的での利用可能性、バージョン1.0の近日リリースについても言及されています。また、Stable Diffusion XLを別のAIモデルであるMidjourneyと比較し、Stable Diffusion XLの無料で無制限の利用を強調しています。
公開日：2023年7月1日
※動画を再生してから読むのがオススメです。

If you want to create absolutely stunning AI generated images that are similar in quality to Midjourney, absolutely free and nearly unlimited, this video is for you.

もしあなたが、Midjourneyと同じようなクオリティで、絶対的に無料で、ほぼ無制限に、絶対的に素晴らしいAI生成画像を作成したいのであれば、このビデオはあなたのためのものです。

So strap in, we're going to take a look at the new Stable Diffusion, which is absolutely stunning.

このビデオでは、Midjourneyと同じようなクオリティのAI生成画像を、無料でほぼ無制限に作成することができます。

I'm going to tell you about it, then I'm going to show you how to use it, and then we're going to do some direct comparisons.

この新しいStable Diffusionは、本当に素晴らしいものです。このビデオでは、Stable Diffusionについて説明し、その使い方を紹介します。

Let's go!

さあ、行こう！

This is the blog post today: Stability AI announces sdxl Stable Diffusion XL 0.9, a huge leap forward in AI image generation.

これが今日のブログ記事です： Stability AIがsdxl Stable Diffusion XL 0.9を発表しました。

So Stable Diffusion XL beta was released in April, and just a few months later, Stable Diffusion XL 0.9 produces massively improved image and composition detail over its predecessor.

Stable Diffusion XLのベータ版は4月にリリースされ、それからわずか数ヶ月でStable Diffusion XL 0.9は、前作よりも画像と構図のディテールが大幅に改善されました。

The rate of improvement of these open source models, whether you're talking about large language models for text or AI generative art models, these are absolutely huge gains every single day, seemingly.

これらのオープンソースモデルの改善速度は、テキスト用の大規模な言語モデルやAIジェネレーティブ・アートモデルについて話しているのであれ、これらは毎日、一見、絶対に大きな進歩を遂げている。

And the best part about it, it's open source, it's completely free, and soon you'll be able to run it on your local computer.

そして何より素晴らしいのは、オープンソースであり、完全に無料であり、すぐにローカルのコンピューターで実行できるようになることだ。

Despite its ability to be run on a modern consumer GPU, sdxl 0.9 presents a leap in Creative use cases for generative AI imagery.

最新のコンシューマー向けGPUで実行できるにもかかわらず、sdxl 0.9はジェネレーティブAIイメージのクリエイティブなユースケースに飛躍的な進歩をもたらす。

First, let's take a look at some examples from the previous beta to this new version.

まず、前回のベータ版から今回の新バージョンまでの例を見てみよう。

On the left, we're seeing the beta, and on the right, we're seeing the new version.

左側ではベータ版を見ていますが、右側では新しいバージョンを見ています。

These are the same exact prompts, and as you could tell, the left is pretty good, and the right is so much more detailed, so much more color.

これらはまったく同じプロンプトで、お分かりのように、左はかなり良い出来で、右はとても詳細で、色彩が豊かです。

There's bokeh in the background, it just looks so much better.

背景にはボケがあり、本当に良く見えます。

Let's take a look at this second image.

この2番目の画像を見てみましょう。

The left again is the beta version, and the right is the most recent release.

左側はベータ版で、右側は最新のリリースです。

A wolf in Yosemite National Park, Chile.

チリ、ヨセミテ国立公園のオオカミ。

Nature documentary film photography.

ネイチャードキュメンタリー映画撮影。

So this one looks pretty good, although the wolf is far away and the details are not great.

オオカミは遠くにいて、ディテールはよくないが、これはかなりよく見える。

The log doesn't look super real.

丸太は超リアルには見えない。

But if we look on the right, this one is fantastic.

しかし、右側を見てみると、これは素晴らしい。

It's gorgeous.

ゴージャスだ。

You can see all the little hairs, the details are phenomenal, the eyes look perfect.

小さな毛がすべて見えるし、ディテールも驚異的で、目も完璧に見える。

This is a really great image.

これは本当に素晴らしい画像です。

Here's another one.

もう一つこちらです。

Aesthetic manicured hand holding up a takeout coffee.

テイクアウトコーヒーを掲げる美的なマニキュアの手。

Pastel Chile Dawn Beach.

パステルチリの夜明けのビーチ。

Instagram film photography.

インスタグラムのフィルム写真。

Now, to be honest, I think this left one looks really good.

さて、正直なところ、この左の1枚はとてもいい感じだと思う。

The only criticism I have is that you can tell that there are lots of fingers here, many more than what humans have.

唯一の批判は、ここにはたくさんの指があることがわかるということです。人間よりもはるかに多くの指があります。

AI generative art has always struggled with hands.

AIのジェネレーティブ・アートは、常に手と格闘してきた。

Midjourney's new version has really solved it, but it looks like now Stable Diffusion XL 0.9 has solved it as well.

Midjourneyの新しいバージョンはそれを解決してくれたが、Stable Diffusion XL 0.9もそれを解決してくれたようだ。

This hand looks flawless.

このハンドは完璧に見える。

And the sdxl series offers a huge range of functionalities that extend beyond basic text prompting.

そして、sdxlシリーズは、基本的なテキストプロンプトにとどまらない膨大な機能性を提供している。

Image to image prompting, in painting, which is basically taking portions of the image and replacing it with generative art, and out painting, which is constructing a seamless extension of an existing image, basically taking a little piece of the image and creating AI art extensions of what's around it.

イメージ・トゥ・イメージ・プロンプティング、基本的にイメージの一部を取り出してジェネレーティブ・アートで置き換えるイン・ペインティング、既存のイメージのシームレスな拡張を構築するアウト・ペインティングなど、基本的にイメージの一部を取り出して、その周囲にあるものを拡張したAIアートを作成することができる。

So how did they do it?

では、彼らはどうやってそれを実現したのか？

The key driver of this advancement in composition for sdxl 0.9 is its significant increase in parameter count over the beta version.

sdxl 0.9のコンポジションが進化した主な要因は、ベータ版よりもパラメータ数が大幅に増えたことです。

As sdxl 0.9 has one of the largest parameter counts of any open source image model, boasting 3.5 billion parameter base model and 6.6 billion parameter Ensemble Pipeline.

sdxl 0.9はオープンソースの画像モデルの中でも最大級のパラメータ数を誇り、35億パラメータのベースモデルと66億パラメータのEnsemble Pipelineを備えています。

And the beta version, for comparison, has 3.1 billion parameters and uses just a single model.

また、ベータ版は31億パラメータで、単一のモデルしか使用していません。

Sdxl uses two models, clip models they're called, including one of the largest open clip models trained to date, which beefs up the processing power and you get upgraded realistic imagery, greater depth, and resolution up to 1024 by 1024.

Sdxlは2つのモデル（クリップモデルと呼ばれる）を使用しており、そのうちの1つは現在までにトレーニングされた最大級のオープンクリップモデルで、処理能力が強化され、よりリアルな画像、より深い深度、最大1024×1024の解像度を得ることができる。

They're going to be releasing a blog post talking about all of their advancements soon.

彼らはすぐに彼らの進歩についてのブログ投稿を公開する予定です。

Here's another image, absolutely stunning galaxies in the background, a little Galaxy in this bottle, a really beautiful image.

もう一つの画像ですが、背景には絶対に素晴らしい銀河があり、このボトルの中には小さな銀河があり、本当に美しい画像です。

Now, here are the system requirements.

さて、これがシステム要件だ。

Despite its powerful output and advanced model architecture, sdxl 0.9 is able to run on a modern consumer GPU.

sdxl 0.9は、その強力な出力と高度なモデル・アーキテクチャにもかかわらず、最新のコンシューマー向けGPUで動作可能です。

You need Windows 10 or 11 or a Linux operating system, 16 gigabytes of RAM.

Windows 10か11、またはLinuxオペレーティングシステム、16ギガバイトのRAMが必要です。

Now, that's not vram, that's just regular RAM, which 16 gigabytes most modern computers have.

今、それはvramではなく、普通のRAMです。ほとんどの現代のコンピュータには16ギガバイトのRAMが搭載されています。

You need an NVIDIA GeForce RTX 20 graphics card, which is kind of mid-range, lower mid-range graphics card, and eight gigabytes of vram.

NVIDIA GeForce RTX 20グラフィックカードが必要ですが、これはミドルレンジのグラフィックカードで、8ギガバイトのVRAMが必要です。

So very doable for many video cards and really a minimum requirement for any modern video game right now.

多くのビデオカードで実現可能であり、最新のビデオゲームに最低限必要なものです。

You can use Stable Diffusion XL on clip drop, which is what we're going to be testing it on today, and the API in dream Studio customers can access it as of three days ago, so they'll be able to use it as well.

Stable Diffusion XLはクリップドロップで使用することができ、今日テストするドリームスタジオのAPIも3日前からアクセスできるようになりました。

Sdxl 0.9 will be provided for research purposes only during a limited period to collect feedback and fully refine the model before its general open release.

Sdxl 0.9は、一般公開前にフィードバックを収集し、モデルを完全に改良するために、限られた期間、研究目的でのみ提供されます。

The code is available and you can already use the code today.

コードは公開されており、今日からすでに使うことができる。

So they've released the code, it's on the stability AI GitHub page, you can download it, you can play around with it right now.

安定性AIのGitHubページにコードが公開されているので、ダウンロードして今すぐ試すことができる。

I'm not going to do that, but maybe I'll create another video showing you how to get it up and running on your local computer.

私はそれをするつもりはありませんが、あなたのローカルコンピューターでそれを立ち上げて実行する方法を紹介する別のビデオを作るかもしれません。

And what's next?

次の予定は？

Stability XL 0.9 will be followed by the full open release of sdxl 1.0 targeted for mid-July.

Stability XL 0.9に続き、sdxl 1.0のフルオープンリリースは7月中旬を予定している。

So very, very soon.

とても、とても近いうちに。

Now, here's the Stable Diffusion XL model on clip drop's website.

さて、これがクリップドロップのウェブサイトにあるStable Diffusion XLモデルだ。

Let's take a look at some of these examples.

これらのサンプルを見てみよう。

I mean, these are absolutely gorgeous.

つまり、これらは本当に美しいです。

We have anime style right there, realistic with a skeleton, there hyper-realism right there, tilt-shift effect, unbelievable.

アニメのようなスタイル、骨格のあるリアルさ、ハイパーリアリズム、ティルトシフト効果、信じられないほどです。

These images seem absolutely on par with Midjourney.

これらの画像はMidjourneyに匹敵する。

First, let's take a look at pricing.

まず、価格を見てみましょう。

As I mentioned, it is completely free.

前述したように、これは完全に無料だ。

Midjourney, you can't even test for free, and you get 400 images per day.

Midjourneyは、無料ではテストすらできず、1日あたり400枚の画像を入手できます。

So it's basically unlimited.

つまり、基本的に無制限なのだ。

I don't know about you, but I've never even come close to generating 400 images in a single day.

あなたのことは知らないが、私は1日に400枚の画像を作成したことはない。

And with the free version, you get a bunch of other AI features: background removal, cleanup pictures, relight, which we'll take a look at, image upscaler, really cool.

さらに無料版には、背景除去、クリーンアップ画像、リライト、画像アップスケーラーなど、たくさんのAI機能がついています。

Now let's do some direct comparisons.

では、直接比較してみましょう。

I'm on the mid Journey website.

私はMid Journeyのウェブサイトにいます。

I'm gonna grab some of these prompts.

プロンプトをいくつか見てみよう。

And then, I'm going to test it out and compare the images directly with Clipdrop.

それから、それをテストして、直接Clipdropの画像と比較します。

So, first, I'm going to test this one out - pill head surreal surrealism abstract Harry Clark.

まず最初に、これを試してみます - pill head surrealism surrealism abstract Harry Clark.

This one looks really, really cool.

これは本当に、本当にクールに見える。

Now, I don't expect the images to look very similar because they're trained on completely separate models.

この2つの画像は全く別のモデルで学習されたものなので、よく似ているとは思いません。

What I'm curious about is, is the quality equal?

気になるのは、クオリティは同等なのか、ということだ。

Let's see.

見てみましょう。

Alright, there we go.

さあ、やりました。

Now, as I mentioned, the images really don't look like what I found on Midjourney, but these are gorgeous.

先ほど言ったように、この画像はMidjourneyで見つけたものとは本当に似ていない。

These are very artistic as well.

これらも非常に芸術的です。

I'd say even more artistic than what was on Midjourney.

Midjourneyに掲載されていたものよりも芸術的だ。

I'm going to show both of these on the screen right now so you can see a direct comparison.

今、この2つをスクリーンに映し出しますので、直接比較して見てください。

Next, I really like this fingerprint right here, so let's try to recreate it using sdxl - a large fingerprint stamp on paperboard.

次に、この指紋がとても気に入ったので、sdxlを使って再現してみよう - ペーパーボード上の大きな指紋スタンプ。

That's a super broad and generic prompt, so I suspect we're going to get something very different.

これは超広範囲で一般的なプロンプトなので、非常に異なるものが得られると思う。

But let's see.

でも見てみよう。

Alright, there it is.

さあ、それです。

So, it's black and white, but I think these look really good as well.

それで、白黒ですが、これらもとても良いように思います。

Let's see what happens if I try to use blue and red in the prompt to get something more akin to what we found in Midjourney.

では、もしプロンプトに青と赤を使って、Midjourneyで見つけたものに近いものを取得しようとしたら、どうなるか見てみましょう。

Okay, here we go.

よし、いくぞ。

These all look fantastic.

どれも素晴らしい。

I'd say this one looks really close to the Midjourney version.

これはMidjourneyのバージョンに近いと思う。

I'll put these both up on the screen now so you can see a couple versions versus Midjourney.

Midjourneyと比較できるように、この2つをスクリーンに映し出します。

Alright, lastly, I love this image of a lion, and I think sdxl is going to do a fantastic job on this one.

最後に、このライオンの画像が大好きで、sdxlさんがこの画像に素晴らしい仕事をすると思います。

So, let's test it out.

では、テストしてみましょう。

Alright, take a look at this.

さて、これを見てください。

I'd say this is okay.

これはまあまあだと言えます。

This isn't quite what I was asking for.

これは僕が求めていたものとはちょっと違う。

It's much more artistic.

より芸術的です。

I was looking for something much more realistic, and it didn't seem to be able to achieve that.

僕はもっとリアルなものを求めていたんだけど、それは実現できなかったようだ。

But that's okay.

でも、それでいいんだ。

Alright, these are absolutely stunning.

さて、これらは絶対に素晴らしいです。

This is exactly what I wanted - hyper-detailed, super realistic.

これはまさに私が望んでいたものです - 超詳細で超リアルです。

Let's click into one of them.

クリックしてみましょう。

Look at that, you can see all the skin textures, you can see each hair and the eyebrows and the beard on the head.

見てください、肌のテクスチャが全部見えて、髪の毛も、眉毛も、頭の髭も全部見えます。

So, this is so, so good.

これはとても素晴らしい。

Now, I want to show you a couple other features.

では、他の機能もお見せしましょう。

If you click on this top right, the three dots right here, you have a remove background feature, cleanup imperfections, relight, enhance, upscale, imagine, and uncrop.

この右上の3つの点をクリックすると、背景の除去、不完全な部分のクリーンアップ、リライト、強調、アップスケール、イメージ、切り抜き解除があります。

I want to test out relight, and what relight allows you to do is exactly what it sounds like - you get to position lights in any direction on this face.

私はリライトを試してみたいのですが、リライトでできることはその名の通り、この顔のあらゆる方向にライトを配置することです。

So, you can already tell the face is much different.

つまり、この顔のあらゆる方向にライトを配置することができるのです。

So, if I click over to the red light, I can easily increase the red light, and in real time, you can see the lighting on his face change, the shadows change, everything.

赤いライトをクリックすると、簡単に赤いライトを増やすことができ、リアルタイムで彼の顔の照明が変わり、影が変わるのがわかります。

So cool!

とてもクールだ！

You can change the intensity, you can change the radius.

強度を変えることができますし、半径も変えることができます。

And then you can change the distance and so look at this.

そして、距離も変えることができますので、これを見てください。

If I change the distance, you can actually see the light passing over his face and changing the shadows in real time.

距離を変えると、彼の顔の上を光が通り、リアルタイムで影が変化するのがわかります。

So if I turn the red light off completely and then I turn the blue light up, you can see that it changes the light and the green one.

ですので、赤い光を完全に消し、その後青い光を上げると、光と緑色の光が変わるのが分かります。

You can add more lights to it, look at that.

さらにライトを追加することもできます。

So I'll move it around his head and in real time, the whole face is updated.

頭の周りを動かしてみると、リアルタイムで顔全体が更新されます。

And then you can turn the lights off and then download that image.

そして、ライトを消して、その画像をダウンロードすることができます。

So cool!

とてもクールだ！

So I am blown away by these images.

この画像には圧倒されました。

The progress of this model is substantial and just knowing that the 1.0 version is coming soon is so exciting.

このモデルの進歩は相当なもので、1.0バージョンがもうすぐ登場すると知っているだけで、とてもわくわくする。

This is free, this is open source.

これは無料で、オープンソースだ。

Soon there's going to be no limit because you can run it on your own computer.

もうすぐ制限がなくなるでしょう。自分自身のコンピュータで実行することができますから。

But for now, you'll have to deal with the 400 images per day, which is a ton.

しかし、今のところ、1日あたり400枚の画像を処理しなければならない。

Midjourney really has their work cut out for them.

Midjourneyは本当に大変な仕事を抱えています。

I'm so excited to see open source AI continue to proliferate, continue to get better at such a rapid clip.

オープンソースのAIが普及し続け、急速に改良されていくのを見るのはとても楽しみだ。

Let me know what you think in the comments.

コメントで感想を聞かせてください。

If you like this video, please consider giving me a like and subscribe, and I'll see you in the next one.

このビデオが気に入ったら、いいねやチャンネル登録を考えていただけると嬉しいです。次回をお楽しみに。

この記事が気に入ったらサポートをしてみませんか？