Cool Japan DiffusionをAUTOMATIC1111版Web UIで試してみた

2022年12月13日 03:49

　2023年1月24日、Cool Japan Diffusion 2.1.1.1が公開されましたので、 AUTOMATIC1111版Web UIで試してみました。

１．Cool Japan Diffusionとは

　Cool Japan Diffusion 2.1.1.1は、Stable Diffusion 2.0のVAE（変分オートエンコーダー）とU-Net（拡散モデルによく使われる畳み込みニューラルネットワーク）をファインチューニングして、イラスト用に特化したモデルです。アルゴリズムは Latent Diffusion Model と OpenCLIP-ViT/H です。
　このモデルの学習データとしては、VAEについては、Danbooruなどの無断転載サイトを除いた日本の国内法を遵守したデータ 60万種類、U-Netについても、同じように日本の国内法を遵守したデータ180万ペアが使われています。
　Cool Japan Diffusion 2.1.1からの変更点は、入力プロンプトにWaifu Diffusionとの互換性、masterpieceという画風の追加、マンガ表現の強化の3点だそうです。

〇 Cool Japan Diffusion 2.1.1.1デモサイト

２．Cool Japan Diffusion 2.1.1.1をAUTOMATIC1111版Web UIで使用するためのコード

　以下がこのモデルを使用するために用意したコードです。
　これをColabノートにコピーし、GPUを選択してセルを実行し、Running on public URL:の後のリンクをクリックすると、操作画面が立ち上がります。

!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
%cd /content/stable-diffusion-webui
!wget https://huggingface.co/aipicasso/cool-japan-diffusion-2-1-1-1/resolve/main/v2-1-1-1_fp16.safetensors -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1-1-1_fp16.safetensors
!wget https://huggingface.co/aipicasso/cool-japan-diffusion-2-1-1-1/raw/main/v2-1-1-1_fp16.yaml -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1-1-1_fp16.yaml
!python launch.py --share --xformers --enable-insecure-extension-access

（参考）旧バージョン
〇 Cool Japan Diffusion 2.1.1

!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
%cd /content/stable-diffusion-webui
!wget https://huggingface.co/aipicasso/cool-japan-diffusion-2-1-1/resolve/main/v2-1-1.safetensors -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1-1.safetensors
!wget https://huggingface.co/aipicasso/cool-japan-diffusion-2-1-1/raw/main/v2-1-1.yaml -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1-1.yaml
!python launch.py --share --xformers --enable-insecure-extension-access

〇画像生成AIを使って描かれた初の本格SFコミック

３．画像生成例

masterpiece, Anime, ((Beautiful Japanese woman in kimono)), (beautiful cute face), beautiful proportions, background of cherry blossoms and green mountains, ultra high definition, colorful, hypermaximalist Negative prompt: (((deformed))), blurry, ((((bad anatomy)))), bad pupil, disfigured, poorly drawn face, mutation, mutated, (extra limb), (ugly), (poorly drawn hands), bad hands, fused fingers, messy drawing, broken legs censor, low quality, ((mutated hands and fingers:1.5), (long body :1.3), (mutation, poorly drawn :1.2), ((bad eyes)), ui, error, missing fingers, fused fingers, one hand with more than 5 fingers, one hand with less than 5 fingers, one hand with more than 5 digit, one hand with less than 5 digit, extra digit, fewer digits, fused digit, missing digit, bad digit, liquid digit, long body, uncoordinated body, unnatural body, lowres, jpeg artifacts, 2d, 3d, cg, text Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 2056728409, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

masterpiece, game, touhou project, a portrait of a beautiful girl with back long hair, hakurei reimu, touhou project official artwork, detailed Negative prompt: low quality, bad face, ((((bad anatomy)))), ((bad hand)), lowres, jpeg artifacts, 2d, 3d, cg, text Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 4213488408, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

game, touhou project, a portrait of a beautiful girl, kirisame marisa, touhou project official artwork, detailed Negative prompt: low quality, bad face, ((((bad anatomy)))), ((bad hand)), lowres, jpeg artifacts, 2d, 3d, cg, text Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 1425960707, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

Anime, (shinkai makoto) Beautiful Japanese landscape, ((Mt. Fuji and many cherry blossoms)), vivid color, official art, 4k, 8k, highly detailed, Dynamic Lighting Negative prompt: (((deformed))), photo, people, girl, boy Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 336752197, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

masterpiece, Anime, (shinkai makoto) , ((Nakamise Street in Asakusa)), vivid color, official art, 4k, 8k, highly detailed, Dynamic Lighting Negative prompt: (((deformed))), photo, people, girl, boy Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 1131027119, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

４．その他のモデルとの比較

(1) 前回のモデルとの比較

　今年1月14日に公開されたCool Japan Diffusion 2.1.1（前回のモデル）と同一のプロンプト、同一のseed値、同一の条件で比較してみました。
　色合いはさらに鮮やかになっているでしょうか。手の表現は未だ苦労しているようです。品質に大きな差は無くなってきたように見えます。

!wget https://huggingface.co/aipicasso/cool-japan-diffusion-2-1-1-1/resolve/main/v2-1-1-1_fp16.safetensors -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1-1-1_fp16.safetensors !wget https://huggingface.co/aipicasso/cool-japan-diffusion-2-1-1-1/raw/main/v2-1-1-1_fp16.yaml -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1-1-1_fp16.yaml Negative prompt: inaccurate limb , lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, blurry Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 620075266, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

masterpiece, anime, A beautiful girl with long, black hair and blue eyes standing against a summer sky. Negative prompt: inaccurate limb , lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, blurry Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 2843868587, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

masterpiece, anime, A regal girl with gold hair and an elegant presence. Negative prompt: inaccurate limb , lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, blurry Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 498249553, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

masterpiece, anime, A fluffy white kitten with big, curious eyes and a pink nose. Negative prompt: inaccurate limb , lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, blurry Steps: 40, Sampler: Euler a, CFG scale: 7, Seed: 3937185007, Size: 512x512, Model hash: b0c60ae560, Model: v2-1-1-1_fp16

(2) Waifu Diffusion1.4 Anime Epoch 2との比較

　今年1月16日に公開されたWaifu Diffusion 1.4 Anime Epoch 2(トレーニング設定の確認用のテストモデル)と同一のプロンプト、同一のseed値、同一の条件で比較してみました。
　二つのモデルの間で適したプロンプトの書き方がかなり異なるため、できるだけ条件を公平なものにするため、ChatGPTに依頼してプロンプトを作成してもらいました。なお、後から「masterpiece, anime」とネガティブプロンプトを追加しています。
　結果は以下の通りです。1枚目は、Waifuの場合、「holding_branch」を入れると、枝を持たせることができると教えてもらったのでプロンプトを追加したのですが、CJDの方は、このプロンプトでは桜の枝を持ちませんでした。
　また、4枚目については、最初は、Waifuの方が少女の画像になったのですが、「no_humans」を入れることによって人間が出てくるのを避けやすくなると聞いて、プロンプトを追加したのが以下の画像です。
　CJD2.1.1.1は、入力プロンプトにWaifuとの互換性を持たせたとのことですが、プロンプトの効き方には違いがあるようです。