Diffusersでsafetensorsモデルを使う＆高解像度化【ControlNet Tile】

2024年1月6日 18:51

皆さまあけましておめでとうございます。
今年はもう少しX(旧Twitter)にもAIイラストなどの投稿数を上げていきたいなと思っているのですが、無料で高解像度なイラストをまとまった数一気に生成するというのは意外に大変だなぁと思っていたので試行錯誤した結果になります。

大まかな流れは
①Diffusersでsafetensorsモデルを使用し通常通り画像生成
②DiffusersでControlNet Tileを使い、①の生成画像を拡大し高解像度化
です。

現在はDiffusersモデルよりsafetensorsモデルのほうが一般的(？)みたいなのでsafetensorsを使っていきます。作業環境は無料版Colabを使用します。
最終的にはGoogleドライブに画像を保存するようにして、画像生成と高解像度化でノートブックを分けるのが良いと思います。そこだけ知りたい人はまとめまで飛ばしてください。

Diffusersでsafetensorsモデルを使用する

と言ってもやることは簡単です。
今回はアニメ系モデルのaamAnyloraAnimeMixAnime_v1.safetensorsを使います。

以下サンプルコードです。


#各種インストール
!apt -y install -qq aria2
!pip install safetensors diffusers transformers omegaconf

#モデルのダウンロード
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://civitai.com/api/download/models/89927 -d /content/ -o aamAnyloraAnimeMixAnime_v1.safetensors

#パイプラインの設定(nsfwフィルターは外す(これしないと高確率で真っ黒になる))
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_single_file("aamAnyloraAnimeMixAnime_v1.safetensors",safety_checker=None).to("cuda")

#各種パラメータ設定
prompt = "plain white t-shirt,short cut hair,red hair,blue eyes,jeans,inside the room,smile,girl,solo, in the forest"
n_prompt = "nsfw, worst quality, low quality, medium quality, deleted, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digits, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry"
height = 512
width = 512

#生成
image = pipe(prompt, negative_prompt=n_prompt,height=height, width=width, num_inference_steps=30).images[0]
image.save("test.png")

これでひとまず画像は生成できます。上記で試しに生成できたのはこれ

DiffusersでControlNet Tileを使い高解像度化

先ほど出力したファイルにControlNetのTileを使っていきます。
さっきのでインストール＆ダウンロード済みのものもありますが敢えて全部書いておきます。以下サンプルコード。

#各種インストール
!pip install diffusers transformers omegaconf
!pip install accelerate
!pip install xformers
!apt -y install -qq aria2

#モデルのダウンロード
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://civitai.com/api/download/models/89927 -d /content/ -o aamAnyloraAnimeMixAnime_v1.safetensors

#ControlNetとパイプラインの設定
import torch
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
from diffusers.utils import load_image
import xformers

controlnet = ControlNetModel.from_pretrained('lllyasviel/control_v11f1e_sd15_tile')

pipe = StableDiffusionControlNetPipeline.from_single_file(
    "aamAnyloraAnimeMixAnime_v1.safetensors",
    safety_checker=None,
    custom_pipeline="stable_diffusion_controlnet_img2img",
    controlnet=controlnet).to('cuda')
pipe.enable_xformers_memory_efficient_attention()

#画像の読み込み
image_path = "/content/test.png"
init_img = Image.open(image_path).convert("RGB")
#２倍に拡大
new_size = (init_img.width * 2, init_img.height * 2)
resized_img = init_img.resize(new_size)
#プロンプト
prompt = "plain white t-shirt,short cut hair,red hair,blue eyes,jeans,inside the room,smile,girl,solo, in the forest"
n_prompt = "nsfw, worst quality, low quality, medium quality, deleted, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digits, fewer digits, cropped, jpeg artifacts, signature, watermark, username, blurry"

# 生成
image = pipe(prompt=prompt,
    negative_prompt=n_prompt,
    image=resized_img,
    controlnet_conditioning_image=resized_img,
    width=resized_img.size[0],
    height=resized_img.size[1],
    strength=1.0,
    generator=torch.manual_seed(0),
    num_inference_steps=32,
    ).images[0]

# 画像を保存
image.save("test2.png")

上記でできたものがこちらです。

512×512を２倍にしたので1024×1024になりました。
途中でOut Of Memoryになってしまったらちょっと非推奨ですが

[_ for _ in range(10000000000)]

でメモリを使い切って強制クラッシュさせて再実行するか、一旦test.pngを保存しておいて一度「ランタイムを接続解除して削除」してから再実行時に配置しましょう。

まとめ

どうでしたでしょうか。多少絵柄は変わってしまいますが、割と大きめの画像が綺麗に出力できたかなぁと思います。Python初心者も初心者なので変なところあったらごめんなさい。
また上記のサンプルコードでは1枚ずつの出力になってしまうので、まとめて出力するColabノートを画像生成と高解像度化でそれぞれ作成しました。よかったら見てみてください。
あ、ランタイムのタイプをGPUにするのを忘れずにしておいてもらうのと、2回目同じコードを実行すると画像が上書きされますのでご注意ください。

safetensors画像出力

ControlNet Tile高解像度化

最後に。。私Amazonアソシエイト始めたので良かったら買わなくても良いので商品リンク踏んでって頂けますと幸いです。
(ちゃんと機能してるか確認したく…)

ではでは、ここまでお読み頂きありがとうございました！

この記事が気に入ったらサポートをしてみませんか？