Flux.1-devで現場猫の画像を生成しよう

inada

2024年9月7日 06:20

はじめに

この記事は、2024/9時点の最新の画像生成AIで現場猫を学習させるとどれぐらいの性能が出るかの簡単なまとめと生成＆学習手順の記事です。

対象読者

現在の最新技術の動向チェックに余念がないAI絵師
(メモリ24G以上のGPUボードを所有しているか、レンタルマシンを使用して)現場猫生成の再現実験をしてみたい人
その他の現場猫マニア

背景

2024/9時点の画像生成AIの状況の簡単おさらい

Flux.1という新しい画像生成モデルが登場し、メモリ24G以上のGPUボードを搭載したローカルPCで画像学習が可能になった
Flux.1の画像学習では従来より細かな制御が可能になるモデルの生成が可能になった（どれぐらいの制御が出来るようになったかをこの記事で解説）

Flux.1とは

Black Forest Labsによって開発された、テキストから画像を生成する最新のAIモデルです
Flux.1-devは商用利用は出来ません
Flux.1-schnellという軽量モデルは商用利用可能です

https://blackforestlabs.ai/

Flux.1-devでの画像学習と生成

画像生成

先にどのような画像が生成できるかから見て行きましょう。

画像とプロンプトの例

単純生成

<lora:gembacat_flux_lora_20240814_000000500:1>, gembacat, cat, helmet, no humans

<lora:gembacat_flux_lora_20240814_000000500:1>, gembacat, japanese anime style, cat, gray cat wearing a yellow helmet pointing sideways with one foot , hardhat, helmet, no humans, yellow headwear, solo, animal focus, pointing, open mouth, animal, full body, :3, looking at viewer. The 'YOSHI!' text should be prominently displayed in bold, large letters in speech bubble, ensuring it is highly visible.

自衛隊

lora:gembacat_flux_lora_20240814_000000500:1, gembacat, a cute cat with a military helmet, holding a smartphone, a Twitter logo in the background, anime style, illustration

lora:gembacat_flux_lora_20240814_000000500:1, gembacat, a cute cat with a military helmet, holding a megaphone, a crowd of people with different expressions, anime style, illustration

lora:gembacat_flux_lora_20240814_000000500:1, gembacat, a cute cat with a military helmet, sitting on a pile of anime DVDs, a variety of anime characters in the background, anime style, illustration

ずんだもんと現場猫の夢の共演(複数LoRA適用)

1cat and 1girl hold hands each other, lora:gembacat_flux_lora_20240814_000000500:1, gembacat, japanese anime style, cat, gray cat wearing a yellow helmet pointing sideways with one foot, hardhat, helmet, no humans, yellow headwear, solo, animal focus, pointing, open mouth, animal, full body, :3, looking at viewer. lora:zundamon_flux_lora_20240811:1, zundamon, 1girl, white shirts, green suspenders shorts, green hair, open mouth, pointing sideways with one foot.

モデル置き場

flux1.dev

https://huggingface.co/lllyasviel/flux1_dev/blob/main/flux1-dev-fp8.safetensors

現場猫モデル

https://huggingface.co/takaaki-inada/flux1_lora/blob/main/gembacat_flux_lora_20240814_000000500.safetensors

生成手順

ここからはメモリ24G以上のGPUボードを所有しているか、レンタルマシンを調達できる人向けの手順になります。

webui(tool)のdownload

https://github.com/lllyasviel/stable-diffusion-webui-forge

git cloneします。

モデルを配置する
モデル置き場からモデルをdownloadし、git cloneしたディレクトリの配下に配置する
- models/Stable-diffusion/flux1-dev-fp8.safetensors
- models/Lora/gembacat_flux_lora_20240814_000000500.safetensors

以下linuxの場合

webuiの起動 (installを兼ねる)

./webui.sh

画面に出てくるlinkをクリックしてwebui画面を表示して、fluxを選択し、promptを入力しgenerateボタンを押下すると現場猫の画像が生成されます。簡単ですね。

モデル学習

学習画像

この4枚で学習しました。

gembacat, cat, no humans, black background, animal focus, simple background, yellow eyes, solo, looking at viewer, animal, open mouth, yellow sclera, portrait

gembacat, cat, white background, hardhat, helmet, no humans, yellow headwear, simple background, solo, animal focus, pointing, open mouth, animal, full body, :3, looking at viewer

gembacat, cat, animal focus, no humans, simple background, animal, white background, solo, :3, open mouth, looking at viewer, full body

gembacat, cat, white background, hardhat, helmet, no humans, yellow headwear, simple background, solo, animal focus, open mouth, animal, full body, :3, looking at viewer

学習画像置き場

https://huggingface.co/takaaki-inada/flux1_lora/tree/main/datasets/gembacat

スクリプト

学習tool(ai-toolkit)のdownloadとinstalll

https://github.com/ostris/ai-toolkit

git cloneし、READMEのinstallation手順の通りinstallします。

設定ファイルの作成と学習実行
以下linuxでコマンドラインの場合

cp config/examples/train_lora_flux_24gb.yaml config/gembacat.yaml

config/gembacat.yaml の以下の部分を編集します。

      datasets:
        # datasets are a folder of images. captions need to be txt files with the same name as the image
        # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
        # images will automatically be resized and bucketed into the resolution specified
        - folder_path: "/path/to/images/folder"

上記の学習画像置き場に学習画像を配置しておきましたので、学習画像を適当なフォルダにダウンロードし、ダウンロードしたfloder_pathを上記の部分に設定します。

        prompts:
          # you can add [trigger] to the prompts here and it will be replaced with the trigger word
         - "[trigger] holding a sign that says 'I LOVE PROMPTS!'"

`[trigger]`を`gembacat`に変えたら、以下の通り学習を実行します。

python run.py config/gembacat.yml

500step程(30分程)学習させてCTRL + Cを押下すればOKです。簡単ですね。

所感

たった4枚の画像学習で特徴をとらえていて、思ったようなポーズやシチュエーションが生成できます。
2年前現場猫を学習させた時は、擬人化した女の子の現場猫が生成されるぐらいの性能しか出せなかったのですが、性能がとても良くなっています。

2年前

https://huggingface.co/sd-dreambooth-library/gemba-cat

おわりに

この記事はQiitaとのマルチポストです。マルチポストは良くないかなと思ったのですが、Qiitaのview数が少なく内容的にもQiita向けの記事ではないので、noteにも投稿することにしました（note初投稿です！）

この記事が気に入ったらサポートをしてみませんか？