Google Colab で Moore-AnimateAnyone を試す

2024年1月13日 09:55

「Google Colab」で「Moore-AnimateAnyone」を試したので、まとめました。

1. Moore-AnimateAnyone

「Moore-AnimateAnyone」は、「AnimateAnyone」の再現実装です。元の論文で実証された結果を一致させるために、さまざまなアプローチやトリックを採用していますが、それらは論文や別の実装とは多少異なる場合があります。

これは非常に暫定的なバージョンで、AnimateAnyone で示されたパフォーマンス (テストでは約 80%) に近似することを目的としています。

2. インストール

インストール手順は、次のとおりです。

(1) パッケージのインストール。

# パッケージのインストール
!git clone https://github.com/MooreThreads/Moore-AnimateAnyone
%cd Moore-AnimateAnyone
!pip install -r requirements.txt

3. ウェイトの準備

ウェイトを以下のフォルダ構成を準備します。

./pretrained_weights/
|-- DWPose
|   |-- dw-ll_ucoco_384.onnx
|   `-- yolox_l.onnx
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- denoising_unet.pth
|-- motion_module.pth
|-- pose_guider.pth
|-- reference_unet.pth
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

・weights (denoising_unet.pth, reference_unet.pth, pose_guider.pth and motion_module.pth)
・StableDiffusion V1.5
・sd-vae-ft-mse
・image_encoder
・DWPose (dw-ll_ucoco_384.onnx, yolox_l.onnx)

(1) 「pretrained_weights」フォルダの生成と移動。

!mkdir pretrained_weights
%cd pretrained_weights

(2) 「Moore-AnimateAnyone」のウェイトの準備。

!wget https://huggingface.co/patrolli/AnimateAnyone/resolve/main/denoising_unet.pth
!wget https://huggingface.co/patrolli/AnimateAnyone/resolve/main/motion_module.pth
!wget https://huggingface.co/patrolli/AnimateAnyone/resolve/main/pose_guider.pth
!wget https://huggingface.co/patrolli/AnimateAnyone/resolve/main/reference_unet.pth

(3) 「stable-diffusion-v1-5」のウェイトの準備。

!mkdir stable-diffusion-v1-5
!mkdir stable-diffusion-v1-5/feature_extractor
!mkdir stable-diffusion-v1-5/unet
!wget -P stable-diffusion-v1-5 https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/model_index.json
!wget -P stable-diffusion-v1-5 https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-inference.yaml
!wget -P stable-diffusion-v1-5/feature_extractor https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/feature_extractor/preprocessor_config.json
!wget -P stable-diffusion-v1-5/unet https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/unet/config.json
!wget -P stable-diffusion-v1-5/unet https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/unet/diffusion_pytorch_model.bin

(4) 「sd-vae-ft-mse」のウェイトの準備。

!git clone https://huggingface.co/stabilityai/sd-vae-ft-mse

(5) 「image_encoder」のウェイトの準備。

!mkdir image_encoder
!wget -P image_encoder https://huggingface.co/lambdalabs/sd-image-variations-diffusers/resolve/main/image_encoder/config.json
!wget -P image_encoder https://huggingface.co/lambdalabs/sd-image-variations-diffusers/resolve/main/image_encoder/pytorch_model.bin

(5) 「DWPose」のウェイトの準備。
「DWPose」フォルダを生成します。

!mkdir DWPose

その後、「DWPose」リポジトリから「dw-ll_ucoco_384.onnx」「 yolox_l.onnx」をダウンロードして配置。

(6) ルートフォルダに戻る。

%cd ..

4. コンフィグの設定

コンフィグの設定の手順は、次のとおりです。

(1) キャラクター画像とポーズ動画の準備。
画像「girl.png」(512x784px) を準備して、「configs/inference/ref_images」に配置。

・girl.png

ポーズ動画は、「/configs/inference/pose_videos/anyone-video-2_kps.mp4」を使います。

・anyone-video-2_kps.mp4

(2) 「configs/prompts/animation.yaml」を以下のように編集。

・animation.yaml

pretrained_base_model_path: "./pretrained_weights/stable-diffusion-v1-5/"
pretrained_vae_path: "./pretrained_weights/sd-vae-ft-mse"
image_encoder_path: "./pretrained_weights/image_encoder"
denoising_unet_path: "./pretrained_weights/denoising_unet.pth"
reference_unet_path: "./pretrained_weights/reference_unet.pth"
pose_guider_path: "./pretrained_weights/pose_guider.pth"
motion_module_path: "./pretrained_weights/motion_module.pth"

inference_config: "./configs/inference/inference_v2.yaml"
weight_dtype: 'fp16'

test_cases:
  "./configs/inference/ref_images/girl.png":
    - "./configs/inference/pose_videos/anyone-video-2_kps.mp4"

5. 動画生成

動画生成は、次のとおりです。

(1) 動画生成。
A100で2分30秒ほどかかりました。

# 動画生成
!python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 784 -L 64

メモリ消費量は、次のとおりです。

Google Colab で Moore-AnimateAnyone を試す2。
Poseの頭身と顔のサイズも調整した方が良さそう。https://t.co/m93Qbf5zzo pic.twitter.com/SkuMyS8auO
— 布留川英一 / Hidekazu Furukawa (@npaka123) January 13, 2024

6. ポーズ動画生成

付属の「vid2pose.py」で人物動画からポーズ動画生成できます。

(1) 人物動画の準備し、「./configs/inference/pose_videos/」に配置。
今回は、Mixamoの動画を使いました。

・dance.mp4

Mixamoの元動画 pic.twitter.com/0zexQ15MMr
— 布留川英一 / Hidekazu Furukawa (@npaka123) January 13, 2024

(2) 「tools/vid2pose.py」を親フォルダに移動。
tools直下だとうまく動かなかったので移動させました (もっとスマートな解決方法ありそう)。

!cp tools/vid2pose.py vid2pose.py

(3) ポーズ動画生成。
4分ほどで、「./configs/inference/pose_videos/dance_kpm.mp4」が生成されます。

!python vid2pose.py --video_path ./configs/inference/pose_videos/dance.mp4

ポーズ動画変換後 pic.twitter.com/saADqK5KyN
— 布留川英一 / Hidekazu Furukawa (@npaka123) January 13, 2024

(4) 「configs/prompts/animation.yaml」を以下のように編集。

・animation.yaml

pretrained_base_model_path: "./pretrained_weights/stable-diffusion-v1-5/"
pretrained_vae_path: "./pretrained_weights/sd-vae-ft-mse"
image_encoder_path: "./pretrained_weights/image_encoder"
denoising_unet_path: "./pretrained_weights/denoising_unet.pth"
reference_unet_path: "./pretrained_weights/reference_unet.pth"
pose_guider_path: "./pretrained_weights/pose_guider.pth"
motion_module_path: "./pretrained_weights/motion_module.pth"

inference_config: "./configs/inference/inference_v2.yaml"
weight_dtype: 'fp16'

test_cases:
  "./configs/inference/ref_images/girl.png":
    - "./configs/inference/pose_videos/dance_kps.mp4"

(5) 動画生成。
Lが64だと1秒なので、256の4秒に増やしました。

!python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 784 -L 256

AnimateAnyone のvid2poseでMixamoからポーズ生成してからの動画生成
Lも4秒 (256)に増やす。https://t.co/m93Qbf5zzo pic.twitter.com/ZVfOQGhwlc
— 布留川英一 / Hidekazu Furukawa (@npaka123) January 13, 2024

この記事が気に入ったらサポートをしてみませんか？