『FLUX.1』 MidjourneyやSD3 Ultra超えのモデルが登場！Stable Diffusionの開発者たちが仕掛ける新革命

2024年8月4日 17:10

Stable Diffusionの元開発者たちが立ち上げたBlack Forest Labs（BFL）が、待望の新モデル「FLUX.1」を発表しました。120億ものパラメータを誇るFLUX.1は、驚異的な画像生成能力を持つだけでなく、これまでのAIが苦手としてきた人間の手の描写や複雑な場面の再現にも秀でています。さらに、商用からオープンソースまで、多様なニーズに応える3つのバージョンを用意し、ユーザーの需要を満たそうとしています。

本記事では、FLUX.1の革新的な機能や特徴、そしてそれがもたらす可能性について、詳しく探っていきます。実際にComfyUIとFLUX.1を用いて画像生成を行い、その性能を実感していきます。

1. FLUX.1について

FLUX.1は、Black Forest Labsが開発した最新の画像生成AI技術です。テキストから画像を生成する能力において新たな基準を設定し、画像の詳細さ、プロンプトへの忠実性、スタイルの多様性、シーンの複雑さなどの面で優れた性能を示しています。

FLUX.1の主な特徴

1.最先端の性能: 既存の主要なAIモデル（Midjourney v6.0、DALL·E 3、Stable Diffusion 3など）を上回る性能を達成。

https://blackforestlabs.ai/announcing-black-forest-labs/

2.多様なバリエーション: FLUX.1 [pro]、FLUX.1 [dev]、FLUX.1 [schnell]の3バージョンを提供。

3.革新的な技術: マルチモーダルおよび並列拡散トランスフォーマーブロックのハイブリッドアーキテクチャを採用し、12Bパラメータにスケールアップ。

4.柔軟性: 0.1から2.0メガピクセルの範囲で多様なアスペクト比と解像度をサポート。

5.アクセシビリティ: APIを通じたアクセスと一部バージョンのオープンソース提供。

FLUX.1の3つのバリエーション

Black Forest Labsは、異なるニーズに対応するため、FLUX.1を3つのバリエーションで提供しています。

1. FLUX.1 [pro]：プロフェッショナルの夢を叶える最高峰モデル

FLUX.1 [pro]は、FLUX.1の最高性能を誇るバージョンです。

最先端のパフォーマンスを持つ画像生成
トップクラスのプロンプト追従能力
卓越した視覚的品質と画像の詳細さ
多様な出力

現在、Black Forest LabsはFLUX.1 [pro]の推論コンピューティング能力を徐々に拡大しています。APIを通じてアクセスできるほか、ReplicateやFal.aiなどのパートナー企業を通じても利用可能です。さらに、企業向けのカスタマイズソリューションも提供しています。

2. FLUX.1 [dev]：研究者とクリエイターの新たな味方

FLUX.1 [dev]は、非商用アプリケーション向けのオープンウェイトモデルです。

FLUX.1 [pro]から直接蒸留された効率的なモデル
同等のサイズの標準モデルより高効率
高品質とプロンプト追従能力を維持

FLUX.1 [dev]の重みはHuggingFaceで公開されており、ReplicateやFal.aiで直接試すこともできます。商用利用を検討している場合は、Black Forest Labsに直接問い合わせることができます。

3. FLUX.1 [schnell]：個人開発者の強力な味方

FLUX.1 [schnell]は、ローカル開発と個人利用に特化した最速モデルです。

Apache2.0ライセンスで公開
Hugging Faceでweightsを入手可能
GitHubで推論コードを公開
ReplicateとFal.aiでも利用可能

なぜFluxは革命的なのか？

圧倒的な処理速度: 5分で1000人分のアート作品を生成できる能力は、創作プロセスを根本から変革します。
高度なカスタマイズ性: 3つのバリエーションにより、プロフェッショナルから個人開発者まで、幅広いニーズに対応します。
オープンな開発環境: 一部のバージョンをオープンソースで提供することで、AIコミュニティ全体の発展に貢献しています。
企業レベルのソリューション: FLUX.1 [pro]を通じて、エンタープライズ向けの高度なカスタマイズが可能です。
継続的な進化: Black Forest Labsの強力な研究チームにより、Fluxは常に最先端の技術を取り入れ続けています。

将来の展望

Black Forest Labsは、FLUX.1を基盤として、今後テキストから動画を生成する高性能なAIシステムの開発を計画しています。これにより、高解像度での精密な動画作成と編集が、前例のないスピードで可能になると期待されています。

2. ComfyUIでFluxを使用する

ComfyUIの作者であるcomfyanonymous氏が早速workflowを公開してくださいました。以下のリンクに詳細がまとめられています。このリンクの内容に従い、使用準備を進めていきます。

weightsのダウンロード

以下のリンクよりFLUX.1[dev]のweightsをダウンロードします。flux1-dev.sftをダウンロードし、ComfyUI/models/unetに格納してください。

CLIPのダウンロード

以下のリンクよりCLIPモデルをダウンロードします。clip_l.safetensors、t5xxl_fp16.safetensors（または省メモリ版のt5xxl_fp8_e4m3fn.safetensors）をダウンロードし、ComfyUI/models/clipに格納してください。

VAEのダウンロード

以下のリンクよりVAEをダウンロードします。ae.sftをダウンロードし、ComfyUI/models/vaeに格納してください。

Workflowのロード

以下の画像をダウンロードし、ComfyUIのキャンバスにドラッグ&ドロップしてください。画像ですが、workflowの情報が含まれているため、キャンバスにロードできます。

キャンバスにロードすると、以下のようなフローがロードされます。

dev版で高品質の画像を生成するためには、50steps必要なので、BasicSchedulerのstepsを50に変更してください。

3. FLUX.1[dev]で画像生成してみる

実際にFLUX.1[dev]で画像生成してみました。以下に使用したプロンプトと生成された画像を貼付します。

ヘリコプターから飛び降りる女性

A cinematic image capturing a Japanese woman with long black hair, performing a dramatic dive from a helicopter into the vast open sky. The background features a breathtaking view of the sky filled with soaring birds, accentuating a sense of freedom and exhilaration. The woman's expression is focused and fearless, her hair flowing dramatically behind her as she dives. The helicopter is visible in the upper part of the frame, adding a touch of adventure and scale to the scene. The lighting is dynamic, highlighting the action and the expansive atmosphere.

ダッシュする髭マッチョ男性

A cinematic image depicting a rugged Japanese man with a beard, sprinting through the bustling streets of Shibuya, Tokyo. He is portrayed as muscular and intense, with his strong physique evident even through his clothing. The scene captures him mid-dash, with the iconic Shibuya crossing in the background blurred by the motion. Neon lights and the vibrant city life add to the dynamic and energetic atmosphere of the image. The lighting is urban and dramatic, emphasizing the man's determined expression and the fast-paced action of the scene.

ドラゴンと勇者

A cinematic fantasy image inspired by RPG themes, featuring a heroic scene with a dragon, a warrior, a wizard, a martial artist, and a cleric. Set in a mystical landscape, the dragon looms large in the background, spewing fire into the sky. The warrior, clad in armor, stands boldly in the foreground with a sword raised. Beside him, a wizard prepares a spell, glowing with magical energy. The martial artist, in dynamic pose, is ready to strike, and the cleric, with a staff in hand, invokes a protective spell. The scene is bathed in the ethereal light of magic and fire, creating a dramatic and epic atmosphere.

ゾンビと逃げるカップル

A cinematic image depicting a male and female couple frantically running from a massive horde of zombies. The scene is set in a chaotic urban environment with the army in the background, engaged in a fierce battle to contain the zombie outbreak. The couple appears desperate and terrified, dodging between abandoned cars and debris. Soldiers can be seen in the periphery, firing at the advancing zombies, providing a grim backdrop. The atmosphere is tense and suspenseful, with dark, ominous lighting amplifying the sense of impending danger.

足の生成テスト

A cinematic image of a Japanese woman casually displaying the soles of her feet, seated on a park bench. The scene captures her in a relaxed pose, perhaps during a leisurely afternoon in a tranquil urban park. The focus is on her bare feet, crossed elegantly as she enjoys a book or the peaceful surroundings. The background is softly blurred, emphasizing her and the detail of her feet. The lighting is warm and natural, highlighting the simplicity and quiet mood of the moment.

軍隊の上陸作戦

A cinematic image depicting a military landing at a beachfront during a defensive operation. The scene captures the intensity of the moment with troops disembarking from landing craft under the cover of smoke and gunfire. The ocean is rough, reflecting the turmoil of battle, with waves crashing against the shore. Soldiers in full gear advance onto the beach, facing resistance from defensive positions in the distance. The sky is overcast, adding a dramatic and somber tone to the scene, emphasizing the gravity of the military engagement.

複数の女性がプールで遊ぶ

A cinematic image featuring multiple Japanese women in swimsuits, enjoying a playful moment in a pool, surrounded by splashing water that creates a fantastical atmosphere. The scene captures them laughing and splashing water at each other, with the sun casting a shimmering glow on the droplets, creating a sparkling effect. The background shows a beautifully designed pool area that enhances the dreamlike quality of the image. The overall mood is joyful and ethereal, with soft, diffused lighting that adds a magical touch to the setting.

イラスト: 魔法使いが爆発魔法を唱える

A cinematic image inspired by anime, depicting a dramatic scene of magical alchemy leading to an explosion. The setting is a dark, mystic chamber filled with ancient symbols and glowing artifacts. In the center, a character performs a complex magical ritual, hands raised as they channel energy into a vibrant, swirling mass of light that culminates in a sudden, intense explosion. The explosion sends colorful magical energies radiating outward, casting vivid shadows and illuminating the room with a spectrum of light. The atmosphere is tense and charged with the power of unleashed magic.

イラスト: 異世界転生したプログラマ

A cinematic image blending realistic and anime styles, featuring a programmer who has been reincarnated into a fantastical other world. The scene shows the programmer sitting at a magical, glowing workstation filled with ancient scrolls and futuristic screens, coding to manipulate the laws of this new world. Around him, elements of a traditional fantasy setting—enchanted forests, distant castles, and mythical creatures—merge with digital effects to symbolize his unique role in this realm. The lighting is dynamic, highlighting the contrast between the old world's mystique and the new digital influence he brings.

4. 所感: Midjourneyの代わりになりそう

以前に試したKolorsのように、FLUX.1も言語理解力が高く、高品質の画像を生成するモデルで、Midjourneyを使っているような感覚で使えて非常に便利だと感じました。（Kolorsは以下の記事をご参照ください）

このレベルのモデルがオープンソースで徐々に現れ始め、Llama 3.1が台頭するように画像生成AIでもオープンソースがメインになる時代が近いかもしれません。

現状、オープンソースのdev版は、非商用でのみ使用可能で、さらに性能の高いpro版は、APIでの利用となるため、FLUX.1で高品質な画像を商用利用しようとすると、他の商用モデルと同様に料金が掛かってしまうので、気軽に使えない点が気になります。今後の進展を見守りましょう。

5. より詳しく知りたい方へ

AICU media様が本記事の増強版を公開してくださりました！
Black Forest Labs社の背景、FLUX.1の性能、FLUX.1の動作方法などをより詳しく解説してくださっていますので、ぜひ以下の記事もご覧ください。

この記事でご紹介したAI技術の応用方法について、もっと詳しく知りたい方や、実際に自社のビジネスにAIを導入したいとお考えの方、私たちは、企業のAI導入をサポートするAIコンサルティングサービスを提供しています。以下のようなニーズにお応えします。

AIを使った業務効率化の実現
データ分析に基づくビジネス戦略の立案
AI技術の導入から運用までの全面サポート
専門家によるカスタマイズされたAIソリューションの提案

初回相談無料ですので、お気軽にご相談ください。以下のリンクからお問い合わせください。

この記事が気に入ったらサポートをしてみませんか？