Stable Diffusion 3.5が登場!ComfyUIでの動作方法の解説と生成画像の検証
2024年10月22日、Stable Diffusionの最新バージョン、「Stable Diffusion 3.5」が発表されました。今回のバージョンでは、より強力なモデルやカスタマイズ性が加わり、さらに便利に使えるようになっているとのこと。この記事では、Stable Diffusion 3.5の主なポイントや新機能について、分かりやすく解説します。
この記事でご紹介したAI技術の応用方法について、もっと詳しく知りたい方や、実際に自社のビジネスにAIを導入したいとお考えの方、私たちは、企業のAI導入をサポートするAIコンサルティングサービスを提供しています。以下のようなニーズにお応えします。
AIを使った業務効率化の実現
データ分析に基づくビジネス戦略の立案
AI技術の導入から運用・教育までの全面サポート
専門家によるカスタマイズされたAIソリューションの提案
初回相談無料ですので、お気軽にご相談ください。以下のリンクからお問い合わせください。
1. Stable Diffusion 3.5の特徴
この新バージョンでは、特に以下の点で強みがあります。
カスタマイズ性の高さ
Stable Diffusion 3.5は、ユーザーが自分のニーズに合わせて簡単にカスタマイズできるように設計されています。これにより、独自のスタイルを追求したり、特定のプロジェクトに合わせたワークフローを構築することが容易になります。
高いパフォーマンス
Stable Diffusion 3.5 MediumやLarge Turboモデルは、標準的な消費者向けハードウェアで動作するように最適化されており、特別な機器がなくてもスムーズに動作します。また、Large Turboは非常に高速な推論時間を提供し、プロジェクトの効率化に貢献します。
多様な出力
このモデルは、異なる肌の色調や特徴を持つ多様な人々の画像を生成することができ、これにより、幅広い表現が可能です。追加のプロンプトを設定しなくても、自然なバリエーションを含んだ画像を生成します。
豊富なスタイル
3Dレンダリング、写真、絵画、線画など、あらゆる視覚スタイルを生成できる柔軟性も魅力です。ユーザーは、自分の好みに合わせたスタイルを選び、クリエイティブな作品を生み出せます。
2. 公開されたモデル
今回公開されたモデルは、以下の3種類になります。
Stable Diffusion 3.5 Large
80億パラメータという大規模なモデルで、高品質な画像生成と優れたプロンプト適応性が特徴です。特に、プロフェッショナルなユースケースに適しており、1メガピクセルの解像度で高精細な画像を生成できます。
Stable Diffusion 3.5 Large Turbo
こちらはLargeモデルのスピードアップ版で、わずか4ステップで画像を生成できるため、より高速な処理が可能です。生成される画像は高品質で、プロンプトの指示にも忠実に従います。
Stable Diffusion 3.5 Medium(10月29日にリリース予定)
25億パラメータを持ち、消費者向けハードウェアでも動作するように設計されたモデルです。これにより、特別な設備がなくても高品質な画像生成が可能となり、解像度は0.25メガピクセルから2メガピクセルまで対応しています。
3. モデルの性能比較
Stable Diffusion 3.5 Largeは、市場においてプロンプトへの適応性でトップクラスを誇り、非常に高い画像品質を実現しています。特に、より大きなパラメータを持つモデルと比較しても、その性能は引けを取りません。また、Stable Diffusion 3.5 Large Turboは、推論時間が極めて速く、生成される画像の質とプロンプト適応性において、非蒸留モデルと比べても優れたパフォーマンスを発揮します。
さらに、Stable Diffusion 3.5 Mediumは他の中型モデルを凌駕しており、プロンプト適応性と画像品質のバランスに優れた効率的なモデルとして、さまざまなユーザーにとって理想的な選択肢となっています。品質とコストの両立を求めるクリエイターには特におすすめです。
4. ライセンス
ライセンスは、Stable Diffusion 3 Mediumと同様のStability AI Community Licenseで提供されています。このライセンスの概要は以下のとおりです。
非商用利用無料: 個人および組織は、科学研究を含む非商用利用において無料でモデルを使用できます。
商用利用無料(年間収益が100万ドル未満の場合): 新興企業、中小企業、クリエイターは、年間収益が100万ドル未満であれば、商用目的でモデルを無料で使用できます。
生成物の所有権: 制限のあるライセンスの影響を受けることなく、生成されたメディアの所有権を保持できます。
年間収益が100万ドルを超える組織の場合は、Enterprise Licenseを申し込む必要があります。(申込はこちら)
5. 環境設定
ここからは、ComfyUIで動作させるための環境設定をしていきます。
モデルのダウンロード
以下よりSD3.5のベースモデルをダウンロードし、「ComfyUI/models/checkpoints」に格納してください。
CLIPのダウンロード
以下よりCLIPをダウンロードし、「ComfyUI/models/clip」フォルダに格納してください。
ワークフローのダウンロード
SD3.5 Large (SD3.5L) とLarge Turbo (SD3.5LT) のワークフローは、それぞれ以下よりダウンロードしてください。
これらのワークフローは、ComfyUI公式が提供しているワークフローになります。Turbo用がなかったため、Stability AIが公開していたTurbo用のワークフローを参考にウィジェットの値を調整しています。ComfyUI公式が提供しているSD3.5のワークフローは以下になります。
6. 生成結果の検証
SD3.5LとSD3.5LTの両方で同じシード値とプロンプトを使用して生成した結果を以下に示します。
Japanese anime-style illustration of a young female swordswoman with long silver hair and blue armor stands confidently in front of a towering fantasy castle. She holds a gleaming katana, her expression resolute and determined. The sunset casts a warm, golden light, casting long shadows and highlighting the ancient stone bridge and castle towers. The scene is reminiscent of a shot from a Hayao Miyazaki film, with soft, diffused lighting and a slight film grain, enhancing the magical and adventurous atmosphere. The vibrant colors of the sunset and the sharp lines of the castle create a visually striking composition.
A giant fire-breathing dragon with black scales reflecting the night sky stands atop a desolate volcanic mountain, its mouth releasing scorching flames. In the background, jagged mountains with flowing lava and a red-black sky. Cinematography inspired by Christopher Nolan, with a wide-angle lens capturing the vast landscape, deep shadows, and dramatic highlights. Soft film grain adds texture, and a desaturated color grading enhances the desolate atmosphere. Key light from a low-angle position casts long shadows, while a rim light from behind the dragon creates a silhouette, emphasizing its imposing presence.
In a dimly lit, ancient library with wooden shelves and stone floors, an elderly wizard with long graying hair and a deeply wrinkled face sits reading a magical book. The book emits a pale blue light, casting an ethereal glow around him. Candles flicker on the shelves, creating a dark and mysterious ambiance. The scene is framed with a soft film grain, deep shadows, and a cold color grading, evoking the visual style of Tim Burton.
In a softly lit room with pastel watercolor walls, a young woman with fair skin, long wavy auburn hair, and emerald green eyes stands beside a gentle stream of light. She wears a flowing, pastel pink dress with delicate lace trim. Her hand gently traces the smooth, flowing lines of the word "Dream" written in elegant cursive. The letters glow with a soft, ethereal light, casting a warm, golden hue around her. The scene is bathed in a gentle, diffused light from a hidden window, creating a dreamy, almost mystical atmosphere. The film grain adds a subtle texture, and the color grading enhances the pastel tones, giving the image a serene, almost otherworldly feel, reminiscent of a Wes Anderson film.
A serene Japanese courtyard with a traditional wooden veranda, where a young Japanese woman with fair skin and delicate features, dressed in a light pink kimono, is holding a scroll with the kanji "愛" (love) rendered in a silk-screen texture. Her eyes are warm and inviting, and she is gently handing the scroll to a young man with short, dark hair and a calm expression, dressed in a dark blue kimono. The background features a washi paper pattern with subtle gradients, creating a sense of depth and tradition. The scene is bathed in soft, warm lighting, with a key light placed from the upper left, casting gentle shadows and highlighting the texture of the characters' kimonos and the scroll. The color grading is slightly desaturated, emphasizing the traditional elegance and tranquility of the moment, with a hint of film grain adding to the nostalgic feel.
In a futuristic cyberpunk city at night, neon lights illuminate towering glass skyscrapers. Hovercars zip through the sky, and holographic ads flicker on street signs. Rain falls in the lower levels, creating wet streets that reflect vibrant neon colors. A diverse couple stands under a flickering neon sign, their faces illuminated by the blue and pink lights. The man, with dark skin and a sleek black trench coat, leans in to kiss the woman, who has pale skin and electric blue hair, as she smiles softly. The scene is framed with a shallow depth of field, emphasizing the couple against the bustling city. The lighting is soft yet dramatic, with a mix of cool and warm tones, creating a moody, atmospheric feel. Inspired by the visual style of Ridley Scott, the film grain adds a gritty texture, and the color grading enhances the neon hues, giving the scene a vivid, futuristic vibe.
A young steampunk adventurer with fair skin, bronze goggles, and a determined expression stands on the deck of a steam-powered airship. His Victorian-style coat flutters in the wind, and leather gloves grip the railing. Behind him, a mechanical world of gears and pipes glows under a warm, golden sunset. Cinematography inspired by Christopher Nolan, with deep shadows and high contrast, emphasizing the intricate details of the steam-powered machinery. Soft, ambient light from overhead lanterns casts a warm glow, creating a nostalgic and adventurous atmosphere. Film grain adds texture, and color grading enhances the golden hues, bringing the scene to life.
A 5-year-old girl with short brown hair stands in a sunlit meadow, her radiant smile brightening the scene. Soft, golden sunlight filters through the blue sky with fluffy white clouds, casting gentle shadows on her freckled cheeks. A light breeze ruffles her hair, adding a playful touch. The background meadow is a vibrant tapestry of green grass and wildflowers, with the horizon stretching out into a serene, picturesque landscape. The scene is bathed in a warm, nostalgic glow, reminiscent of the cinematography of Wes Anderson, with a subtle film grain and a slightly desaturated color palette that enhances the dreamy, whimsical atmosphere.
In a mystical forest with towering ancient trees and a thin mist hanging in the air, blue-glowing mushrooms illuminate the ground. Two fairies, one with shimmering golden hair and translucent wings, the other with deep emerald hair and delicate green wings, gently embrace, their eyes closed in a moment of serene connection. Glowing orbs drift gently around them, casting a soft, ethereal light. The scene is bathed in a dreamlike atmosphere, with a cool, blue-tinted color grading and subtle film grain, reminiscent of a Tim Burton film. The lighting is primarily from bioluminescent sources, creating a mystical, otherworldly glow.
A photorealistic tiger with vibrant orange and black stripes slowly walks through a dense, lush jungle. Sunlight filters through the canopy, casting deep, dappled shadows on the forest floor. The tiger's eyes are piercing and alert, its muscles rippling under its fur. The background is a lush tapestry of green leaves and vines, with a soft, natural film grain and warm, golden color grading, evoking the cinematic style of Ang Lee.
In a sleek, sci-fi spaceship control room with holographic displays and a large glass window revealing countless stars, two crew members, a fair-skinned woman with short, dark hair and a Middle Eastern man with a trimmed beard, share a tense conversation. Blue and white lights cast soft, cool shadows, enhancing the futuristic atmosphere. The woman’s expression is concerned, while the man looks determined, his eyes fixed on a holographic display. The scene is framed with subtle film grain and a cool, desaturated color grade, evoking the visual style of Christopher Nolan’s high-stakes sci-fi.
In a chillingly eerie abandoned hospital, a dark, long hallway is illuminated by the cold moonlight seeping through broken windows. An old wheelchair sits eerily abandoned, casting long shadows that dance with the cold wind. Faded bloodstains mar the walls, adding to the tense atmosphere. Shadows seem to move of their own accord, creating a sense of impending dread. The scene is framed with a low-key lighting setup, using soft blue and red gels to enhance the horror, with film grain adding to the gritty, unsettling mood.
In a luxurious palace interior, a middle-aged woman with warm olive skin and deep brown eyes, wearing an elegant emerald dress, sits gracefully in a gilded chair. Her expression is serene, framed by soft, natural light filtering through ornate windows, casting a gentle glow on her face. A large, shining chandelier hangs above, casting intricate shadows and adding a touch of opulence. The scene is bathed in a warm, golden hue, reminiscent of a David Fincher film, with subtle film grain and a rich, muted color palette.
Zeus atop Mount Olympus, his white hair and beard flowing in the wind, holding a crackling thunderbolt. Dark, stormy clouds swirl around him, lightning flashing in the background. His powerful, commanding presence is illuminated by dramatic, directional lighting from above, casting deep shadows and emphasizing his muscular form. The scene is framed with a wide, sweeping lens, capturing the epic scale and intensity of the moment. Film grain adds a gritty, textured quality, while the color grading is cool and moody, enhancing the atmosphere of divine power and foreboding.
A anthropomorphized rabbit dressed in a dapper gentleman's suit, with a proud and calm expression, sipping tea at an elegantly set table in a Victorian-style living room. Luxurious curtains drape the windows, and a warm fireplace casts a soft, golden glow across the scene. The rabbit’s fur is sleek and well-groomed, with a hint of film grain adding to the timeless feel. The lighting is a mix of natural sunlight filtering through the curtains and the warm, amber tones from the fireplace, creating a cozy and refined atmosphere reminiscent of Wes Anderson’s visual style.
A dreamlike scene of floating islands suspended in a pink and purple sky, waterfalls cascading into the air, and a giant glowing moon. Ancient ruins stand on the islands, bathed in soft, diffused light. Two figures, one with fair skin and flowing dark hair, the other with a warm, olive complexion and sharp eyes, stand on the edge of an island, gazing at the magical vista. They hold hands, their expressions filled with wonder and awe. The lighting is soft and ethereal, casting a warm, golden glow that enhances the mystical atmosphere. Film grain adds a subtle texture, and the color grading emphasizes the pink and purple hues, creating a serene and enchanting mood.
In a lyrical autumn landscape, an elderly man with a weathered, kind face and silver hair walks along a path lined with golden trees. Fallen leaves swirl in the wind, catching the soft, golden light that filters through the branches. The distant lake and mountains shimmer in warm tones, creating a peaceful, serene atmosphere. The scene is framed with subtle film grain and warm color grading, evoking the cinematographic style of Terrence Malick, with a low-angle shot emphasizing the majestic trees and the man's contemplative expression.
A small village glows under a starry night sky, with swirling stars and a bright moon casting a soft blue glow. Distant mountains are silhouetted against the sky, and bold brushstrokes and vivid colors create a dynamic and energetic scene, reminiscent of Van Gogh’s style. The village is bathed in a warm, golden light from lanterns, contrasting with the cool blue of the sky. Film grain adds texture, and the color grading enhances the mystical atmosphere.
In a surreal, otherworldly landscape with a sky painted in fiery red and orange, massive rocks float in zero-gravity space. Orbs of light drift lazily, casting ethereal glows. A pale, ethereal woman with long, silver hair and luminescent blue eyes stands on a floating rock, her skin glowing softly against the fiery backdrop. She holds out a glowing orb to a dark-skinned man with intense, golden eyes, his expression a mix of awe and curiosity. The scene is framed with a soft film grain, the color grading enhances the warmth of the sky and the cool glow of the orbs, creating a dreamlike atmosphere. Lighting is primarily from an overhead softbox, casting gentle shadows and emphasizing the textures of the rocks and characters.
A slender, metallic humanoid AI with glowing blue eyes stands atop a skyscraper, overlooking a futuristic cityscape. Holographic advertisements illuminate the scene, casting vibrant, multi-colored light. The robot's sleek body reflects the neon glow, emphasizing its advanced design. The background city is a maze of towering skyscrapers, interconnected by sky bridges and flying vehicles. The lighting is dramatic, with blue and purple hues dominating, creating a moody, atmospheric feel. Film grain adds a gritty texture, while the color grading enhances the cyberpunk aesthetic, making the scene both dystopian and mesmerizing.
A Japanese girl with fair skin, long black hair, and a radiant smile stands on a sunlit beach. She wears a blue and white polka-dot swimsuit, the fabric clinging to her figure. The beach is golden, with soft sand and gentle waves lapping at the shore. The sky is a vibrant blue, with a few fluffy white clouds. The lighting is natural, with the sun casting a warm, golden glow from the upper right, creating soft shadows and highlighting her features. The scene is captured with a shallow depth of field, blurring the background slightly, and a subtle film grain adds texture. The color grading is warm, with a slight desaturation to give a nostalgic, sun-kissed feel.
色々と画像を生成してみての所感を述べます。
SD3.5LとSD3.5LTを比較すると、SD3.5Lの方が画像の詳細度が高い
複数人や手の生成が甘い
イラストの画像はいまいちか
日本語の描画はできない(英語のみ)
プロンプトの理解力は高い
7. まとめ
Stable Diffusion 3.5は、性能面やカスタマイズ性が大幅に向上した最新バージョンで、様々なクリエイティブなプロジェクトにおいて非常に有用です。特に、プロンプト適応性や画像生成の品質が向上し、消費者向けハードウェアでも動作可能な点は、大きな利点です。
しかしながら、手や複数人物の生成においてはまだ課題が残っており、特定のスタイルやイラストの生成には若干の改善が必要です。加えて、日本語プロンプトには対応していないため、英語での指示が必要となります。
全体的には、Stable Diffusion 3.5は、より広範な用途で活用できる強力なツールとして評価できる一方で、さらなる改良が期待される点もあります。
動画教材プラットフォーム「Coloso」にて、ComfyUIの使い方をまとめた講座「ComfyUIマスター: カスタムノードとアプリ開発」を公開しています。ComfyUIの基礎から応用、ComfyUIを利用したアプリ開発まで、ComfyUIをマスターするための内容を解説する講座になります。
今、予約注文すると、最大25%オフで購入できます。ComfyUIをマスターしたい方は、ぜひ以下のリンクよりご購入ください。