【動画生成AI】 Runway Gen-3 Alpha + Evelenlabs + Canva + ChatGPT + Claude 3.5 Sonnetでショートムービー制作！

2024年7月25日 09:43

この記事が役立つ方

生成AIを使用した動画制作方法を知りたい方
動画制作の初心者の方

はじめに

動画生成AIの発展により、数秒の動画ならば、かなり生合成を保った動画生成が可能になったことで、動画制作の初心者の方でも、それなりの動画を制作できるようになりました。
今回は、動画生成AIを始め、各種生成AIを駆使してショートムービーを作成してみます。最終的に作成した動画は以下になります。

アイデア出しや台本作成をClaude 3.5 Sonnet、動画をRunway Gen-3 Alpha、Runway Gen-3用のプロンプト作成をChatGPT、ナレーションをElevenlabs、それらの素材を組み合わせて動画への成形にCanvaを利用しました。私は動画制作の初心者なので、同じく初心者の方でも上記の動画を作れるようになります。

1. 使用するツール

Claude 3.5 Sonnet

Claude 3.5 Sonnetは、Anthropic社が開発した最新の大規模言語モデルです。高度な自然言語処理能力を持ち、創造的なアイデア出しや台本作成に優れています。ユーザーの意図を理解し、多様なジャンルやスタイルに対応した魅力的なコンテンツを生成できます。
こちらは、無料で使用することができます。しかし、1日に生成できる回数が決まっており、無料の場合は、その回数が少ないため、注意が必要です。

ChatGPT

ChatGPTは、OpenAI社が開発した大規模言語モデルを基にしたAIチャットボットです。GPT（Generative Pre-trained Transformer）アーキテクチャを使用しており、自然言語処理と生成において高い能力を持っています。Claudeと同様に、こちらも無料で使用することができます。回数制限が有料ユーザーより厳しくなっているので、使い過ぎには注意が必要です。
ChatGPTでは、カスタマイズしたGPTを作成することができます。今回は、Runway Gen-3 Alpha用のプロンプトを生成するGPTを利用します。

Runway Gen-3 Alpha

Runway Gen-3 Alphaは、最先端のAI動画生成ツールです。テキストプロンプトから高品質な動画を作成することができ、様々なスタイルや雰囲気の映像を生成可能です。既存の動画を編集したり、新しい動画を一から作成したりすることができ、クリエイティブな表現の幅を大きく広げます。

最新モデルのGen-3 Alphaを使用するには課金が必要です。最低でも月額15ドルのStandardプランへの加入が必要です。
Standardプランの場合、毎月625クレジットが付与されます。Gen-3 Alphaでの動画生成で使用するクレジットは、5秒の動画で50クレジット、10秒の動画で100クレジットになります。Standardプランでは、毎月60秒までの動画しか生成できないことをご留意ください。
さらにその上のProプランでは、毎月2250クレジットが付与されるので、毎月225秒の動画生成が可能です。さらにその上のUnlimitedプランでは、無制限に生成することができます。
Gen-3が高性能と言われていますが、生成したい動画の内容によっては、整合性の保たれた動画を生成するために何度も生成を繰り返す必要があるため、気がつくと直ぐにクレジットが空になってしまうので、StandardやProプランの方は使いすぎに注意が必要です。

Elevenlabs

Elevenlabsは、最高品質のAI音声合成技術を提供するプラットフォームです。自然で表現豊かな音声を生成し、多言語対応や感情表現も可能です。カスタムボイスの作成や既存の音声のクローニングにも対応しており、ナレーションやボイスオーバーの制作に最適です。

Elevenlabsでは、最初は10,000文字までの音声を無料で生成することができます。まずは、無料でトライし、必要に応じて課金をご検討ください。

Canva

Canvaは、直感的で使いやすいオンラインデザインツールです。プロフェッショナルな品質の画像、動画、プレゼンテーションなどを簡単に作成できます。豊富なテンプレートやデザイン要素を提供し、ドラッグ＆ドロップで操作できるため、デザインの経験が少ない人でも美しい作品を作ることができます。動画編集機能も充実しており、様々な素材を組み合わせて完成度の高い動画を制作できます。

Canvaは、無料で使用できる範囲が広いです。動画の作成も無料で行うことができます。無料の場合、Pro版の素材を利用できない、1080p (1920x1080)の解像度でしかダウンロードできないという制限があります。より高画質の動画としてダウンロードしたい、無料素材では足りない場合は、課金をご検討ください。

2. Claudeでアイデア出し＋台本制作

アイデア出し

まずは、Claudeでアイデア出しをしましょう。Claudeは優秀なので、雑なプロンプトでも良い結果を出力してくれます。以下のプロンプトを入力してみます。

動画生成AIを使って面白いショートムービーを作成したいです。動画のアイデアをください。

以下が出力結果になります。

1. **時空を超えた対話**
   - 歴史上の偉人たちが現代の問題について議論する様子を描きます。例えば、レオナルド・ダ・ヴィンチ、アインシュタイン、孔子が環境問題や人工知能の倫理について語り合う場面を作成します。AIを使って各人物の外見や声を再現し、ディープフェイク技術で自然な会話を演出します。

2. **感情の可視化**
   - 人間の感情を抽象的な映像として表現します。例えば、喜び、悲しみ、怒り、恐れなどの感情を、色彩や形、動きで表現し、それらが互いに影響し合い、変化していく様子を描きます。AIを使って感情と視覚表現をマッピングし、流動的なアニメーションを生成します。

3. **未来都市の一日**
   - AIが想像する100年後の都市の姿を24時間のタイムラプスで表現します。空飛ぶ車、ホログラム広告、環境に優しい建築物など、未来的な要素を盛り込みながら、朝から夜までの都市の変化を描きます。AIによる未来予測と3DCG生成を組み合わせて制作します。

4. **音楽の視覚化**
   - クラシック音楽の名曲を視覚的に表現します。例えば、ベートーベンの第九交響曲を、音の強弱や和音の変化に合わせて、抽象的な形や色彩が変化する映像として表現します。AIを使って音楽を分析し、それに対応する視覚要素を生成します[1]。

5. **自然の神秘**
   - 微生物の世界や、宇宙の果ての現象など、肉眼では見ることのできない自然の神秘を、AIの想像力を駆使して視覚化します。例えば、細胞分裂の過程や、ブラックホールの事象の地平線付近の様子を、科学的知見に基づきながらも芸術的に表現します。

6. **言語の進化**
   - 人類の言語の進化を視覚的に表現します。古代の象形文字から現代の文字、さらには未来の通信方法まで、言語の形態変化を動的なアニメーションで表現します。AIを使って文字の形状を生成し、その変化を滑らかにつなげます。

7. **夢の中の旅**
   - 人間の夢の中を旅するような映像を制作します。現実と非現実が入り混じり、シーンが突然変わったり、物体が変形したりする不思議な世界を表現します。AIの画像生成能力を活用し、シュールでありながら美しい映像を作り出します[2]。

面白そうなアイデアを複数提示してくれました。ここから1つを選んで台本作成してみます。

台本作成

「言語の進化」が面白そうだったので、これで台本を制作してもらいます。今回は、英語の動画にしたかったので、台本を英語で出力してもらっています。以下のプロンプトを入力します。

「言語の進化」の台本を英語で作成してください。

出力結果は以下の通りです。

# The Evolution of Language: A History of Human Communication

**Prologue (20 seconds)**
Space. Countless stars form the double helix structure of DNA.
Narration: "Our story is etched in the stars. The birth of language began with the pulse of the universe."
The DNA transforms into a human shape. Tears fall from its eyes, forming the Earth.
Narration: "And that pulse was passed on to the Earth, to humanity, and to words."
Sound effects: Cosmic music, heartbeat

**Act 1: The First Words (40 seconds)**
Primitive Earth. Early humans gathered around a fire in a cave.
Narration: "The first word was probably a cry. Fear, joy, warning—emotions became sounds that shook the air."
One person opens their mouth and makes a sound. Others turn in surprise.
Narration: "In that moment, the world changed. Sounds gained meaning, and meaning became shared."
Handprints appear on the cave wall, followed by a bison painting. The painting comes to life, depicting a hunting scene.
Narration: "And so, humanity acquired the power to tell stories. To record the past and imagine the future—words became wings that transcended time."
Sound effects: Wind, crackling fire, primitive drum rhythms

**Act 2: Dawn of Civilization (50 seconds)**
Egyptian pyramid construction site. Workers calling to each other as they move stones.
Narration: "As civilization developed, words took form. Voices were carved in stone, aiming for eternity."
Hieroglyphs appear on the wall.
Narration: "Pictographs abstracted the visible world, becoming tools to convey thought."
Scene changes to Mesopotamia, with cuneiform being inscribed on clay tablets.
Narration: "Writing continued to evolve, capable of expressing increasingly complex concepts."
Phoenician port town, merchants using alphabets for trade.
Narration: "With the birth of the alphabet, words prepared for a journey."
Letters move across a map, spreading east and west along the Silk Road.
Narration: "Languages spread with cultures, encountering new ideas and evolving further."
Chinese characters emerge, then travel to Japan where kana develops.
Narration: "In the East, the characters themselves became art, and words were clothed in beauty."
Sound effects: Construction sounds, brush writing, port bustle

**Act 3: Wings of Knowledge (40 seconds)**
Medieval monastery. Monks carefully copying text onto parchment.
Narration: "For a long time, knowledge was the privilege of the few. Words became power, sometimes a tool of oppression."
Gutenberg's printing press appears. Books are produced one after another.
Narration: "But technological progress gave words new wings. With the invention of printing, knowledge was liberated."
Library scene. People eagerly reading books.
Narration: "Books became treasure troves of knowledge, opening new worlds in people's minds."
Newspapers and magazines appear, information spreading to the masses.
Narration: "The democratization of information began, and words gained the power to change society."
Sound effects: Quill pen writing, printing press operation, pages turning

**Act 4: Electronic Words (45 seconds)**
20th century telegraph. Morse code signals race around the world.
Narration: "The Industrial Revolution brought revolution to words as well. An era arrived where words clothed in electricity circled the globe in an instant."
Invention of the telephone. Voices traveling far along wires.
Narration: "Distance became powerless before words. Voices began to resound across time and space."
Computers appear. Programming languages show on the screen.
Narration: "And then, words became logic, gaining the power to move machines. The dialogue between humans and machines began."
Explosive spread of the Internet. Information from around the world connecting.
Narration: "With the birth of the Internet, words flew freely, crossing borders and fusing cultures."
Smartphone screen. Posts in various languages flying back and forth on social media.
Narration: "An era where we connect instantly with the world at our fingertips. Individual voices now resound on a global scale."
Sound effects: Telegraph sounds, dial tones, keyboard typing, notification sounds

**Act 5: AI Language Revolution (40 seconds)**
Voice recognition waveforms appear, converting into multilingual text.
Narration: "In the 21st century, artificial intelligence opened new horizons for language. An era began where machines understand and respond to human speech."
Real-time translation between different languages.
Narration: "Language barriers began to crumble. The day approaches when people worldwide can converse freely in their mother tongues."
AI poet generating poetry, collaborating with a human poet.
Narration: "AI acquired creativity and began co-creation with humans. The possibilities of language were led to even greater heights."
Project to database and preserve endangered languages.
Narration: "Simultaneously, we began efforts to protect past languages. The challenge continues to save disappearing languages and maintain diversity."
Sound effects: AI processing sounds, poetry recitation, minority ethnic songs

**Act 6: The Future of Language (45 seconds)**
Brain wave patterns displayed, showing direct thought transmission.
Narration: "The future of language is filled with endless possibilities. A day may come when brain connects directly with brain."
Cross-cultural exchange in virtual space. Communication transcending language barriers.
Narration: "In virtual spaces, new communities are emerging that transcend differences in language and culture."
In outer space, a message from extraterrestrial intelligent life glows.
Narration: "And someday, we may encounter unknown words from beyond the stars."
Creation of a universal language composed of equations and music.
Narration: "Humanity's dream is to create a universal language that reflects the truth of the universe. That challenge has only just begun."
Sound effects: Futuristic electronic sounds, signals from space

**Epilogue (20 seconds)**
Various characters, symbols, and voices swirl, reforming into a human shape.
Narration: "Words are our very history. They reflect the past, shape the present, and point to the future."
That figure splits into countless parts, spreading into the universe as stars.
Narration: "We evolve with words and spread into the universe. The story of language continues eternally."
Sound effects: Grand orchestral climax, fade out

これで準備ができました。次はGen-3で動画生成を行います。

3. Gen-3で動画作成

前項で生成した各シーンの描写をGen-3用のプロンプトに変更してもらいます。以下のGPTsに各シーンの描写を入力することで、Gen-3用のプロンプトに変更してもらえるため、以下のGPTsを使用しましょう。

このGPTsに以下のプロンプトを入力します。

以下のシーンを生成するためのプロンプトを教えてください。
- Space. Countless stars form the double helix structure of DNA.

そうすると、以下のGen-3用のプロンプトを生成してくれます。

Slow zoom: Vast, deep space. Countless stars twinkling, slowly forming a DNA double helix structure. The light of the stars pulsates as the helix rotates gracefully. Colors shift from deep blue to purple in a gradient.

このプロンプトで動画を生成します。Runway Gen-3 Alphaにアクセスし、Gen-3での動画生成画面を開いてください。以下のような画面が表示されるので、左上のプロンプト入力部に先ほどのプロンプトを入力し、下部の動画の長さの設定箇所で"10秒"を設定し、最後に"Generate"ボタンを押下します。