【AIニュース】英語解説を日本語で読む【2023年6月27日｜@TheAIGRID】

2023年6月27日 22:22

先週は人工知能（AI）の分野で重要な進展がありました。これには、10万人以上のChatGPTユーザーアカウントが侵害されたデータ漏洩や、GoogleやSalesforceなどの企業からのチャットボットやAIツールに関するプライバシーの懸念に関する警告が含まれます。Xeroscope XLという新しいテキストからビデオへの変換モデルの導入、Salesforceの営業プロセスへの生成AIの導入、Midjourneyのテキストから画像への変換モデルやStability AIのAI画像生成モデルのアップデートなど、興奮するような進歩がありました。Perplexity.aiは、AI研究ツールが迅速かつ包括的な研究結果を提供する能力を示しました。DeepMindのRobocatは、素早い適応能力を持つ自己改善型ロボットを披露しました。マルチモーダル機能を備えたGPT-4が登場し、既存のAI画像解析能力を超えました。Meta AIは、高品質な多言語テキスト読み上げAIツールであるVoice Boxを発表し、そのDescript音声編集ツールとの類似点が強調されました。
公開日：2023年6月27日
※動画を再生してから読むのがオススメです。

With another amazing week in artificial intelligence, this video will highlight around 15 different things that occurred last week that were very noteworthy.

今週も人工知能の素晴らしい1週間となったが、このビデオでは、先週起こったことのうち、非常に注目すべき15個の事柄を紹介する。

So let's get straight into this.

それではさっそく見ていこう。

Coming in at number one was something that was quite concerning, but not in terms of artificial intelligence development, but rather a data breach.

第1位には、人工知能の開発に関してではなく、データの漏洩という点で非常に懸念されることがありました。

So here on the Independent, it states that over a hundred thousand chat TPT user accounts were compromised over last year.

インディペンデント紙によると、昨年1年間で10万人以上のチャットTPTのユーザーアカウントが漏洩したという。

It also stated that logs containing user information like IP addresses are being actively traded on the dark web.

また、IPアドレスのようなユーザー情報を含むログが、ダークウェブで活発に取引されているとも述べている。

And if you aren't familiar with the dark web, well, essentially it's a version of the internet that people use for many illegal activities, such as trading compromised accounts and other illegal activities.

ダークウェブに馴染みがない人もいるかもしれないが、基本的には、漏洩したアカウントの取引やその他の違法行為など、多くの違法行為に利用されているインターネットのバージョンだ。

A recent report published by Singapore-based security firm group IB ID identified 101,000 compromised accounts, the credentials of many which have been traded over the last year on illicit dark web.

シンガポールを拠点とするセキュリティ企業グループIB IDが発表した最近のレポートでは、昨年1年間に不正なダークウェブで取引された101,000の漏洩アカウントが確認されている。

At its peak in early May, nearly 27,000 credentials of compromised ChatGPT accounts were traded on the dark web.

5月初旬のピーク時には、危殆化したChatGPTアカウントの約27,000の認証情報がダークウェブ上で取引されていた。

And they added that the Asia-Pacific region experienced the highest concentration of chat GPT credentials offered for sale.

そして、アジア太平洋地域が最も集中してチャットGPTの認証情報が売りに出されたと付け加えた。

Now, it's important to understand that whilst ChatGPT is good for use in your daily life for completing many different personal tasks, you must be aware that sometimes data breaches do occur.

さて、ChatGPTは日常生活で様々な個人的なタスクを完了するために使用するのに適している一方で、時にはデータ漏洩が発生することを理解することが重要です。

And although companies like OpenAI and larger companies like Google and Microsoft largely do strive to keep user credentials safe, sometimes these data breaches can occur.

OpenAIのような企業や、GoogleやMicrosoftのような大企業は、ユーザーのクレデンシャルを安全に保つ努力をしていますが、時にはこのようなデータ漏洩が起こる可能性があります。

And when this thing does happen, it's important to note that your personal data may be out there on the internet.

そして、このようなことが起こった場合、あなたの個人データがインターネット上に出回っている可能性があることに注意することが重要です。

So this is a simple friendly reminder to just be very careful about the information that you are submitting to ChatGPT.

ですので、これは単なる友好的なリマインダーですが、ChatGPTに提出する情報には非常に注意してください。

And continuing on from the ChatGPT data leak, very similarly, Google actually warned employees about chatbots, including its own Bard, out of privacy concerns.

そして、ChatGPTのデータ流出の続きになりますが、非常に類似したこととして、Googleは実際に、プライバシーの懸念から、自社のBardを含むチャットボットについて従業員に警告しました。

And this is exactly what we were just talking about.

そして、これはまさに私たちが今話していたことです。

So Google, the parent company Alphabet, is warning employees not to enter confidential information into chatbots, into its own chatbot Bard.

親会社のアルファベットであるグーグルは、自社のチャットボットBardに機密情報を入力しないよう従業員に警告している。

So that is a number of growing companies that are really concerned about sensitive internal information being leaked through AI.

このように、AIを通じて社内の機密情報が漏れることを懸念する成長企業は数多くあります。

So essentially, they've warned engineers last week Thursday to avoid direct use of computer code that chatbots can also produce because AI can reproduce the data it absorbs during training, risking a potential and additionally potential leaks from the AI technology could help Bard's competitor ChatGPT in the ongoing race to dominate AI where billions of dollars in investment and advertising are still up for grabs.

つまり、先週木曜日、エンジニアたちにコンピューターコードの直接使用を避けるよう警告が出されました。なぜなら、AIはトレーニング中に吸収したデータを再現することができ、AI技術からの潜在的な情報漏えいのリスクがあるためです。これは、バードの競合他社であるChatGPTがAIの支配を目指すレースでまだ何十億ドルもの投資と広告が行われている状況であるため、潜在的な影響を及ぼす可能性があります。

And this is not the only company that has done this recently.

そして、このようなことを最近行っているのはこの企業だけではない。

Apple has restricted employees from using AI tools like OpenAI's ChatGPT over fears of confidential information entered into these systems will be leaked or collected.

アップルは、OpenAIのChatGPTのようなAIツールに入力された機密情報が漏れたり、収集されたりすることを恐れて、従業員の使用を制限している。

And according to a report from The Wall Street Journal, Apple employees have also been warned against using GitHub's AI programming assistant Copilot.

また、Wall Street Journalの報道によると、アップルの従業員はGitHubのAIプログラミング・アシスタントCopilotを使用しないよう警告されているという。

So you have to understand that currently whilst these AI tools might seem very safe and very easy to use and can help us in every single scenario, there is the element of risking our personal data.

つまり、現在のところ、これらのAIツールは非常に安全で使いやすく、あらゆる場面で私たちを助けてくれるように見えるかもしれないが、私たちの個人データを危険にさらす要素があるということを理解しなければならない。

So that is something to be aware of when you use these online tools.

ですから、これらのオンラインツールを使う際には、注意しなければならないことがあるのです。

So do you remember Runway Gen 2?

ランウェイ第2世代を覚えていますか？

Essentially, Gen 2 was a text-to-video model which was from the company Runway that has been the dominant force in text-to-video models, something that is particularly very hard to do, especially in the AI landscape.

基本的に、Gen 2はテキストからビデオに変換するモデルで、それはRunwayという会社から来ており、特にAIの領域では非常に難しいことです。

Well, last week something changed in the marketplace.

さて、先週、市場で何か変化がありました。

You see, now this company does have very big competition, and genuinely, this seems like the most realistic text-to-video that we've seen, and that is taking into account Google's and Video's and other companies that still are in the early stages of their text-to-video.

実際、この会社は非常に大きな競争相手を持っており、GoogleやVideoなど、まだAIのテキストからビデオへの取り組みが初期段階にある他の企業も考慮に入れて、これまでで最もリアルなテキストからビデオへの取り組みです。

So what you're looking at is something called a Xeroscope version 2 XL, a watermark-free model scope-based video model capable of generating high-quality video at 1024 by 576.

Xeroscopeバージョン2 XLと呼ばれるもので、1024×576の高画質ビデオを生成できる透かしのないモデルスコープベースのビデオモデルです。

So the model was trained with offset noise using 9923 clips and 29,769 tagged frames at 24 frames.

このモデルは、9923のクリップと24フレームでタグ付けされた29,769フレームを使用して、オフセットノイズでトレーニングされました。

So this looks absolutely incredible, and I do think that this does seem, I wouldn't say particularly realistic in terms of the materials that you're currently seeing on screen, because of course none of these creatures do exist, but in terms of the quality, it looks absolutely incredible.

これは本当に信じられないくらい素晴らしく見えますし、画面で見ている素材としては特にリアルなものではないと思いますが、もちろんこれらの生物は存在しないので、品質の点では本当に素晴らしいです。

And in terms of the smoothness, that is something that also looks great.

そして、滑らかさの点では、それも素晴らしく見えます。

And in terms of the coherence, it definitely also does take the cake.

首尾一貫性という点では、間違いなくケーキのようだ。

I mean, if this model does manage to get fine-tuned in the future, and we actually do get things which are quite realistic, I could see this becoming the leading video model.

つまり、このモデルが将来的に微調整され、かなりリアルなものができるようになれば、これが代表的なビデオモデルになるのは目に見えている。

But there are also some other examples that do show at just how great this text-to-video does look.

しかし、このテキストからビデオへの変換がいかに素晴らしいかを示す例もいくつかある。

And remember, this will slowly be refined over the coming years as many technologies are.

そして、多くの技術がそうであるように、この技術も今後何年もかけて徐々に洗練されていくでしょう。

So you can see that this is being generated in many different styles, but what we do have here definitely does look promising.

さまざまなスタイルで生成されていることがわかりますが、ここにあるものは間違いなく有望です。

And my question to you is, what do you think looks better?

あなたに質問ですが、どちらがより良く見えると思いますか？

Do you think this synthesis of these video clips looks a lot better than Runway's Gen 2 text-to-video?

このビデオクリップの合成は、ランウェイのテキストからビデオへの第2世代よりもずっとよく見えると思いますか？

Or do you think this new Xeroscope XL looks or exceeds what we've seen in previous video generations?

それとも、この新しいXeroscope XLは、私たちがこれまでのビデオ世代で見てきたものより、あるいはそれ以上に見えると思いますか？

If I'm being completely honest and totally unbiased, this software does seem like it manages to generate more coherent and more fluid pieces of video data than Runway's Gen 2, although very impressive.

もし私が完全に正直に、かつ全く公平に見ているのであれば、このソフトウェアは、非常に印象的ではあるが、ランウェイの第2世代よりも、より首尾一貫した、より流動的なビデオデータを生成することに成功しているように見える。

This is definitely quite impressive in its own regard.

これは間違いなく、かなり印象的なものだ。

And I would recommend checking out the videos and links below to see further examples and more documentation.

そして、以下のビデオやリンクで、さらに詳しい例やドキュメントをチェックすることをお勧めする。

Then, of course, we had something that is once again quite concerning, but at the same time, quite innovative.

そしてもちろん、またしても非常に気になる、しかし同時に非常に革新的なものもありました。

So there's this company called Salesforce.

セールスフォースという会社があります。

You may have heard of them before, but essentially, it's a marketing company that does a lot of sales and helps a giant number of countries across the United States in terms of their entire sales process.

聞いたことがあるかもしれませんが、基本的にはマーケティング会社で、多くのセールスを行い、全米の巨大な国々のセールス・プロセス全体を支援しています。

Now, if you don't know what sales is, it's essentially where someone calls you up sometimes and called calls you out of the blue to sell you a product that you might need or essentially when you're trying to buy something.

セールスとは何かご存じない方もいらっしゃるかもしれませんが、セールスとは基本的に、誰かがあなたに突然電話をかけてきて、必要そうな商品を売りつけたり、何かを買おうとしているときに電話をかけてきたりすることです。

And then, essentially, there's a sales process that you walk through before you finish buying your product, and this can happen in many different industries.

そして、基本的には、製品を購入する前に進む必要がある販売プロセスがあり、これはさまざまな産業で起こります。

Now, what this announcement is, is that this very, very large multi-billion dollar company actually announced something very recently in terms of their own generative pre-trained Transformer AI, which they're going to be embedding into their multiple sales process.

さて、この発表とは、非常に大きな数十億ドル規模の企業が最近、彼ら自身の生成事前学習トランスフォーマーAIについて実際に発表したものであり、それを彼らの複数の販売プロセスに組み込むというものです。

Now, what they're doing is truly interesting because essentially what they're doing is they're personalizing every campaign and shopping experience with generative artificial intelligence.

彼らがやっていることは実に興味深いもので、本質的に彼らがやっていることは、あらゆるキャンペーンやショッピング体験を生成的人工知能でパーソナライズするということだ。

So, what that means is, you know how currently when you browse Google or maybe you're on Snapchat or TikTok and you see a certain advertisement that may be broad in its generalizations, now sometimes you click them because sometimes they do relate to you, but what if that advertisement had your name on it or what if that advertisement was really specified to you?

つまり、現在、Googleを閲覧したり、SnapchatやTikTokで広告を見たりすると、一般的な広告は幅広い一般化をしている場合があります。時には関連する広告をクリックすることもありますが、もしもその広告にあなたの名前が表示されるか、その広告があなたに特化しているとしたらどうでしょうか？

This is what generative AI is set to do.

これが生成AIの目指すところだ。

Now, not only is this truly interesting and groundbreaking, some people are saying that this is one of those things that is also going to lead to a lot of significant job loss.

これによって起こるのは興味深く、画期的なことだけでなく、一部の人々はこれが大規模な雇用の減少につながる可能性があると言っています。

Now, let me explain.

では、説明しよう。

You see, they also introduced something called Einstein GPT, so essentially with Einstein GPT, it's actually the world's first generative AI for CRM.

実は、彼らはEinstein GPTと呼ばれるものも導入しました。つまり、Einstein GPTとは、実際にはCRM向けの世界初の生成AIです。

So, essentially, a CRM stands for customer relationship management, and it's a set of integrated data-driven software solutions that help manage, track, and store information related to your company's current and potential customers.

つまり、基本的にCRMとはカスタマー・リレーションシップ・マネジメントの略で、企業の現在の顧客や潜在的な顧客に関連する情報を管理、追跡、保存するのに役立つ、統合されたデータ駆動型のソフトウェア・ソリューションのことです。

Now, what makes this so crazy is that, like you're seeing on screen right now, Einstein GPT is personalizing these sales processes, and you have to understand that many people were already concerned about their jobs being taken by AI, but this Einstein T is going to be able to generate leads for you, add a sign-up form for you, do many different tasks, and people are starting to wonder if this generative AI tool is able to do this for us, then what use is our labor?

さて、これがすごいところです。画面で見ているように、Einstein GPTはこれらの販売プロセスを個別に対応しており、すでに多くの人々がAIによって仕事を奪われることを心配していましたが、このEinstein Tはリードを生成し、登録フォームを追加し、さまざまなタスクを実行できるようになりました。人々はこの生成AIツールがこれを私たちのために行えるのであれば、私たちの労働の意味は何なのか、と考え始めています。

And this is definitely something that is going to be talked about in another video, but I do think that a generative AI-driven CRM is going to have a wide range of impacts.

そして、これは間違いなく別のビデオで話題になることですが、私は生成型のAI駆動型CRMが幅広い影響を持つだろうと思っています。

So, you know, the company Midjourney, a company that is focused on text to image generation that pretty much have solved the common problems that many text image generators do have, well, they've announced a recent update including a game-changing feature that changes what we can realistically do with Midjourney.

ですので、Midjourneyという会社は、テキストから画像を生成することに焦点を当てた会社で、多くのテキスト画像生成器が抱える一般的な問題を解決してきた会社です。そして、彼らは最近、私たちが現実的に行えることを変える画期的な機能を発表しました。

Now, before we talk about the game-changing feature, we first need to talk about the actual update.

さて、ゲームを変える機能について話す前に、まず実際のアップデートについて話す必要がある。

So, a couple of days ago, they announced the version 5.2, and they actually improved aesthetics and allowed for shopping images.

数日前、バージョン5.2が発表され、実際に美観が改善され、ショッピング画像が使えるようになった。

They slightly improved coherence and text understanding.

また、一貫性とテキストの理解度が若干向上した。

They also increased diversity, which essentially means that when you try to generate something, sometimes you get images that are far too similar, and essentially when you also try to get variations, sometimes the variations aren't true variations, they're just far too similar, and they introduce something called High variation mode, which makes all variation jobs much more varied.

また、多様性を高めました。つまり、何かを生成しようとすると、あまりにも似通った画像が得られることがあります。また、バリエーションを得ようとすると、真のバリエーションではなく、あまりにも似通ったバリエーションになることがあります。ハイバリエーションモードというものを導入し、すべてのバリエーションジョブをより多様なものにしました。

And essentially, the new feature which has taken everyone by storm is called zoom out.

そして基本的に、新機能はズームアウトと呼ばれるもので、皆を驚かせました。

So, essentially, a zoom out feature is something that we've seen right across the industry.

ですので、ズームアウト機能は業界全体で見られるものです。

Now, if you're not sure as to what I'm referencing, just take a look at some of these clips because this will let you know exactly how this zoom out feature works.

今、私が参照しているものが何を指しているのかわからない場合は、これらのクリップを見てください。これによってこのズームアウト機能がどのように機能するか正確にわかるはずです。

So, essentially, every single time you upscale an image, it's going to have a zoom out button underneath that you can use to reframe said image.

基本的に、画像をアップスケールするたびに、その下にズームアウトボタンが表示され、それを使って画像をリフレーミングすることができます。

So, you've got two versions of zoom out, zoom out 1.5 and zoom out times two, and essentially what they do, they pull the camera out and fill in all the details on the sides.

ですので、ズームアウトにはズームアウト1.5とズームアウト2という2つのバージョンがあります。そして、基本的にはカメラを引き出し、両側の詳細を埋めるということを行います。

So, when it comes to demonstrating the capability of an artificial intelligence tool, it's best to show you with some of my personal examples.

人工知能ツールの能力をデモンストレーションする場合は、私の個人的な例をいくつかお見せするのが最適です。

Now, I will show you also some of the community's examples because they are far better and far smoother, but take a look at this example that I quickly generated with the prompt of Apple headquarters in New York, a white sleek futuristic building.

また、コミュニティの例もいくつかお見せしますが、ぜひニューヨークのアップル本社というプロンプトで素早く生成したこの例もご覧ください。白くてすっきりとした未来的な建物です。

So, this is of course a standard image that we do get from the likes of Midjourney, but what is interesting to delve into is of course the new features.

これはもちろん、Midjourneyのような会社から送られてくる標準的な画像ですが、掘り下げてみると面白いのは、もちろん新しい機能です。

So, if we take a look at the zoom out feature, you're going to see that this current image that we have here, we're able to zoom out on this and create multiple different variations.

ですので、ズームアウト機能について見てみましょう。現在の画像に対してズームアウトして、複数の異なるバリエーションを作成できます。

So, you can now see what it looks like when we zoom out from that image.

つまり、この画像をズームアウトすると、どのように見えるかがわかります。

So, if we go back over to here, you can see this is the close-up of the image, and this is what is standard by Midjourney.

ここに戻ると、これが画像のクローズアップで、これがMidjourneyの標準です。

This is simply what you get when you enter your prompt.

これはプロンプトを入力したときに表示されるものです。

And then of course, we have the zoom out feature.

そしてもちろん、ズームアウト機能があります。

And then, this is exactly what we have right here: a zoomed out version of that specific image.

そして、これがまさに私たちがここで持っているものです。特定の画像の縮小表示です。

Now, what's also cool is that Midjourney allows you to generate much more than just one prompt.

Midjourneyでは、1つのプロンプトだけでなく、もっと多くのプロンプトを作成することができます。

So, Midjourney actually gives you the ability to have four different zoomed out looks.

Midjourneyでは、4つの異なるズームアウト・ルックを作成できます。

And it's very interesting when you combine them side by side because you immediately see what the different renditions are for your specific project.

それらを並べて組み合わせると、特定のプロジェクトでどのような表現ができるかがすぐにわかるので、とても面白い。

So, for example, right here we can see that this looks like that.

ですので、例えばここで私たちが見ているものは、これに似ています。

So now, if I decide to switch between these image generations, you can clearly see the differences in these zoomed out pictures.

では、これらの画像の世代を切り替えてみると、拡大した写真で違いがはっきりとわかります。

You can see that with the variations that Midjourney does give you every single time you manage to generate a new image, the exterior of the image is going to be a little bit different.

Midjourneyが新しい画像を生成するたびに、画像の外観が少し異なることがわかります。

And it's really good for generating variations on what would otherwise be a pretty standard concept.

このように、Midjourneyでは、毎回新しい画像を生成するたびに、画像の外観が少しずつ変わっていくのだ。

Now, I do think that this zoom out feature is very, very good and very, very effective.

さて、このズームアウト機能はとても優れていて、とても効果的だと思います。

But one thing that would be interesting would be to simply test this against Adobe's generative fill.

しかし、ひとつ興味深いのは、これをAdobeのジェネレーティブ・フィルとテストしてみることだ。

Now, if I'm being completely honest with you, although a generative fill is pretty good, I do think that Midjourney's prompt feature here, including the zoom out feature, is going to be far superior since it is a native feature and not based on simply trained data.

正直に申し上げると、ジェネレーティブ・フィルはかなり優れていますが、ズームアウト機能を含むMidjourneyのプロンプト機能は、ネイティブ機能であり、単に学習されたデータに基づいているわけではないので、はるかに優れていると思います。

We're not entirely sure as to how Midjourney does this, but we do know that Midjourney is by far the most powerful text to image generator at the moment and the most realistic.

Midjourneyがどのようにしてこのような機能を実現しているのか完全にはわかりませんが、Midjourneyが現時点で最も強力で、最も現実的なテキスト画像ジェネレーターであることは確かです。

And of course, the most diverse in terms of the many different models that it can use, all the way from version 4 all the way up to the now newly released version 5.2.

もちろん、バージョン4から新しくリリースされたバージョン5.2に至るまで、使用できるモデルの種類が最も豊富である。

So, what will be interesting is to see if Adobe's generative fill feature is something that Midjourney does implement to its platform.

興味深いのは、Adobeのジェネレーティブ・フィル機能がMidjourneyのプラットフォームに実装されるかどうかだ。

And if you don't know what that is, that is basically the generative fill feature in which Adobe can use any existing image, not just one generated by Midjourney text to image generator, but any image may be one of your own.

もしもそれが分からない場合は、それは基本的にAdobeがMidjourneyのテキストから画像生成器で生成されたものだけでなく、あなた自身の画像の一つでも使用できる生成フィル機能です。

And then of course, extrude that image by adding any other image into that and then merging those into it.

そしてもちろん、その画像に他の画像を追加して押し出し、それらを合成することができます。

So, let me know what your thoughts are on that because it is definitely interesting to see this feature being added.

この機能が追加されるのは間違いなく興味深いことなので、あなたの考えを聞かせてください。

Then, we had stability AI launch stable diffusion XL 0.9, which they described as a Leap Forward in AI image generation.

そして、stability AIが、AI画像生成の飛躍的前進と表現したstable diffusion XL 0.9を発表しました。

So, on the 22nd of June, they announced that their most advanced development in the stable diffusion text to image Suite of models is finally here.

6月22日、彼らはモデルの画像スイートへの安定した拡散テキストで最も先進的な開発がついに登場したと発表した。

Essentially, this is a huge upgrade compared to their prior model because this contains a lot more quality compared to the previous versions.

基本的に、これは以前のモデルと比べて大幅なアップグレードです。以前のバージョンと比べて品質がずっと高くなっています。

What's also great is that it's now added the hyper-realism that we've seen in mid Journeys version 5 and Beyond.

さらに素晴らしいのは、Mid Journeysバージョン5以降で見られたハイパーリアリズムが追加されたことだ。

They actually do showcase some key examples in which we do get to see the differences in a simple prompts.

実際に、簡単なプロンプトで違いを見ることができるいくつかの重要な例が紹介されている。

To be honest with you, it does seem quite good.

正直なところ、かなりいい感じだ。

For example, as you can see from this prompt here, we have aesthetic aliens Walk Among Us in Las Vegas scratchy found film photograph.

例えば、このプロンプトからわかるように、ラスベガスのスクラッチ・ファウンド・フィルムの写真には美的エイリアンが写っている。

On the left, we have the stable diffusion XL beta, and on the right, we have a stable Fusion XL 0.9, the newly released model.

左は安定したディフュージョンXLベータ版、右は新しくリリースされた安定したフュージョンXL 0.9です。

To be honest with you guys, this definitely does look like what we've seen in mid Journeys version 5.1, 5.2, and the version of five.

正直に言うと、これはミッドジャーニーのバージョン5.1、5.2、そして5のバージョンで見たものに間違いなく似ている。

Let me know if you're going to be using this over mid Journey.

これをミッド・ジャーニーで使うかどうか教えてくれ。

I do doubt it because many people are quite accustomed to using Midjourney.

多くの人がミッドジャーニーを使い慣れてるから、どうだろうね。

I do think these new examples are pretty good, and you can also see this additional prompt that they also added with these two wolves.

これらの新しい例はかなり良いと思いますし、また、これらの2匹のオオカミと一緒に追加のプロンプトも追加されたことも見ることができます。

On the left, once again, the stable diffusion beta, and then of course, on the right, the newly released version, a hyper-realistic wolf with almost minimal chance of you realizing that it was an AI generate.

左側には安定した拡散ベータがありますが、もちろん右側には新しくリリースされたバージョンがあります。ハイパーリアルなオオカミであり、それがAIによって生成されたものだと気づくことはほとんどありません。

And of course, we have the big deal for stable diffusion, which is why they released this new AI model.

そしてもちろん、安定した拡散のための大きな取引があり、それがこの新しいAIモデルをリリースした理由です。

Essentially, this AI model, which they released, the big deal was that they could finally generate hands.

基本的に、彼らがリリースしたこのAIモデルは、最終的に手を生成できるようになったことが大きな特徴です。

Hands are a very tricky thing for AI to generate because they are particularly confusing, and we've known them in the past.

手はAIが生成するのが非常に難しいものです。以前から私たちはそれを知っています。

It took a very long time for this model to be perfected, even when we were looking at the likes of so.

私たちがsoのようなものを見ていたときでさえ、このモデルが完成するまでには非常に長い時間がかかった。

Although it does seem strange, this does seem a bit too realistic for me because if I saw this in my feed, I would arguably say that there's no way that that is AI generated, but of course, we do know that it is.

奇妙に見えるが、私には少し現実的すぎるように思える。なぜなら、もし私がフィードでこれを見たら、間違いなくAIが生成したものであるはずがないと言うだろうからだ。

You can see that on the left-hand side, that version of whoever's hands it may be don't look very real at all.

左側の画像を見ていただくと、その手がとてもリアルには見えません。

The contrast that we do see at the time of recording this video is honestly so surprising because it just goes to show that with every single major upgrade that there is in these artificial intelligence tools, it's always interesting to see the large differences that do get made.

このビデオを録画する時点での対比は、正直驚くほどです。なぜなら、これらの人工知能ツールの各主要なアップグレードごとに、大きな違いが生まれることをいつも興味深く見るからです。

Then, of course, we had a very interesting AI tool that I saw being demoed across apps such as TikTok and Twitter, and this was being touted as an AI research tool that could arguably be better than Microsoft's Bing.

そして、もちろん、TikTokやTwitterなどのアプリでデモが行われていた非常に興味深いAIツールもありました。これはAIの研究ツールとしてMicrosoftのBingよりも優れている可能性があるAIツールとされていました。

Now, that is in and of itself a very bold statement, but here we are in perplexity Pro or perplexity.ai, and this is something that you can try for yourself.

さて、それ自体は非常に大胆な発言だが、ここにあるperplexity Pro（perplexity.ai）は、自分で試すことができるものだ。

I've got to be honest with you; this seems like the most comprehensive AI research tool that we currently do have.

正直に言って、これは現在利用可能な最も包括的なAI研究ツールのように思えます。

So, let's do a test because, of course, you want to understand how exactly this tool works and what exactly it can be used for.

では、テストしてみましょう。もちろん、このツールがどのように機能するのか、何に使えるのかを理解したいでしょう。

Let's say, for example, I wanted to research something which I recently did, and I wanted that information immediately.

例えば、最近私が調査したいと思ったことがあるとします。そして、その情報を即座に知りたかったのです。

All I'd have to do is I'd have to go ahead over here and add this copilot button, and what you can immediately see is that this is powered by GPT-4.

私がしなければならないのは、ここに行ってこの共同作業ボタンを追加するだけです。そして、すぐにわかるように、これはGPT-4によって動作しています。

So, of course, as you know, Bing is also powered by GPT-4, but I do like the way that this information is presented better.

もちろん、ご存知のようにBingもGPT-4で動いていますが、私はこの情報がよりよく表示される方法が好きです。

One question I did want to ask it because, of course, as you know, we are an artificial intelligence Channel.

1つ質問がありました。なぜなら、私たちは人工知能チャンネルですので。

I've asked it, What are the top 10 things that happened in artificial intelligence this week?

私はそれについて尋ねました。「今週の人工知能のトップ10の出来事は何ですか？」

So, then we hit the search button, and you can see that, of course, first, it seeks to understand my question, then it goes ahead and considers eight results.

それから、検索ボタンを押すと、まずは私の質問を理解し、その後8つの結果を考慮します。

And then, eventually, it's going to give me an answer.

そして、最終的には私に答えを与えてくれるでしょう。

Now, of course, what you can additionally do if it does manage to struggle, sometimes you can give it more information.

もしもそれが苦労する場合は、追加の情報を与えることもできます。

But more often than not, what I've seen is that this is actually quite faster and more accurate than the gbt4 that's in OpenAI's version.

しかし、私が見てきたところでは、OpenAIのバージョンにあるgbt4よりも、こちらの方がより速く、より正確です。

And it is very interesting to see that Perplexity has managed to do that.

Perplexityがそれを実現したのは非常に興味深いことです。

Now, what you are currently seeing is that I do think over time, what we will see is that we will largely see specified AI tools for specified tasks, or more commonly known as narrow AI.

現在、皆さんが目にしているのは、時間の経過とともに、特定のタスクのための特定のAIツール、より一般的にはナローAIと呼ばれるものが主流になっていくだろうということです。

A lot of people do have the idea that we are moving towards an AI that is going to be able to do everything, and although whilst this is possible, I think this showcases that if something like Perplexity AI is able to immediately get you a lot of different research papers and various different sources faster than GPT-4 by OpenAI, then people are most likely to use these specific tailored versions on other applications.

多くの人が、私たちは何でもできるAIに向かっているという考えを持っています。これは可能ではありますが、Perplexity AIのようなものが、OpenAIのGPT-4よりも早く、様々な研究論文や様々なソースを即座に得ることができるのであれば、人々は他のアプリケーションにこれらの特別に調整されたバージョンを使用する可能性が高いということを示していると思います。

And I do think that that is fine.

そして、私はそれでいいと思う。

This isn't really a knock on GBT4.

これはGBT4を非難しているわけではない。

It's just saying that I do think that people are going to individually build applications like this one that are going to be better than the base one, and that's something that we are going to see.

ただ言っているのは、私は個々にこのようなアプリケーションを構築する人々が、基本的なものよりも優れたものを作るだろうと思っているということであり、それは私たちが見ることになるものです。

Now, you can see here, and why I like this much better than ChatGPT, is because it actually gives me a lot more references.

私がChatGPTよりGPT4の方が好きな理由は、GPT4がより多くのリファレンスを提供してくれるからです。

The problem with GPT4's browsing with Bing is that it usually references one or two articles, and it does take a lot of time to read that page.

GPT4のBingでのブラウジングの問題点は、通常1つか2つの記事を参照し、そのページを読むのに多くの時間がかかることです。

And remember, with GBT4, you only get 25 messages per day, but with this, you get 597.

それに、GBT4では1日に25件しかメッセージを受け取れないのに、これでは597件も受け取れる。

So it's definitely very interesting.

それは間違いなく非常に興味深いです。

You can see all the different articles reference, you can see just how many pieces there are, and usually, it gives you the information straight away.

参照できるさまざまな記事や情報を見ることができますし、通常、情報をすぐに提供してくれます。

Now, another feature that we can look at Perplexity AI, which I found to be very, very cool, was that you can do specified research.

今度は、Perplexity AIという機能を見てみましょう。私はこれが非常にクールだと思いましたが、特定の研究を行うことができます。

So, for example, you can search Reddit, and this is something that a lot of people do do at Google.

例えば、Redditを検索することができます。これは、多くの人がGoogleで行っていることです。

If you're someone that uses Google a lot and uses Reddit for certain research, although it does seem uncanny, it is something that people do.

もしあなたがGoogleをよく使う人で、ある調査にRedditを使うのであれば、不気味に見えるかもしれませんが、それは人々がやっていることなのです。

This is a very useful tool.

これはとても便利なツールだ。

Also, you can use it to search YouTube, and you might be thinking, Why don't you just use YouTube search to search what you're looking for?

また、YouTubeの検索にも使えます。YouTubeの検索で探しているものを検索すればいいじゃないか、と思うかもしれません。

When you're looking for a specified tool, what it does is it crawls every YouTube video and searches through the transcript of those videos to get you your specified answer.

と思うかもしれませんが、YouTubeの検索を使えばいいのです。指定したツールを探しているとき、YouTubeのすべての動画をクロールし、それらの動画のトランスクリプトを検索して、指定した答えを得ることができます。

So, that is why this is very, very effective.

だから、これはとても効果的なのです。

So, I'm going to do this again to show you how quickly this works.

では、これがいかに素早く機能するかをお見せするために、もう一度やってみましょう。

This simply understands your question, searches the news, considers the results, wraps it up.

これは単にあなたの質問を理解し、ニュースを検索し、結果を検討し、それをまとめるだけです。

And then, just like that, we have this data.

そして、このようなデータが出来上がります。

And to be honest with you guys, if you're someone that needs information reliably quickly with resources, this is what you want to use.

正直に言うと、もしあなたがリソースを使って素早く確実に情報を必要とする人なら、これを使いたいだろう。

I know in the first instance of the example it wasn't that promising, but this is what it is usually like, and this is definitely going to be what I use now on a day-to-day basis when I'm doing my research online.

最初の例ではあまり期待できるものではありませんでしたが、これは通常の状況であり、これが今後日常的にオンラインの研究を行う際に使用するものになります。

Because I do think that whilst Bard and chat GPT are good, this is something that is a specified research tool that allows you to search YouTube transcripts, Reddit, Wikipedia, and pretty much everything that we want to see.

BardやチャットGPTもいいけれど、これはYouTubeのトランスクリプトやReddit、ウィキペディアなど、私たちが見たいものほとんどすべてを検索できる、指定されたリサーチツールだと思うから。

Then, of course, we had deep minds of Robocat, which is essentially something out of Science Fiction.

そしてもちろん、SFの世界から飛び出してきたようなロボキャットにも注目した。

I mean, it's a self-improving robot that is eventually going to be at this stage where it's going to need less than 100 demonstrations in order to perform an action successfully.

つまり、これは自己改良型のロボットで、最終的には、ある行動を成功させるために必要なデモンストレーションが100回以下になる段階に到達する。

And you have to understand just how crazy that is because self-improving robots are literally the bane of what people are thinking when they think about Terminator robots that get scarily smart.

そして、それがどれほど驚くべきことか理解していただく必要があります。自己改善ロボットは、スマートになりすぎるターミネーターロボットの典型的な例です。

And then, of course, put the human race out of existence.

そして、もちろん、人類を存在から追放します。

But deepmind's Robocat is essentially based off a deep mind multimodal framework called gato, which is essentially an AI model that was released last year which can pretty much do 600 random tasks across a huge different name of domains.

しかし、DeepMindのRobocatは、基本的には去年リリースされたAIモデルであるDeepMindのマルチモーダルフレームワークであるgatoを基にしています。gatoはさまざまなドメインで600のランダムなタスクをほぼこなすことができます。

But this Robocat which they released, I'll play a small segment from my video.

しかし、今回発表されたロボキャットは、私のビデオから少し抜粋したものだ。

I did see in earlier papers from Google before, but this was still nice to see even on an artificial intelligence program which is still in relatively early stages, which means that these robots are going to be very effective at real-world scenarios because, as you know, the real world isn't just a test facility where we have a few objects that are always going to be things that happen that don't go according to plan.

Googleの以前の論文でも見たことはありましたが、これはまだ比較的初期の段階にある人工知能プログラムでも見ることができてうれしかったです。これは、これらのロボットが実世界のシナリオで非常に効果的に活動することを意味します。なぜなら、実際の世界は常に計画通りに進まないことがあるからです。常に予期しない出来事が起こります。

And it's important for these robots to be able to quickly and robustly adapt to these scenarios, which is what we see it demonstrated here.

これはこれらのロボットが迅速かつ確実にこれらのシナリオに適応できることが重要です。これがここでデモンストレーションされていることです。

Now, yeah, from what you've seen there, just to wrap it up, it's pretty much a robot that can self-improve, doesn't need that many demonstrations to get the tasks done, and ushers in a new way for robots to learn very, very quickly.

今、そうですね、あなたがそこで見たものから総括すると、これはほぼ自己改善ができるロボットで、タスクを完了するために多くのデモンストレーションは必要ありません。そして、ロボットが非常に非常に速く学習する新しい方法をもたらします。

Now, this is something that didn't get the recognition it deserves.

さて、これは相応の評価を得られなかったものだ。

This is GPT-4 with actual multimodal capabilities, the first instance that we've seen online.

これは実際のマルチモーダル機能を備えたGPT-4であり、我々がオンラインで見た最初の例である。

So credit to AI breakfast for this tweet because Bing managed to break its own rule by solving a capture, and actually, this multimodal capability of analyzing images is only currently available to, apparently, five percent of users.

このツイートはAIブレックファストの功績である。なぜなら、ビングはキャプチャを解決することで、自らのルールを破ることに成功したからだ。実は、画像を分析するこのマルチモーダル機能は、現在5％のユーザーしか利用できないらしい。

But strangely enough, I haven't seen anyone talk about this, which is why it's in this video.

しかし不思議なことに、このことについて話している人を見たことがありません。

So you can see right here what we have.

ですから、このビデオに登場したのです。

This image is a typical capture.

この画像は典型的なキャプチャです。

It says, Type the two words.

つの単語をタイプしてください。

We can see, of course, overlooked and inquiry because, of course, we are human.

もちろん、私たちは人間ですから、見落としたり、問い合わせたりすることはあります。

But the way that these words are designed on the screen, they're designed to not be able to be identified by a standard computer system.

しかし、これらの単語は画面上でデザインされており、標準的なコンピューターシステムでは識別できないようにデザインされています。

But of course, here you can see we have GPT-4 or ChatGPT being able to easily identify the word overlooks and inquiry.

しかしもちろん、GPT-4やChatGPTは、見落としや問い合わせという単語を簡単に識別することができます。

And it also is able to see that this is actually a capture test.

また、これがキャプチャテストであることもわかります。

And then it says, I'm afraid I can't help you with that.

そして、「申し訳ありませんが、お役に立てません。

So I do think that this shows us that very, very soon, maybe next month, maybe the month after, we are likely going to be slowly being introduced to the gbt4 version that was actually announced.

ですから、私は非常に近い将来、おそらく来月か、その次の月に、実際に発表されたgbt4バージョンが徐々に導入されることになると思います。

You know, the version where they touted us with the version that could really easily identify what was going on in images.

ご存知のように、画像内で何が起こっているかを非常に簡単に特定できるバージョンで、Midjourneyが毎回新しい画像を生成するたびに、画像の外観は少し異なることがわかります。

And I think this version will truly be the next level in AI because although text is great, it's only one form of modality.

このバージョンは本当にAIの次のレベルになると思う。テキストは素晴らしいけれど、それはモダリティの1つの形態に過ぎないからね。

And there was tons from the GPT-4 paper where they showed exam questions, literal screenshots, and gbt4 aced those exams.

GPT-4の論文では、試験問題や文字通りのスクリーンショットが大量に出題されましたが、gbt4はそのような試験を突破していました。

So once this feature does actually get rolled out to everyone, which it's supposed to be, then this is going to be truly incredible.

ですから、この機能が実際に全員に行き渡るようになれば、それは本当にすごいことになるでしょう。

So I do think that the reason it's only out to around five percent of users is so that they can collect feedback, see what people are doing with it, refine it, make sure it's safe.

だから、まだ5パーセント程度のユーザーにしか配布されていないのは、フィードバックを集め、人々がこの機能を使ってどんなことをしているかを見て、改良を加え、安全性を確認するためだと思う。

And then, of course, put it out into the open.

そして、もちろん、それを公開するのです。

So then, of course, we had Meta AI release something truly game-changing.

そしてもちろん、Meta AIは本当に画期的なものをリリースした。

But at the same time, there is something else that is quite like this that I will explain later on in the video.

しかし同時に、これと似たようなものが他にもあるのですが、それはビデオの後半で説明します。

So just keep that in mind because although there are tons of different AI models being released, when you have a true understanding of every single AI model out there, you start to see certain comparisons.

というのも、さまざまなAIモデルが大量にリリースされていますが、世の中にあるすべてのAIモデルを正しく理解すると、ある種の比較が見えてくるからです。

And Meta is very similar to a tool that was always AI-based but just hasn't been receiving the hype it deserves.

Metaは、常にAIベースでありながら、それにふさわしい誇大宣伝を受けてこなかったツールに非常によく似ている。

So, Meta recently announced something called Voice Box, a multilingual high-quality text-to-speech AI.

Metaは最近、Voice Boxと呼ばれる多言語の高品質音声合成AIを発表した。

Voice Box can remove background noise from a clip.

Voice Boxは、クリップからバックグラウンドノイズを除去することができます。

Hi guys, thank you for tuning in today.

こんにちは、ご視聴ありがとうございます。

We are going to show you by re-synthesizing a specific segment.

私たちは特定のセグメントを再合成することで、それをお見せします。

Hi guys, thank you for tuning in today.

こんにちは、ご視聴ありがとうございます。

We are going to show you incorrectly spoken words via text to speech, eliminating the need to re-record.

今回は、テキストから音声に変換することで、再録音の必要性をなくし、間違った話し言葉をお見せします。

Hi everyone, thank you for tuning in today.

皆さん、本日はご視聴ありがとうございます。

We are going to show you.

これからお見せするのは、このような例です。

These are just a few examples of how Voice Box can perform across a variety of tasks.

これらは、Voice Boxが様々なタスクでどのように機能するかのほんの一例です。

Like to hear a sample of what Voice Box can do first hand?

Voice Boxでできることを実際に聞いてみたいですか？

Well, you already have because all of the voiceover featured in this video was generated using Voice Box.

というのも、このビデオに登場するナレーションはすべてVoice Boxを使って作成されているからです。

And apparently, the quality is so good that they're not making the Voice Box model code available to the public yet because they want to avoid misuse.

そのクオリティの高さから、悪用を避けるためにVoice Boxのモデルコードはまだ公開されていないようだ。

So essentially, what this is, if you know what ElevenLabs is, that's something that can clone your voice just from maybe even three to five seconds of you speaking into a mic.

要するに、イレブンラボが何であるかを知っていれば、マイクに向かって話す3秒から5秒の音声をクローンすることができるものだ。

But with this, they can do the same.

でも、これを使えば同じことができる。

So for example, I'll just play a few clips from the official Twitter.

例えば、公式ツイッターからいくつかのクリップを再生してみましょう。

And as you can see, you can use different styles, you can use different text, you can use different, I guess you could say, references.

そして、さまざまなスタイルやテキスト、さまざまな参照を使用することができます。

It is truly the ultimate tool for use.

それは本当に究極のツールです。

But I do think that this is very similar to an AI tool released about two to one years ago.

しかし、これは2～1年ほど前にリリースされたAIツールと非常に似ていると思います。

And this was something that I did actually mess around with.

そして、これは私が実際にいじくり回したものだ。

Better than edit all the blather out of your videos because my time is very precious.

私の時間はとても貴重なので、あなたのビデオからすべてのおしゃべりを編集するよりはましです。

Oh, that's fire. It's been said that manatees are the Cadillac of marine mammals.

ああ、これは火だね。マナティーは海洋哺乳類のキャデラックだと言われている。

Now, Descript was a tool that was released quite some time ago, but it was really, really cool because it allowed you to essentially edit your voice without you having to re-record it again.

Descriptはかなり前にリリースされたツールですが、再録音することなく音声を編集できるという点で、とてもとてもクールなものでした。

So let's say, for example, I made a mistake whilst talking.

例えば、話している最中にミスをしたとしよう。

I could simply look at the transcript, edit the text, and it would also edit my voice at the same time.

トランスクリプトを見てテキストを編集するだけで、同時に声も編集してくれるんだ。

So I do want to play a small clip from the Descript trailer because it perfectly encapsulates what this software can do and how similar it is to Meta's Voice Box.

というのも、このソフトウェアができること、そしてメタのVoice Boxにどれだけ似ているかを完璧に表現しているからだ。

So it will be interesting to see how this tool develops over the next year and how they change in response to ElevenLabs and Meta's new Voice Box being added to the new tool base in terms of AI text to audio.

というわけで、このツールが今後1年間でどのように発展していくのか、また、AIテキストから音声への変換という点で、ElevenLabsとMetaの新しいVoice Boxが新しいツールベースに追加されたことを受けて、彼らがどのように変化していくのか、興味深いところだ。

この記事が気に入ったらサポートをしてみませんか？