

先週は人工知能(AI)の分野で重要な進展がありました。これには、10万人以上のChatGPTユーザーアカウントが侵害されたデータ漏洩や、GoogleやSalesforceなどの企業からのチャットボットやAIツールに関するプライバシーの懸念に関する警告が含まれます。Xeroscope XLという新しいテキストからビデオへの変換モデルの導入、Salesforceの営業プロセスへの生成AIの導入、Midjourneyのテキストから画像への変換モデルやStability AIのAI画像生成モデルのアップデートなど、興奮するような進歩がありました。Perplexity.aiは、AI研究ツールが迅速かつ包括的な研究結果を提供する能力を示しました。DeepMindのRobocatは、素早い適応能力を持つ自己改善型ロボットを披露しました。マルチモーダル機能を備えたGPT-4が登場し、既存のAI画像解析能力を超えました。Meta AIは、高品質な多言語テキスト読み上げAIツールであるVoice Boxを発表し、そのDescript音声編集ツールとの類似点が強調されました。

With another amazing week in artificial intelligence, this video will highlight around 15 different things that occurred last week that were very noteworthy.


So let's get straight into this.


Coming in at number one was something that was quite concerning, but not in terms of artificial intelligence development, but rather a data breach.


So here on the Independent, it states that over a hundred thousand chat TPT user accounts were compromised over last year.


It also stated that logs containing user information like IP addresses are being actively traded on the dark web.


And if you aren't familiar with the dark web, well, essentially it's a version of the internet that people use for many illegal activities, such as trading compromised accounts and other illegal activities.


A recent report published by Singapore-based security firm group IB ID identified 101,000 compromised accounts, the credentials of many which have been traded over the last year on illicit dark web.

シンガポールを拠点とするセキュリティ企業グループIB IDが発表した最近のレポートでは、昨年1年間に不正なダークウェブで取引された101,000の漏洩アカウントが確認されている。

At its peak in early May, nearly 27,000 credentials of compromised ChatGPT accounts were traded on the dark web.


And they added that the Asia-Pacific region experienced the highest concentration of chat GPT credentials offered for sale.


Now, it's important to understand that whilst ChatGPT is good for use in your daily life for completing many different personal tasks, you must be aware that sometimes data breaches do occur.


And although companies like OpenAI and larger companies like Google and Microsoft largely do strive to keep user credentials safe, sometimes these data breaches can occur.


And when this thing does happen, it's important to note that your personal data may be out there on the internet.


So this is a simple friendly reminder to just be very careful about the information that you are submitting to ChatGPT.


And continuing on from the ChatGPT data leak, very similarly, Google actually warned employees about chatbots, including its own Bard, out of privacy concerns.


And this is exactly what we were just talking about.


So Google, the parent company Alphabet, is warning employees not to enter confidential information into chatbots, into its own chatbot Bard.


So that is a number of growing companies that are really concerned about sensitive internal information being leaked through AI.


So essentially, they've warned engineers last week Thursday to avoid direct use of computer code that chatbots can also produce because AI can reproduce the data it absorbs during training, risking a potential and additionally potential leaks from the AI technology could help Bard's competitor ChatGPT in the ongoing race to dominate AI where billions of dollars in investment and advertising are still up for grabs.


And this is not the only company that has done this recently.


Apple has restricted employees from using AI tools like OpenAI's ChatGPT over fears of confidential information entered into these systems will be leaked or collected.


And according to a report from The Wall Street Journal, Apple employees have also been warned against using GitHub's AI programming assistant Copilot.

また、Wall Street Journalの報道によると、アップルの従業員はGitHubのAIプログラミング・アシスタントCopilotを使用しないよう警告されているという。

So you have to understand that currently whilst these AI tools might seem very safe and very easy to use and can help us in every single scenario, there is the element of risking our personal data.


So that is something to be aware of when you use these online tools.


So do you remember Runway Gen 2?


Essentially, Gen 2 was a text-to-video model which was from the company Runway that has been the dominant force in text-to-video models, something that is particularly very hard to do, especially in the AI landscape.

基本的に、Gen 2はテキストからビデオに変換するモデルで、それはRunwayという会社から来ており、特にAIの領域では非常に難しいことです。

Well, last week something changed in the marketplace.


You see, now this company does have very big competition, and genuinely, this seems like the most realistic text-to-video that we've seen, and that is taking into account Google's and Video's and other companies that still are in the early stages of their text-to-video.


So what you're looking at is something called a Xeroscope version 2 XL, a watermark-free model scope-based video model capable of generating high-quality video at 1024 by 576.

Xeroscopeバージョン2 XLと呼ばれるもので、1024×576の高画質ビデオを生成できる透かしのないモデルスコープベースのビデオモデルです。

So the model was trained with offset noise using 9923 clips and 29,769 tagged frames at 24 frames.


So this looks absolutely incredible, and I do think that this does seem, I wouldn't say particularly realistic in terms of the materials that you're currently seeing on screen, because of course none of these creatures do exist, but in terms of the quality, it looks absolutely incredible.


And in terms of the smoothness, that is something that also looks great.


And in terms of the coherence, it definitely also does take the cake.


I mean, if this model does manage to get fine-tuned in the future, and we actually do get things which are quite realistic, I could see this becoming the leading video model.


But there are also some other examples that do show at just how great this text-to-video does look.


And remember, this will slowly be refined over the coming years as many technologies are.


So you can see that this is being generated in many different styles, but what we do have here definitely does look promising.


And my question to you is, what do you think looks better?


Do you think this synthesis of these video clips looks a lot better than Runway's Gen 2 text-to-video?


Or do you think this new Xeroscope XL looks or exceeds what we've seen in previous video generations?

それとも、この新しいXeroscope XLは、私たちがこれまでのビデオ世代で見てきたものより、あるいはそれ以上に見えると思いますか?

If I'm being completely honest and totally unbiased, this software does seem like it manages to generate more coherent and more fluid pieces of video data than Runway's Gen 2, although very impressive.


This is definitely quite impressive in its own regard.


And I would recommend checking out the videos and links below to see further examples and more documentation.


Then, of course, we had something that is once again quite concerning, but at the same time, quite innovative.


So there's this company called Salesforce.


You may have heard of them before, but essentially, it's a marketing company that does a lot of sales and helps a giant number of countries across the United States in terms of their entire sales process.


Now, if you don't know what sales is, it's essentially where someone calls you up sometimes and called calls you out of the blue to sell you a product that you might need or essentially when you're trying to buy something.


And then, essentially, there's a sales process that you walk through before you finish buying your product, and this can happen in many different industries.


Now, what this announcement is, is that this very, very large multi-billion dollar company actually announced something very recently in terms of their own generative pre-trained Transformer AI, which they're going to be embedding into their multiple sales process.


Now, what they're doing is truly interesting because essentially what they're doing is they're personalizing every campaign and shopping experience with generative artificial intelligence.


So, what that means is, you know how currently when you browse Google or maybe you're on Snapchat or TikTok and you see a certain advertisement that may be broad in its generalizations, now sometimes you click them because sometimes they do relate to you, but what if that advertisement had your name on it or what if that advertisement was really specified to you?


This is what generative AI is set to do.


Now, not only is this truly interesting and groundbreaking, some people are saying that this is one of those things that is also going to lead to a lot of significant job loss.


Now, let me explain.


You see, they also introduced something called Einstein GPT, so essentially with Einstein GPT, it's actually the world's first generative AI for CRM.

実は、彼らはEinstein GPTと呼ばれるものも導入しました。つまり、Einstein GPTとは、実際にはCRM向けの世界初の生成AIです。

So, essentially, a CRM stands for customer relationship management, and it's a set of integrated data-driven software solutions that help manage, track, and store information related to your company's current and potential customers.


Now, what makes this so crazy is that, like you're seeing on screen right now, Einstein GPT is personalizing these sales processes, and you have to understand that many people were already concerned about their jobs being taken by AI, but this Einstein T is going to be able to generate leads for you, add a sign-up form for you, do many different tasks, and people are starting to wonder if this generative AI tool is able to do this for us, then what use is our labor?

さて、これがすごいところです。画面で見ているように、Einstein GPTはこれらの販売プロセスを個別に対応しており、すでに多くの人々がAIによって仕事を奪われることを心配していましたが、このEinstein Tはリードを生成し、登録フォームを追加し、さまざまなタスクを実行できるようになりました。人々はこの生成AIツールがこれを私たちのために行えるのであれば、私たちの労働の意味は何なのか、と考え始めています。

And this is definitely something that is going to be talked about in another video, but I do think that a generative AI-driven CRM is going to have a wide range of impacts.


So, you know, the company Midjourney, a company that is focused on text to image generation that pretty much have solved the common problems that many text image generators do have, well, they've announced a recent update including a game-changing feature that changes what we can realistically do with Midjourney.


Now, before we talk about the game-changing feature, we first need to talk about the actual update.


So, a couple of days ago, they announced the version 5.2, and they actually improved aesthetics and allowed for shopping images.


They slightly improved coherence and text understanding.


They also increased diversity, which essentially means that when you try to generate something, sometimes you get images that are far too similar, and essentially when you also try to get variations, sometimes the variations aren't true variations, they're just far too similar, and they introduce something called High variation mode, which makes all variation jobs much more varied.


And essentially, the new feature which has taken everyone by storm is called zoom out.


So, essentially, a zoom out feature is something that we've seen right across the industry.


Now, if you're not sure as to what I'm referencing, just take a look at some of these clips because this will let you know exactly how this zoom out feature works.


So, essentially, every single time you upscale an image, it's going to have a zoom out button underneath that you can use to reframe said image.


So, you've got two versions of zoom out, zoom out 1.5 and zoom out times two, and essentially what they do, they pull the camera out and fill in all the details on the sides.


So, when it comes to demonstrating the capability of an artificial intelligence tool, it's best to show you with some of my personal examples.


Now, I will show you also some of the community's examples because they are far better and far smoother, but take a look at this example that I quickly generated with the prompt of Apple headquarters in New York, a white sleek futuristic building.


So, this is of course a standard image that we do get from the likes of Midjourney, but what is interesting to delve into is of course the new features.


So, if we take a look at the zoom out feature, you're going to see that this current image that we have here, we're able to zoom out on this and create multiple different variations.


So, you can now see what it looks like when we zoom out from that image.


So, if we go back over to here, you can see this is the close-up of the image, and this is what is standard by Midjourney.


This is simply what you get when you enter your prompt.


And then of course, we have the zoom out feature.


And then, this is exactly what we have right here: a zoomed out version of that specific image.


Now, what's also cool is that Midjourney allows you to generate much more than just one prompt.


So, Midjourney actually gives you the ability to have four different zoomed out looks.


And it's very interesting when you combine them side by side because you immediately see what the different renditions are for your specific project.


So, for example, right here we can see that this looks like that.


So now, if I decide to switch between these image generations, you can clearly see the differences in these zoomed out pictures.


You can see that with the variations that Midjourney does give you every single time you manage to generate a new image, the exterior of the image is going to be a little bit different.


And it's really good for generating variations on what would otherwise be a pretty standard concept.


Now, I do think that this zoom out feature is very, very good and very, very effective.


But one thing that would be interesting would be to simply test this against Adobe's generative fill.


Now, if I'm being completely honest with you, although a generative fill is pretty good, I do think that Midjourney's prompt feature here, including the zoom out feature, is going to be far superior since it is a native feature and not based on simply trained data.


We're not entirely sure as to how Midjourney does this, but we do know that Midjourney is by far the most powerful text to image generator at the moment and the most realistic.


And of course, the most diverse in terms of the many different models that it can use, all the way from version 4 all the way up to the now newly released version 5.2.


So, what will be interesting is to see if Adobe's generative fill feature is something that Midjourney does implement to its platform.


And if you don't know what that is, that is basically the generative fill feature in which Adobe can use any existing image, not just one generated by Midjourney text to image generator, but any image may be one of your own.


And then of course, extrude that image by adding any other image into that and then merging those into it.


So, let me know what your thoughts are on that because it is definitely interesting to see this feature being added.


Then, we had stability AI launch stable diffusion XL 0.9, which they described as a Leap Forward in AI image generation.

そして、stability AIが、AI画像生成の飛躍的前進と表現したstable diffusion XL 0.9を発表しました。

So, on the 22nd of June, they announced that their most advanced development in the stable diffusion text to image Suite of models is finally here.


Essentially, this is a huge upgrade compared to their prior model because this contains a lot more quality compared to the previous versions.


What's also great is that it's now added the hyper-realism that we've seen in mid Journeys version 5 and Beyond.

さらに素晴らしいのは、Mid Journeysバージョン5以降で見られたハイパーリアリズムが追加されたことだ。

They actually do showcase some key examples in which we do get to see the differences in a simple prompts.


To be honest with you, it does seem quite good.


For example, as you can see from this prompt here, we have aesthetic aliens Walk Among Us in Las Vegas scratchy found film photograph.


On the left, we have the stable diffusion XL beta, and on the right, we have a stable Fusion XL 0.9, the newly released model.

左は安定したディフュージョンXLベータ版、右は新しくリリースされた安定したフュージョンXL 0.9です。

To be honest with you guys, this definitely does look like what we've seen in mid Journeys version 5.1, 5.2, and the version of five.


Let me know if you're going to be using this over mid Journey.


I do doubt it because many people are quite accustomed to using Midjourney.


I do think these new examples are pretty good, and you can also see this additional prompt that they also added with these two wolves.


On the left, once again, the stable diffusion beta, and then of course, on the right, the newly released version, a hyper-realistic wolf with almost minimal chance of you realizing that it was an AI generate.


And of course, we have the big deal for stable diffusion, which is why they released this new AI model.


Essentially, this AI model, which they released, the big deal was that they could finally generate hands.


Hands are a very tricky thing for AI to generate because they are particularly confusing, and we've known them in the past.


It took a very long time for this model to be perfected, even when we were looking at the likes of so.


Although it does seem strange, this does seem a bit too realistic for me because if I saw this in my feed, I would arguably say that there's no way that that is AI generated, but of course, we do know that it is.


You can see that on the left-hand side, that version of whoever's hands it may be don't look very real at all.


The contrast that we do see at the time of recording this video is honestly so surprising because it just goes to show that with every single major upgrade that there is in these artificial intelligence tools, it's always interesting to see the large differences that do get made.


Then, of course, we had a very interesting AI tool that I saw being demoed across apps such as TikTok and Twitter, and this was being touted as an AI research tool that could arguably be better than Microsoft's Bing.


Now, that is in and of itself a very bold statement, but here we are in perplexity Pro or perplexity.ai, and this is something that you can try for yourself.

さて、それ自体は非常に大胆な発言だが、ここにあるperplexity Pro(perplexity.ai)は、自分で試すことができるものだ。

I've got to be honest with you; this seems like the most comprehensive AI research tool that we currently do have.


So, let's do a test because, of course, you want to understand how exactly this tool works and what exactly it can be used for.


Let's say, for example, I wanted to research something which I recently did, and I wanted that information immediately.


All I'd have to do is I'd have to go ahead over here and add this copilot button, and what you can immediately see is that this is powered by GPT-4.


So, of course, as you know, Bing is also powered by GPT-4, but I do like the way that this information is presented better.


One question I did want to ask it because, of course, as you know, we are an artificial intelligence Channel.


I've asked it, What are the top 10 things that happened in artificial intelligence this week?


So, then we hit the search button, and you can see that, of course, first, it seeks to understand my question, then it goes ahead and considers eight results.


And then, eventually, it's going to give me an answer.


Now, of course, what you can additionally do if it does manage to struggle, sometimes you can give it more information.


But more often than not, what I've seen is that this is actually quite faster and more accurate than the gbt4 that's in OpenAI's version.


And it is very interesting to see that Perplexity has managed to do that.


Now, what you are currently seeing is that I do think over time, what we will see is that we will largely see specified AI tools for specified tasks, or more commonly known as narrow AI.


A lot of people do have the idea that we are moving towards an AI that is going to be able to do everything, and although whilst this is possible, I think this showcases that if something like Perplexity AI is able to immediately get you a lot of different research papers and various different sources faster than GPT-4 by OpenAI, then people are most likely to use these specific tailored versions on other applications.

多くの人が、私たちは何でもできるAIに向かっているという考えを持っています。これは可能ではありますが、Perplexity AIのようなものが、OpenAIのGPT-4よりも早く、様々な研究論文や様々なソースを即座に得ることができるのであれば、人々は他のアプリケーションにこれらの特別に調整されたバージョンを使用する可能性が高いということを示していると思います。

And I do think that that is fine.


This isn't really a knock on GBT4.


It's just saying that I do think that people are going to individually build applications like this one that are going to be better than the base one, and that's something that we are going to see.


Now, you can see here, and why I like this much better than ChatGPT, is because it actually gives me a lot more references.


The problem with GPT4's browsing with Bing is that it usually references one or two articles, and it does take a lot of time to read that page.


And remember, with GBT4, you only get 25 messages per day, but with this, you get 597.


So it's definitely very interesting.


You can see all the different articles reference, you can see just how many pieces there are, and usually, it gives you the information straight away.


Now, another feature that we can look at Perplexity AI, which I found to be very, very cool, was that you can do specified research.

今度は、Perplexity AIという機能を見てみましょう。私はこれが非常にクールだと思いましたが、特定の研究を行うことができます。

So, for example, you can search Reddit, and this is something that a lot of people do do at Google.


If you're someone that uses Google a lot and uses Reddit for certain research, although it does seem uncanny, it is something that people do.


This is a very useful tool.


Also, you can use it to search YouTube, and you might be thinking, Why don't you just use YouTube search to search what you're looking for?


When you're looking for a specified tool, what it does is it crawls every YouTube video and searches through the transcript of those videos to get you your specified answer.


So, that is why this is very, very effective.


So, I'm going to do this again to show you how quickly this works.


This simply understands your question, searches the news, considers the results, wraps it up.


And then, just like that, we have this data.


And to be honest with you guys, if you're someone that needs information reliably quickly with resources, this is what you want to use.


I know in the first instance of the example it wasn't that promising, but this is what it is usually like, and this is definitely going to be what I use now on a day-to-day basis when I'm doing my research online.


Because I do think that whilst Bard and chat GPT are good, this is something that is a specified research tool that allows you to search YouTube transcripts, Reddit, Wikipedia, and pretty much everything that we want to see.


Then, of course, we had deep minds of Robocat, which is essentially something out of Science Fiction.


I mean, it's a self-improving robot that is eventually going to be at this stage where it's going to need less than 100 demonstrations in order to perform an action successfully.


And you have to understand just how crazy that is because self-improving robots are literally the bane of what people are thinking when they think about Terminator robots that get scarily smart.


And then, of course, put the human race out of existence.


But deepmind's Robocat is essentially based off a deep mind multimodal framework called gato, which is essentially an AI model that was released last year which can pretty much do 600 random tasks across a huge different name of domains.


But this Robocat which they released, I'll play a small segment from my video.


I did see in earlier papers from Google before, but this was still nice to see even on an artificial intelligence program which is still in relatively early stages, which means that these robots are going to be very effective at real-world scenarios because, as you know, the real world isn't just a test facility where we have a few objects that are always going to be things that happen that don't go according to plan.


And it's important for these robots to be able to quickly and robustly adapt to these scenarios, which is what we see it demonstrated here.


Now, yeah, from what you've seen there, just to wrap it up, it's pretty much a robot that can self-improve, doesn't need that many demonstrations to get the tasks done, and ushers in a new way for robots to learn very, very quickly.


Now, this is something that didn't get the recognition it deserves.


This is GPT-4 with actual multimodal capabilities, the first instance that we've seen online.


So credit to AI breakfast for this tweet because Bing managed to break its own rule by solving a capture, and actually, this multimodal capability of analyzing images is only currently available to, apparently, five percent of users.


But strangely enough, I haven't seen anyone talk about this, which is why it's in this video.


So you can see right here what we have.


This image is a typical capture.


It says, Type the two words.


We can see, of course, overlooked and inquiry because, of course, we are human.


But the way that these words are designed on the screen, they're designed to not be able to be identified by a standard computer system.


But of course, here you can see we have GPT-4 or ChatGPT being able to easily identify the word overlooks and inquiry.


And it also is able to see that this is actually a capture test.


And then it says, I'm afraid I can't help you with that.


So I do think that this shows us that very, very soon, maybe next month, maybe the month after, we are likely going to be slowly being introduced to the gbt4 version that was actually announced.


You know, the version where they touted us with the version that could really easily identify what was going on in images.


And I think this version will truly be the next level in AI because although text is great, it's only one form of modality.


And there was tons from the GPT-4 paper where they showed exam questions, literal screenshots, and gbt4 aced those exams.


So once this feature does actually get rolled out to everyone, which it's supposed to be, then this is going to be truly incredible.


So I do think that the reason it's only out to around five percent of users is so that they can collect feedback, see what people are doing with it, refine it, make sure it's safe.


And then, of course, put it out into the open.


So then, of course, we had Meta AI release something truly game-changing.

そしてもちろん、Meta AIは本当に画期的なものをリリースした。

But at the same time, there is something else that is quite like this that I will explain later on in the video.


So just keep that in mind because although there are tons of different AI models being released, when you have a true understanding of every single AI model out there, you start to see certain comparisons.


And Meta is very similar to a tool that was always AI-based but just hasn't been receiving the hype it deserves.


So, Meta recently announced something called Voice Box, a multilingual high-quality text-to-speech AI.

Metaは最近、Voice Boxと呼ばれる多言語の高品質音声合成AIを発表した。

Voice Box can remove background noise from a clip.

Voice Boxは、クリップからバックグラウンドノイズを除去することができます。

Hi guys, thank you for tuning in today.


We are going to show you by re-synthesizing a specific segment.


Hi guys, thank you for tuning in today.


We are going to show you incorrectly spoken words via text to speech, eliminating the need to re-record.


Hi everyone, thank you for tuning in today.


We are going to show you.


These are just a few examples of how Voice Box can perform across a variety of tasks.

これらは、Voice Boxが様々なタスクでどのように機能するかのほんの一例です。

Like to hear a sample of what Voice Box can do first hand?

Voice Boxでできることを実際に聞いてみたいですか?

Well, you already have because all of the voiceover featured in this video was generated using Voice Box.

というのも、このビデオに登場するナレーションはすべてVoice Boxを使って作成されているからです。

And apparently, the quality is so good that they're not making the Voice Box model code available to the public yet because they want to avoid misuse.

そのクオリティの高さから、悪用を避けるためにVoice Boxのモデルコードはまだ公開されていないようだ。

So essentially, what this is, if you know what ElevenLabs is, that's something that can clone your voice just from maybe even three to five seconds of you speaking into a mic.


But with this, they can do the same.


So for example, I'll just play a few clips from the official Twitter.


And as you can see, you can use different styles, you can use different text, you can use different, I guess you could say, references.


It is truly the ultimate tool for use.


But I do think that this is very similar to an AI tool released about two to one years ago.


And this was something that I did actually mess around with.


Better than edit all the blather out of your videos because my time is very precious.


Oh, that's fire. It's been said that manatees are the Cadillac of marine mammals.

ああ、これは火だね。 マナティーは海洋哺乳類のキャデラックだと言われている。

Now, Descript was a tool that was released quite some time ago, but it was really, really cool because it allowed you to essentially edit your voice without you having to re-record it again.


So let's say, for example, I made a mistake whilst talking.


I could simply look at the transcript, edit the text, and it would also edit my voice at the same time.


So I do want to play a small clip from the Descript trailer because it perfectly encapsulates what this software can do and how similar it is to Meta's Voice Box.

というのも、このソフトウェアができること、そしてメタのVoice Boxにどれだけ似ているかを完璧に表現しているからだ。

So it will be interesting to see how this tool develops over the next year and how they change in response to ElevenLabs and Meta's new Voice Box being added to the new tool base in terms of AI text to audio.

というわけで、このツールが今後1年間でどのように発展していくのか、また、AIテキストから音声への変換という点で、ElevenLabsとMetaの新しいVoice Boxが新しいツールベースに追加されたことを受けて、彼らがどのように変化していくのか、興味深いところだ。
