AIの世界では夏に一時的な停滞が見られたものの、注目すべき新技術が続々と登場しています。まずは「Odyssey」という新しいAIビデオツールに注目します。これはハリウッド級の視覚効果を提供することを目指しており、ドローンショットや島々の映像などを生成できると主張しています。次に紹介するのは「Live Portrait」というツールです。これはドライビングビデオと画像をアップロードし、画像をビデオのようにアニメーション化できる機能を提供します。また、「Paints Undo」という研究も注目されています。これは完成した画像をアップロードすると、その画像を描く過程をアニメーションで逆再生して見せるツールです。これにより、AIアートジェネレータの逆バージョンのように、完成したアート作品の作成プロセスを理解することができます。さらに、OpenAIは最近、中国へのアクセスをブロックしました。これにより、中国のユーザーはMicrosoft Azure経由でGPT-4oにアクセスする必要があります。Stability AIも新しいライセンス条件を発表し、Stable Diffusion 3の商用利用が年間収益1百万ドル以下の場合は無料で使用できるようになりました。他にも、Metaの新しいモバイル向け言語モデル「Mobile LLM」や、NVIDIA AIのビデオ作成ツール、Anthropicの新機能など、多くの新技術が紹介されています。Samsungの新しいガジェットもAI機能を備えており、日常生活におけるAIの利用がますます進化しています。

AI has definitely had a little bit of a slowdown for the summer, but that doesn't mean that there hasn't been some really cool stuff in the AI world happening that we can talk about.


Let's start by talking about the newest AI video tool that we've been getting a preview of, a tool called Odyssey.


Odyssey claims to be Hollywood-grade visual AI.


Here's some of the clips that it showed off that it can generate, like this drone-looking shot, this stickies, these islands.


I'm not sure if that person was generated or not, but those landscapes and most of this B-roll that we're seeing here was created with this Odyssey tool.


This isn't something that's publicly available yet, kind of like Sora, and their goal is to make it capable of producing glitch-free and mind-blowing visuals.


They say here that they're training four generative models that enable full control of the core layers of visual storytelling, high-quality geometry, photorealistic materials, stunning lighting, and controllable motion.


As this little animation here shows you, each model will enable you to precisely configure the minutiae of your scene.


They claim they're working alongside Hollywood to shape the technology.


The team behind this project has worked on some big things, specifically mostly in the self-driving car area like Cruise and Waymo and Tesla, but also from the video game world.


They're working alongside artists who have worked on things like Dune 2, Godzilla, the Creator, Avengers, and more. So pretty solid team.


I'm excited to learn more about this one as it comes out.


We're having this sort of renaissance in AI video lately where we're getting all sorts of really cool AI video tools to use and previews of other ones that hopefully we'll be able to use soon.


If you've been on Twitter/X recently at all, you've probably seen this going a little bit viral.


It's called Live Portrait.

それはLive Portraitと呼ばれています。

It's a tool that gives you the ability to upload a driving video and a driving image and then animate that image to look like the video.


What's so cool about this is it's available right now.


You can download the code on GitHub and run it locally if you'd like, but they also set up a Hugging Face space where you can actually use it completely free.

GitHubでコードをダウンロードしてローカルで実行することもできますが、Hugging Faceスペースも設定されており、完全に無料で利用することができます。

I will make sure it is linked up below so that you can easily find it.


Here's what it looks like in practice.


I made this image here with Leonardo.


I uploaded a driving video that looks like this or should subscribe to Matt Wolf.

このようなドライブ動画をアップロードしたり、Matt Wolfに登録する必要があります。

I will tell you right now, it struggles a little bit if you've got a beard, because this was the output.


You can see me talking on the left, but the character on the right is struggling to open his mouth at the same pace that I'm talking.


When I start doing these crazy expressions on my face, it actually starts to match up.


If you're real expressive, it works well.


But if I'm just talking down here, it actually doesn't look like it's syncing up to my lips very well.


I'm guessing it's probably because of the beard.


I don't know, maybe the hatch throwing it off.


But if I had to guess, it's probably the beard.


Let's go ahead and select this input image along with this driving video here of this person looking around, click animate.


We get our final output video over here on the left.


We get a sort of side-by-side comparison video over here on the right.


We can see what it generated.


You can see the mouth and the eyes try to mimic what the driving video was doing and apply it to the source image here.


Another piece of cool research started circulating this week called paints undo.

今週、もう1つの素敵な研究が広まり始めました。それは「paints undo」と呼ばれています。

The idea is you upload a finished image here like this anime girl, and it will actually make an animation showing you essentially how to draw that image.


It's almost like a reverse of what we're getting with AI art generators.


You start with a piece of art and it will reverse engineer how to create that art yourself.


If you make a piece of art inside of Midjourney or Leonardo or something like that, you can throw it into a tool like this and then actually get a little bit of instruction on how to recreate that image yourself on paper.


There's all sorts of cool examples here of different anime art and then showing it reverse engineered into its initial sketches, including the painting and shading and all of the steps to get to that image.


I'll link up to this GitHub below.


You can see there's all sorts of other examples here.


It's not just for anime girls, although that seems to be the main use case they're showing off here.


This one, the code is available on GitHub right now.


You can search out paints undo and find it.


I'll also link it up.


However, it doesn't have a Hugging Face space.

ただし、Hugging Faceスペースはありません。

You read through the GitHub page.


It actually says because the processing time in most cases is significantly longer than most tasks in Hugging Face.

実際には、Hugging Faceのほとんどのタスクよりも処理時間がかなり長いと記載されています。

We personally do not recommend to deploy this to Hugging Face, but they do say here is one option is to wait for us to release a collab notebook.

私たちは個人的には、Hugging Faceにこれを展開することをお勧めしませんが、彼らはここで私たちがコラボノートブックをリリースするのを待つ選択肢があると言っています。

It sounds like they will be setting up a Google collab where you'll be able to use this soon.

彼らはすぐにこれを使用できるGoogle Colabを設定する予定のようです。

But if you know what you're doing, you can download this code and run it locally.


If you meet the system requirements, now they did say they tested on a 24 gigabyte VRAM NVIDIA 4090, but it does say it should work on a 16 gigabyte VRAM as well.

システム要件を満たしている場合、彼らは24ギガバイトのVRAM NVIDIA 4090でテストしたと言っていますが、16ギガバイトのVRAMでも動作するはずだと述べています。

If you're using a 3080 or a 4080 with 16 gigabytes of VRAM from NVIDIA, it's still theoretically should work.


I haven't tested it myself yet, but maybe for a future video, we're still seeing some amazing generations come out from gen three.


If you're on X or Twitter at all, you're probably getting inundated with people making videos using gen three.


This was one of my more favorite videos I've seen recently from the door brothers, where they're showing a water slide sliding through all sorts of cities and fire.


I just like it when people Figure out this way to create this cohesive theme, but make a really long video that looks like it all pieces together well.


I think they did a really, really good job on this video and I wanted to share it because I was impressed.


If you use the chat bot Po, it just got a new update this week called previews.


It's a new feature that lets you see and interact with web applications generated directly in chats on Po.


Previews work particularly well with LLMs that excel at coding including Claude 3.5 Sonnet, GPT-4o and Gemini 1.5.

プレビューは、コーディングに優れた大規模言語モデルと特にうまく機能します。その中には、Claude 3.5 Sonnet、GPT-4o、Gemini 1.5などが含まれます。

Po is a subscription based chat bot, but when you're using it, you can actually choose the model.


You're not stuck with just using GPT or Claude or Gemini.


You get to choose which model you use.


To me, this seems very much like what Anthropic just released with their artifacts, but it's in Po and you can use it with multiple different models.


You can see from this clip right here that after it was prompted, it actually generated the code and executed the code in real time right in line with the rest of the chat.


You can actually interact with whatever it built right inside the chat window and the previews can be shared with anyone via a dedicated link.


If you make a really cool coded up thing inside of Po and you want other people to use it, you can share a link with them and they'll get access to it inside of their Po account.


I just mentioned Anthropic and their artifact feature.


They actually just made artifacts shareable as well.


That is new as of this week.


Artifacts isn't new where you enter your prompt sort of on the left


It generates the code and the preview on the right, and then you can interact with it on the right.


That's been out for a couple weeks now, but the ability to actually share that with others where others can use it, try it, and then remix it and build on top of it.


That's a new feature, bringing it even closer to what we can do with things like custom GPTS that we were getting out of the GPT store.


In my opinion, Claude is just upleveling and upleveling and shipping features that just improve the quality of life of using this app.


I'm really, really a fan of what Anthropic has been doing lately.


Since we're on the topic of Anthropic this week, they also rolled out the ability to evaluate prompts inside of the developer console.


This isn't on the Claude website.


This is in the developer console, which you can find by going to console.Anthropic.com.

開発者コンソールにあり、console.Anthropic.com にアクセスすることで見つけることができます。

You can see we've got this generate prompt here where we can enter a basic prompt, click generate prompt, and it will make an improved prompt for us.


I just had it generate a prompt about generating more prompts.


Let's go ahead and click continue.


You can see that it wrote up this long detailed prompt, and then it gave me a variable of topic.


I can plug in whatever topic I want here, and it will wrap this entire prompt around that topic.


But what's cool is that we can actually do this in bulk with multiple topics.


If I come over here to evaluate, I can add rows here and give it multiple topics.


Let's go robotics, 3D printing, artificial intelligence, and I can tell it to run all, and it's going to run all three of these prompts and just change the one variable for each time.


We can see all of the outputs for each of these topics was generated.


But I can also create a second prompt and have it compare the second prompt using the same topic.


This will be really handy to test out a handful of prompts to see which prompts are the best and also change individual variables within each prompt to see how those change as well.


Meta this week also announced a new language model called Mobile LLM.

今週、MetaはMobile 大規模言語モデルという新しい言語モデルを発表しました。

It's a much smaller language model, obviously developed for mobile devices.


But you can see according to this chart here, the accuracy seems to be much higher than most of the other mobile models that are out there.


Let's talk about OpenAI for a minute because there's been a lot of fascinating things happening over at OpenAI.


Starting with the fact that OpenAI just blocked access to China.


It already was sort of outlawed in China.


You weren't allowed to use ChatGPT in China, but people were figuring out how to get around it using VPNs and things like that.


Apparently OpenAI has even cut off that loophole.


However, it's very interesting because China still has access to Microsoft Azure and Microsoft Azure has GPT-4o access.

しかし、面白いことに、中国はまだMicrosoft Azureにアクセスでき、Microsoft AzureにはGPT-4oへのアクセスがあります。

They can still sort of use it, but they have to go through Azure instead of OpenAI's APIs.


It's really interesting.


It's got a lot of the AI world speculating, maybe this is a precursor to GPT-5 coming out.


A lot of people think that this blocking of access to China, but not blocking it from Microsoft is an indicator that OpenAI is going to launch GPT-5 really, really soon.


They don't want China to have access to GPT-5, but that's just speculations.


A lot of YouTubers and people on X are acting like this is fact.


It's not, it's speculation, but who knows?


There could be some validity to the idea that GPT-5 is right around the corner and this is OpenAI just sort of prepping for that to happen.


This news was also kind of interesting from OpenAI this week.


If you remember when all the crazy fallout happened with Sam Altman and OpenAI, it was announced that Microsoft was going to get a board member on OpenAI.


It was going to be an observer role.


They didn't have voting rights, but they were going to have a board member.


They were always sort of looped in.


Apparently Microsoft is backing out saying we don't need that board role.


Also when Apple announced that they were going to use OpenAI tech as part of the new Siri feature with the deal, Apple was also going to get an observer board role.


This week Apple also stepped away and basically said, we're not planning on taking that board role.


The reason these companies are doing this is because there's been a lot of scrutiny around like antitrust laws and the government worrying about too big of a consolidation of power in any one AI company.


Because that's happening, both Microsoft and Apple are saying, we're going to distance ourselves.


We don't want to open ourselves up for more legal government antitrust sort of lawsuits.


In other OpenAI news this week, OpenAI and Los Alamos National Laboratory announced a bioscience research partnership.


This partnership could potentially lead to more rapid advancements in innovation in areas like healthcare and bioscience.


Along a similar line, Ariana Huffington and OpenAI are now working together to try to transform healthcare.


The OpenAI startup fund and Thrive Global are jointly funding to build a customized hyper-personalized AI health coach that will be available as a mobile app and also within Thrive Global's enterprise products.

OpenAIスタートアップファンドとThrive Globalは共同で資金提供を行い、カスタマイズされたハイパーパーソナライズされたAI健康コーチを開発します。このコーチはモバイルアプリとThrive Globalのエンタープライズ製品内で利用可能となります。

Imagine like an AI health coach that knows more about you and can provide nutrition advice, exercise advice, mental health advice, and things like that that are specifically tailored to you, your needs, your body type, and you individually.


Existing apps don't have this sort of hyper personalization, so you get a lot of sort of one size fits all health advice.


With a combination of AI and this health coach concept, we can sort of more personalize it and more tailor it to individual's needs because in most cases what works well for one person isn't always the same thing that works well for another person.


This seems like a problem that AI should be able to easily help us solve.


Stability AI updated their license terms around Stable Diffusion 3.

Stability AIはStable Diffusion 3に関するライセンス条項を更新しました。

If you remember, it was non-commercial use only.


The website Civitai basically said you can't put SD3 models on here because the licensing is confusing.


It looks like Stability AI listened to the feedback, updated their licensing terms, and says now you only need a paid enterprise license if your yearly revenue exceeds USD 1 million dollars.

Stability AIはフィードバックを聞き入れ、ライセンス条項を更新し、今では年間収益が100万ドルを超える場合にのみ有料のエンタープライズライセンスが必要と述べています。

Other than that, if you're under a million dollars a year, you can use the AI models in commercial products or services.


They're kind of following that Unreal Engine model.

彼らはある種のUnreal Engineモデルに従っています。

We can see the breakdown here.


Non-commercial use remains free.


Free commercial use appropriate for individual use and small businesses.


There's fewer limits and only commercial users need to self-report, and that's only if you exceed revenue of over a million dollars.


Stability AI also released Stable Assistant features.

Stability AIはStable Assistantの機能もリリースしました。

We can see they added a search and replace feature.


You can see they uploaded this image with an egg in it, searched out egg, replaced it with dragon.


They updated their text to audio here.


If you want to use Stable Assistant, you go to Stability AI, go to applications, click on Stable Assistant, and they have a free trial.

Stable Assistantを使用したい場合は、Stability AIにアクセスし、アプリケーションに移動して、Stable Assistantをクリックし、無料トライアルがあります。

But pretty much all of these various tools are available inside of this Stable Assistant.

しかし、ほとんどのさまざまなツールはこのStable Assistantの中で利用可能です。

Stability AI used to own ClipDrop.

Stability AIは以前、ClipDropを所有していました。

Most of these features were in ClipDrop.


They sold ClipDrop, and now it looks like they're sort of building out their own competitor to ClipDrop.


There was a court ruling this week that suggested that AI systems may be in the clear as long as they don't make exact copies.


This specific court case was against GitHub and OpenAI, claiming that they were using other people's copyrighted code to generate new code.


The courts essentially said they couldn't find any clear copyright infringement as long as the output was different than the input and it didn't just clone it or duplicate it, then it's not a copyright infringement.


Since this has been settled this way, although there could be some appeals and things like that, so it's probably not totally over, but it does now sort of provide some precedent for future lawsuits that are trained on copyright material, but the output of the models is different enough that it no longer infringes on the copyrighted materials.


It's still very muddy, and I don't know how this is going to play out, but this at least gives a little bit of precedent for future lawsuits.


If you're a fan of the Magnific AI Upscaler, which sort of upscales, but also can do things like reimagine images in new styles and actually hallucinate on purpose to make your images actually more interesting looking, well, they rolled out a Photoshop plugin so you can use Magnific directly inside of Photoshop.

Magnific AI Upscalerのファンで、画像を拡大するだけでなく、新しいスタイルで画像を再構築したり、意図的に幻覚を起こして画像をより興味深く見せることができるMagnificを使用している場合、Photoshop内で直接Magnificを使用できるようにPhotoshopプラグインがリリースされました。

That's a new feature that came out just this week.


You can find it over in the Adobe Exchange.

Adobe Exchangeで見つけることができます。

Quite honestly, I use Magnific quite a bit to help me upscale some of the images that you see on my thumbnails.


Some of these images that you see here may look familiar.


That's because I use Magnific all the time to try to improve the look of some of these thumbnails.


This isn't brand new news, but I did want to touch on it.


There is a bill called SB 1047 that they're trying to get signed into law here in California.

カリフォルニア州で法律に署名されることを目指しているSB 1047という法案があります。

It has the potential to severely hinder researchers from actually doing their work.


We recently interviewed Anjanay Madhav from A16Z on the Next Wave podcast.

最近、Next WaveポッドキャストでA16Zのアンジャネイ・マダヴさんにインタビューしました。

He's one of the people out there sort of fighting against this bill.


Let me just show you a quick clip from the podcast so you can hear exactly how he thinks this bill is going to harm researchers.


Basically SB 1047 is a proposed law that's making its way through the California legislature right now.

基本的に、SB 1047は現在カリフォルニア州議会を通過中の提案法律です。

This bill is drafted to attack underlying model researchers, scientists, and developers.


Among other things, it's trying to place civil and criminal liabilities on developers of AI models, as opposed to focusing on the malicious users of those models.


As proposed by this bill overseeing these new laws would be a frontier model division, which is kind of like a new DMV they want to form that would have the power to propose requirements on startups, on researchers, on academia that would dictate if a researcher or engineer could ultimately be thrown in jail or not.


This bill is now slated for a California assembly vote in August, less than 60 days away.


This is a incredibly dangerous piece of well-intentioned but incredibly misguided regulation that is trying to make AI safer by focusing on the underlying model instead of the malicious misuses, which is really where we should be focusing.


I'm not going to dive too much more into that bill on this video here, but that episode of the Next Wave podcast is being released on Tuesday and Anjane does an entire breakdown of the bill, what it means for AI advancements, what we can do to prevent the bill and what it means for people outside of California, because it definitely does affect these companies outside of California as well.

このビデオでは、この法案についてあまり詳しくは触れませんが、Next Waveポッドキャストのエピソードは火曜日に公開され、Anjaneがその法案の完全な解説、AIの進歩に対する意味、法案を防ぐためにできること、カリフォルニア外の人々にとっての意味について説明しています。なぜなら、これはカリフォルニア外の企業にも影響を与えるからです。

If you're not subscribed for the Next Wave podcast, it is linked up in the description.

Next Waveポッドキャストに登録していない場合は、説明欄にリンクが貼られていますので、ぜひチェックしてみてください。

I highly recommend you check that out because that's where I get to have long-form conversations with a lot of the builders, innovators, and people doing interesting things in the world of AI.


This week, Samsung also held their annual unpacked event where they announced a whole bunch of new gadgets and pretty much all of them have AI in it.


For example, they announced their Samsung Galaxy Z Fold 6 and we can see here the Galaxy Z Fold 6 is AI ready with Samsung's latest AI features including Circle to Search, translate and transcribe PDF documents, generate AI based off of people or objects in photos you snap, and a sketch to image feature that turns quick sketches into high quality images.

例えば、彼らはSamsung Galaxy Z Fold 6を発表しました。ここでは、Galaxy Z Fold 6がSamsungの最新のAI機能を備えてAIに対応していることがわかります。これには、サークルで検索、PDF文書の翻訳や書き起こし、撮影した写真の中の人物や物体に基づいたAIの生成、そしてクイックスケッチを高品質な画像に変換するスケッチから画像への機能が含まれています。

They also announced the Samsung Galaxy Z Flip 6.

彼らはまた、Samsung Galaxy Z Flip 6を発表しました。

The external display has been enhanced with suggested replies from the on-device AI and it's got AI-powered wallpapers on the display.


They showed off the new Galaxy Watch 7 with a new AI-empowered sleep algorithm.

新しいGalaxy Watch 7も紹介され、新しいAI搭載の睡眠アルゴリズムが搭載されています。

It's the first of its kind to be FDA authorized to recognize signs of sleep apnea in its wearers as well as providing a host of valuable sleep metrics.


They showed off their Samsung Galaxy Ring which looks to be a competitor to like the Aura Ring which I actually wear myself.

彼らは、私自身が身に着けているAura Ringのような競合製品であるSamsung Galaxy Ringを自慢していました。

It's got little sensors inside of it but this version of the ring uses Galaxy AI to generate a comprehensive energy score based on your activity level, sleep quality and other health metrics.

リングの内部には小さなセンサーがありますが、このリングのバージョンはGalaxy AIを使用して、活動レベル、睡眠の質、その他の健康指標に基づいて包括的なエネルギースコアを生成します。

The ring's AI-powered algorithm powers its sleep tracking mode recording insights such as movement during sleep, sleep latency, respiratory rate and heart rate all available each night in the app.


Finally the Samsung Galaxy Buds 3 Pro.

最後に、Samsung Galaxy Buds 3 Proです。

Samsung's answer to like the AirPods.


It's got an interpreter setting that leverages AI to translate foreign language dialogue in real time.


That is cool.


Come on, that is cool.


Can you imagine just putting a pair of AirPods in talking to somebody in a different language?


The correct language just being translated in real time right into your ear?


That's going to be valuable.


Lots of new gadgets, pretty much all of them with some sort of cool new AI feature built into them and that was what was announced at Samsung Unpacked.

たくさんの新しいガジェットがあります。ほとんどすべてに、何らかのクールな新しいAI機能が組み込まれています。そして、それがSamsung Unpackedで発表された内容です。

Finally here's a robot that navigated the Google DeepMind offices using Gemini.

最後に、Google DeepMindのオフィスをGeminiを使ってナビゲートするロボットが登場しました。

It's using that vision model to see what's around it and navigates through the hallways making sure not to bump into anything because the vision model knows exactly where it is and can see around itself to make sure it doesn't bump into stuff.


These videos that are here on this TechCrunch article don't have any audio but it does say that it can walk around the office and point out different landmarks with speech.


They use what's called a vision language action that combine the environment understanding and common sense reasoning power and once the processes are combined the robot can respond to written and drawn commands as well as gestures.


Right now it's kind of like an AI tour guide.


It could roam around a building and point things out to you and give you some information about the things that it's pointing out.


That's what I got for you this week.


It was another week without any like major massive huge monumental AI announcements but a lot of cool stuff is still happening almost every single day.


There's something interesting to talk about and I love making these videos at the end of the week and recapping all of the cool stuff that I came across.


