【rabbit r1のアップグレード：機能紹介と将来展望】英語解説を日本語で読む【2024年1月21日｜@TheAIGRID】

2024年1月21日 10:15

この動画は、AI対応デバイスrabbit r1の最新アップグレードを紹介しています。rabbit r1は高い人気を博し、現在6回目のバッチ予約中で、価格は$200です。新たにPerplexityとの提携でrabbit購入者に1年間の無料利用が提供され、そのGoogle検索のようなAI機能が強化されます。デバイスは、Airbnb予約の自動化など、多様なタスクを学習し、操作することができます。MicrosoftのCEOもこのデバイスを高く評価し、Rabbit創設者と共に言及しています。また、デバイスはiPhone Pro Maxに近いサイズで、左利きの人にも使いやすく、プライバシー保護のために回転カメラを装備しています。音声AIのレスポンス速度も向上しており、未来のAIアシスタンスとしての可能性も秘めています。
公開日：2024年1月21日
※動画を再生してから読むのがオススメです。

So, the new rabbit r1 device recently just got a major upgrade, and I need to show you all why this is really, really cool and some of the things you didn't know.

新しいrabbit r1デバイスは最近大幅にアップグレードされましたので、なぜこれが本当に素晴らしいのか、そして知らなかったことのいくつかを皆さんにお見せする必要があります。

Because there are some exclusive videos that show us just exactly what's going on with this device, so let's actually take a look at some of the things if you're buying the rabbit, if you're looking forward to it, you do want to know.

なぜなら、このデバイスについての具体的な状況を示す独占的なビデオがいくつかありますので、rabbitを購入する方、楽しみにしている方は、ぜひ見ておきたいと思います。

So, number one is that these things are selling fast, like really fast, okay?

まず、これらのデバイスは非常に速く売れています、本当に速く売れています、わかりますか？

So, the fifth batch of 10,000 rabbit r1 devices has sold out.

10,000台のrabbit r1デバイスの第5バッチは完売しました。

Pre-orders for the sixth batch, totally 50,000, are now available at rabbit.tech.

第6バッチの予約受付は現在rabbit.techで行われています。総数は50,000台です。

An expected delivery date for the sixth batch is June to July 2024, and for all addresses in the EU and UK, batches number 1 to six will be shipped by the end of July 2024 on a first come first serve basis.

第6バッチの予定配送日は2024年6月から7月であり、EUとUKのすべての住所については、バッチ1から6までが2024年7月末までに先着順で発送されます。

So, if you're one of the people that ordered this when you first saw it, you're likely to get it earlier than someone that orders it currently now.

ですので、最初に注文した人は、現在注文している人よりも早く手に入れることができるでしょう。

And remember, this is only $200, which is around 170 British pounds, around €70.

そして、これはたったの200ドルです、約170ポンド、約70ユーロです。

Now, what was recent was there was actually a really new announcement that essentially they dived into, and essentially this was an announcement with Perplexity.

最近の注目すべき出来事は、実際には非常に新しい発表があり、本質的にはPerplexityに関するものでした。

Now, many people actually don't know what Perplexity is, which is why I'm making this video so you guys can understand.

Perplexityが何かを知らない人が実際に多いので、このビデオを作成して皆さんに理解してもらいたいと思います。

So, essentially, Perplexity is basically like Google Search, but it combines AI to be more effective.

基本的に、PerplexityはGoogle検索のようなものですが、AIを組み合わせてより効果的になっています。

And honestly, you can't knock it until you try it because it is really, really effective.

正直に言って、試してみるまでわからないですが、本当に本当に効果的です。

So, take a look at the Perplexity trailer so you can really understand exactly what's going on, because trust me when I say it's really, really effective.

ですので、Perplexityのトレーラーを見て、実際に何が起こっているのかを本当に理解してください。本当に本当に効果的だと言っていますから、信じてください。

So, take a look at this, and I'm going explain to you, their announcement.

それでは、これをご覧いただき、彼らの発表について説明します。

Whether you're navigating the Maze of headphone options, drowning in news noise, or stalled on your Japan trip plans, Perplexity Copilot is your guided Search Assistant.

ヘッドフォンの選択肢の迷宮、ニュースの騒音に溺れたり、日本旅行の計画が停滞したりしている場合、Perplexity Copilotはあなたのガイド付き検索アシスタントです。

Just turn on Copilot and ask, it takes a deep dive into anything you want to know and delivers tailor-made concise answers.

Copilotをオンにして質問してみてください。あなたが知りたいことについて徹底的に調査し、適切な簡潔な回答を提供します。

Forget about diving into a sea of links, Copilot does the leg work by grasping the essence of your question to fine-tune your search.

リンクの海に飛び込むことを忘れてください。Copilotは質問の本質を把握して、検索を微調整するための作業を行います。

Copilot engages you with clarifying questions.

Copilotは質問を明確にするためにあなたと対話します。

This ensures you get what you're actually after.

これにより、あなたが本当に求めているものを得ることができます。

Once it gets what you're asking, Copilot scours a vast array of sources to ensure relevance and quality.

あなたが求めているものを見つけるために、Copilotはさまざまな情報源を駆使して関連性と品質を確保します。

Want to know more?

もっと知りたいですか？

Every source is just a click or tap away for deeper exploration.

すべての情報源は、より詳しく調査するためにクリックまたはタップするだけです。

Let's say you asked a quick question, but the answer wasn't what you were looking for.

例えば、簡単な質問をしたけれども、答えが求めていたものではなかったとしましょう。

Easy, at the bottom of your quick answer, hit rewrite and select Copilot to turn your quick search into a guided search experience.

簡単です、簡単な回答の下部で「書き直す」を選択し、Copilotを選ぶことで、簡単な検索をガイド付きの検索体験に変えることができます。

With Perplexity Copilot, you're not just searching, you're gaining a new window into the internet.

Perplexity Copilotを使えば、単に検索するだけでなく、インターネットへの新たな窓を開くことができます。

From the simplest questions to your deepest inquiries, this is where knowledge begins.

最も単純な質問から深い疑問まで、ここから知識が始まります。

So, Perplexity is really, really effective at what it does.

ですので、Perplexityは本当に本当に効果的です。

And for those of you who know what Perplexity is and for those of you who use it, you're going to know exactly what I'm saying is so true.

そして、Perplexityが何かを知っている人や使用している人にとっては、私が言っていることが本当だということがわかるでしょう。

That's why this announcement is so cool because rabbit actually partnered with Perplexity to provide everyone who buys their rabbit device a year completely of this.

それがこの発表がとても素晴らしい理由です。なぜなら、rabbitは実際にPerplexityと提携して、rabbitデバイスを購入するすべての人に1年間完全に無料で提供するからです。

Now, usually, I think this is around $10 or $20 a month, but currently, they're going to give you guys a year complete free.

通常、これは月に10ドルまたは20ドルほどですが、現在は1年間完全無料で提供されます。

And trust me when I say this is going to make the rabbit device so much better.

私が言っていることは、rabbitデバイスが非常に良くなるということです。

That's why I said that this has been completely supercharged.

だからこそ、これは完全にパワーアップされたと言ったのです。

Now, later on in the video, you're going to see some videos of the actual rabbit device because the founder actually shared some videos on Twitter, things like size references, some other cool stuff like that.

ビデオの後半では、実際のrabbitデバイスのいくつかのビデオを見ることができます。創設者がTwitterでいくつかのビデオを共有しており、サイズの参照などの他のクールなものもあります。

And there was actually a mention of rabbit's device by the CEO of Microsoft.

MicrosoftのCEOがrabbitのデバイスについて言及していました。

But so then we have the founder of both companies here on a Twitter space talking about this announcement, and I think you guys should listen to Discode, it is really, really.

しかし、それから両社の創設者がTwitterのスペースでこの発表について話しているので、Discodeを聞いていただきたいと思います。本当に、本当に。

So I'm pretty excited to share that, Perplexity and rabbit are partnering together, so we are excited to power real-time precise answers for rabbit r1 using our Perplexity online LLM APIs that have no knowledge cut off, is always plugged into our search index, and the first 100,000 rabbit arban purchases will also get one year free Perplexity Pro where that came from.

だから、Perplexityとrabbitがパートナーシップを組んでいることをお知らせできてとても嬉しいです。私たちはPerplexityのオンラインLLM APIを使用して、rabbit r1のリアルタイムで正確な回答を提供することに興奮しています。これには知識の制限がなく、常に私たちの検索インデックスに接続されています。また、最初の10万個のrabbit arbanの購入者には、1年間の無料Perplexity Proも提供されます。

I didn't know that feature, but yeah, continue everyone.

その機能は知りませんでしたが、続けてください。

Okay, yeah, the first 100,000 rabit arban purchases are going to get one year free of Perplexity Pro.

はい、最初の10万個のrabbit arbanの購入者は、1年間のPerplexity Proを無料で提供されます。

So, it's basically like Perplexity Pro one year free is 200 bucks.

つまり、Perplexity Proが1年間無料で提供されるということは、200ドルの価値があるということです。

So, if you pay 200 bucks to purchase a rabbit r1, you're getting twice the value.

ですので、rabbit r1を200ドルで購入すると、2倍の価値を得ることができます。

Yeah, so, we had a interaction on X, couple of days ago, and, then what's going on next is the following couple days, team's been working really hard together to make this happen.

はい、私たちは数日前にXでやり取りをしました。そして、次に起こることは数日後で、チームは一緒に本当に一生懸命働いてこれを実現させています。

And I think to me, it's a no-brainer if you think about rabbit r1 with price at $199, no, actually not $2, 200, but $199, no subscription and the Perplexity errand is generous enough to offer, Perplexity Pro for a whole year, that wors actually 200 bucks.

私にとっては、rabbit r1を考えると、$199という価格、いや、実際には$2,200ではなく$199、加入料はなく、Perplexityの使命は十分に寛大で、1年間のPerplexity Proを提供してくれるということは、実際には200ドルですが、簡単な選択です。

There was that announcement that was really cool, but there was also some other stuff, okay?

それは本当にクールな発表があったけれど、他にもいくつかのことがあったんだ、わかる？

So, like I said, the Microsoft CEO, Saan Adella, actually talks about just how good rabbit was.

だから、MicrosoftのCEO、サン・アデラがrabbitの素晴らしさについて話していました。

And I can't imagine how this must feel as the rabbit founder, seeing the CEO of, I think it is now the world's largest company, talk about the product that you've created.

rabbitの創設者として、自分が作った製品について世界最大の企業のCEOが話すのを見るのはどんな気持ちなのか想像できません。

You see, I thought the demo of, the rabbit OS and, the device was fantastic.

rabbit OSとデバイスのデモは素晴らしかったと思います。

I think I must say, after Job's, sort of launch of iPhone, probably one of the most impressive presentations I've seen of capturing the, the vision, of what is possible going forward for what is an agent-centric, operating system and interface.

私は言わせてもらいますが、ジョブズのiPhoneの発表の後、エージェント中心のオペレーティングシステムとインターフェースの可能性を示す、最も印象的なプレゼンテーションの1つだと思います。

And I think that's what everybody's going seeking, what which device will make it and so on.

そして、私はそれがみんなが求めているものであると思います。どのデバイスがそれを実現するのか、など。

It's unclear, but I think it's very, very clear that computer, I go back to that, right?

それははっきりしていませんが、コンピューターに戻ると、非常に、非常に明確だと思いますね。

If you have a breakthrough in natural interface, where this idea that you have to go one app at a time and all of the cognitive load is with you, as a human, does seem like there can be a real breakthrough.

ブレークスルーがあれば、自然なインターフェースで、1つのアプリずつ進む必要がなく、認知負荷がすべて人間にかかるというアイデアは、本当のブレークスルーがある可能性があるように思えます。

Because in the past, when we had the first generation, whether it was Cortana or Alexa or Siri or what have you, it was just not, it was too brittle, where we didn't have these Transformers, these Large Language Models, whereas now we have, I think, the tech to go and come up with a new app model.

過去には、最初の世代のCortanaやAlexa、Siriなどがあったとしても、これらのTransformersやLarge Language Modelsがなかったため、非常に脆弱でしたが、今では新しいアプリモデルを作るための技術があると思います。

And once you have a new interface and a new app model, I think new hardware is also possible.

新しいインターフェースと新しいアプリモデルがあれば、新しいハードウェアも可能だと思います。

And has that an opportunity from Microsoft or are you moving away from hardware?

それはマイクロソフトからの機会ですか、それともハードウェアから離れているのですか？

I mean, always it's an opportunity.

いや、いつでもチャンスですよ。

So, that talk right there was really fascinating because Microsoft seemed to be kind of eyeing up the hardware market.

その話は非常に興味深かったです。Microsoftはハードウェア市場を注視しているようです。

And I mean, you have to remember, it was a couple of years ago, in fact, not just a couple years ago, in fact, it was, I think, around 15 years ago where Microsoft, really just pulled the plug on their device which was the Windows Phone.

そして、覚えておいてください、それは数年前のことで、実際には数年前だけでなく、15年前くらいだったと思いますが、マイクロソフトはWindows Phoneというデバイスを完全に中止しました。

Some of you don't even know what that is, and rightly so because it just didn't go well.

それについて知らない人もいるでしょうし、その通りです。うまくいかなかったので。

And, and it just goes to show how hard it is to make a consumer Hardware device that actually does succeed.

そして、実際にMicrosoftが再びこの市場に参入するかどうかは興味深いですが、CU OpenAIもデバイスに取り組む予定でない限り、参入しないと思います。

And it will be interesting to see if Microsoft just jump back into this, but, I don't think they will, unless OpenAI are going to be working on a device too.

マイクロソフトがこれに再び参入するかどうかは興味深いですが、OpenAIもデバイスで取り組むつもりでない限り、彼らは参入しないと思います。

But I think if you watched some of the other videos that I talked about in where I talked about Ray-Ban's AI glasses that are going to be coming in the future, I think that that is going to be an interesting point.

しかし、もし私が過去に語ったRay-BanのAIメガネに関する他のビデオをご覧になったのであれば、それが興味深いポイントになると思います。

Now, something as well that many people did miss was how rabbit actually works.

さらに、多くの人々が見逃したことは、rabbitが実際にどのように機能するかということです。

And in the original video which made discussing rabbit amazing Tech, I didn't actually show this video from their website where they actually talk about Language Action Models.

rabbitについて話題となった素晴らしいテックのオリジナルビデオでは、彼らがLanguage Action Modelについて話しているウェブサイトのビデオは実際には紹介していませんでした。

Essentially, their new proprietary system on how they actually use agents to, I guess you could say interact with the web because LLMs are good, but they are text-based, and that's essentially their purpose.

基本的には、彼らが実際にエージェントを使ってウェブとやり取りするための新しい独自のシステムです。LLMsは良いですが、テキストベースですし、それが本来の目的です。

They can be repurposed for other things, but that's not what they were made for.

他の用途にも使えますが、それが作られた目的ではありません。

So, they essentially made LAMs, and in this, demo, they essentially talk about how Large Action Models are pretty much better than anything we've ever seen, and it's a New Foundation model that understands human intentions on computers.

だから、彼らは本質的にLAMsを作りました。そして、このデモでは、大規模なアクションモデルがこれまで見た中で最も優れていることを説明しています。それは人間の意図を理解する新しい基盤モデルです。

So, I think this is a really interesting watch.

なので、これは本当に興味深い腕時計だと思います。

And then, after this, I want to show you guys, some of the, videos of rabbit, like actually being used, so some more in-person demos because I think it's really, really interesting.

それから、これの後で、実際に使用されているrabbitのビデオを皆さんに見せたいと思っていますので、もっと直接的なデモをいくつかお見せします。とても興味深いと思いますので。

Because, I know, like everyone who's ordered it, you probably want to know, how big it is, you probably want to know how certain things work for certain capabilities, so, I'm going to show you that in a second.

なぜなら、注文した皆さんは、おそらくそれがどれくらい大きいのか知りたいと思うでしょうし、特定の機能についてはどのように動作するのか知りたいと思うでしょうから、すぐにお見せします。

We can teach Rabbid OS how to use specific applications.

Rabbid OSには、特定のアプリケーションの使い方を教えることができます。

In this video, I'm teaching a rabbit how to book an Airbnb while I'm operating normally as a human.

このビデオでは、私が人間として通常通り操作しながら、rabbitにAirbnbの予約の仕方を教えています。

On the left screen, watch closely on the right as the Large Action Model is learning all my inputs and imitating my behavior in real time.

左の画面では、右側でLarge Action Modelが私の入力を学習し、リアルタイムで私の行動を模倣しています。

So, I'm trying to plan a trip to Barcelona with my wife and my daughter.

私は妻と娘と一緒にバルセロナへの旅行を計画しようとしています。

The first thing I'm going to do is navigate to the anywhere option, and I'm going to type Barcelona in the search field.

最初に私がすることは、どこでもオプションに移動し、検索フィールドにバルセロナと入力することです。

The system suggesting Barcelona, Spain, which is exactly where we want to go.

システムはバルセロナ、スペインを提案してくれます。それが私たちが行きたい場所です。

Using the website's calendar tool, I'm going to mark our check-in on the 15th and check out on the 21st.

ウェブサイトのカレンダーツールを使って、15日にチェックインし、21日にチェックアウトする予定です。

Now, I'll click add guests and adjust the members accordingly.

さて、ゲストを追加し、メンバーを適切に調整します。

Now, let's hit the search button and see what pops up.

それでは、検索ボタンを押してみて、何が出てくるか見てみましょう。

Since we love the beach, let's make sure to select the beachfront option.

私たちはビーチが好きなので、ビーチフロントのオプションを選ぶようにしましょう。

And for a more private experience, I'm going to select entire home, so we have the whole place for ourselves.

そして、よりプライベートな体験のために、私はまるごと貸切のオプションを選びます。そうすれば、私たちだけの場所になります。

For the budget, I'll set a maximum at 400,000 one and a minimum of 100,000, so that all the options are within our price range.

予算については、最大で400,000、最小で100,000に設定します。そうすれば、すべてのオプションが私たちの価格帯内に収まります。

We're going to need at least two bedrooms to make sure we all have our own space.

私たちは少なくとも2つのベッドルームが必要です。それぞれが自分のスペースを持てるように。

Finally, with all of our preferences set, we've got plenty of options that fit the bill.

最後に、私たちの好みが設定されたので、条件に合う多くのオプションがあります。

I'll just start browsing for the perfect one.

ちょうど完璧なものを探してみます。

Each training only takes a few minutes and does not require access to an application programming interface, also known as an API, nor do you need anything installed on your device.

各トレーニングは数分で終わり、アプリケーションプログラミングインターフェース（API）へのアクセスも必要ありませんし、デバイスに何かをインストールする必要もありません。

You only need to train each workflow once.

各ワークフローを一度だけトレーニングするだけです。

Let's try to use Rabbid OS and instead book a room in London.

では、Rabbid OSを使ってロンドンの部屋を予約してみましょう。

My extended family is going to London.

私の大家族がロンドンに行く予定です。

It's going to be eight of us and four kids.

私たちは大人8人と子供4人です。

We're thinking of December 30th to January 5th.

12月30日から1月5日までを考えています。

It's not set in stone yet, so I just want some general options.

まだ確定ではありませんので、一般的なオプションを教えてください。

Can you look it up for me?

調べてもらえますか？

Sure, I can help you with that.

もちろん、お手伝いします。

The first option is a home in Porto Bell Muse's house, priced at 1,348,3511 per night, with a rating of 4.8.

最初のオプションは、ポルトベルムの家で、1,348,3511の宿泊料金で、評価は4.8です。

The Large Action Model supports mobile apps, web apps, and professional desktop apps.

大規模なアクションモデルは、モバイルアプリ、ウェブアプリ、プロフェッショナルデスクトップアプリをサポートしています。

It learns directly on the user interfaces and acts on them.

それはユーザーインターフェース上で直接学習し、それに基づいて行動します。

We have already started the training process for the most popular apps.

私たちはすでに最も人気のあるアプリのトレーニングプロセスを開始しています。

As you're watching this video, rabbit OS is learning fast and adapting to hundreds of applications.

このビデオを見ている間に、rabbit OSは急速に学習し、数百のアプリに適応しています。

The ultimate goal of rabbit is to define the first natural language operating system that replaces apps on your device.

rabbitの究極の目標は、デバイス上のアプリを置き換える最初の自然言語オペレーティングシステムを定義することです。

It's time for the machines to do some serious homework.

機械に真剣に宿題をやらせる時間です。

So, I think you can understand why this product sold the way it did because if what they're saying is even remotely true, I mean, training this takes minutes it needs no API required, that you know, you can do it within one without software one time.

だから、この製品がそのように売れた理由がわかると思います。彼らが言っていることが少しでも本当なら、トレーニングは数分で済み、APIは必要なく、ソフトウェアも一度だけで済むということです。

That's what they said, each workflow you just need to train it once.

それが彼らが言ったことで、各ワークフローを一度だけトレーニングするだけです。

I mean, if that is really true, and that's a bold claim, they are definitely, definitely breaking new ground here.

本当にそうなら、それは大胆な主張ですが、彼らは確かに新しい領域を切り拓いています。

So I would say that that is absolutely incredible.

それで私はそれを絶対に信じられないと言いたいと思います。

But that's just some understanding of how it works.

しかし、それはただの仕組みの一部の理解です。

Then of course, we had the benchmarks, which I found to be really, really cool because I actually compared it to GPT-4, GPT-3.5, Flan-T5-XL, some of the other things, and you can see just how good LAM large 1 is, Neuro-Symbolic their new proprietary model.

そしてもちろん、私たちはベンチマークも行いました。実際にGPT-4やGPT-3.5、Flan-T5-XLなどと比較してみたのですが、LAM large 1がどれだけ優れているかがわかります。Neuro-Symbolicという彼らの新しい独自モデルです。

And then, of course, we get to the size references.

そして、もちろん、サイズの参照に移ります。

So, this is where the founder actually talks about just how big this is because some people might want to see just how, this is, how it works what the size is, just some cool stuff like that.

だから、ここで創設者が実際にこれがどれだけ大きいかについて話しています。なぜなら、あなたは、これがどれだけ大きいか、どのように機能するか、サイズがどうなっているか、そういったクールなことを見たいと思う人もいるかもしれません。

And then he also shows two other videos.

そして彼はまた、他の2つのビデオも見せてくれます。

So, I want to show you guys thiscause I think it's important to see just how big it is.

だから、これを見せたいと思います。なぜなら、これがどれだけ大きいかを見ることは重要だと思うからです。

And I kind of wish he did compare it to, like, an iPhone. Because I feel like this might not replace the iPhone, but it's still a similar handheld device.

そして、私は、まるでiPhoneと比較してほしかったと思います。なぜなら、これはiPhoneを置き換えるかもしれないけれども、それでも似たような携帯デバイスだからです。

But nonetheless, definitely worth a watch.

しかし、それにもかかわらず、絶対に見る価値があります。

But the idea is seven years ago when I designed Raven H, I have this magnetic detachable pixelated controller that just stocks on the main device like that.

しかし、アイデアは、7年前にRaven Hを設計したとき、私はこの磁気式の取り外し可能なピクセル化されたコントローラーを持っていて、それがメインデバイスにくっつくんですよ。

But the idea is that you can carry around and you can kind of, like, just hold on, talk.

しかし、アイデアは、あなたが持ち運び、ちょっと待って、話すことができるということです。

But r1 is actually smaller than that.

しかし、r1は実際にはそれよりも小さいです。

If you put it on top, it's smaller than the footprint.

それを上に置くと、フットプリントよりも小さいです。

But it's exactly the footprint.

しかし、それはまさにフットプリントと同じです。

The wiist wise is exactly like an iPhone Pro Max model, but 50% of the footprint.

ワイストワイズはまさにiPhone Pro Maxモデルと同じですが、フットプリントの50%です。

That's kind of like the idea.

それがアイデアのようなものです。

So, he said it's pretty much the same size like an iPhone 15 Pro Max, but just half of it.

だから、彼はそれがほぼiPhone 15 Pro Maxと同じサイズだと言っていますが、ちょうどその半分です。

Then, of course, this is something for, I guess, you could say accessibility.

そして、もちろん、これは、アクセシビリティのためのものです。

So, he talks about why you don't need a left-handed version.

だから、彼はなぜ左利きのバージョンは必要ないのかについて話しています。

Hey, this is Jesse, and here's my r1.

こんにちは、私はジェシーです。これが私のr1です。

A lot of people on Twitter have been asking, Hey, can you guys make an L1 for left-hand users?

たくさんの人々がTwitterで尋ねています。「左利きの人向けにL1を作ってもらえませんか？」

The out, because they think that all these controllers and the button and scrolls on the right side, and probably specifically designed for right hand.

彼らは、これらのコントローラーやボタン、スクロールが右側にあると思っているため、外れていると思っています。おそらく右利きのために特別に設計されていると。

Well, that's actually not the case.

実際にはそうではありません。

I'm actually a left-hander, so this is how I feel most comfortable holding the r1 using my left hand, actually.

実は私は左利きなので、これが私がr1を左手で持つのに最も快適な方法です。

But if you look at this, if I hold it like this in my hand, the push the dock button, actually my midfinger just naturally lands right here for the PTD button, and for the scrolls, I basically scroll from back like that without breaking your gesture.

もしもこれを見ていただければ、これを手に持って、ドックボタンを押すと、実際に私の中指は自然にここにPTDボタンの位置に着地しますし、スクロールに関しては、ジェスチャーを崩すことなく、基本的にはこうやって後ろからスクロールします。

So, I just hold it like this and I press play, Get Lucky from Daft Punk.

このように持って、再生ボタンを押すだけで、Daft Punkの「Get Lucky」が流れます。

Okay, so that was really effective.

それは本当に効果的でした。

So, for any of you who are left-handers, this is not going to be a problem for you.

したがって、左利きの方々にとっては、これは問題にならないでしょう。

Hey, Jessie. And here's my r1. So here with...

ねえ、ジェシー。そして、こちらが私のr1です。では、ここで...

Then, of course, he shows another sneaky demo where he talks about the rotational camera.

そして、もちろん、彼は回転カメラについて話す別のずる賢いデモを見せます。

So, this is obviously worth a look.

これは明らかに見る価値があります。

My r1, let's have a close look at a rotational camera.

私のr1、回転カメラをじっくり見てみましょう。

The camera, by default, points down, which has a physical block for privacy.

カメラはデフォルトで下を向いており、プライバシーのための物理的なブロックがあります。

But if you are about to use it, you go to Vision, double click.

しかし、使おうとすると、ビジョンに行ってダブルクリックします。

And then, you just rotate.

そして、単に回転させるだけです。

Let's try that one more time.

もう一度やってみましょう。

Go back.

戻る。

It points down and enters the Vision.

それは下を向いてビジョンに入ります。

It rotates, and obviously, you can flip to the other side as well.

回転し、もちろん、反対側にもフリップできます。

Cheers.

乾杯。

So yeah, I think it was really, really fascinating on how they managed to make this device.

だから、彼らがこのデバイスをどのように作り上げたのかは、本当に魅力的だと思います。

On the space, I did get a few clips, so I did manage to listen to the entire thing.

宇宙では、いくつかのクリップを手に入れたので、全体を聞くことができました。

It was around 48 minutes.

約48分でした。

It was definitely, some fascinating stuff.

本当に興味深いものでした。

And they actually talked about three things.

そして、彼らは実際に3つのことについて話しました。

So there were three things that I do want to show you from this. Because they talked about the future of AI assistance.

だから、この3つのことをお見せしたいと思います。なぜなら、彼らはAIアシスタントの未来について話していたからです。

They also talked about how they achieved a 500 millisecond response time.

彼らはまた、500ミリ秒の応答時間をどのように実現したかについても話しました。

And they also talked about how they reduced latency.

そして、彼らはレイテンシーをどのように削減したかについても話しました。

And those are the three things that I think are most important for the future.

そして、それらは私が将来において最も重要だと思う3つのことです。

Because reduced latency makes us, I guess you could say, enjoy our AI systems more.

なぜなら、低遅延は私たちがAIシステムをより楽しむことができるようにするからです。

Because it sounds more realistic because they respond quicker.

なぜなら、より現実的に聞こえるからであり、彼らはより迅速に反応するからです。

And of course, the future of AI systems is important because these guys developed a proprietary model which seems to be better than anything currently on the market.

もちろん、AIシステムの未来は重要です。なぜなら、これらの人々は市場に現在存在するものよりも優れた独自のモデルを開発しているようです。

So this talk right here is how they achieved that 500 milliseconds response time, and I think it's an interesting listen.

この話は、彼らが500ミリ秒の応答時間を実現する方法についてのものであり、興味深い話だと思います。

If you press this button, the microphone starts recording.

このボタンを押すと、マイクが録音を開始します。

You're recording in an audio file, and that audio file needs to be converted into strings.

録音はオーディオファイルに保存され、そのオーディオファイルを文字列に変換する必要があります。

And those strings send it to the dictation engine or Speech to Text Engine, and convert to text.

そして、それらの文字列はディクテーションエンジンまたは音声テキストエンジンに送られ、テキストに変換されます。

And then that text to OpenAI ChatGPT API or Perplexity API or whatever Large Language Model for intentional understanding, and then it starts generating based on their speed.

そして、そのテキストはOpenAI ChatGPT APIまたはPerplexity API、または他の大規模言語モデルに送られ、意図的な理解に基づいて生成が始まります。

But we made a streaming model to where we basically cut off the chunks into a very, very small time stamp chunks, and we make the entire model streaming.

しかし、私たちはストリーミングモデルを作成しました。つまり、チャンクを非常に小さなタイムスタンプのチャンクに切り分け、モデル全体をストリーミング化しています。

But we do have a technology to make the sequence into a streaming.

ただし、私たちはシーケンスをストリーミング化する技術を持っています。

We're not necessarily accelerating GPT or Perplexity speed at the moment.

現時点では、私たちはGPTやPerplexityの速度を加速させるわけではありません。

But with this streaming mechanism, that if you ask non-search up-to-date information, we're constantly hitting the benchmark, which is 500 milliseconds per response.

しかし、このストリーミングメカニズムにより、最新の情報を検索しないで尋ねる場合、私たちは常に基準値である1回の応答あたり500ミリ秒に達しています。

Because again, this is whatever we're gonna push, this is going to be industrial standard because right now, this is what it is.

再度言いますが、これは私たちが押し進めるものであり、現時点では産業標準です。

So that right there is how they talk about what they're going to push.

それが彼らが推進する方法について話している内容です。

Then, of course, they additionally dive into some more details.

そして、もちろん、彼らはさらに詳細に掘り下げています。

And where would you consider your whatever whatever latency you have today?

そして、今日のあなたのレイテンシーをどのように考えますか？

How do you compare that with other similar apps like ChatGPT voice to voice?

それを他の類似のアプリ、例えばChatGPTの音声対話と比較したことはありますか？

Like, have you tried looking, comparing the two?

例えば、2つを比較してみたことはありますか？

Yeah, so we did have a technology we call kernel that we started working on this pretty early, more than two years, that we basically established a streaming model.

そうですね、私たちはこのストリーミングモデルを確立したのはかなり早い段階で、2年以上前から取り組んでいました。

Because if you think about why there's a latency, so if you press this button, the microphone starts recording, and you're recording in an audio file, and that audio file needs to be converted into strings.

なぜなら、遅延があるのか考えてみると、このボタンを押すとマイクが録音を開始し、オーディオファイルで録音され、そのオーディオファイルを文字列に変換する必要があるからです。

And those strings send it to the dictation engine or Speech to Text Engine, and convert to text.

そして、その文字列はディクテーションエンジンまたは音声テキストエンジンに送られ、テキストに変換されます。

And then, that text needs to OpenAI ChatGPT API or Perplexity API or whatever Large Language Model for intentional understanding.

そして、そのテキストはOpenAI ChatGPT APIまたはPerplexity API、または他の大規模言語モデルに送られ、意図的な理解のために生成されます。

And then, it starts generating based on their speed.

そして、それから、彼らの速度に基づいて生成が始まります。

And then, it's a run trip, right?

そして、それは往復ですね？

This is a single trip, and everything reversed again.

これは単一の往復ですし、すべてが再び逆になります。

So, if you add all this together, if you just go there and build a voice AI with no optimization based off GPT-4, we know for a fact that a single dialogue you're looking at probably five to six.

だから、これらすべてを合わせると、GPT-4をベースに最適化されていないボイスAIを構築する場合、単一の対話についてはおそらく5〜6秒かかることがわかっています。

But we made a streaming model to where we basically cut off the chunks, into a very, very small, time stamp chunks.

And we make the entire model streaming.

私は最適な人間ではないと思いますが、このことについて話すのは私ではないと思います。

I think I'm not the best guy to talk about this.

私はこのことについて話すのに最適な人ではないと思います。

Maybe our C later on can write something about this.

後で私たちのCが、このことについて何か書いてくれるかもしれませんね。

But we do have a technology to make the sequence into a streaming.

しかし、私たちはシーケンスをストリーミングにするための技術を持っています。

We're not necessarily accelerating GBP or propr praity speed at the moment.

現時点では、GBPやpropr praityのスピードを加速しているわけではありません。

But with this streaming mechanism if you ask non-search up-to-date information, we're constantly, hitting the benchmark, which is 500 milliseconds per response.

しかし、このストリーミングメカニズムにより、最新の情報の検索を要求する場合、私たちは常に500ミリ秒のレスポンスを達成しています。

But I wish everyone, I wish our team, you and me, can do something just on up-to-date information search.

しかし、私は皆さん、私たちのチーム、あなたと私が、最新の情報検索に関して何かできることを願っています。

And maybe we can push this far a little bit because, again, this is whatever we're going to push, this is going to be industrial standard because right now, this is what it is.

そして、もしかしたらこれを少し遠くまで推し進めることができるかもしれません。なぜなら、これは私たちが推進するものであり、現在の産業標準であるからです。

Yeah, absolutely, yeah.

はい、まったくその通りです。

We are certainly at the cutting edge here.

私たちは確かに最先端にいます。

And in fact, like the fact that you wanted to do it through streaming, that already makes it much like the perceived latency is already a lot better than waiting for the full response.

実際、ストリーミングを通じて行うことを希望するという事実は、完全な応答を待つよりも、知覚されるレイテンシーがすでにはるかに良くなっていることを意味します。

And I think there are so many more things we can do to speed it up.

そして、私たちはそれを加速させるためにできることはまだまだたくさんあると思います。

Yeah, that's where they talk about how they are compared to ChatGPT, and it seems like it's going to be getting even better.

そうですね、それが彼らがChatGPTと比較されているところであり、さらに良くなると思われます。

And this is the final clip where they talk about how the future of assistance is going to transpire.

そして、これはアシスタンスの未来がどのように展開されるかについて話している最後のクリップです。

So maybe I want to lead from there to, like your thoughts on the whole voice to voice form factor, right?

だから、そこからリードして、音声対音声の形態についてのあなたの考えを聞きたいですね。

Yeah, because the Rabbid device is not is definitely taking us beyond just consuming screens and text in the form of pixels to like just interacting more naturally.

そうですね、なぜなら、Rabbidデバイスは、単に画面やテキストを消費するだけでなく、より自然に対話することを可能にしてくれるからです。

So what are your thoughts on the next stage of how people consume and interact with all these AI chatbots and assistants?

では、人々がこれらのAIチャットボットやアシスタントとどのように関わり、消費するかについて、あなたの考えはどうですか？

Yeah, so I think being our age we grew up unfortunately where the dictation engine were never invented.

そうですね、私たちは、私たちの年齢であることを考えると、残念ながら、音声認識エンジンは決して発明されませんでした。

And then, it was invented, and it was put in use in a horrible way.

そして、それが発明され、ひどい方法で使用されました。

I think our current generation are victims of the early days of the dictation engine, the early days of the National Processing before Large Language Model, of course, and transformer, and all that.

私たちの現行世代は、ディクテーションエンジンの初期の日々、大規模言語モデルやトランスフォーマーなどが登場する前の自然言語処理の犠牲者だと思います。

So, I think me personally, I identify myself as probably, along with everyone here, is a PTSD with the early version of dictation engine.

だから、私は個人的には、おそらくここにいる皆さんと一緒に、初期バージョンの音声認識エンジンに対してPTSDを抱えていると思います。

That's why, I guess it creates such a strong impact on our mind that, okay, maybe voice is not a right way to go.

だから、おそらく、声は正しい方法ではないかもしれないという強い印象を私たちの心に与えるのかもしれません。

I rather prefer type, um.

私はむしろタイプする方が好きです。

But I think our principle is very, is very simple, is that what's the most included way for communication, right?

しかし、私たちの原則は非常にシンプルです。コミュニケーションにとって最も包括的な方法は何か、ですね。

Like, think about everyone.

皆さんを考えてみてください。

If we convert this Twitter spaces into a type Twitter spaces or into even worse, like a fact Twitter spaces, non-instant message Twitter spaces.

もしも私たちがこのTwitterスペースをTwitterスペースの一種や、さらに悪い場合はファクトTwitterスペースに変換した場合、比較的短い時間でこの情報をすべて提供することはできないと思います。

I don't think that we can deliver all this information in a relatively short period of time.

だから、人間同士がどのようにコミュニケーションを取るかを考えると、ニューラリンクのようなものが使われる前の、特に声による会話はまだ最も効率的な方法です。

So if you think about how human communicates with human, and before the Neuralink stuff become put in use, and natural language, especially conversation in voice, is still to be the most efficient way.

特に若い世代にとって、ここで聞いているリスナーの中には5歳、6歳、7歳の子供を持っている人がどれくらいいるかわかりませんが、彼らは実際にキーボード上のディクテーションアイコンをタイピングを始めるよりも好むということがあります。

Now the problem becomes the easy because we just need to fix a PTSD.

今、問題は簡単になりました。なぜなら、私たちはPTSDを修正するだけで済むからです。

But I think if you, if you look at the past three, four years, probably like, especially past past three years, a lot of the fundamental infrastructure around that has been significantly improved.

しかし、過去3、4年を見てみると、特に過去3年間を考えると、その周りの基盤インフラストラクチャが大幅に改善されてきたと思います。

To where the younger generation, especially, I'm not sure if you know how many of the listeners here got like, probably like 5-year-old, 6-year-old, 7-year-old kid, but the younger generation, that they were born like after, I guess, after 2010, I see among all the kids that they actually prefer the dictation icon on the keyboard rather than start typing.

特に若い世代、特にここにいるリスナーの中に、おそらく5歳、6歳、7歳の子供がどれくらいいるかはわかりませんが、若い世代、つまり2010年以降に生まれた子供たちの中で、キーボード上の音声入力アイコンを実際に好む子供が多いことが見られます。通常のタイピングよりも、声による入力を選好しているようです。

So, I think the use behavior in a different generation is already start shifting.

だから、異なる世代の利用行動はもう変わり始めていると思います。

And of course, the fundamental reason is because a lot of infrastructure are good enough, are redundant enough.

もちろん、根本的な理由は、多くのインフラが十分に整っていて、冗長性があるからです。

So for us we are not saying that, why you can only talk to r1 if you shake the r1, the keyboard will pop up.

だから、私たちにとって、r1を振るとr1に話しかけることしかできないのは、なぜなのかと言っているわけではありません。

But if you think about the most ined way, and if you're in a rush, there's nothing better than just find that analog button, press and hold, and start talking.

でも、もっとも自然な方法を考えてみてください。急いでいる場合、アナログなボタンを見つけて、押し続けて話し始めることよりも良い方法はありません。

So I guess that's the our design principle.

だから、それが私たちのデザインの原則だと思います。

You know, we understand the current challenges of the difficulties, but we want to push just a little bit further because the method is not wrong, right?

私たちは、現在の困難さを理解していますが、少しでも前進したいと思っています。方法が間違っているわけではないですよね？

The approaching is not wrong.

アプローチが間違っているわけではありません。

It feels wrong because the technology won't ready, but I think, in like I said, in the past 3-4 years, a lot of infra has been significantly achieved.

技術が準備できていないから違和感があるだけですが、過去3〜4年間で、多くのインフラが大きな進展を遂げてきたと思います。

この記事が気に入ったらサポートをしてみませんか？