
【rabbit r1のアップグレード:機能紹介と将来展望】英語解説を日本語で読む【2024年1月21日|@TheAIGRID】

この動画は、AI対応デバイスrabbit r1の最新アップグレードを紹介しています。rabbit r1は高い人気を博し、現在6回目のバッチ予約中で、価格は$200です。新たにPerplexityとの提携でrabbit購入者に1年間の無料利用が提供され、そのGoogle検索のようなAI機能が強化されます。デバイスは、Airbnb予約の自動化など、多様なタスクを学習し、操作することができます。MicrosoftのCEOもこのデバイスを高く評価し、Rabbit創設者と共に言及しています。また、デバイスはiPhone Pro Maxに近いサイズで、左利きの人にも使いやすく、プライバシー保護のために回転カメラを装備しています。音声AIのレスポンス速度も向上しており、未来のAIアシスタンスとしての可能性も秘めています。

So, the new rabbit r1 device recently just got a major upgrade, and I need to show you all why this is really, really cool and some of the things you didn't know.

新しいrabbit r1デバイスは最近大幅にアップグレードされましたので、なぜこれが本当に素晴らしいのか、そして知らなかったことのいくつかを皆さんにお見せする必要があります。

Because there are some exclusive videos that show us just exactly what's going on with this device, so let's actually take a look at some of the things if you're buying the rabbit, if you're looking forward to it, you do want to know.


So, number one is that these things are selling fast, like really fast, okay?


So, the fifth batch of 10,000 rabbit r1 devices has sold out.

10,000台のrabbit r1デバイスの第5バッチは完売しました。

Pre-orders for the sixth batch, totally 50,000, are now available at rabbit.tech.


An expected delivery date for the sixth batch is June to July 2024, and for all addresses in the EU and UK, batches number 1 to six will be shipped by the end of July 2024 on a first come first serve basis.


So, if you're one of the people that ordered this when you first saw it, you're likely to get it earlier than someone that orders it currently now.


And remember, this is only $200, which is around 170 British pounds, around €70.


Now, what was recent was there was actually a really new announcement that essentially they dived into, and essentially this was an announcement with Perplexity.


Now, many people actually don't know what Perplexity is, which is why I'm making this video so you guys can understand.


So, essentially, Perplexity is basically like Google Search, but it combines AI to be more effective.


And honestly, you can't knock it until you try it because it is really, really effective.


So, take a look at the Perplexity trailer so you can really understand exactly what's going on, because trust me when I say it's really, really effective.


So, take a look at this, and I'm going explain to you, their announcement.


Whether you're navigating the Maze of headphone options, drowning in news noise, or stalled on your Japan trip plans, Perplexity Copilot is your guided Search Assistant.

ヘッドフォンの選択肢の迷宮、ニュースの騒音に溺れたり、日本旅行の計画が停滞したりしている場合、Perplexity Copilotはあなたのガイド付き検索アシスタントです。

Just turn on Copilot and ask, it takes a deep dive into anything you want to know and delivers tailor-made concise answers.


Forget about diving into a sea of links, Copilot does the leg work by grasping the essence of your question to fine-tune your search.


Copilot engages you with clarifying questions.


This ensures you get what you're actually after.


Once it gets what you're asking, Copilot scours a vast array of sources to ensure relevance and quality.


Want to know more?


Every source is just a click or tap away for deeper exploration.


Let's say you asked a quick question, but the answer wasn't what you were looking for.


Easy, at the bottom of your quick answer, hit rewrite and select Copilot to turn your quick search into a guided search experience.


With Perplexity Copilot, you're not just searching, you're gaining a new window into the internet.

Perplexity Copilotを使えば、単に検索するだけでなく、インターネットへの新たな窓を開くことができます。

From the simplest questions to your deepest inquiries, this is where knowledge begins.


So, Perplexity is really, really effective at what it does.


And for those of you who know what Perplexity is and for those of you who use it, you're going to know exactly what I'm saying is so true.


That's why this announcement is so cool because rabbit actually partnered with Perplexity to provide everyone who buys their rabbit device a year completely of this.


Now, usually, I think this is around $10 or $20 a month, but currently, they're going to give you guys a year complete free.


And trust me when I say this is going to make the rabbit device so much better.


That's why I said that this has been completely supercharged.


Now, later on in the video, you're going to see some videos of the actual rabbit device because the founder actually shared some videos on Twitter, things like size references, some other cool stuff like that.


And there was actually a mention of rabbit's device by the CEO of Microsoft.


But so then we have the founder of both companies here on a Twitter space talking about this announcement, and I think you guys should listen to Discode, it is really, really.


So I'm pretty excited to share that, Perplexity and rabbit are partnering together, so we are excited to power real-time precise answers for rabbit r1 using our Perplexity online LLM APIs that have no knowledge cut off, is always plugged into our search index, and the first 100,000 rabbit arban purchases will also get one year free Perplexity Pro where that came from.

だから、Perplexityとrabbitがパートナーシップを組んでいることをお知らせできてとても嬉しいです。私たちはPerplexityのオンラインLLM APIを使用して、rabbit r1のリアルタイムで正確な回答を提供することに興奮しています。これには知識の制限がなく、常に私たちの検索インデックスに接続されています。また、最初の10万個のrabbit arbanの購入者には、1年間の無料Perplexity Proも提供されます。

I didn't know that feature, but yeah, continue everyone.


Okay, yeah, the first 100,000 rabit arban purchases are going to get one year free of Perplexity Pro.

はい、最初の10万個のrabbit arbanの購入者は、1年間のPerplexity Proを無料で提供されます。

So, it's basically like Perplexity Pro one year free is 200 bucks.

つまり、Perplexity Proが1年間無料で提供されるということは、200ドルの価値があるということです。

So, if you pay 200 bucks to purchase a rabbit r1, you're getting twice the value.

ですので、rabbit r1を200ドルで購入すると、2倍の価値を得ることができます。

Yeah, so, we had a interaction on X, couple of days ago, and, then what's going on next is the following couple days, team's been working really hard together to make this happen.


And I think to me, it's a no-brainer if you think about rabbit r1 with price at $199, no, actually not $2, 200, but $199, no subscription and the Perplexity errand is generous enough to offer, Perplexity Pro for a whole year, that wors actually 200 bucks.

私にとっては、rabbit r1を考えると、$199という価格、いや、実際には$2,200ではなく$199、加入料はなく、Perplexityの使命は十分に寛大で、1年間のPerplexity Proを提供してくれるということは、実際には200ドルですが、簡単な選択です。

There was that announcement that was really cool, but there was also some other stuff, okay?


So, like I said, the Microsoft CEO, Saan Adella, actually talks about just how good rabbit was.


And I can't imagine how this must feel as the rabbit founder, seeing the CEO of, I think it is now the world's largest company, talk about the product that you've created.


You see, I thought the demo of, the rabbit OS and, the device was fantastic.

rabbit OSとデバイスのデモは素晴らしかったと思います。

I think I must say, after Job's, sort of launch of iPhone, probably one of the most impressive presentations I've seen of capturing the, the vision, of what is possible going forward for what is an agent-centric, operating system and interface.


And I think that's what everybody's going seeking, what which device will make it and so on.


It's unclear, but I think it's very, very clear that computer, I go back to that, right?


If you have a breakthrough in natural interface, where this idea that you have to go one app at a time and all of the cognitive load is with you, as a human, does seem like there can be a real breakthrough.


Because in the past, when we had the first generation, whether it was Cortana or Alexa or Siri or what have you, it was just not, it was too brittle, where we didn't have these Transformers, these Large Language Models, whereas now we have, I think, the tech to go and come up with a new app model.

過去には、最初の世代のCortanaやAlexa、Siriなどがあったとしても、これらのTransformersやLarge Language Modelsがなかったため、非常に脆弱でしたが、今では新しいアプリモデルを作るための技術があると思います。

And once you have a new interface and a new app model, I think new hardware is also possible.


And has that an opportunity from Microsoft or are you moving away from hardware?


I mean, always it's an opportunity.


So, that talk right there was really fascinating because Microsoft seemed to be kind of eyeing up the hardware market.


And I mean, you have to remember, it was a couple of years ago, in fact, not just a couple years ago, in fact, it was, I think, around 15 years ago where Microsoft, really just pulled the plug on their device which was the Windows Phone.

そして、覚えておいてください、それは数年前のことで、実際には数年前だけでなく、15年前くらいだったと思いますが、マイクロソフトはWindows Phoneというデバイスを完全に中止しました。

Some of you don't even know what that is, and rightly so because it just didn't go well.


And, and it just goes to show how hard it is to make a consumer Hardware device that actually does succeed.

そして、実際にMicrosoftが再びこの市場に参入するかどうかは興味深いですが、CU OpenAIもデバイスに取り組む予定でない限り、参入しないと思います。

And it will be interesting to see if Microsoft just jump back into this, but, I don't think they will, unless OpenAI are going to be working on a device too.


But I think if you watched some of the other videos that I talked about in where I talked about Ray-Ban's AI glasses that are going to be coming in the future, I think that that is going to be an interesting point.


Now, something as well that many people did miss was how rabbit actually works.


And in the original video which made discussing rabbit amazing Tech, I didn't actually show this video from their website where they actually talk about Language Action Models.

rabbitについて話題となった素晴らしいテックのオリジナルビデオでは、彼らがLanguage Action Modelについて話しているウェブサイトのビデオは実際には紹介していませんでした。

Essentially, their new proprietary system on how they actually use agents to, I guess you could say interact with the web because LLMs are good, but they are text-based, and that's essentially their purpose.


They can be repurposed for other things, but that's not what they were made for.


So, they essentially made LAMs, and in this, demo, they essentially talk about how Large Action Models are pretty much better than anything we've ever seen, and it's a New Foundation model that understands human intentions on computers.


So, I think this is a really interesting watch.


And then, after this, I want to show you guys, some of the, videos of rabbit, like actually being used, so some more in-person demos because I think it's really, really interesting.


Because, I know, like everyone who's ordered it, you probably want to know, how big it is, you probably want to know how certain things work for certain capabilities, so, I'm going to show you that in a second.


We can teach Rabbid OS how to use specific applications.

Rabbid OSには、特定のアプリケーションの使い方を教えることができます。

In this video, I'm teaching a rabbit how to book an Airbnb while I'm operating normally as a human.


On the left screen, watch closely on the right as the Large Action Model is learning all my inputs and imitating my behavior in real time.

左の画面では、右側でLarge Action Modelが私の入力を学習し、リアルタイムで私の行動を模倣しています。

So, I'm trying to plan a trip to Barcelona with my wife and my daughter.


The first thing I'm going to do is navigate to the anywhere option, and I'm going to type Barcelona in the search field.


The system suggesting Barcelona, Spain, which is exactly where we want to go.


Using the website's calendar tool, I'm going to mark our check-in on the 15th and check out on the 21st.


Now, I'll click add guests and adjust the members accordingly.


Now, let's hit the search button and see what pops up.


Since we love the beach, let's make sure to select the beachfront option.


And for a more private experience, I'm going to select entire home, so we have the whole place for ourselves.


For the budget, I'll set a maximum at 400,000 one and a minimum of 100,000, so that all the options are within our price range.


We're going to need at least two bedrooms to make sure we all have our own space.


Finally, with all of our preferences set, we've got plenty of options that fit the bill.


I'll just start browsing for the perfect one.


Each training only takes a few minutes and does not require access to an application programming interface, also known as an API, nor do you need anything installed on your device.


You only need to train each workflow once.


Let's try to use Rabbid OS and instead book a room in London.

では、Rabbid OSを使ってロンドンの部屋を予約してみましょう。

My extended family is going to London.


It's going to be eight of us and four kids.


We're thinking of December 30th to January 5th.


It's not set in stone yet, so I just want some general options.


Can you look it up for me?


Sure, I can help you with that.


The first option is a home in Porto Bell Muse's house, priced at 1,348,3511 per night, with a rating of 4.8.


The Large Action Model supports mobile apps, web apps, and professional desktop apps.


It learns directly on the user interfaces and acts on them.


We have already started the training process for the most popular apps.


As you're watching this video, rabbit OS is learning fast and adapting to hundreds of applications.

このビデオを見ている間に、rabbit OSは急速に学習し、数百のアプリに適応しています。

The ultimate goal of rabbit is to define the first natural language operating system that replaces apps on your device.


It's time for the machines to do some serious homework.


So, I think you can understand why this product sold the way it did because if what they're saying is even remotely true, I mean, training this takes minutes it needs no API required, that you know, you can do it within one without software one time.


That's what they said, each workflow you just need to train it once.


I mean, if that is really true, and that's a bold claim, they are definitely, definitely breaking new ground here.


So I would say that that is absolutely incredible.


But that's just some understanding of how it works.


Then of course, we had the benchmarks, which I found to be really, really cool because I actually compared it to GPT-4, GPT-3.5, Flan-T5-XL, some of the other things, and you can see just how good LAM large 1 is, Neuro-Symbolic their new proprietary model.

そしてもちろん、私たちはベンチマークも行いました。実際にGPT-4やGPT-3.5、Flan-T5-XLなどと比較してみたのですが、LAM large 1がどれだけ優れているかがわかります。Neuro-Symbolicという彼らの新しい独自モデルです。

And then, of course, we get to the size references.


So, this is where the founder actually talks about just how big this is because some people might want to see just how, this is, how it works what the size is, just some cool stuff like that.


And then he also shows two other videos.


So, I want to show you guys thiscause I think it's important to see just how big it is.


And I kind of wish he did compare it to, like, an iPhone. Because I feel like this might not replace the iPhone, but it's still a similar handheld device.


But nonetheless, definitely worth a watch.


But the idea is seven years ago when I designed Raven H, I have this magnetic detachable pixelated controller that just stocks on the main device like that.

しかし、アイデアは、7年前にRaven Hを設計したとき、私はこの磁気式の取り外し可能なピクセル化されたコントローラーを持っていて、それがメインデバイスにくっつくんですよ。

But the idea is that you can carry around and you can kind of, like, just hold on, talk.


But r1 is actually smaller than that.


If you put it on top, it's smaller than the footprint.


But it's exactly the footprint.


The wiist wise is exactly like an iPhone Pro Max model, but 50% of the footprint.

ワイストワイズはまさにiPhone Pro Maxモデルと同じですが、フットプリントの50%です。

That's kind of like the idea.


So, he said it's pretty much the same size like an iPhone 15 Pro Max, but just half of it.

だから、彼はそれがほぼiPhone 15 Pro Maxと同じサイズだと言っていますが、ちょうどその半分です。

Then, of course, this is something for, I guess, you could say accessibility.


So, he talks about why you don't need a left-handed version.


Hey, this is Jesse, and here's my r1.


A lot of people on Twitter have been asking, Hey, can you guys make an L1 for left-hand users?


The out, because they think that all these controllers and the button and scrolls on the right side, and probably specifically designed for right hand.


Well, that's actually not the case.


I'm actually a left-hander, so this is how I feel most comfortable holding the r1 using my left hand, actually.


But if you look at this, if I hold it like this in my hand, the push the dock button, actually my midfinger just naturally lands right here for the PTD button, and for the scrolls, I basically scroll from back like that without breaking your gesture.


So, I just hold it like this and I press play, Get Lucky from Daft Punk.

このように持って、再生ボタンを押すだけで、Daft Punkの「Get Lucky」が流れます。

Okay, so that was really effective.


So, for any of you who are left-handers, this is not going to be a problem for you.


Hey, Jessie. And here's my r1. So here with...


Then, of course, he shows another sneaky demo where he talks about the rotational camera.


So, this is obviously worth a look.


My r1, let's have a close look at a rotational camera.


The camera, by default, points down, which has a physical block for privacy.


But if you are about to use it, you go to Vision, double click.


And then, you just rotate.


Let's try that one more time.


Go back.


It points down and enters the Vision.


It rotates, and obviously, you can flip to the other side as well.




So yeah, I think it was really, really fascinating on how they managed to make this device.


On the space, I did get a few clips, so I did manage to listen to the entire thing.


It was around 48 minutes.


It was definitely, some fascinating stuff.


And they actually talked about three things.


So there were three things that I do want to show you from this. Because they talked about the future of AI assistance.


They also talked about how they achieved a 500 millisecond response time.


And they also talked about how they reduced latency.


And those are the three things that I think are most important for the future.


Because reduced latency makes us, I guess you could say, enjoy our AI systems more.


Because it sounds more realistic because they respond quicker.


And of course, the future of AI systems is important because these guys developed a proprietary model which seems to be better than anything currently on the market.


So this talk right here is how they achieved that 500 milliseconds response time, and I think it's an interesting listen.


If you press this button, the microphone starts recording.


You're recording in an audio file, and that audio file needs to be converted into strings.


And those strings send it to the dictation engine or Speech to Text Engine, and convert to text.


And then that text to OpenAI ChatGPT API or Perplexity API or whatever Large Language Model for intentional understanding, and then it starts generating based on their speed.

そして、そのテキストはOpenAI ChatGPT APIまたはPerplexity API、または他の大規模言語モデルに送られ、意図的な理解に基づいて生成が始まります。

But we made a streaming model to where we basically cut off the chunks into a very, very small time stamp chunks, and we make the entire model streaming.


But we do have a technology to make the sequence into a streaming.


We're not necessarily accelerating GPT or Perplexity speed at the moment.


But with this streaming mechanism, that if you ask non-search up-to-date information, we're constantly hitting the benchmark, which is 500 milliseconds per response.


Because again, this is whatever we're gonna push, this is going to be industrial standard because right now, this is what it is.


So that right there is how they talk about what they're going to push.


Then, of course, they additionally dive into some more details.


And where would you consider your whatever whatever latency you have today?


How do you compare that with other similar apps like ChatGPT voice to voice?


Like, have you tried looking, comparing the two?


Yeah, so we did have a technology we call kernel that we started working on this pretty early, more than two years, that we basically established a streaming model.


Because if you think about why there's a latency, so if you press this button, the microphone starts recording, and you're recording in an audio file, and that audio file needs to be converted into strings.


And those strings send it to the dictation engine or Speech to Text Engine, and convert to text.


And then, that text needs to OpenAI ChatGPT API or Perplexity API or whatever Large Language Model for intentional understanding.

そして、そのテキストはOpenAI ChatGPT APIまたはPerplexity API、または他の大規模言語モデルに送られ、意図的な理解のために生成されます。

And then, it starts generating based on their speed.


And then, it's a run trip, right?


This is a single trip, and everything reversed again.


So, if you add all this together, if you just go there and build a voice AI with no optimization based off GPT-4, we know for a fact that a single dialogue you're looking at probably five to six.


But we made a streaming model to where we basically cut off the chunks, into a very, very small, time stamp chunks.


And we make the entire model streaming.


I think I'm not the best guy to talk about this.


Maybe our C later on can write something about this.


But we do have a technology to make the sequence into a streaming.


We're not necessarily accelerating GBP or propr praity speed at the moment.

現時点では、GBPやpropr praityのスピードを加速しているわけではありません。

But with this streaming mechanism if you ask non-search up-to-date information, we're constantly, hitting the benchmark, which is 500 milliseconds per response.


But I wish everyone, I wish our team, you and me, can do something just on up-to-date information search.


And maybe we can push this far a little bit because, again, this is whatever we're going to push, this is going to be industrial standard because right now, this is what it is.


Yeah, absolutely, yeah.


We are certainly at the cutting edge here.


And in fact, like the fact that you wanted to do it through streaming, that already makes it much like the perceived latency is already a lot better than waiting for the full response.


And I think there are so many more things we can do to speed it up.


Yeah, that's where they talk about how they are compared to ChatGPT, and it seems like it's going to be getting even better.


And this is the final clip where they talk about how the future of assistance is going to transpire.


So maybe I want to lead from there to, like your thoughts on the whole voice to voice form factor, right?


Yeah, because the Rabbid device is not is definitely taking us beyond just consuming screens and text in the form of pixels to like just interacting more naturally.


So what are your thoughts on the next stage of how people consume and interact with all these AI chatbots and assistants?


Yeah, so I think being our age we grew up unfortunately where the dictation engine were never invented.


And then, it was invented, and it was put in use in a horrible way.


I think our current generation are victims of the early days of the dictation engine, the early days of the National Processing before Large Language Model, of course, and transformer, and all that.


So, I think me personally, I identify myself as probably, along with everyone here, is a PTSD with the early version of dictation engine.


That's why, I guess it creates such a strong impact on our mind that, okay, maybe voice is not a right way to go.


I rather prefer type, um.


But I think our principle is very, is very simple, is that what's the most included way for communication, right?


Like, think about everyone.


If we convert this Twitter spaces into a type Twitter spaces or into even worse, like a fact Twitter spaces, non-instant message Twitter spaces.


I don't think that we can deliver all this information in a relatively short period of time.


So if you think about how human communicates with human, and before the Neuralink stuff become put in use, and natural language, especially conversation in voice, is still to be the most efficient way.


Now the problem becomes the easy because we just need to fix a PTSD.


But I think if you, if you look at the past three, four years, probably like, especially past past three years, a lot of the fundamental infrastructure around that has been significantly improved.


To where the younger generation, especially, I'm not sure if you know how many of the listeners here got like, probably like 5-year-old, 6-year-old, 7-year-old kid, but the younger generation, that they were born like after, I guess, after 2010, I see among all the kids that they actually prefer the dictation icon on the keyboard rather than start typing.


So, I think the use behavior in a different generation is already start shifting.


And of course, the fundamental reason is because a lot of infrastructure are good enough, are redundant enough.


So for us we are not saying that, why you can only talk to r1 if you shake the r1, the keyboard will pop up.


But if you think about the most ined way, and if you're in a rush, there's nothing better than just find that analog button, press and hold, and start talking.


So I guess that's the our design principle.


You know, we understand the current challenges of the difficulties, but we want to push just a little bit further because the method is not wrong, right?


The approaching is not wrong.


It feels wrong because the technology won't ready, but I think, in like I said, in the past 3-4 years, a lot of infra has been significantly achieved.

