【AIモデルの進化：Q-learningの役割】英語解説を日本語で読む【2023年11月24日｜@TheAIGRID】

2023年11月25日 16:16

この動画では、Q-learningの機能と重要性を分かりやすく説明しています。Q-learningは強化学習の一種で、AI進化に大きな影響を与える可能性があります。Q-learningは、大規模な言語モデルの静的な知識やバイアスなどの問題を解決し、動的な学習と特定の目標達成に適しています。GoogleのGeminiモデルもQ-learningに似た手法を使用している可能性があり、これにより、AIの新たな進歩が期待されます。
公開日：2023年11月24日
※動画を再生してから読むのがオススメです。

So, this video will get into the exact specifics of how Q-learning works, and it's going to try and break it down in the easiest way possible so you can gain an understanding of why OpenAI's potential breakthrough could be the next evolution in large language models and AI models.

このビデオでは、Q-learningがどのように機能するのか、その具体的な仕組みに迫り、可能な限り簡単な方法でそれを分解し、なぜOpenAIの潜在的なブレークスルーが大規模な言語モデルやAIモデルの次の進化となり得るのかを理解してもらおうと思う。

So, let's waste no time and jump right in.

それでは、時間を無駄にすることなく、さっそく飛び込んでみよう。

So, what is Q-learning?

では、Q-learningとは何でしょうか？

One of the main things that we want to talk about is where does the name qar come from.

私たちが話したいことの一つは、qarという名前の由来です。

The name qar likely comes from two sources.

qarという名前の由来はおそらく2つある。

Firstly, the Q could be a reference to Q-learning, which we will discuss later, and essentially it's a type of machine learning used in reinforcement learning.

第一に、QはQ-learningを指している可能性があります。Q-learningについては後述しますが、基本的には強化学習に使われる機械学習の一種です。

Okay, so that's where the q is from.

なるほど、これがqの由来か。

I'm guessing they're trying to merge this, and I'm going to talk about the second part.

私は、彼らがこれを統合しようとしていると推測しています。

For the second part, essentially the star comes from the AAR search.

第2部では、基本的に星はAARの検索から来ています。

There was a research paper, I think written in 2019, and the AAR search algorithm is a pathfinding and graph traversal algorithm which is widely used in computer science for a variety of problems, especially in games and AI for finding the shortest path between two points.

AAR探索アルゴリズムとは、経路探索とグラフ探索のアルゴリズムで、コンピュータサイエンスにおいて様々な問題、特にゲームやAIで2点間の最短経路を見つけるために広く使われています。

Okay, so I'm going to do that again, but I'm going to show you guys in simpler terms how exactly that works.

さて、それではもう一度、具体的にどのように機能するのか、より簡単な言葉で皆さんにお見せしよう。

Essentially, a simpler definition of Q-learning is that you can think of the name qar like a nickname for a super smart robot.

基本的に、Q-learningをより簡単に定義すると、qarという名前は超賢いロボットのニックネームのようなものだと考えてもらえればいい。

And then the Q part is basically like saying this robot is really good at making decisions and that it learns from its experience, just like you would learn if you played a video game a bunch of times.

そしてQの部分は基本的に、このロボットは意思決定がとても上手で、ビデオゲームを何度もプレイして学ぶのと同じように、経験から学ぶということを表しています。

And of course, the more you play, the better it gets at figuring out how to win.

そしてもちろん、遊べば遊ぶほど、どうすれば勝てるかがわかってくる。

Then, of course, we have the simpler definition for AAR search.

そしてもちろん、AAR検索にはもっとシンプルな定義がある。

Essentially, you just need to think of it like this.

基本的には、こう考えればいい。

So, imagine you're in a maze and you need to find the quickest way out.

迷路の中にいて、最短の出口を見つける必要があるとする。

There's a classic method in computer science, kind of like a set of instructions that help you find the shortest path in a maze, and that is exactly what we call A* search.

コンピュータ・サイエンスには、迷路の中で最短経路を見つけるのに役立つ一連の命令のような古典的な方法があります。

And of course, once you mix this with deep learning...

そしてもちろん、これにディープラーニングを組み合わせれば...。

And then, you get the computers to learn and improve from the experience.

そして、コンピュータに学習させ、経験から改善させる。

You get a really, really smart system.

本当に、本当に賢いシステムを手に入れることができる。

And it's not just finding the shortest path in the maze.

迷路の最短経路を見つけるだけでなく、もっと厄介な問題も解決できる。

It can solve much trickier problems by finding the best solutions, just like how you might figure out the best way to beat a video game.

ビデオゲームに勝つための最良の方法を見つけ出すのと同じように、最良の解決策を見つけることで、より厄介な問題を解決することができる。

So now, we're going to look at six steps to actually understanding Q-learning because there are six key parts, and they're really simple once they're broken down into these parts.

では、Q-learningを理解するための6つのステップを紹介しよう。

And overall, Q-learning, before we get into these six parts, it's basically like training a pet.

この6つの部分に入る前に、Q-learningは基本的にペットのしつけのようなものです。

If the pet does something good, like sitting on command, you give it a treat.

ペットが何か良いことをしたら、例えば命令通りに座ったら、おやつを与える。

And if it does something not so good, like chewing on your shoes, you say no or ignore it.

そして、あまり良くないこと、例えば靴を噛むようなことをしたら、ダメと言うか無視する。

So that's how the basic of this reinforcement learning actually does work.

これが強化学習の基本です。

You reward them for the good decisions, and then you penalize them for the bad decisions.

良い判断にはご褒美を与え、悪い判断にはペナルティを与える。

So step one in Q-learning is the environment and the agent.

Q-learningにおけるステップ1は、環境とエージェントです。

In Q-learning, you have an environment like a video game or potentially like a maze, and an agent, and the AI or computer program that needs to learn how to navigate this environment.

Q-learningでは、ビデオゲームのような環境、あるいは迷路のような可能性のある環境、エージェント、そしてこの環境をどのようにナビゲートするかを学習する必要があるAIやコンピュータープログラムがあります。

So that's just a basis.

これは単なる基礎にすぎません。

We have the agent, and then we have the environment that the agent is going to be in.

エージェントがいて、エージェントがいる環境がある。

Then, of course, we have the states and actions.

そしてもちろん、ステートとアクションがあります。

So with the states and actions, this is where we have the environment.

ステートとアクションがあり、ここに環境がある。

It's going to be made up of different states and different actions that the agent can take.

さまざまな状態と、エージェントがとることのできるさまざまな行動で構成されます。

So essentially, the agent may be able to move left or right, and of course, the different positions that they can take on the board or in said game, which is fairly simple to understand.

つまり、エージェントは左右に動くことができ、もちろんボード上やゲーム内でさまざまなポジションを取ることができます。

Then, of course, we have something called the Q table.

それからもちろん、Qテーブルと呼ばれるものもあります。

So the Q table is basically like the big cheat sheet that tells the agent what action is best to take in each state.

Qテーブルは基本的に、各状態でどのような行動を取るのが最善かをエージェントに伝える大きなカンニングペーパーみたいなものです。

And at first, this table is filled with guesses because the agent doesn't know the environment yet.

そして最初は、エージェントがまだ環境を知らないので、この表は推測で埋め尽くされます。

So of course, this isn't going to have all the correct data because it doesn't have the right movements because it hasn't done it yet.

もちろん、これは正しいデータをすべて持っているわけではなく、正しい動きをしていないからだ。

Then, of course, we have step four, which is learning by doing.

そしてもちろん、ステップ4があります。

So the agent starts to explore the environment, and every time it takes an action in a state, it gets feedback from the environment.

エージェントは環境を探索し始め、ある状態で行動を起こすたびに、環境からフィードバックを得ます。

You get rewarded for the positive points, and you get penalties for the negative points.

ポジティブなポイントには報酬が与えられ、ネガティブなポイントにはペナルティが与えられます。

So this feedback loop helps the agent update the Q table, essentially learning from the experience.

このフィードバック・ループは、エージェントがQテーブルを更新するのに役立ち、基本的に経験から学習します。

So it goes out, it tries to figure out which way it's going to go.

そのため、エージェントは外に出て、どちらに進むかを判断しようとする。

And then, of course, it updates that.

そしてもちろん、それを更新する。

And, of course, that's what we have at step five, which is where you update the Q table.

そしてもちろん、ステップ5ではQテーブルを更新します。

So, the Q table is going to be updated using a formula that considers the current reward and also the potential future rewards.

Qテーブルは、現在の報酬と将来の可能性を考慮した計算式を使って更新されます。

Make sure you pay attention to this part because the potential future rewards is, of course, one of the key things that separate Q-learning from many of the others.

将来の潜在的な報酬は、もちろん、Q-learningを他の多くの学習と区別する重要な点の1つなので、この部分に注意してください。

Okay, so this way the agent doesn't just learn to maximize the immediate rewards, but also to consider the long-term consequences of its actions.

なるほど、このようにしてエージェントは目先の報酬を最大化することだけを学習するのではなく、その行動の長期的な結果も考慮するようになるのですね。

Because, think about it like this, if you had an AI system which didn't think about long-term rewards, every time it got a reward for doing something good, it would just keep doing that same good thing over and over again.

もし長期的な報酬を考えないAIシステムがあったとしたら、何か良いことをして報酬を得るたびに、同じ良いことを何度も何度も繰り返すでしょう。

And it would just kind of be like this, you know, spiral that wouldn't lead you to the future and long-term better goals.

そうなると、未来や長期的な目標につながらないスパイラルに陥ってしまう。

So, that's why um, this algorithm is really, really cool because it has long-term consequences planned into it.

だから、このアルゴリズムは本当にクールなんだ。

Then, of course, we have number six, which is overtime with enough exploration and learning, the Q table gets more and more accurate.

もちろん、6番目もあります。十分な探索と学習によって、Qテーブルはますます正確になっていきます。

The agent becomes better at predicting which actions will yield the highest rewards in different states.

エージェントは、異なる状態においてどの行動が最も高い報酬をもたらすかを予測するのがうまくなる。

And eventually, it can navigate the environment very, very effectively.

そして最終的には、非常に効果的に環境をナビゲートできるようになる。

Which is why we have this image of an AI that is pretty much a God and is able to do it in the fastest way possible.

これが、私たちがAIを神のような存在と考え、可能な限り最速の方法でそれを実行できるようにする理由です。

So, overall, you can think of Q-learning like playing a complex video game where over time, you learn the best moves and strategies to get the highest score.

つまり、Q-ラーニングは複雑なビデオゲームをプレイするようなもので、時間をかけて最高得点を得るための最善の動きや戦略を学んでいくものだと考えることができる。

Initially, you're not going to know the best actions to take, but as you play more and more, you can learn from your experience and get better at the game.

最初のうちは、取るべき最善の行動を知ることはできないが、プレーを重ねるにつれて、経験から学び、ゲームを上手にこなせるようになる。

That's what this AI is doing with Q-learning.

このAIがQ-learningでやっているのはそういうことだ。

It's learning from experiences to make the best decisions in different scenarios.

さまざまなシナリオで最善の決断を下すために、経験から学んでいるのだ。

Then, of course, we do have the most likely future of LLMS.

そしてもちろん、LLMSの最も可能性の高い未来もある。

Because one thing that I did want to add was that LLMS do have current limitations.

LLMSには現在のところ限界がある。

And that's why I do believe that Q-Star is currently being explored as a viable option for the future of large language models.

そのため、Q-Starは将来の大規模言語モデルの有力な選択肢として検討されていると思います。

So, please watch this clip from someone at Google Deep Mind who talks about how LLMS have these limitations and why these kinds of styles that we're starting to implement and starting to look in are going to be the future of large language models.

Googleディープマインドの人が、LLMSには限界があること、そしてなぜ私たちが実装し始め、注目し始めているこのようなスタイルが大規模言語モデルの未来になるのかについて語っています。

These Foundation models are World models of a kind.

これらのファウンデーション・モデルは、ある種のワールド・モデルです。

And to do really creative um problem solving, you need to start searching.

そして、本当に創造的な問題解決をするためには、検索を始める必要があります。

So, if I think about something like AlphaGo in the move 37, famous move 37, where did that come from?

有名な37手目のAlphaGoのようなものについて考えてみましょう。

Did that come from all its data that it's seen of human games or something like that?

人間の対局のデータとか、そういうものから来たのでしょうか？

No, it didn't.

いや、そうではない。

It came from it identifying a move as being quite unlikely but know possible.

可能性はかなり低いが、可能性はあると判断したんだ。

And then, via the process of search, coming to understand that the that was actually a very, very good move.

そして、探索のプロセスを経て、その手が実はとてもいい手だと理解するようになった。

So, you need to get real creativity.

だから、本当の創造性を手に入れる必要がある。

You need to search through spaces of possibilities and find these sort of hidden gems.

可能性の空間を探して、隠れた宝石のようなものを見つける必要がある。

That's what creativity is, I think.

創造性とはそういうものだと思います。

Current language models, they don't really do that kind of thing.

現在の言語モデルでは、そのようなことはできません。

They really are mimicking the data, they're mimicking all the human ingenuity and everything which they have seen from all this data that's coming from the internet, that's originally derived from humans.

彼らは本当にデータを模倣しているのであって、インターネットから送られてくるデータ、元々は人間から得られたデータから見た人間の創意工夫やあらゆるものを模倣しているのだ。

If you want a system that can go be truly beyond that and not just generalize in novel ways, so it can, you know, these models can blend things, they can do, you know, Harry Potter in the style of a Kanye West rap or something, even though it's never happened, they can blend things together.

もし、真にそれを超えて、斬新な方法で一般化するだけでなく、物事を融合させることができるシステムを求めているのであれば、これらのモデルは、カニエ・ウエストのラップのスタイルでハリー・ポッターを演じたりすることができる。

But to do something that's truly creative, that there is not just a blending of existing things, that requires searching through a space of possibilities and finding these hidden gems that are sort of hidden away in there somewhere.

しかし、真に創造的なことをするためには、既存のものをただ混ぜ合わせるのではなく、可能性の空間を探して、どこかに隠れている隠れた宝石を見つける必要がある。

And that requires search.

そのためには検索が必要だ。

So, I don't think we'll see systems that truly step beyond their training data until we have powerful search in the process.

ですから、その過程で強力な検索ができるようになるまでは、学習データを真に超えるシステムは出てこないと思います。

So, in this part of the video, I do want to talk about some of the limitations of large language models because there are quite a few.

このビデオでは、大規模言語モデルの限界についてお話ししたいと思います。

So, one of the biggest things that you didn't know about llms is that, and we're going to get into the benefits of Q-learning and why Q-learning and how it compares to llms, and one of the biggest things is, of course, the data dependency.

これからQ-learningの利点や、なぜQ-learningなのか、そしてllmsと比較してどうなのかについて説明していきますが、最も大きなことの1つは、もちろんデータ依存性です。

So, traditional llms require massive amounts of data for training.

従来のllmsは、トレーニングのために膨大な量のデータを必要とする。

They learn from examples in this data, which means their knowledge and abilities are limited to what's present in the training set.

つまり、彼らの知識や能力はトレーニングセットに含まれるものに限定されるのです。

There was even a recent paper, I can't find it, if I do find it, I will leave a link in the description because it's going to be one entire article on the website.

もし見つけたら、ウェブサイトの記事全体になるので、説明文にリンクを残すつもりだ。

Essentially, in that paper, they talk about how large language models cannot generalize on training, cannot generalize on information that they haven't seen in their training data, which basically just means that these large language models are only as good as their training data.

基本的に、その論文では、大規模な言語モデルがいかに訓練で汎化できないか、訓練データで見たことのない情報で汎化できないかについて話しています。

And essentially, we've explored this concept before with Microsoft's 51.

基本的に、このコンセプトは以前マイクロソフトの51で検討したことがある。

Essentially, it was a very, very small model and it was able to do coding much better than some of the large language models, and it was trained on only specific coding stuff, and it was able to excel at that.

基本的に、51は非常に小さなモデルでしたが、大規模な言語モデルよりもはるかに優れたコーディングを行うことができました。

And basically, what this means is that if you don't have good data, your llm is going to do horrible.

基本的に、このことが意味するのは、良いデータがなければ、llmはひどい結果になるということです。

But if you have good data, it's going to do really good.

しかし、良いデータがあれば、とても良い結果が得られる。

But of course, that comes with some other limitations.

しかしもちろん、これにはいくつかの制限がある。

Okay, of course, we have static knowledge.

もちろん、静的知識もあります。

Okay, so static knowledge is once trained, LLMs have a fixed knowledge base.

静的知識とは、一度訓練されると、LLMは固定された知識ベースを持つということです。

So, they can't learn or update their knowledge after training, which means they can become outdated as the world changes.

つまり、LLMはトレーニング後に知識を学んだり更新したりすることができない。

So, of course, you can see here, knowledge cut off September 2023, and that means that currently it can't get any more data because we're now in November.

もちろん、ここで2023年9月に知識はカットオフされ、現在は11月なので、これ以上データを取得することはできないということです。

Not sure when you're watching this, but if OpenAI doesn't decide to update it, it means you're going to be stuck without that new update.

あなたがこれをいつ見ているかはわかりませんが、もしOpenAIが更新することを決めなければ、あなたは新しい更新がないと立ち往生することになります。

So, static knowledge isn't entirely great because, as you know, things change day by day, every second, every minute, the world is changing.

つまり、静的な知識は全く素晴らしいものではないのです。ご存知のように、物事は日々変化し、毎秒、毎分、世界は変化しています。

And if these AI algorithms are going to be really good, they need to be able to adapt rapidly to that changing world.

AIアルゴリズムが本当に優れたものになるには、変化する世界に迅速に適応できなければならない。

So, that's why traditional llms, this is, of course, a bottleneck, a limitation.

だから、従来のLMSは、もちろんボトルネックであり、限界なのです。

Then we have context understanding.

次に、コンテキストの理解だ。

While they're good at understanding and generating human-like text, they sometimes struggle with understanding the deeper context or intent behind the query, especially if it's complex or very specific.

人間のようなテキストを理解し、生成することは得意ですが、クエリの背後にある深い文脈や意図を理解することに苦労することがあります。

And that is something that happens when you're dealing with llms.

これは、LLMSを扱っているときに起こることです。

In addition, we do have bias and fairness, which is something that is really, really prevalent in AI.

さらに、AIには偏見と公平性があります。

And essentially, the problem is the bottleneck is the data.

基本的に、問題はデータがボトルネックになっていることです。

So, when you have data on an llm and you train it on that specific data set, it's going to be geared to that data set.

つまり、人工知能のデータがあり、それを特定のデータセットで訓練すると、そのデータセットに合わせたものになります。

So, for example, if you train it on data that only shows it a certain type of car, and every time it's seen that car, it thinks the car is orange because it's only ever seen that car in the orange color, it's going to be really hard to get that AI model to think of the car in any other color.

例えば、ある種類の車しか見せないデータで訓練した場合、その車を見るたびに、その車はオレンジ色だと思ってしまいます。

So, just think of that in that kind of bias.

つまり、そのようなバイアスがかかっていると考えてください。

So, the AI systems can have two kinds of biases: the coordinative biases and the lack of complete data.

つまり、AIシステムには2種類のバイアスがあるのです。協調的なバイアスと、完全なデータの欠如です。

So, if data isn't complete, it's not going to be representative.

つまり、データが完全でなければ、それは代表的なものにはなりません。

So, like I said, if you don't have all the colors of the car, then it's not going to be representative of all the colors that it could potentially have.

つまり、私が言ったように、車の色がすべて揃っていなければ、その車が持つ可能性のあるすべての色を代表することにはならないのです。

And of course, there are cognitive biases, which are things that could seep into the machine learning algorithms via the designers unknowingly introducing them to the model or our training data set, which introduces those biases.

もちろん、認知バイアスもあります。設計者が知らず知らずのうちにモデルやトレーニングデータセットにバイアスを導入することで、機械学習アルゴリズムにバイアスが入り込む可能性があります。

So, bias is something that is really hard, but that, you know, it's really a big problem, and it's something that people are trying to solve by making llms as unbiased as possible.

バイアスは本当に難しいものですが、本当に大きな問題で、人々はllmsをできるだけバイアスのないものにすることで解決しようとしています。

But it's not something that is easy to solve.

しかし、簡単に解決できることではありません。

Of course, we have the lack of adaptation, which I've already discussed.

もちろん、適応の欠如はすでに述べたとおりだ。

And of course, now we need to get into the pros of Q-learning or QAR, which could be GPT 5.

そしてもちろん、Q-learningやQAR（GPT5かもしれない）の長所にも触れる必要がある。

So, of course, we have dynamic learning.

もちろん、ダイナミックな学習があります。

So, Q-learning can continuously learn and adapt based on new data or interactions.

つまり、Q-learningは新しいデータや相互作用に基づいて継続的に学習し、適応することができる。

That means it can update its knowledge and strategies over time, staying more and more relevant, which is, of course, what we talked about before, something that we're going to need to do.

つまり、時間の経過とともに知識や戦略を更新し、より適切な状態を保つことができるのです。

Then, of course, we have the optimization of decisions.

そしてもちろん、意思決定の最適化もある。

Learning is always about finding the best decisions to achieve a goal, which can lead to more effective and efficient decision-making processes in various applications.

学習とは常に、目標を達成するための最良の決断を見つけることであり、それは様々な用途において、より効果的で効率的な意思決定プロセスにつながる。

And with Q-learning, that's clearly what it's going to be able to do over time.

Q-learningを使えば、それが長期にわたって可能になることは明らかだ。

Then, of course, this is the main thing about Q-learning, is the fact that we have specific goal achievement.

それからもちろん、これがQ-learningの最大の特徴なのですが、具体的な目標達成があるということです。

And Q-learning models are goal-oriented, making them suitable for tasks where a clear objective needs to be achieved, unlike the general purpose of traditional llms.

そしてQ-learning・モデルはゴール志向であるため、従来の一般的なllmsとは異なり、明確な目的を達成する必要があるタスクに適している。

So, essentially, the reason that this is going to be really good is because when you apply this to other things that require goals, so for example, maybe we could apply it to self-driving, maybe we could apply it to AI agents on computers that are actually going to be able to have a complete end goal.

つまり、本質的に、これが本当に良いものになる理由は、目標を必要とする他のものにこれを適用した場合、例えば、自動運転に適用できるかもしれないし、コンピュータ上のAIエージェントに適用できるかもしれない。

Maybe the end goal is going to be a video, maybe the end goal is going to be an entire article, maybe the end goal is going to be building an entire business.

最終目標は動画かもしれないし、記事全体かもしれないし、ビジネス全体を構築することかもしれない。

That's where we have the specific goal achievement.

そこで具体的な目標が達成される。

That's where you get that next leap up in ability in AI systems, which is why this could be really, really next level.

これがAIシステムの能力を飛躍的に向上させる理由であり、これは本当に、本当に次のレベルになる可能性があります。

Then, of course, we have something about the systems in which companies are already on this.

そしてもちろん、企業がすでに取り組んでいるシステムについてもご紹介します。

So, on the 28th of June 2023, Mr. Sais says that the company is working on a system called Gemini.

2023年6月28日、サイス氏は、同社がジェミニと呼ばれるシステムに取り組んでいると述べている。

If you haven't heard of Gemini before, it's Google's next huge large language model.

Geminiを聞いたことがない人は、Googleの次の巨大言語モデルだ。

A company that is going, SL predicted to be beating GPT-4 across all benchmarks, and it's going to be using a method called treesearch, which is going to be able to explore and remember possible scenarios, which is quite similar to Q-learning.

あらゆるベンチマークでGPT-4を打ち負かすとSLが予測している会社で、ツリーサーチと呼ばれる手法を使っている。

So, they're moving away from the standard methods, and they're now trying to think about advanced techniques where they can essentially explore and remember multiple different things.

つまり、彼らは標準的な手法から離れ、本質的に複数の異なる事柄を探索し、記憶することができる高度なテクニックを考えようとしているのだ。

Now, if you find that a bit confusing, you should take a look at AlphaGo and how it's going to impact the future of AI because AlphaGo was essentially something that researchers thought that they couldn't predict.

アルファ碁と、それがAIの未来にどのような影響を与えようとしているのかを見ていただきたい。

Because essentially with AI, what the problem was was that the moves on AlphaGo were essentially uncomputable.

というのも、本質的にAIで問題だったのは、AlphaGoの手は本質的に計算不可能だということだった。

They couldn't just remember every single move.

すべての手を記憶することはできなかった。

They had to get the system to think.

システムに考えさせる必要があった。

And essentially, the problem was that there are more AlphaGo moves than there are, I think, atoms in the universe or grains of sand on the beach.

そして本質的な問題は、アルファ碁の手の数が、宇宙の原子や浜辺の砂粒の数よりも多いということだった。

It's something absolutely crazy when you look at the statistics.

統計データを見ると、まったくクレイジーなことです。

So, this was something that researchers thought that they were never going to solve.

だから研究者たちは、これは絶対に解けないと考えていた。

But of course, the AI managed to solve it.

しかしもちろん、AIはそれを解決することに成功した。

So, I would say take a look at the quick trailer, which I'm going to show you guys.

これからお見せする予告編をご覧ください。

I don't want to spoil it for you.

ネタバレはしたくない。

It is honestly riveting to see this kind of content.

このような内容を見るのは正直、ぞっとする。

But it was something that happened a while back that people do forget if they aren't particularly plugged into the AI space.

しかし、それは少し前に起こったことで、特にAI分野に精通していなければ、人々は忘れてしまう。

Standing challenge of artificial intelligence.

人工知能への挑戦。

Everything we've ever tried in AI just falls over when you try the game of Go.

囲碁のゲームに挑戦すると、これまでAIで試みられてきたことのすべてがひっくり返る。

The number of possible configurations of the board is more than the number of atoms in the universe.

碁盤の可能な配置の数は、宇宙に存在する原子の数よりも多いのだ。

AlphaGo found a way to learn how to play Go.

アルファ碁は囲碁の打ち方を学ぶ方法を見つけた。

So far, AlphaGo has beaten every challenge we've given it.

これまでのところ、アルファ碁は私たちが与えたすべての挑戦を破ってきた。

But we won't know its true strength until we play somebody who is at the top of the world.

しかし、世界の頂点に立つ誰かと対戦するまで、その真の強さはわからない。

Likely, so.

そうですね。

Then, of course, we had the significance of move 37, which was, I think, one in a 10,000 move that nobody expected from an AI, where it seemed to exhibit a bit of creativity.

37手目は、AIがちょっとした創造性を発揮した、1万手に1手のような、誰も予想していなかった手だったと思います。

And many people weren't expecting this.

そして、多くの人がこれを予想していませんでした。

So, there's also multiple videos where they talk about move 37, which was something that, of course, we didn't expect.

だから、37手目について話しているビデオも複数ある。

And of course, this brings us to this point.

そしてもちろん、これは私たちにこの点をもたらしている。

Google delays the release of Gemini AI2 Q1 of 2024.

GoogleはGemini AI2のリリースを2024年の第1四半期に延期する。

So, this might be in response to the fact that it might be harder than they think.

これは、彼らが考えているよりも難しいという事実に対応するためかもしれない。

Maybe they're changing their angle.

もしかしたら、彼らは角度を変えているのかもしれない。

Maybe they just want to perfect it.

もしかしたら、完成度を高めたいだけなのかもしれない。

Currently, we don't know what the reason is for them delaying this model.

現在のところ、彼らがこのモデルを遅らせる理由が何なのかはわからない。

But what we do know is that Gemini is going to be currently delayed.

しかし、わかっているのは、ジェミニが現在遅れているということだ。

And if this model does come out and it does possess these capabilities, it will be interesting to see how it compares to GPT-4 and if it's going to be similar to Q-learning or how different it's going to be.

そして、もしこのモデルが登場し、これらの能力を持っているとしたら、GPT-4と比較してどうなのか、Q-learningと同じようなものになるのか、それともどれほど違うものになるのか、興味深いところだ。

Of course, we have one of the main questions, and that is, will it be in GPT-5?

もちろん、GPT-5に搭載されるかどうかという大きな疑問もある。

Many sources have already shown us that Sam Altman has already started training the next level in llms or AI systems.

多くの情報筋が、サム・アルトマンがすでに次のレベルのllmsやAIシステムのトレーニングを始めていることを示している。

So, will GPT-5 contain this Q star, or is it just going to be something that is in future models like GPT-6?

では、GPT-5にはこのQスターが搭載されるのか、それともGPT-6のような将来のモデルに搭載されるものなのか。

Either way, it's going to be interesting to see how this entire thing pans out.

いずれにせよ、この全体像がどうなるかは興味深い。

And if this video did help you understand, don't forget to leave a like, subscribe, all that good stuff.

このビデオが理解の助けになったのなら、「いいね！」や「購読」などのコメントを残すのをお忘れなく。

Check out the full article in the comment section below.

下のコメント欄で記事全文をチェックする。

この記事が気に入ったらサポートをしてみませんか？