

65兆パラメータの「LIMA: Less Is More for Alignment」という画期的なモデルの解説です。このモデルは、無監督の事前トレーニングと教師あり損失アプローチによる微調整を通じて、特定の応答形式を理解し遵守する驚異的なパフォーマンスを実現しています。

Hey, what is up guys?


Welcome back to another YouTube video at the WorldofAI.


In today's video, I'm going to be showcasing Meta AI's new project, which is called Lima, and it stands for Less is More for Alignment.

今日のビデオでは、Meta AIの新プロジェクトであるリマを紹介します。それは「Less is More for Alignment」の略で、アラインメントにとって少なければ少ないほど良いという意味です。

Now, this is quite groundbreaking, and this is something that I'm going to be showcasing as it's quite innovative in the term of how they're able to train its language model.


Now, what they've done is that they presented a detailed analysis of language model with Lima, and this is specifically focusing on large-scale models.


These models typically undergo two stages of training.


Firstly, is the unsupervised pre-training, and secondly is the fine-tuning with reinforcement learning to better align them with specific tasks and user preferences.


Now, what the authors have done for this actual paper is to aim to determine the relative importance of these two stages by training a 65 billion parameter language model, which is called Lima.


And this is something that is going to be released very shortly, and this is through Meta AI.

これは、Meta AIを通じて、まもなくリリースされる予定です。

Now, Lima is actually fine-tuned for only using a thousand carefully selected prompts and responses, and this is something that we haven't seen, as it's something that has been trained without any reinforcement learning or human preferences in terms of its modeling.


Now, if you compare this to other types of models, you aren't able to see this type of approach in terms of its training sets, and this is why I really wanted to showcase this project, as it's quite remarkable in the terms of how it was able to actually innovate its model and its data set.


And this is something that we're going to be taking a look in today's video as we're going to go over certain things about what Lima is, as well as how it's able to achieve this.


So with that thought, guys, before we actually get into the video, I just want to put some emphasis on my donor page.


I just want to say thank you guys so much from the bottom of my heart.


I really, really, really appreciate it and like for the support you guys have been giving me and the love.


It really means so much to me, and I promise you that I'm gonna continuously work hard to make sure that you guys are able to get the best content and the best value.


So I really, really appreciate it from the bottom of my heart.


Um, I promise you guys I'm gonna keep working my hardest to make sure that you guys are able to benefit from this channel.


Now, if you guys haven't followed this Twitter page, please do so as I'm going to be posting the latest content over here so that you can get the latest news on the AI world.


And if you guys aren't subscribed, please do so.


Like this video as it'll definitely help the algorithm out, and if you guys haven't seen my previous videos, I would highly recommend that you do so as there's a lot of content that you'll definitely benefit from.


And with that thought, let's get right into the video.


So, guys, as I talked about, Lima was fine-tuned to only use a thousand carefully selected prompts.


Now, the model is not actually provided with explicit instruction but is trained with the standard supervised loss.


Now, surprisingly, Lima demonstrates strong performances, showcasing its ability to understand and follow specific response formats using only a handful of examples from its training data, which is quite remarkable as it only has a thousand specifically selected prompts and responses, and it's able to do such a wide variety of different things with its data set.


Now, the training prompts cover a wide range of different tasks, including planning trip itineraries and speculating about alternative history.


These are some of the examples of what you can actually do with this training dataset, and it just basically demonstrates that the model is quite versatile in terms of its, uh, like responses as well as how it's able to depict different types of generation.


And it basically quickly shows that it exhibits good generalization in terms of its capabilities and performs well on unseen tasks that are not present in the actual training data.


Now, what they've actually done to evaluate Lima's performance is by taking a controlled human study that was actually conducted.


And what the study had done is that it showed that the responses that were generated from Lima, when compared to GPT-4, Bard, and Da Vinci, showed that the actual results of Lima were quite, like in some cases, they were actually equivalent, but in some other cases, they were actually preferred over GPT-4.


And actually, the comparison with GPT-4 got 43% of its cases to be preferred over.


Now, when it was compared to Bard, this actual percentage was increased to 58%, and this percentage basically refers to the preference over Bard when comparing it to Lima.


And it actually received a 65% increase when it was compared to Da Vinci.


And it basically shows that it's able to be on par with these models, but in certain cases, it's actually even preferred over these models.


Now, based on these findings, what the authors of this actual project were able to find and suggest is that on the vast majority of the knowledge and language models it was able to acquire during the unsupervised pre-training stage, what it found was that the fine-tuning with the limited amount of instruction data is actually sufficient to teach the model to produce high-quality outputs.


And it shows that it's able to work on unseen tasks that are not actually present in the training data, which highlights the importance of pre-training and enabling the model to learn general purposes, which represents and performs well across various different tasks.


Now, this is quite remarkable as what they've been able to do, and I just want to say it's a huge props to Meta as well as the researchers of this project for what they've been able to do and accomplish.


I want to take a look at this actual table over here.


As in this paper, it provides a breakdown of the sources of the training prompts.


We see these are some of the inputs of the training prompts that were used to create the datasets, as well as the test prompts that were used in the study.


Now, the total training data consists of approximately 75 or 750k tokens, and it was just distributed across a thousand sequences.


Now, what the table actually provides is a summary of the data that was used to train the language model Lima, and it also includes where it was specifically exported and split in terms of its tokenization and where it was distributed across the thousand sequences.


Now, the authors actually described that collection process from three different community question and answer websites.


We can see that in the actual table over here.


It focused on Stack Exchange, WikiHow, as well as PushShift, which utilized Reddit datasets.

Stack Exchange、WikiHow、そしてRedditのデータセットを利用したPushShiftに焦点をあてています。

Now, I'm going to be explaining a little bit more of what these different types of data collection sites are.


Now, the data collected from Stack Exchange, as well as WikiHow, was found to be more well-aligned with the actual desired behavior of a helpful AI agent.

Stack ExchangeとWikiHowから収集されたデータは、役に立つAIエージェントが実際に望む行動と、よりよく一致していることがわかりました。

Now, these websites typically provide informative and helpful answers to user questions.


Now, as a result of this, what the researchers were actually able to do is that they were able to mine the data automatically from these sources, meaning that they are able to extract the prompts and responses without much manual intervention.


Now, on the other hand, you have Pushshift Reddit dataset, and it contains highly uploaded answers from Reddit, which are often characterized for its humor, obviously, or if it's something like that is Uprising in the Reddit threads, you're going to see a lot of upwards for it.

一方で、Pushshift Redditデータセットがあります。これはRedditからの高くアップロードされた回答を含んでおり、それはそのユーモラスさ、明らかに、またはRedditのスレッドで上昇しているような何かがあれば、それに対して多くの上向きのものを見るでしょう。

Now, these types of results/responses do not actually align well with the desired behavior of a helpful AI bot.


So, as a result, what they've done is that they curated their appropriate responses from the dataset, which required a more manual approach in selection to be added to its dataset.


And this is one of the things that they talked about in terms of its Community question answering, as to show you how they're able to collect their dataset.


Let's focus on the next step and how they were able to actually train Lima.


Now, to train Lima, what the researchers were able to do is they were able to specifically follow a protocol, and they began with LLaMA 65 billion model.

Limaを訓練するために、研究者たちはあるプロトコルに沿って、LLaMA 650億モデルから始めました。

And what they've done is that they performed a fine-tuning using their alignment training set, which only consisted of a thousand examples.


Now, in order to distinguish between the different speakers, such as a user as well as an AI assistant, what they were able to do is they created a special token called the end-to-turn token.


And this token was placed at the end of each utterance in the training data.


Now, while serving a similar purpose as an end-of-sequence type of token, which used an indication of an end of a text generation, what the EOT token was actually able to do is it was specifically able to introduce to avoid any confusion or overlap when the existing meaning of EOS token was actually used in the pre-training model.


Now, by introducing this new EOT token, what the researchers were able to ensure is that Lima could differentiate between the user as well as the assistant or the utterance during the training.


And what this was able to do is that it facilitated more of an alignment and learning process when I was trying to train Lima.


And obviously, this was actually able to allow the actual model to understand and respond appropriately to different types of responses and prompts effectively and efficiently.


In this figure of the paper, it presents the results of human preferences in terms of its evaluation and it compared the performance of Lima with five different baselines.


Now, this evaluation was conducted with 300 test prompts, which you can see over here.


And the purpose of this evaluation was to assess how well Lima performed in comparison to these different types of baselines, such as GPT-4, Claude, and etc.


Now, the participants in the evaluation were presented with a test prompt as well as a response generated by Lima and the baselines.


Now, they were actually asked to indicate their preferences for a certain response and they were able to compare what was able to perform and generate a better response.


And the figure is showing that it provides a visual representation of the evaluation's responses, which showcases the percentage inc of cases in which the responses from Lima were equivalent or actually preferred over these different types of baselines.


Now, the observation from the actual study indicates that despite training significantly more data, which is actually 52 times more data, the Alpaca 65 billion parameter model tends to generate less preferable outputs compared to Lima.

さて、実際の研究からの観察によると、実際には52倍という非常に多くのデータを学習したにもかかわらず、Alpaca 650億パラメータモデルは、Limaと比較してより好ましくない出力を生成する傾向があることがわかります。

And similarly, DaVinci, which is trained more on a superior reinforcement learning from Human feedback method, has also been able to perform or produce less of a preferable output from Lima.


And this is quite remarkable, as Limos has been able to give you better responses compared to these amazing different types of models.


Now, obviously, in contrast, you can try to put it apart apart with GPT-4 as well as Bard, but it's also able to hit certain alignments as well as certain preferable outputs in comparison with these different models, which shows that it's slowly but surely getting to the same part with these different big models.


Now, in the paper, the authors conducted different experiments to explore the impact of data diversity, quality, and quantity on alignment processes.


They investigated the question of why less is more in terms of training language models.


Now, through these experiments, the researchers observed that when it comes to alignment, scaling up the diversity of training data has a significant effect.


They find that increasing the diversity of training prompts, rather than simply increasing the quantity of data, plays a crucial role in terms of improving alignment processing.


Furthermore, they also examined the effects of data quality in the alignment process.


Now, while they do not provide specific details on their findings, they were able to see that the data of higher quality tends to be better in terms of its alignment results.


And this is something that you can see in the research paper and get a better idea of later on if you want to check this out.


Overall, I just wanted to showcase this amazing project as it's something amazing and innovative as to what Meta has been able to accomplish in terms of training its model.


And the analysis presented in this paper emphasizes the effectiveness of unsupervised pre-training and language models, and that's quite innovative in the way it is training different types of models.


We're definitely going to see more out of this later on in the video or another video in the future.


And I'm definitely going to be showcasing more different things about Lima in the future as it tends to evolve and innovate its different models.


And with that thought, guys, thank you so much for watching.


I hope you found this video quite informative.


I'm definitely going to be posting more videos and going over different types of research papers so we can get a better idea of these different models.


So with that thought, guys, make sure you give this Twitter account a follow so you stay updated.


And if you guys haven't subscribed, please do so, as you will definitely benefit from it.


And if you guys haven't seen any of my previous videos, definitely do so.


Like this video, and I'll definitely see you guys next time.


Have an amazing day, spread positivity.


I'll catch you guys later.


Peace out, fellas.

