LiteLlama【オープンソースLLM Llama2軽量版】

えんぞう

2024年3月10日 22:14

はじめに

先日、遅ればせながら無料で使えるChatGPT相当のオープンソースLlama2を使ってみました。

ふつうのPCでLLMが動くことに感動していましたが、さらに軽量なモデルがありました。

その名も「LiteLlama」です。

Meta AI のLLaMa 2のオープンソース複製を紹介します。ただし、モデルサイズが大幅に縮小されたLiteLlama-460M-1Tには、 1T トークンでトレーニングされた 4 億 6,000 万個のパラメーターがあります。

パラメータ数はTinyLlamaより小さく、なんと0.46Bです。

かなり小さいですねー。こんなので動くのでしょうか？

このモデルは、 Xia "Ben" Hu教授の監督の下、DATA Lab のテキサス A&M 大学のXiaotian Han氏によって開発され、MIT ライセンスの下でリリースされています。

さっそく試してみました。

インストール、実行方法

使い方は簡単です。pipでtransformersをインストールします。

pip install transformers --upgrade

コード

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = 'ahxt/LiteLlama-460M-1T'

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.eval()

prompt = 'What is the future of AI?'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
tokens = model.generate(input_ids, max_length=300)
print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )

実行結果

What is the future of AI?

\section{Introduction}

The rise of AI has been a major driver of the development of the next generation of technologies. The rise of AI has been accompanied by a proliferation of AI-related publications, which have been published in peer-reviewed journals and conferences. The number of AI-related publications has increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years. The number of AI-related publications has also increased significantly in the last few years.

動きました！　ちゃんと質問に対する回答になっています。

翻訳

AIの未来はどうなるのか？

\section{はじめに}

AI の台頭は、次世代テクノロジーの開発の主要な推進力となっています。 AI の台頭は、AI 関連の出版物の急増を伴い、査読付きのジャーナルや会議で発表されています。 AI 関連の出版物の数はここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。 AI 関連の出版物の数もここ数年で大幅に増加しました。

同じ文章を繰り返してしまっていますが、回答としては悪くなさそうです。

負荷状況

top - 22:56:30 up  1:37,  3 users,  load average: 0.85, 0.34, 0.24
Tasks: 238 total,   2 running, 236 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.0 us,  0.7 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 97.0 us,  3.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 99.7 us,  0.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16154844 total, 10473892 free,  3698700 used,  1982252 buff/cache
KiB Swap:  8191996 total,  8191996 free,        0 used. 12151056 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5981 root      20   0 5387800   2.1g 114112 R 200.0 13.9   0:16.10 python
  293 root      20   0   52372   1944   1380 S   1.3  0.0   1:42.50 plymouthd

実メモリ使用量は2GBだけでした。

「Lite」だけあってかなり軽量です。

回答までの時間も1m30sと、Llama2より回答が速いです。

軽量なだけあって処理も速そうですね。

まとめ

Llama2の軽量版LiteLlamaを試してみました。

Llama2のCPU版でも普通のスペックのPCで動きましたが、今回はさらに少ないリソースで動作しています。

回答結果はLlama2に比べると残念な結果ですが、これだけ小さいリソースで動くのであれば、いろいろ使い道はありそうです。

LiteLlamaのファインチューニング方法は見当たりませんでしたが、これだけ軽量であればファインチューニングも手軽にできると領域特化型LLMも自前で簡単に作れそうなので楽しみです。

この記事が気に入ったらサポートをしてみませんか？