Google Colab で Llama-2-70B-chat-GPTQ を試す。

npaka

2023年9月12日 14:38

「Google Colab」で「Llama-2-70B-chat-GPTQ」を試したので、まとめました。

【注意】Google Colab Pro/Pro+のA100で動作確認しています。

1. Llama-2-70B-chat-GPTQ

「TheBloke/Llama-2-70B-chat-GPTQ」を利用します。2023年9月12日現在、70Bは「Llama 2」の最大パラメータモデルになります。

2. Colabでの実行

Colabでの実行手順は、次のとおりです。

(1) Colabのノートブックを開き、メニュー「編集 → ノートブックの設定」で「GPU」の「A100」を選択。

(2) パッケージのインストール。
GPTQを利用するため、「auto-gptq 」もインストールしています。

# パッケージのインストール
!pip install transformers>=4.32.0 optimum>=1.12.0
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7

(3) トークナイザーとモデルの準備。

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# トークナイザーとモデルの準備
tokenizer = AutoTokenizer.from_pretrained(
    "TheBloke/Llama-2-70B-Chat-GPTQ",
    use_fast=True
)
model = AutoModelForCausalLM.from_pretrained(
    "TheBloke/Llama-2-70B-Chat-GPTQ",
    torch_dtype=torch.float16,
    device_map="auto",
    revision="main"
)

(4) 質問応答。
後で処理速度を確認するため、start_timeとend_timeを保持しています。

import time

# プロンプトの準備
prompt = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What is Bocchi-chan's personality from BOCCHI THE ROCK?[/INST]"""

# 推論の実行
with torch.no_grad():
    token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
    start_time = time.time()
    output_ids = model.generate(
        token_ids.to(model.device),
        max_new_tokens=256,
    )
    end_time = time.time()
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
print(output)

Bocchi-chan, the main character of the anime and manga series "Bocchi the Rock," is a unique and interesting character with a distinct personality. Here are some adjectives that might describe her personality:

1. Naive: Bocchi-chan is a bit naive and innocent, often misunderstanding social cues and situations.
2. Optimistic: Despite her naivety, Bocchi-chan is a very optimistic and positive person, always looking on the bright side of things.
3. Determined: Bocchi-chan is a hard worker and is determined to achieve her goals, even if she doesn't always go about it the right way.
4. Socially awkward: Bocchi-chan has difficulty interacting with others, often due to her lack of understanding of social norms and customs.
5. Talented: Bocchi-chan is a very talented musician, able to play the guitar and sing with ease.
6. Shy: Despite her optimism and determination, Bocchi-chan can be quite shy and introverted, especially when it comes to interacting with people

【翻訳】
[INST] <<SYS>> あなたは親切で、礼儀正しく、誠実なアシスタントです。常に安全を保ちながら、できるだけ役立つように答えてください。回答には、有害、非倫理的、人種差別的、性差別的、有毒、危険、または違法なコンテンツを含めてはいけません。回答は社会的に偏見がなく、本質的に前向きなものであることを確認してください。質問が意味をなさない場合、または事実に一貫性がない場合は、正しくないことに答えるのではなく、その理由を説明してください。質問の答えがわからない場合は、誤った情報を共有しないでください。 <</SYS>>
「ぼっち・ざ・ろっく！」のぼっちちゃんの性格は？[/INST]

アニメ・漫画シリーズ「ぼっち・ざ・ろっく！」の主人公「ぼっちちゃん」は、独特の個性を持ったユニークで面白いキャラクターです。彼女の性格を表すかもしれない形容詞をいくつか挙げます。

1. 世間知らず: ぼっちちゃんは少し世間知らずで無邪気で、社会的な合図や状況をよく誤解します。
2. 楽観的: ぼっちちゃんは、世間知らずな性格にもかかわらず、非常に楽観的で前向きな人で、常に物事の明るい面を考えています。
3. 決意: ぼっちちゃんは努力家で、たとえ常に正しい方法で進むわけではなくても、自分の目標を達成することを決意しています。
4. 社交的に不器用：ぼっちちゃんは、社会規範や習慣を理解していないことが多く、他人と交流するのが苦手です。
5. 才能がある: ぼっちちゃんは非常に才能のあるミュージシャンで、ギターを弾いて簡単に歌うことができます。
6. 内気：ぼっちちゃんは、楽観主義で決意が強いにもかかわらず、特に人と関わるとなると、非常に内気で内向的になることがあります。

回答は正解です。

(5) 処理速度の確認。
処理速度は9.35トークン/秒です。

# 処理速度の計測
response_len=len(output_ids.tolist()[0][token_ids.size(1) :])
total_time = (end_time-start_time)
print(format(total_time*1000/response_len, ".2f"), "ms per token")
print(format(response_len/total_time, ".2f"), "tokens per second")

106.96 ms per token
9.35 tokens per second

この記事が気に入ったらサポートをしてみませんか？