Google Colab で Llama-2-70B-chat-GPTQ を試す。
「Google Colab」で「Llama-2-70B-chat-GPTQ」を試したので、まとめました。
1. Llama-2-70B-chat-GPTQ
「TheBloke/Llama-2-70B-chat-GPTQ」を利用します。2023年9月12日現在、70Bは「Llama 2」の最大パラメータモデルになります。
2. Colabでの実行
Colabでの実行手順は、次のとおりです。
(1) Colabのノートブックを開き、メニュー「編集 → ノートブックの設定」で「GPU」の「A100」を選択。
(2) パッケージのインストール。
GPTQを利用するため、「auto-gptq 」もインストールしています。
# パッケージのインストール
!pip install transformers>=4.32.0 optimum>=1.12.0
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
(3) トークナイザーとモデルの準備。
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# トークナイザーとモデルの準備
tokenizer = AutoTokenizer.from_pretrained(
"TheBloke/Llama-2-70B-Chat-GPTQ",
use_fast=True
)
model = AutoModelForCausalLM.from_pretrained(
"TheBloke/Llama-2-70B-Chat-GPTQ",
torch_dtype=torch.float16,
device_map="auto",
revision="main"
)
(4) 質問応答。
後で処理速度を確認するため、start_timeとend_timeを保持しています。
import time
# プロンプトの準備
prompt = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What is Bocchi-chan's personality from BOCCHI THE ROCK?[/INST]"""
# 推論の実行
with torch.no_grad():
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
start_time = time.time()
output_ids = model.generate(
token_ids.to(model.device),
max_new_tokens=256,
)
end_time = time.time()
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
print(output)
Bocchi-chan, the main character of the anime and manga series "Bocchi the Rock," is a unique and interesting character with a distinct personality. Here are some adjectives that might describe her personality:
1. Naive: Bocchi-chan is a bit naive and innocent, often misunderstanding social cues and situations.
2. Optimistic: Despite her naivety, Bocchi-chan is a very optimistic and positive person, always looking on the bright side of things.
3. Determined: Bocchi-chan is a hard worker and is determined to achieve her goals, even if she doesn't always go about it the right way.
4. Socially awkward: Bocchi-chan has difficulty interacting with others, often due to her lack of understanding of social norms and customs.
5. Talented: Bocchi-chan is a very talented musician, able to play the guitar and sing with ease.
6. Shy: Despite her optimism and determination, Bocchi-chan can be quite shy and introverted, especially when it comes to interacting with people
回答は正解です。
(5) 処理速度の確認。
処理速度は9.35トークン/秒です。
# 処理速度の計測
response_len=len(output_ids.tolist()[0][token_ids.size(1) :])
total_time = (end_time-start_time)
print(format(total_time*1000/response_len, ".2f"), "ms per token")
print(format(response_len/total_time, ".2f"), "tokens per second")
106.96 ms per token
9.35 tokens per second
この記事が気に入ったらサポートをしてみませんか?