rakutenモデルが僕のマシンで動かなかった→動いた

2024年3月21日 23:38

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from torch.quantization import quantize_dynamic

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = AutoModelForCausalLM.from_pretrained("Rakuten/RakutenAI-7B-chat", torch_dtype=torch.float32) 
tokenizer = AutoTokenizer.from_pretrained("Rakuten/RakutenAI-7B-chat")

# モデルを量子化します
quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint4)
quantized_model.to(device)
quantized_model.eval()

requests = [
    "「馬が合う」はどう言う意味ですか",
    "How to make an authentic Spanish Omelette?",
]

system_message = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {user_input} ASSISTANT:"

for req in requests:
    input_req = system_message.format(user_input=req)
    input_ids = tokenizer.encode(input_req, return_tensors="pt").to(device=device)
    
    with torch.no_grad():
        tokens = quantized_model.generate(
            input_ids,
            max_new_tokens=1024,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    out = tokenizer.decode(tokens[0][len(input_ids[0]):], skip_special_tokens=True)
    
    print("USER:\n" + req)
    print("ASSISTANT:\n" + out)
    print()
    print()

rakuten-7B-chat

を動かしてみようと思いましたが
メモリ食いすぎてKilledになっちゃいました

量子化モデルかGGML版が
公開されると
いいな

2024/03/29追加
このコードで動いた

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "Rakuten/RakutenAI-7B-chat"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto")
model.eval()

requests = [
    "「馬が合う」はどう言う意味ですか",
    "How to make an authentic Spanish Omelette?",
]

system_message = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {user_input} ASSISTANT:"

for req in requests:
    input_req = system_message.format(user_input=req)
    input_ids = tokenizer.encode(input_req, return_tensors="pt").to(device=model.device)
    tokens = model.generate(
        input_ids,
        max_new_tokens=1024,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
    out = tokenizer.decode(tokens[0][len(input_ids[0]):], skip_special_tokens=True)
    print("USER:\n" + req)
    print("ASSISTANT:\n" + out)
    print()
    print()

USER:
「馬が合う」はどう言う意味ですか
ASSISTANT:
「馬が合う」は、人柄や気が合い、よくいったり、心が通じるを意味する言葉です。
言葉の字は「相合う」と書くべきところを「合う」と単純化して言います。
「二人が馬が合う」と言う時は、二人が気が合い、お互いに心を開いて話すことができる様子を表します。

USER:
How to make an authentic Spanish Omelette?
ASSISTANT:
To make an authentic Spanish Omelette, follow these steps:

1. Heat a pan on medium heat and add a small amount of olive oil
2. Beat 4 eggs in a bowl together with a couple of tablespoons of chopped parsley and a pinch of salt until well mixed
3. Add sliced potatoes (ideally left over from the previous day) into the pan and fry them until slightly brown
4. Add onion and mix well with the potatoes
5. Pour the egg mixture over the potatoes and onion
6. When the underside of the omelette is cooked, place a large flat plate on top of the pan and invert the omelette onto the plate
7. Slide the omelette back into the pan with the uncooked side facing down, and cook until the egg is set
8. Serve hot with a pinch of salt and a couple of tablespoons of mayo

リクエスト2つありますね

この記事が気に入ったらサポートをしてみませんか？