Llama2をwindows環境で動かしてみた

2023年9月7日 16:56

この記事は個人的な備忘録。読んでも意味不明だと思われる。ほかの人の記事を読んでもらえるとよいかと。

苦労した点

pythonのモジュールエラーが頻発して苦労した。モジュールエラーは原因特定が難しいので、ネットに頼るくらいしかできない。

現時点で注意が必要な点

llmモデルについて、ggml形式は最新のLlamaCppが読み込めなくなっている。gguf形式に変換して読み込ませる必要がある。

Code

import os
import logging
import sys
import fitz
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores.faiss import FAISS
from langchain.llms import LlamaCpp
from langchain.chains import ConversationalRetrievalChain

os.chdir(os.path.dirname(os.path.abspath(__file__)))

doc = fitz.open("./a.pdf")

text = ""
for page in doc:
    text += page.get_text().encode("utf8").decode()

print(text)

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=20,
)
texts = text_splitter.split_text(text)

print(len(texts))
for text in texts:
    print(text[:10].replace("\n", "\\n"), ":", len(text))

index = FAISS.from_texts(
    texts=texts,
    embedding=HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-large"),
)
index.save_local("storage")

from langchain.llms import LlamaCpp

llm = LlamaCpp(
    model_path=r"./llama-2-7b-chat.q8_0.gguf",
    n_ctx=4096,
    temperature=0,
    max_tokens=640,
    verbose=True,
    streaming=True
)

qa = ConversationalRetrievalChain.from_llm(llm, chain_type="stuff", retriever=index.as_retriever(search_kwargs={"k": 4}))

chat_history = []
print("Welcome to the State of the Union chatbot! Type 'exit' to stop.")
while True:
    query = input("Please enter your question: ")

    if query.lower() == 'exit':
        break
    result = qa({"question": query, "chat_history": chat_history})

    print("Answer:", result['answer'])

いろんなサイト情報をかき集めて書いたコード。
pdfから集めた情報から学習して答えてくれるAIみたいです。
cpuで動いているのでとにかく時間がかかる。
次は高速化を図りたい。

この記事が気に入ったらサポートをしてみませんか？