ローカルLLMでGraphRAGを実装して「クリスマスキャロル」を分析してみた

Maki@Sunwood.ai.labs

2024年7月14日 23:28

「クリスマスキャロル」の真のテーマは何だったのか？ローカル環境で構築した最強の質問応答AIで、その謎に迫ります！

最近、ChatGPTをはじめとする大規模言語モデル(LLM)が注目を集めていますが、LLMは大量のテキストデータから学習するため、特定の分野の質問に対しては、的確な回答を返すことが難しい場合があります。

そこで今回は、GraphRAGと呼ばれる技術を使って、LLMが「クリスマスキャロル」の内容をより深く理解し、高度な質問に答えられるようにする方法を紹介します。そして、実際にGraphRAGを使って「クリスマスキャロル」を分析し、その結果を考察します。

実験の時間短縮のため「クリスマスキャロル」のデータセットを大幅に減らして実験を行っています。

GraphRAGとは？

GraphRAGは、Microsoft Researchが開発した、LLMの知識ベースを強化するためのオープンソースのパイプラインです。

複雑な文章や専門的な知識を必要とする質問に対して、LLMがより正確で詳細な回答を生成できるようサポートします。

具体的には、GraphRAGは、テキストデータから知識グラフと呼ばれる構造化データを自動的に作成し、LLMと連携させることで、LLMがより高度な推論を可能にします。

例：

従来のLLM: 「クリスマスキャロルのテーマは？」→「家族愛、慈愛、 forgiveness...」
GraphRAG適用後: 「クリスマスキャロルのテーマは？」→「主人公スクルージの贖罪と社会的責任の重要性を描いた物語であり、読者に対し、真の豊かさとは何かを問いかける作品です。」

このように、GraphRAGを使用することで、LLMはより深く文章を理解し、人間のように文脈を踏まえた回答を生成できるようになります。

https://hamaruki.com/graphrag-beginners-guide/

https://hamaruki.com/building-global-search-graphrag-ai-data-analysis/

GraphRAGをローカル環境に構築する

今回は、GraphRAGをローカル環境で動かすために、Dockerとdocker-composeを利用します。

必要なもの

Docker: コンテナ化技術。アプリケーションの実行環境をパッケージ化し、ポータブルにすることができます。
docker-compose: 複数のDockerコンテナを定義・実行するためのツール。複数のコンテナを連携して動作させるアプリケーションを簡単に構築できます。
Python: プログラミング言語。GraphRAGはPythonで実装されています。
Git: バージョン管理システム。GraphRAGのソースコードを取得するために使用します。
llama.cppで動かすモデル（今回は「models\Llama-3-ELYZA-JP-8B-q4_k_m.gguf」と「ggml-e5-mistral-7b-instruct-q4_k_m.gguf」を使用）

https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF

https://huggingface.co/dranger003/e5-mistral-7b-instruct-GGUF

Docker/docker-composeのインストール

Dockerのインストールはこちらを参考にしてください。

プロジェクトの作成

まず、任意のディレクトリを作成し、その中に以下の３つのファイルを作成します。

Dockerfile.llama
Dockerfile
docker-compose.yml

Dockerfile.llama

# ビルドステージ
ARG CUDA_IMAGE="12.5.0-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE}

# 外部からのアクセスを許可するために、ホストを0.0.0.0に設定する必要がある
ENV HOST 0.0.0.0

# 必要なパッケージのインストールと OpenCL の設定
RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y git build-essential \
    python3 python3-pip gcc wget \
    ocl-icd-opencl-dev opencl-headers clinfo \
    libclblast-dev libopenblas-dev \
    && mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

# アプリケーションディレクトリの設定
WORKDIR /app

# llama.cpp リポジトリのクローン
RUN git clone https://github.com/ggerganov/llama.cpp.git
WORKDIR /app/llama.cpp

# ビルド関連の環境変数の設定
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1

# 依存関係のインストール
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings starlette-context

# CMakeの実行
RUN mkdir build && cd build && \
    cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=all

# makeコマンドの実行（エラー時の再試行とログ出力を含む）
RUN cd build && \
    make VERBOSE=1 -j4 || \
    (echo "Make failed. Retrying with more details..." && \
    make VERBOSE=1 && \
    echo "If make succeeded this time, there might be a concurrency issue.") && \
    cp bin/* ..

# コンテナ起動時のデフォルトコマンド
CMD ["/bin/bash"]

Dockerfile.llamacpp の解説

FROM nvidia/cuda:${CUDA_IMAGE}: NVIDIA CUDAのDockerイメージをベースイメージとして使用します。
ENV HOST 0.0.0.0: コンテナが外部からのアクセスを受けられるようにホストを設定します。
RUN apt-get ...: 必要なパッケージをインストールし、OpenCLを設定します。
WORKDIR /app: 作業ディレクトリを/appに設定します。
RUN git clone ...: llama.cppのリポジトリをクローンします。
WORKDIR /app/llama.cpp: 作業ディレクトリを/app/llama.cppに変更します。
ENV CUDA_DOCKER_ARCH=all: CUDAアーキテクチャをすべて有効にします。
ENV LLAMA_CUBLAS=1: CUBLASを有効にします。
RUN python3 -m pip ...: 必要なPythonパッケージをインストールします。
RUN mkdir build && cd build && cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=all: CMakeを実行してビルドファイルを生成します。
RUN cd build && make ...: ビルドを実行します。
CMD ["/bin/bash"]: コンテナ起動時にbashを実行します。

Dockerfile

# Dockerfile
FROM python:3.10

WORKDIR /app

RUN apt-get update && apt-get install -y git

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Dockerfileの解説

FROM python:3.10: Python 3.10のDockerイメージをベースイメージとして使用します。
WORKDIR /app: 作業ディレクトリを/appに設定します。
RUN apt-get ...: Gitをインストールします。
COPY requirements.txt .: requirements.txtをコンテナ内のカレントディレクトリにコピーします。
RUN pip install --no-cache-dir -r requirements.txt: requirements.txtに記載されているPythonパッケージをインストールします。

docker-compose.yml

version: '3.8'
services:
  app:
    build: .
    volumes:
      - .:/app
    tty: true
    env_file:
      - .env
    extra_hosts:
      - "host.docker.internal:host-gateway"
      
  llama-cpp-embedding:
    build: 
      context: .
      dockerfile: Dockerfile.llama
    volumes:
      - ./models:/models
    ports:
      - "8081:8081"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]
    tty: true
    command: >
      ./llama-server
      -m /models/ggml-e5-mistral-7b-instruct-q4_k_m.gguf
      --port 8081
      --host 0.0.0.0
      -n -1
      --n-gpu-layers 10
      --embedding

  llama-cpp:
    build: 
      context: .
      dockerfile: Dockerfile.llama
    volumes:
      - ./models:/models
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]
    tty: true
    command: >
      ./llama-server
      -m /models/Llama-3-ELYZA-JP-8B-q4_k_m.gguf
      --port 8080
      --host 0.0.0.0
      -n -1
      --n-gpu-layers 10

docker-compose.ymlの解説

version: '3.8': docker-composeのバージョンを指定します。
services: 複数のコンテナを定義します。
- app: アプリケーションのメインコンテナです。
  - build: .: カレントディレクトリのDockerfileを使ってイメージをビルドします。
  - volumes: - .:/app: ホストのカレントディレクトリをコンテナの/appディレクトリにマウントします。
  - tty: true: コンテナをインタラクティブモードで起動します。
  - env_file: - .env: .envファイルから環境変数を読み込みます。
  - extra_hosts: - "host.docker.internal:host-gateway": ホスト名を解決できるようにします。
- llama-cpp-embedding: embedding用のllama.cppコンテナです。
  - build: ...: Dockerfile.llamaを使ってイメージをビルドします。
  - volumes: - ./models:/models: ホストの./modelsディレクトリをコンテナの/modelsディレクトリにマウントします。
  - ports: - "8081:8081": コンテナのポート8081をホストのポート8081にバインドします。
  - deploy: ...: デプロイに関する設定を行います。ここでは、GPUを使用するように設定しています。
  - tty: true: コンテナをインタラクティブモードで起動します。
  - command: ...: コンテナ起動時に実行するコマンドを指定します。ここでは、llama-serverを実行するように設定しています。
- llama-cpp: メインのllama.cppコンテナです。
  - build: ...: Dockerfile.llamaを使ってイメージをビルドします。
  - volumes: - ./models:/models: ホストの./modelsディレクトリをコンテナの/modelsディレクトリにマウントします。
  - ports: - "8080:8080": コンテナのポート8080をホストのポート8080にバインドします。
  - deploy: ...: デプロイに関する設定を行います。ここでは、GPUを使用するように設定しています。
  - tty: true: コンテナをインタラクティブモードで起動します。
  - command: ...: コンテナ起動時に実行するコマンドを指定します。ここでは、llama-serverを実行するように設定しています。

settings.yaml


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ollama #${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://llama-cpp:8080/v1 #https://<instance>.openai.azure.com
  # api_base: http://host.docker.internal:11434/api
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  requests_per_minute: 25 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    # api_key: ollama #${GRAPHRAG_API_KEY}
    # type: openai_embedding # or azure_openai_embedding
    model: mxbai-embed-large #text-embedding-3-small
    # model: text-embedding-3-small
    # api_base: http://host.docker.internal:11434/v1 #https://<instance>.openai.azure.com
    # api_base: http://llama-cpp:8000/v1 #https://<instance>.openai.azure.com
    api_base: http://llama-cpp-embedding:8081/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  
chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

実行

docker-compose up -d --build

GraphRAGの実行

python -m graphrag.index --root ./ragtest
python -m graphrag.query \
--root ./ragtest \
--method global \
"この物語の主要なテーマは何ですか？"

これで、ローカル環境でGraphRAGを実行することができました。

回答例

Global Search Response: Summary of Key Themes

Based on the reports from multiple analysts, the primary themes in the eBook 'A Christmas Carol' are:

Social Justice and Morality

The eBook explores themes of social justice, morality, and redemption, which are highly relevant to contemporary society, potentially sparking public debate and controversy. These themes are crucial in understanding the characters' motivations and the story's implications. (Importance Score: 80)

Cultural Significance and Community Cohesion

The eBook's cultural significance could lead to a sense of shared identity and community cohesion, contributing to the community's resilience and adaptability. This theme highlights the importance of the story in shaping the community's dynamics and values. (Importance Score: 70)

Influence on Community Dynamics

The eBook's messages and themes could influence the community's dynamics and implications, potentially leading to significant consequences. This theme emphasizes the potential impact of the story on the community's social fabric. (Importance Score: 60)

These themes are interconnected and central to the narrative, making them the primary focus of the eBook. They shall continue to spark public debate and controversy, and may influence the community's dynamics and implications.

Note: The reports do not provide further information on the specific record ids related to these themes.

結果と考察

GraphRAGのグローバルRAG回答は、「クリスマスキャロル」の主要なテーマとして、社会正義と道徳、文化的重要性とコミュニティの結束、コミュニティのダイナミクスへの影響を挙げました。

一見すると、これらのテーマは妥当に見えます。しかし、AIの回答は、分析が浅く、一般的な言葉の羅列が目立ちます。例えば、「公の議論と論争を引き起こす可能性」、「コミュニティの回復力と適応力への貢献」、「コミュニティのダイナミクスと意味合いへの影響」といった表現は、具体性に欠け、「クリスマスキャロル」のテーマを的確に捉えているとは言えません。

また、「クリスマスの精神」や「贖罪」といった、物語の中心的なテーマに触れていない点は、大きな欠点と言えるでしょう。

さらに、AIは回答の中で、「複数のアナリストからの報告に基づいて」と述べていますが、実際にはそのような情報は存在しません。これは、AIがもっともらしいが、実際には根拠のない情報を生成する「ハルシネーション」と呼ばれる現象の一例です。

要約すると、AIの回答は、表面的なレベルでは妥当に見えますが、深く分析すると、具体性、正確性、そして洞察力の点で不足していると言えます。