Langchain新機能: ChatModel用のキャッシュメモリ

2023年6月26日 18:19

LangChain 0.0.213 から、ChatModelにもキャッシュがサポートされています。ChatModel用のキャッシュとしては、InMemory キャッシュと SQLAlchemy キャッシュの２種類がサポートされています。

早速ためしてみます。

InMemoryキャッシュの動作 🌵

debugオプションをつけて動きを追ってみます。

import langchain
from langchain.chat_models import ChatOpenAI

langchain.debug=True
llm = ChatOpenAI(temperature=1.0)

キャッシュを使用する場合は、langchain.llm_cache に InMemoryChache()を指定します

from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()

# 初回まキャッシュに入っていないので時間がかかる。
%time llm.predict("Tell me a joke")

[llm/start] [1:llm:ChatOpenAI] Entering LLM run with input:
{ "prompts":
　[ "Human: Tell me a joke" ]
} [llm/end] [1:llm:ChatOpenAI] [1.18s] Exiting LLM run with output:
{"generations": [
　[ { "text": "Why don't scientists trust atoms? \n\nBecause they make up everything.",
　"generation_info": null,
　"message": {
　　"content": "Why don't scientists trust atoms? \n\nBecause they make up everything.",
　　"additional_kwargs": {},
　　"example": false } } ]
　],
　"llm_output": {👈👈
　　"token_usage": {
　　　"prompt_tokens": 12,
　　　"completion_tokens": 14,
　　　"total_tokens": 26 },
　"model_name": "gpt-3.5-turbo" },
　"run": null
}
CPU times: user 31.2 ms, sys: 3.93 ms, total: 35.1 ms Wall time: 1.18 s👈👈

Why don't scientists trust atoms? \n\nBecause they make up everything.

# ２回目はキャッシュ上にあるため高速
%time llm.predict("Tell me a joke")

同じプロンプト入力に対しては、llm呼び出しは行わず、キャッシュを参照しています。トークン消費もなく、とても高速です。

[llm/start] [1:llm:ChatOpenAI] Entering LLM run with input:
{
　　"prompts": [ "Human: Tell me a joke" ]
}
[llm/end] [1:llm:ChatOpenAI] [1.804ms] Exiting LLM run with output:
{
　"generations": [
　　[ { "text": "Why don't scientists trust atoms? \n\nBecause they make up everything.",
　"generation_info": null,
　"message": {
　　"content": "Why don't scientists trust atoms? \n\nBecause they make up everything.",
　　"additional_kwargs": {},
　　"example": false } }
　　]
　],
　"llm_output": null,👈👈
　"run": null
}
CPU times: user 4.26 ms, sys: 0 ns, total: 4.26 ms Wall time: 4.42 ms👈👈

Why don't scientists trust atoms? \n\nBecause they make up everything.

SQLiteキャッシュの動作 🌵

SQLiteを用いたキャッシュの場合も、In Memoryキャッシュの場合と同様にlangchain.llm_cacheに指定します。

from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

%time llm.predict("Tell me a joke")

[llm/start] [1:llm:ChatOpenAI] Entering LLM run with input:
（中略）
　　"llm_output": {👈👈
　　　"token_usage": {
　　　　"prompt_tokens": 12,
　　　　"completion_tokens": 14,
　　　　"total_tokens": 26
　　},
　　"model_name": "gpt-3.5-turbo" },
　"run": null
}
CPU times: user 19.6 ms, sys: 1.96 ms, total: 21.5 ms Wall time: 791 ms👈👈

Why did the tomato turn red? Because it saw the salad dressing!

# ２回目は高速
%time llm.predict("Tell me a joke")

（前略）
　　"llm_output": null,👈👈
　"run": null }
CPU times: user 5.68 ms, sys: 0 ns, total: 5.68 ms Wall time: 5.8 ms👈👈

Why did the tomato turn red? Because it saw the salad dressing!

キャッシュを利用することで、原理的に出力に変化はなくなってしまいますが、API呼び出し数を減らすことで、高速化、コスト削減が期待できます。地味な機能ですが、用途や利用方法を工夫すればメリットがありそうです。

最後までお読みいただきありがとうございました。 🦜🔗
完全にやってみた系になってしまって、すいません。

この記事が気に入ったらサポートをしてみませんか？

Langchain新機能: ChatModel用 のキャッシュメモリ

InMemoryキャッシュの動作 🌵

SQLiteキャッシュの動作 🌵

Langchain新機能: ChatModel用のキャッシュメモリ