【DLAI×LangChain講座⑤】 Evaluation


背景

LangChainは気になってはいましたが、複雑そうとか、少し触ったときに日本語が出なかったりで、後回しにしていました。

DeepLearning.aiでLangChainの講座が公開されていたので、少し前に受講してみました。その内容をまとめています。

ちなみに:ゆっくりやっていたら、DLAI×LangChain講座の2が公開されてしまいました!早速受講はしてみたのですが、なかなかグレードアップしていて感動しました。急いでいこうと思います。(けど何かおまけはつけたい。。)

第4回はこちらです。

今回は第5回Evaluationについてです。ここでは、ドキュメントに対してQAを作成し、その回答を生成して、精度評価を行う方法を紹介しています。公式ドキュメントには、LLMの様々な評価方法が紹介されているので、確認してみてください。

Evaluation | 🦜️🔗 Langchain

アプローチ

DeepLearning.aiのLangChain講座の5の内容をまとめます。

サンプル

  • Example generation

  • Manual evaluation (and debuging)

  • LLM-assisted evaluation

Create our QandA application

まずは、Q&Aを生成する方法です。

import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

前回と同様にCSV形式の衣服データを読み込み、ベクトルインデックスを構成します。

file = 'data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

前回と少し違いますが、RetrievalQAを構成します。

llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

Coming up with test datapoints

評価用のデータを生成します。まず、ドキュメントを確認します。前回と同様に、ソースファイル、行番号などがメタデータに入っています。

data[10]
Document(page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.", metadata={'source': 'data/OutdoorClothingCatalog_1000.csv', 'row': 10})
data[11]
Document(page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.', metadata={'source': 'data/OutdoorClothingCatalog_1000.csv', 'row': 11})

Hard-coded examples

以下のように、自作のQ&Aを作成し、評価に利用できます。

examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

LLM-Generated examples

また、ドキュメントからQ&Aを生成することもできます。QAGenerateChainを使用します。

from langchain.evaluation.qa import QAGenerateChain
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

ここで、前回もあったOpenAIのモデル更新による影響で、下のコードが動かなくなっていました。一旦5番目まで→4番目までに変更して切り抜けます。

# new_examples = example_gen_chain.apply_and_parse(
#     [{"doc": t} for t in data[:5]]
# )
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:4]]
)
new_examples[0]
{'query': "What are the key features of the Women's Campside Oxfords?",
     'answer': "The key features of the Women's Campside Oxfords include a super-soft canvas material for a broken-in feel and look, a comfortable EVA innersole with Cleansport NXT® antimicrobial odor control, a vintage hunt, fish and camping motif on the innersole, a moderate arch contour of the innersole, an EVA foam midsole for cushioning and support, and a chain-tread-inspired molded rubber outsole with a modified chain-tread pattern."}

Q: Women's Campside Oxfordsの主要な特徴は何ですか?
A: Women's Campside Oxfordsの主要な特徴としては、すでに使い込まれたような風合いと見た目を持つ超ソフトなキャンバス素材、抗菌防臭機能付きの快適なEVA製インソール、インソールに描かれたレトロ風の狩猟、釣り、キャンプのデザイン、足のアーチに適した形状のインソール、クッション性とサポートを提供するEVAフォーム製のミッドソール、そしてチェーンタイヤから着想を得た、改良されたチェーン状の模様が付いたゴム製のアウトソールが挙げられます。

ChatGPTに日本語翻訳してもらいました。「Women's Campside Oxfords」という商品の特徴に関するQ&Aになっています。

このような形で、Q&Aが生成されます。Q&Aのもととなったドキュメントを見てみましょう。

data[0]
Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': 'data/OutdoorClothingCatalog_1000.csv', 'row': 0})

名前: Women's Campside Oxfords
説明: この超快適なレースアップタイプのオックスフォードは、超ソフトなキャンバス、厚いクッション、高品質な構造を備えており、初めて履いた瞬間から既に馴染んだ感じがします。
サイズ&フィット: 通常の靴のサイズを注文してください。提供されていない半サイズの場合は、次の全サイズまでサイズアップしてください。
仕様: 約重量: 1 lb.1 oz. per pair.
構造: 既に馴染んだ感じと見た目を持つソフトなキャンバス素材。Cleansport NXT® 抗菌防臭機能付きの快適なEVAインソール。インソールに描かれたヴィンテージ風の狩猟、釣り、キャンプのモチーフ。インソールの適度なアーチ形状。クッション性とサポートを提供するEVAフォームのミッドソール。チェーンタイヤにインスパイアされた、改良型のチェーントレッドパターンのモールドラバーアウトソール。輸入品。
質問がありますか? お問い合わせください。

そのままの質問ですね。前回のおまけのようにプロンプトをカスタマイズすることで、開発するアプリケーションの用途に応じたQ&Aを作成することができると思います。また、各商品だけでなく、複数の商品や商品全体の統計情報などから、Q&Aを作らせてもいいかもしれませんね。

Combine examples

自作の例とLLMで生成した例を使って、評価をしていきます。

examples += new_examples
qa.run(examples[0]["query"])
Entering new  chain...

Finished chain.

'Yes, the Cozy Comfort Pullover Set does have side pockets.'

Manual Evaluation

ここから、Debugモードで処理を確認します。

import langchain; langchain.debug = True
qa.run(examples[0]["query"])
[chain/start] [1:chain:RetrievalQA] Entering Chain run with input:
{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[chain/start] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input:
{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": ": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.<<<<>>>>>: 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features \nRelaxed fit top with raglan sleeves and rounded hem. \nPull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg. \nImported.<<<<>>>>>: 632\nname: Cozy Comfort Fleece Pullover\ndescription: The ultimate sweater fleece – made from superior fabric and offered at an unbeatable price. \n\nSize & Fit\nSlightly Fitted: Softly shapes the body. Falls at hip. \n\nWhy We Love It\nOur customers (and employees) love the rugged construction and heritage-inspired styling of our popular Sweater Fleece Pullover and wear it for absolutely everything. From high-intensity activities to everyday tasks, you'll find yourself reaching for it every time.\n\nFabric & Care\nRugged sweater-knit exterior and soft brushed interior for exceptional warmth and comfort. Made from soft, 100% polyester. Machine wash and dry.\n\nAdditional Features\nFeatures our classic Mount Katahdin logo. Snap placket. Front princess seams create a feminine shape. Kangaroo handwarmer pockets. Cuffs and hem reinforced with jersey binding. Imported.\n\n – Official Supplier to the U.S. Ski Team\nTHEIR WILL TO WIN, WOVEN RIGHT IN. LEARN MORE<<<<>>>>>: 151\nname: Cozy Quilted Sweatshirt\ndescription: Our sweatshirt is an instant classic with its great quilted texture and versatile weight that easily transitions between seasons. With a traditional fit that is relaxed through the chest, sleeve, and waist, this pullover is lightweight enough to be worn most months of the year. The cotton blend fabric is super soft and comfortable, making it the perfect casual layer. To make dressing easy, this sweatshirt also features a snap placket and a heritage-inspired Mt. Katahdin logo patch. For care, machine wash and dry. Imported."
}
[llm/start] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
	"System: Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n: 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.<<<<>>>>>: 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features \nRelaxed fit top with raglan sleeves and rounded hem. \nPull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg. \nImported.<<<<>>>>>: 632\nname: Cozy Comfort Fleece Pullover\ndescription: The ultimate sweater fleece – made from superior fabric and offered at an unbeatable price. \n\nSize & Fit\nSlightly Fitted: Softly shapes the body. Falls at hip. \n\nWhy We Love It\nOur customers (and employees) love the rugged construction and heritage-inspired styling of our popular Sweater Fleece Pullover and wear it for absolutely everything. From high-intensity activities to everyday tasks, you'll find yourself reaching for it every time.\n\nFabric & Care\nRugged sweater-knit exterior and soft brushed interior for exceptional warmth and comfort. Made from soft, 100% polyester. Machine wash and dry.\n\nAdditional Features\nFeatures our classic Mount Katahdin logo. Snap placket. Front princess seams create a feminine shape. Kangaroo handwarmer pockets. Cuffs and hem reinforced with jersey binding. Imported.\n\n – Official Supplier to the U.S. Ski Team\nTHEIR WILL TO WIN, WOVEN RIGHT IN. LEARN MORE<<<<>>>>>: 151\nname: Cozy Quilted Sweatshirt\ndescription: Our sweatshirt is an instant classic with its great quilted texture and versatile weight that easily transitions between seasons. With a traditional fit that is relaxed through the chest, sleeve, and waist, this pullover is lightweight enough to be worn most months of the year. The cotton blend fabric is super soft and comfortable, making it the perfect casual layer. To make dressing easy, this sweatshirt also features a snap placket and a heritage-inspired Mt. Katahdin logo patch. For care, machine wash and dry. Imported.\nHuman: Do the Cozy Comfort Pullover Set        have side pockets?"
  ]
}
[llm/end] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] [739.445ms] Exiting LLM run with output:
{
  "generations": [
	[
	  {
		"text": "Yes, the Cozy Comfort Pullover Set does have side pockets.",
		"generation_info": null,
		"message": {
		  "content": "Yes, the Cozy Comfort Pullover Set does have side pockets.",
		  "additional_kwargs": {},
		  "example": false
		}
	  }
	]
  ],
  "llm_output": {
	"token_usage": {
	  "prompt_tokens": 732,
	  "completion_tokens": 14,
	  "total_tokens": 746
	},
	"model_name": "gpt-3.5-turbo"
  },
  "run": null
}
[chain/end] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [740.312ms] Exiting Chain run with output:
{
  "text": "Yes, the Cozy Comfort Pullover Set does have side pockets."
}
[chain/end] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain] [741.0799999999999ms] Exiting Chain run with output:
{
  "output_text": "Yes, the Cozy Comfort Pullover Set does have side pockets."
}
[chain/end] [1:chain:RetrievalQA] [978.979ms] Exiting Chain run with output:
{
  "result": "Yes, the Cozy Comfort Pullover Set does have side pockets."
}

'Yes, the Cozy Comfort Pullover Set does have side pockets.'

かなり長いですが、Queryに対して4つの関連ドキュメントを検索し、回答を生成しています。ここで、ちゃんとQ&A生成元のドキュメントが含まれているか、その内容が回答にきちんと反映されているかを確認する必要があります。

# Turn off the debug mode
langchain.debug = False

LLM assisted evaluation

次に、評価もLLMにやってもら追うという方法です。以下のコードで、すべてのQ&Aに対し、回答を生成します。

predictions = qa.apply(examples)
Entering new  chain...

Finished chain.

Entering new  chain...

Finished chain.

Entering new  chain...

Finished chain.

Entering new  chain...

Finished chain.

Entering new  chain...

Finished chain.

Entering new  chain...

Finished chain.

手動の2件+LLM生成の4件=6件の回答を出力できました。次に、QAEvalChainを使って、自動評価をしていきます。

from langchain.evaluation.qa import QAEvalChain
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

機械学習のテストのように、正答例と回答を入れて評価します。

graded_outputs = eval_chain.evaluate(examples, predictions)
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()
Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set does have side pockets.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: What are the key features of the Women's Campside Oxfords?
Real Answer: The key features of the Women's Campside Oxfords include a super-soft canvas material for a broken-in feel and look, a comfortable EVA innersole with Cleansport NXT® antimicrobial odor control, a vintage hunt, fish and camping motif on the innersole, a moderate arch contour of the innersole, an EVA foam midsole for cushioning and support, and a chain-tread-inspired molded rubber outsole with a modified chain-tread pattern.
Predicted Answer: The key features of the Women's Campside Oxfords are:
- Soft canvas material for a broken-in feel and look
- Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control
- Vintage hunt, fish, and camping motif on innersole
- Moderate arch contour of innersole
- EVA foam midsole for cushioning and support
- Chain-tread-inspired molded rubber outsole with modified chain-tread pattern
- Approx. weight: 1 lb. 1 oz. per pair
Predicted Grade: CORRECT

Example 3:
Question: What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".
Predicted Answer: The dimensions of the small size of the Recycled Waterhog Dog Mat, Chevron Weave are 18" x 28". The dimensions of the medium size are 22.5" x 34.5".
Predicted Grade: CORRECT

Example 4:
Question: What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece?
Real Answer: Some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece include bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom for a secure fit and maximum coverage, and the ability to be machine washed and line dried for best results.
Predicted Answer: Some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece are:
- Bright colors and ruffles
- Exclusive whimsical prints
- Four-way-stretch and chlorine-resistant fabric
- UPF 50+ rated fabric for sun protection
- Crossover no-slip straps
- Fully lined bottom for secure fit and maximum coverage
- Machine washable and line dry for best results
- Imported
Predicted Grade: CORRECT

Example 5:
Question: What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?
Real Answer: The Refresh Swimwear, V-Neck Tankini Contrasts is made of 82% recycled nylon with 18% Lycra® spandex for the body, and the lining is composed of 90% recycled nylon with 10% Lycra® spandex.
Predicted Answer: The fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts is 82% recycled nylon with 18% Lycra® spandex for the body, and 90% recycled nylon with 10% Lycra® spandex for the lining.
Predicted Grade: CORRECT

6つのQ&A例に対して生成した回答と、それらが正しいかどうかの判定が出力されています。これをExcelなんかに書き出せば、非技術系とのコミュニケーションに使えそうです。

評価方法は、QAEvalChainのPromtをカスタマイズすることで、目的に応じた評価ができるみたいです。

🦜️🔗 Langchain#Question Answering##Customize Prompt

ドキュメントの整理がされたそうで、上のリンクが切れていました。Custom Evaluatorは、以下の各Evaluatorの下に作り方があります。
Evaluation | 🦜️🔗 Langchain

おまけ

Fix QAGenerateChain

今回のおまけでは、サンプルコードの動かなかった部分の原因を探し、修正していきます。まずは、RetrievalQAとQAGenerateChainを作成します。

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch
file = 'data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]
from langchain.evaluation.qa import QAGenerateChain
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

そして、問題の箇所です。

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)
/home/harukary/workspace/Learn_OpenAI_APIs/.venv/lib/python3.10/site-packages/langchain/chains/llm.py:303: UserWarning: The apply_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
  warnings.warn(

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[9], line 1
----> 1 new_examples = example_gen_chain.apply_and_parse(
	  2     [{"doc": t} for t in data[:5]]
	  3 )


File ~/workspace/Learn_OpenAI_APIs/.venv/lib/python3.10/site-packages/langchain/chains/llm.py:308, in LLMChain.apply_and_parse(self, input_list, callbacks)
	303 warnings.warn(
	304     "The apply_and_parse method is deprecated, "
	305     "instead pass an output parser directly to LLMChain."
	306 )
	307 result = self.apply(input_list, callbacks=callbacks)
--> 308 return self._parse_generation(result)


File ~/workspace/Learn_OpenAI_APIs/.venv/lib/python3.10/site-packages/langchain/chains/llm.py:314, in LLMChain._parse_generation(self, generation)
	310 def _parse_generation(
	311     self, generation: List[Dict[str, str]]
	312 ) -> Sequence[Union[str, List[str], Dict[str, str]]]:
	313     if self.prompt.output_parser is not None:
--> 314         return [
	315             self.prompt.output_parser.parse(res[self.output_key])
	316             for res in generation
	317         ]
	318     else:
	319         return generation


File ~/workspace/Learn_OpenAI_APIs/.venv/lib/python3.10/site-packages/langchain/chains/llm.py:315, in <listcomp>(.0)
	310 def _parse_generation(
	311     self, generation: List[Dict[str, str]]
	312 ) -> Sequence[Union[str, List[str], Dict[str, str]]]:
	313     if self.prompt.output_parser is not None:
	314         return [
--> 315             self.prompt.output_parser.parse(res[self.output_key])
	316             for res in generation
	317         ]
	318     else:
	319         return generation


File ~/workspace/Learn_OpenAI_APIs/.venv/lib/python3.10/site-packages/langchain/output_parsers/regex.py:28, in RegexParser.parse(self, text)
	 26 else:
	 27     if self.default_output_key is None:
---> 28         raise ValueError(f"Could not parse output: {text}")
	 29     else:
	 30         return {
	 31             key: text if key == self.default_output_key else ""
	 32             for key in self.output_keys
	 33         }


ValueError: Could not parse output: QUESTION: What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?

ANSWER: The fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts is 82% recycled nylon with 18% Lycra® spandex for the body, and 90% recycled nylon with 10% Lycra® spandex for the lining.

エラーメッセージを読んでみると、Output Parserでエラーが起こっているのがわかります。

ValueError: Could not parse output: QUESTION: What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?

ANSWER: The fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts is 82% recycled nylon with 18% Lycra® spandex for the body, and 90% recycled nylon with 10% Lycra® spandex for the lining.

Output Parserを確認すると、QUESTIONとANSWERの間に改行が1つの場合にしか対応していないことがわかりました。

# langchain > evaluation > qa > generate_prompt.py
output_parser = RegexParser(
    regex=r"QUESTION: (.*?)\nANSWER: (.*)", output_keys=["query", "answer"]
)

そこで、改行が2つの場合にも対応できるよう正規表現を修正したOutput Parserを作成します。

from langchain.output_parsers import RegexParser
output_parser = RegexParser(
    regex=r"QUESTION: (.*?)\n{1,2}ANSWER: (.*)", output_keys=["query", "answer"]
)

output_parser.parse("""\
QUESTION: AAA?

ANSWER: BBB.""")
{'query': 'AAA?', 'answer': 'BBB.'}

改行が2つある場合にもパースできていますね。これをQAGenerateChainのOutput Parserに代入して、再度実行してみます。

example_gen_chain.prompt.output_parser = output_parser
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)
import json
for ex in new_examples:
    print(json.dumps(ex,indent=2))
{
  "query": "What is the weight of one pair of Women's Campside Oxfords?",
  "answer": "The weight of one pair of Women's Campside Oxfords is approximately 1 lb. 1 oz."
}
{
  "query": "What are the dimensions of the small size of the Recycled Waterhog Dog Mat, Chevron Weave?",
  "answer": "The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18\" x 28\"."
}
{
  "query": "What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece?",
  "answer": "The Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece has bright colors, ruffles, and exclusive whimsical prints. It is made of four-way-stretch and chlorine-resistant fabric, which helps it maintain its shape and resist snags. The swimsuit also has a UPF 50+ rated fabric that provides the highest rated sun protection, blocking 98% of the sun's harmful rays. The crossover no-slip straps and fully lined bottom ensure a secure fit and maximum coverage."
}
{
  "query": "What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?",
  "answer": "The body of the tankini top is made of 82% recycled nylon and 18% Lycra\u00ae spandex, while the lining is made of 90% recycled nylon and 10% Lycra\u00ae spandex."
}
{
  "query": "What is the fabric composition of the EcoFlex 3L Storm Pants?",
  "answer": "The EcoFlex 3L Storm Pants are made of 100% nylon, exclusive of trim."
}

すべてパースできたようです。あれ、エラーが起きていたのは5番目じゃなかったっけ。たまたまなのかと思い、何度か試してみると、パースできる形で返してくることもあるようでした。つまり、サンプルコードで4までに変更して通っていたのもたまたま。。まあとにかく、Output Parserのエラーの対処方法が少しわかったのでよしとしましょう。

examples += new_examples

サンプルコードと同じなので一気にスキップして、回答を生成してみます。

predictions = qa.apply(examples)
Entering new  chain...

Finished chain.


Entering new  chain...

Finished chain.


Entering new  chain...

Finished chain.


Entering new  chain...

Finished chain.


Entering new  chain...

Finished chain.


Entering new  chain...

Finished chain.


Entering new  chain...

Finished chain.

無事7個の出力を得ることができました。最後に評価です。

from langchain.evaluation.qa import QAEvalChain
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)
graded_outputs = eval_chain.evaluate(examples, predictions)
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grssade: " + graded_outputs[i]['text'])
    print()
Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set does have side pockets.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: What is the weight of one pair of Women's Campside Oxfords?
Real Answer: The weight of one pair of Women's Campside Oxfords is approximately 1 lb. 1 oz.
Predicted Answer: The weight of one pair of Women's Campside Oxfords is approximately 1 lb. 1 oz.
Predicted Grade: CORRECT

Example 3:
Question: What are the dimensions of the small size of the Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28".
Predicted Answer: The dimensions of the small size of the Recycled Waterhog Dog Mat, Chevron Weave are 18" x 28".
Predicted Grade: CORRECT

Example 4:
Question: What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece?
Real Answer: The Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece has bright colors, ruffles, and exclusive whimsical prints. It is made of four-way-stretch and chlorine-resistant fabric, which helps it maintain its shape and resist snags. The swimsuit also has a UPF 50+ rated fabric that provides the highest rated sun protection, blocking 98% of the sun's harmful rays. The crossover no-slip straps and fully lined bottom ensure a secure fit and maximum coverage.
Predicted Answer: Some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece are:
- Bright colors and ruffles
- Exclusive whimsical prints
- Four-way-stretch and chlorine-resistant fabric
- UPF 50+ rated fabric for sun protection
- Crossover no-slip straps
- Fully lined bottom for secure fit and maximum coverage
- Machine washable and line dry for best results
- Imported
Predicted Grade: CORRECT

Example 5:
Question: What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?
Real Answer: The body of the tankini top is made of 82% recycled nylon and 18% Lycra® spandex, while the lining is made of 90% recycled nylon and 10% Lycra® spandex.
Predicted Answer: The fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts is 82% recycled nylon with 18% Lycra® spandex for the body, and 90% recycled nylon with 10% Lycra® spandex for the lining.
Predicted Grade: CORRECT

Example 6:
Question: What is the fabric composition of the EcoFlex 3L Storm Pants?
Real Answer: The EcoFlex 3L Storm Pants are made of 100% nylon, exclusive of trim.
Predicted Answer: The fabric composition of the EcoFlex 3L Storm Pants is 100% nylon, exclusive of trim.
Predicted Grade: CORRECT

最後まで通りました。7番目(Example 6)は、INCORRECTになる場合もあったので、何度か回して平均を取るみたいな必要がありそうです。

まとめ

DeepLearning.aiのLangChain講座の5の内容をまとめました。

今回は、LLMアプリケーションの評価用サンプル生成(QAGenerateChain)と精度評価(QAEvalChain)についてでした。サービスで使う場合には、このような評価方法は重要ですね。

LLMは様々な言語計算を自動化できるため、評価自体も自動化されるのは自然かと思いますが、どこまでも自己完結力の高い技術だなあと感じます。今後はオウンドLLM(ローカルLLM)が広まってきたら、GPT-4で評価だけ行い、使うのはオウンドLLMというパターンが主流になるかもしれませんね。LangChainはそのまま別のLLMにも対応可能なので、そのような場合に真価を発揮することになりそうです。まあもともとChatGPT以前からあったライブラリでしたし。

早くDLAI×LangChain2にいきたいのでおまけは軽めですが、読んでいただきありがとうございます。

参考

【DLAI×LangChain講座④】 Question and Answer|harukary
Evaluation | 🦜️🔗 Langchain
🦜️🔗 Langchain#Question Answering##Customize Prompt

サンプルコード

https://github.com/harukary/llm_samples/blob/main/LangChain/LangChain_DLAI_1/L5-Evaluation.ipynb
https://github.com/harukary/llm_samples/blob/main/LangChain/LangChain_DLAI_1/plus_L5.ipynb

この記事が気に入ったらサポートをしてみませんか?