【Arxiv論文】Gorilla: 多様なAPI群に接続する機能特化型LLM

はまち

2023年6月1日 21:24

ネット上の様々なAPIや機械学習モデルなどを活用するための機能特化型LLM:Gorillaの活用について、以下論文で提案されていました。
簡単に論文の概要を眺めてから、サンプルコードを動かしてみました。

Gorilla-7b-hf-v0 モデルの特徴

LLaMA-7Bを、「TorchHub（94 APIコール）」、「TensorHub(696 APIコール)、HuggingFace(20モデル)の合計925の外部ツールについて、APIとAPIドキュメントから、Self-Instrutの手法を用いて生成した学習データを用いて、ファインチューニングした
ユーザー入力に対して適切なツールを選択し、APIとのインターフェイスなども正しいプログラムの形式で出力できる。
機能特化することで、精度、ハルシネーション（Hallucination）の面でGPT-4等の既存モデルを大幅に上回る性能を発揮ことができた。

サンプルコードの実行

Colabのノートブックも公開されていましたが、今回はCLIインターフェイスを使ってみました。
試した後で気が付きましたが、とりあえず動かしてみるだけであれば、モデルの変換や環境構築が不要なColabノートブックの利用が大分ラクですので、そちらをお勧めします。😓

とはいえ、CLIインターフェイス版も、説明どおりの手順で問題なく動きました。https://github.com/ShishirPatil/gorilla/blob/main/inference/README.md

英語から翻訳の課題

$ python3 ./serve/gorilla_cli.py --model-path ./gorilla-7b-hf-v0/
USER: I would like to translate from English to Chinese.

ASSISTANT: <<<domain>>>: Natural Language Processing Translation
<<<api_call>>>: pipeline('translation_en_to_zh', model='Helsinki-NLP/opus-mt-en-zh')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>: 1. Import the pipeline function from the transformers library provided by Hugging Face.2. Use the pipeline function to create a translation model for English to Chinese. Specify the model 'Helsinki-NLP/opus-mt-en-zh' to be loaded. This model has been trained on a large corpus of text to perform translations between English and Chinese.
3. To translate a given text, pass the text as an input to the translation pipeline. The output will be the translated Chinese text.
<<<code>>>: from transformers import pipeline
translation = pipeline('translation_en_to_zh', model='Helsinki-NLP/opus-mt-en-zh')
translated_text = translation('Hello, world!')"

パイプラインを使って変換してくれるっぽい、コードが出力されましたので、別コンソールにコピペして動かしてみます。

[{'translation_text': '哈罗,世界!'}]

正しいモデルを選択しており、正常に動くようです。中国語でハローの部分は当て字みたいですね。

画像認識モデルの作成

USER: I want to build a robot that detects objects in images 
　　　and outputs their names and locations.

ASSISTANT: <<<domain>>>: Computer Vision Object Detection
<<<api_call>>>: DetrForObjectDetection.from_pretrained('facebook/detr-resnet-101')
<<<api_provider>>>: Hugging Face Transformers
<<<explanation>>>: 1. Import the necessary libraries, which are 'DetrImageProcessor', 'DetrForObjectDetection' from transformers, 'torch', 'Image' from PIL, and 'requests'.
2. Use the 'DetrForObjectDetection.from_pretrained()' function to load the pre-trained model 'facebook/detr-resnet-101'. This model is trained for object detection tasks using the DEtection TRansformer (DETR) architecture.
3. Load the image to be analyzed by the robot using the 'Image.open()' method from the PIL library.
4. Prepare the image for analysis by creating an instance of the 'DetrImageProcessor' class and processing the image using the 'processor()' method.
5. Pass the processed image to the model by calling 'model(**inputs)' method. The model will analyze the image and return the detected objects.
6. Extract the detected objects' coordinates, classes, and other relevant information using 'results.pred_boxes', 'results.pred_scores', and 'results.pred_labels' attributes.
<<<code>>>: from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image
import requests
image_url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(image_url, stream=True).raw)
processor = DetrImageProcessor.from_pretrained('facebook/detr-resnet-101')
model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-101')
inputs = processor(images=image, return_tensors='pt')
outputs = model(**inputs)
detected_objects = processor.post_process_object_detection(outputs, threshold=0.9)

こちらのコードも動きそうです。同じく実行してみます。

[{'scores': tensor([0.9980, 0.9975, 0.9966, 0.9982, 0.9951],
　grad_fn=<IndexBackward0>),
'labels': tensor([17, 75, 75, 17, 63]),
'boxes': tensor([
[ 5.3760e-01, 5.1780e-02, 1.0005e+00, 7.7863e-01],
[ 5.1270e-01, 1.5820e-01, 5.8252e-01, 3.9096e-01],
[ 6.1465e-02, 1.4611e-01, 2.7431e-01, 2.4747e-01],
[ 2.3993e-02, 1.0780e-01, 4.9513e-01, 9.8158e-01],
[-2.9066e-04, 1.4812e-03, 9.9957e-01, 9.8785e-01]],
grad_fn=<IndexBackward0>)}]

DETRを使って、サンプル画像のラベルとボックスのリストが出力されてますね。ラベルの名前との対応などの実装は今回は動作確認が目的なので、ここまでにしておきます。

まとめ

サンプルを動かしたり、学習データセットを眺めることで、論文の提案の趣旨がより理解できたと思います。
高性能なモデルを作るには学習データの作成がモデルの性能を決定する肝だというのが、またまた裏付けられたと思います。機能は違いますが、先日OpenAIが算術学習用のデーターセットを公開してたのも同じ話ですよね。
今後は、先日試した自らツールを作るLLMや、今回のような適切なツールを操るLLMなど、機能特化したLLMを使い分けて１つの大きなシステムを構築していくのが開発の１つの方向性になるのでしょうか

最新の研究が自宅で試せるとは、面白い時代になったものです。

おしまい

この記事が気に入ったらサポートをしてみませんか？