シュッと AutoGen と戯れる

bbz

2023年10月9日 15:42

ちゃっす(/・ω・)/

今日はマイクロソフト、ペンシルベニア州立大学、ワシントン大学の共同研究によって開発された？ AutoGen というフレームワークと戯れてみるわよ(/・ω・)/

なんかご機嫌な文章は ChatGPT ちゃんにまとめてもらってるぞ☆

✨ AutoGenフレームワーク:

LLMアプリケーションを作るためのすごいフレームワーク🛠️
いろんなエージェントたちとおしゃべりしながらタスクを解決🗣️

✨ AutoGenエージェントの特徴:
1️⃣ カスタマイズ自由自在🎨
2️⃣ 会話上手で仲間と連携🤝
3️⃣ 人間とシームレスに協力できる！👫

🚀 どんなモードで動くの？

LLMだけじゃなくて、人の手を借りたり、いろんなツールを使ったりして動作します🔧👩‍💻

AutoGenで、エージェントたちと一緒に、もっと楽しく、もっとスムーズにアプリを作ろう！🌼🚀

主な機能

1️⃣ 複雑なLLMワークフローをシンプルに🌀
2️⃣ LLMの性能アップ&弱点克服💪
3️⃣ いろんな会話パターンに対応🗣️
4️⃣ 会話の自律性やエージェントの数、トポロジーまでカスタマイズOK🎨

🌐 システムのバラエティ:

たくさんの作業システムを持っていて、色んなアプリのニーズに答えられる💼
どんな会話スタイルもサポートできるって証明してる！🌟

🔗 APIの魅力:

openai.Completionやopenai.ChatCompletionに代わる新しいAPIを持ってる🌐
パフォーマンスの調整やAPIのまとめ、エラー処理、高度な使用が簡単にできるよ！🚀

ゆーすけーす

ざっくり二つほどユースケースとして挙げられておりますな(/・ω・)/

Multi-agent Conversation Framework
Enhanced Inference

まぁ能書きはいいのでシュッとお試しするざんす(/・ω・)/

Installation

docker We strongly recommend using docker for code execution or running AutoGen in a docker container (e.g., when developing in GitHub codespace, the autogen runs in a docker container).

docker を強く勧められておる

ので、シュッと使える Google Colab で試しますかな

!pip install pyautogen docker

何個か提供されてる Example を試すざんす(/・ω・)/

API Endpoint を設定するぞ☆

import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        # "model": ["gpt4", "gpt-4-32k", "gpt-4-32k-0314", "gpt-4-32k-v0314"],
        "model": ["gpt-3.5-turbo", "gpt-3.5-turbo-16k"],
    },
)

llm_config={
    "request_timeout": 600,
    "seed": 42,
    "config_list": config_list,
    "temperature": 0,
}

お高いから GPT4 は使わせないぞ☆

OAI_CONFIG_LIST ファイルを作成して中身を定義しておくのだ
拡張子はつけちゃだめだぞ☆

[
    {
        "model": "gpt-3.5-turbo",
        "api_key": "sk-"
    },
    {
        "model": "gpt-3.5-turbo-16k",
        "api_key": "sk-"
    }
]

Agent 設定するーの

# create an AssistantAgent instance named "assistant"
assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=llm_config,
)
# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={"work_dir": "web"},
    llm_config=llm_config,
    system_message="""そのタスクが完全に満足のいく形で解決された場合は、TERMINATEと答える。
そうでなければ、CONTINUEか、タスクがまだ解決していない理由を答えなさい。"""
)

論文について分析してもらうぞ☆

# the assistant receives a message from the user, which contains the task description
user_proxy.initiate_chat(
    assistant,
    message="""
この論文を読むべき人: https://arxiv.org/abs/2308.08155
""",
)

いざ(/・ω・)/

user_proxy (to assistant):


この論文を読むべき人: https://arxiv.org/abs/2308.08155


--------------------------------------------------------------------------------
assistant (to user_proxy):

この論文を読むべき人についての情報を提供します。論文のタイトルや著者名などの情報があれば、それも教えてください。

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
user_proxy (to assistant):

申し訳ありませんが、私は情報を提供することができません。私はAIアシスタントであり、インターネット上の情報にアクセスすることはできません。論文のタイトルや著者名などの情報を提供していただければ、それに基づいてお手伝いすることはできます。

--------------------------------------------------------------------------------
assistant (to user_proxy):

了解しました。論文の情報が提供できない場合、私は具体的なアドバイスを提供することができません。論文の情報が手に入った場合は、再度お知らせください。他に何かお手伝いできることがあればお知らせください。

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
user_proxy (to assistant):

了解しました。論文の情報が手に入った場合は、再度お知らせください。他に何かお手伝いできることがあれば、お気軽にお知らせください。

--------------------------------------------------------------------------------
assistant (to user_proxy):

承知しました。お手伝いできることがあれば、いつでもお知らせください。お待ちしております。

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
user_proxy (to assistant):

ありがとうございます。お手伝いできることがあれば、いつでもお知らせください。お待ちしております。

--------------------------------------------------------------------------------
assistant (to user_proxy):

ありがとうございます。お手伝いできることがあれば、いつでもお知らせください。お待ちしております。

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
user_proxy (to assistant):

ありがとうございます。お手伝いできることがあれば、いつでもお知らせください。お待ちしております。

だめじゃん(　・ω・)

GPT4 使ってみるか。

失敗してくれるなよ、、、

user_proxy (to assistant):


この論文を読むべき人: https://arxiv.org/abs/2308.08155


--------------------------------------------------------------------------------
assistant (to user_proxy):

この論文の要約を取得するために、PythonのBeautifulSoupとrequestsライブラリを使用します。これにより、論文の要約を抽出し、それに基づいて読むべき人を推測します。

以下のPythonコードを実行してください。

```python
import requests
from bs4 import BeautifulSoup

def get_paper_abstract(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    abstract = soup.find('blockquote', attrs={'class': 'abstract mathjax'}).text.strip()
    return abstract

url = "https://arxiv.org/abs/2308.08155"
abstract = get_paper_abstract(url)
print(abstract)
```

このコードは、指定されたURLから論文の要約を取得し、それを出力します。この要約に基づいて、論文を読むべき人を推測します。

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):

exitcode: 0 (execution succeeded)
Code output: 
Abstract:AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. Empirical studies demonstrate the effectiveness of the framework in many example applications, with domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.


--------------------------------------------------------------------------------
assistant (to user_proxy):

この論文の要約によれば、AutoGenは開発者が複数のエージェントを使ってLLMアプリケーションを構築するためのオープンソースフレームワークで、これらのエージェントは会話を通じてタスクを達成することができます。AutoGenエージェントはカスタマイズ可能で、会話可能であり、LLM、人間の入力、ツールの組み合わせを使用したさまざまなモードで動作することができます。また、開発者はAutoGenを使用して、エージェントの相互作用の振る舞いを柔軟に定義することもできます。自然言語とコンピュータコードの両方が、さまざまなアプリケーションのための柔軟な会話パターンをプログラムするために使用できます。AutoGenは、さまざまな複雑さとLLM容量のさまざまなアプリケーションを構築するための一般的なインフラとして機能します。実証研究は、数学、コーディング、質問応答、オペレーションズリサーチ、オンライン決定作成、エンターテイメントなどの領域にわたる多くの例示アプリケーションでフレームワークの有効性を示しています。

したがって、この論文は以下のような人々にとって有益である可能性があります：

1. LLM（Language Models）やAIエージェントを使用してアプリケーションを開発することに興味がある開発者。
2. 自然言語処理や機械学習に関心がある研究者。
3. オープンソースフレームワークを使用して、複雑な問題を解決するための新しい方法を探している技術者。

TERMINATE

--------------------------------------------------------------------------------
Please give feedback to assistant. Press enter or type 'exit' to stop the conversation: exit

しゅ、、、しゅごーーーーーい(/・ω・)/

必要なパッケージインストールして動いてくれますやん☆

逆にいうと GPT4 使わないときびちいのね(　・ω・)

Let's 円卓会議☆

さて、ではお待ちかね複数のエージェントによる円卓会議を開始しよう（〇-〇ヽ）

サンプル通りエンジニア、科学者、計画者、実行者、批評家を召喚！！

gpt4_config = {
    "seed": 42,  # change the seed for different trials
    "temperature": 0,
    "config_list": config_list,
    "request_timeout": 120,
}
user_proxy = autogen.UserProxyAgent(
   name="Admin",
   system_message="A human admin. Interact with the planner to discuss the plan. Plan execution needs to be approved by this admin.",
   code_execution_config=False,
)
engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=gpt4_config,
    system_message='''Engineer. You follow an approved plan. You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.
Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
''',
)
scientist = autogen.AssistantAgent(
    name="Scientist",
    llm_config=gpt4_config,
    system_message="""Scientist. You follow an approved plan. You are able to categorize papers after seeing their abstracts printed. You don't write code."""
)
planner = autogen.AssistantAgent(
    name="Planner",
    system_message='''Planner. Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval.
The plan may involve an engineer who can write code and a scientist who doesn't write code.
Explain the plan first. Be clear which step is performed by an engineer, and which step is performed by a scientist.
''',
    llm_config=gpt4_config,
)
executor = autogen.UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    code_execution_config={"last_n_messages": 3, "work_dir": "paper"},
)
critic = autogen.AssistantAgent(
    name="Critic",
    system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.",
    llm_config=gpt4_config,
)
groupchat = autogen.GroupChat(agents=[user_proxy, engineer, scientist, planner, executor, critic], messages=[], max_round=50)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gpt4_config)

Just do it

user_proxy.initiate_chat(
    manager,
    message="""
find papers on LLM applications from arxiv in the last week, create a markdown table of different domains. 
""",
)

最近の LLM 関連の論文をまとめたまえ（〇-〇ヽ）

Admin (to chat_manager):


find papers on LLM applications from arxiv in the last week, create a markdown table of different domains.

計画を立てるぜ☆

Planner (to chat_manager):

Plan:

1. Engineer: Write a script to scrape arXiv for papers on LLM (Language Model) applications published in the last week. The script should extract the title, authors, abstract, and link to the paper. 

2. Scientist: Review the scraped data, read the abstracts of the papers, and categorize them into different domains based on the content.

3. Engineer: Write a script to convert the categorized data into a markdown table. The table should include columns for the title, authors, domain, and link to the paper.

4. Scientist: Review the markdown table to ensure the information is accurate and the domains are correctly identified.

5. Engineer: Make any necessary revisions to the markdown table based on the scientist's feedback.

6. Scientist: Give final approval of the markdown table.

7. Engineer: Submit the final markdown table to the admin for approval.

8. Admin: Review the markdown table and provide feedback.

9. Engineer and Scientist: Make any necessary revisions to the markdown table based on the admin's feedback.

10. Repeat steps 8 and 9 until the admin approves the markdown table.

修正案を提案するぜ☆

Critic (to chat_manager):

The plan is well-structured and covers all the necessary steps to achieve the goal. However, it lacks the inclusion of verifiable information such as source URLs. The source URL is crucial for transparency and verification of the information. 

Here's a revised plan:

1. Engineer: Write a script to scrape arXiv for papers on LLM (Language Model) applications published in the last week. The script should extract the title, authors, abstract, and link to the paper. 

2. Scientist: Review the scraped data, read the abstracts of the papers, and categorize them into different domains based on the content.

3. Engineer: Write a script to convert the categorized data into a markdown table. The table should include columns for the title, authors, domain, link to the paper, and source URL.

4. Scientist: Review the markdown table to ensure the information is accurate, the domains are correctly identified, and the source URLs are correct.

5. Engineer: Make any necessary revisions to the markdown table based on the scientist's feedback.

6. Scientist: Give final approval of the markdown table.

7. Engineer: Submit the final markdown table to the admin for approval.

8. Admin: Review the markdown table and provide feedback.

9. Engineer and Scientist: Make any necessary revisions to the markdown table based on the admin's feedback.

10. Repeat steps 8 and 9 until the admin approves the markdown table.

This revised plan ensures that the source URL is included in the markdown table, which adds a layer of transparency and allows for the verification of the information.

ワタクシによる承認☆

Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: accept revised plan
Admin (to chat_manager):

accept revised plan

こんなコードでいくぜ☆

Engineer (to chat_manager):

Alright, let's start with the first step. I will write a script to scrape arXiv for papers on LLM (Language Model) applications published in the last week. The script will extract the title, authors, abstract, and link to the paper.

--------------------------------------------------------------------------------
Executor (to chat_manager):



--------------------------------------------------------------------------------
Engineer (to chat_manager):

```python
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
import pandas as pd

# Define the URL of the arXiv API
url = "http://export.arxiv.org/api/query?"

# Define the search parameters
search_params = {
    "search_query": "all:LLM",
    "start": 0,
    "max_results": 1000,
}

# Send a GET request to the arXiv API
response = requests.get(url, params=search_params)

# Parse the response using BeautifulSoup
soup = BeautifulSoup(response.content, "html.parser")

# Get the current date and time
now = datetime.now()

# Define a week ago
week_ago = now - timedelta(days=7)

# Initialize an empty list to hold the papers
papers = []

# Loop through each entry in the response
for entry in soup.find_all("entry"):
    # Get the published date of the entry
    published = datetime.strptime(entry.published.string, "%Y-%m-%dT%H:%M:%SZ")

    # If the entry was published in the last week
    if published > week_ago:
        # Extract the title, authors, abstract, and link
        title = entry.title.string
        authors = [author.find("name").string for author in entry.find_all("author")]
        abstract = entry.summary.string
        link = entry.id.string

        # Add the paper to the list
        papers.append({
            "title": title,
            "authors": authors,
            "abstract": abstract,
            "link": link,
            "published": published,
        })

# Convert the list of papers to a DataFrame
df = pd.DataFrame(papers)

# Print the DataFrame
print(df)

論文取得してきたぜ☆

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
Executor (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
                                                title  ...           published
0   Compressing LLMs: The Truth is Rarely Pure and...  ... 2023-10-02 17:42:37
1   Towards Efficient and Effective Adaptation of ...  ... 2023-10-02 20:03:42
2   EcoAssistant: Using LLM Assistant More Afforda...  ... 2023-10-03 22:16:13
3   Large Language Model Cascades with Mixture of ...  ... 2023-10-04 18:21:17
4   Driving with LLMs: Fusing Object-Level Vector ...  ... 2023-10-03 11:05:14
5   Enhancing Financial Sentiment Analysis via Ret...  ... 2023-10-06 05:40:23
6   Evaluating Multi-Agent Coordination Abilities ...  ... 2023-10-05 21:18:15
7   A Comprehensive Evaluation of Large Language M...  ... 2023-10-06 14:16:28
8   LanguageMPC: Large Language Models as Decision...  ... 2023-10-04 17:59:49
9   Improving web element localization by using a ...  ... 2023-10-03 13:39:22
10  A New Dialogue Response Generation Agent for L...  ... 2023-10-05 03:45:54
11  Resolving Knowledge Conflicts in Large Languag...  ... 2023-10-02 06:57:45
12  Time-LLM: Time Series Forecasting by Reprogram...  ... 2023-10-03 01:31:25
13       LLMParser: A LLM-based Log Parsing Framework  ... 2023-10-03 04:46:59
14  Reasoning on Graphs: Faithful and Interpretabl...  ... 2023-10-02 10:14:43
15    SPELL: Semantic Prompt Evolution based on a LLM  ... 2023-10-02 14:51:16
16  Who is ChatGPT? Benchmarking LLMs' Psychologic...  ... 2023-10-02 17:46:09
17  SmartPlay : A Benchmark for LLMs as Intelligen...  ... 2023-10-02 18:52:11
18  Deciphering Diagnoses: How Large Language Mode...  ... 2023-10-03 00:08:23
19  Novice Learner and Expert Tutor: Evaluating Ma...  ... 2023-10-03 21:19:50
20  A Formalism and Approach for Improving Robustn...  ... 2023-10-05 03:20:41
21  SmoothLLM: Defending Large Language Models Aga...  ... 2023-10-05 17:01:53
22  HeaP: Hierarchical Policies for Web Actions us...  ... 2023-10-05 17:40:09
23  Avalon's Game of Thoughts: Battle Against Dece...  ... 2023-10-02 16:27:36
24  MetaTool Benchmark: Deciding Whether to Use To...  ... 2023-10-04 19:39:26
25           GraphText: Graph Reasoning in Text Space  ... 2023-10-02 11:03:57
26  Exploring Collaboration Mechanisms for LLM Age...  ... 2023-10-03 15:05:52
27  Can Large Language Models Provide Security & P...  ... 2023-10-03 20:54:29
28  CITING: Large Language Models Create Curriculu...  ... 2023-10-04 01:58:34
29  From Words to Watts: Benchmarking the Energy C...  ... 2023-10-04 17:41:59
30  Retrieval meets Long Context Large Language Mo...  ... 2023-10-04 17:59:41
31  Balancing Autonomy and Alignment: A Multi-Dime...  ... 2023-10-05 16:37:29
32  Fine-tuning Aligned Language Models Compromise...  ... 2023-10-05 17:12:17
33                  Label Supervised LLaMA Finetuning  ... 2023-10-02 13:53:03
34  Evaluating the Decency and Consistency of Data...  ... 2023-10-02 17:58:52
35  LLM Lies: Hallucinations are not Bugs, but Fea...  ... 2023-10-02 17:01:56
36  Unveiling the Pitfalls of Knowledge Editing fo...  ... 2023-10-03 15:10:46
37  Jailbreaker in Jail: Moving Target Defense for...  ... 2023-10-03 20:32:04
38  A UMLS-Augmented Framework for Improving Factu...  ... 2023-10-04 12:50:26
39  Misusing Tools in Large Language Models With V...  ... 2023-10-04 22:10:01
40   On the Performance of Multimodal Language Models  ... 2023-10-04 23:33:36
41  Can Large Language Models be Good Path Planner...  ... 2023-10-05 01:42:16
42  Adapting Large Language Models for Content Mod...  ... 2023-10-05 09:09:44
43  Large Language Models for Software Engineering...  ... 2023-10-05 13:33:26
44  Redefining Digital Health Interfaces with Larg...  ... 2023-10-05 14:18:40
45  Knowledge Crosswords: Geometric Reasoning over...  ... 2023-10-02 15:43:53
46  Large Language Models for Test-Free Fault Loca...  ... 2023-10-03 01:26:39
47  Instance Needs More Care: Rewriting Prompts fo...  ... 2023-10-03 14:51:34
48  Dynamic LLM-Agent Network: An LLM-agent Collab...  ... 2023-10-03 16:05:48
49  Automated Evaluation of Classroom Instructiona...  ... 2023-10-02 12:11:17
50  VAL: Interactive Task Learning with GPT Dialog...  ... 2023-10-02 20:45:41
51  Large Language Models Cannot Self-Correct Reas...  ... 2023-10-03 04:56:12
52  DeepDecipher: Accessing and Investigating Neur...  ... 2023-10-03 08:15:20
53  Language Models as Knowledge Bases for Visual ...  ... 2023-10-03 11:11:55
54  Ask Again, Then Fail: Large Language Models' V...  ... 2023-10-03 16:08:41
55  Chain of Natural Language Inference for Reduci...  ... 2023-10-06 00:10:46
56  Amortizing intractable inference in large lang...  ... 2023-10-06 16:36:08

[57 rows x 5 columns]

まとめるぜ☆

Scientist (to chat_manager):

Now that we have the data, I will review the abstracts of the papers and categorize them into different domains based on the content.

--------------------------------------------------------------------------------
Scientist (to chat_manager):

After reviewing the abstracts, I have categorized the papers into the following domains:

マークダウンにするぜ☆

Engineer (to chat_manager):

Alright, I will now write a script to convert the categorized data into a markdown table. The table will include columns for the title, authors, domain, link to the paper, and source URL.

失敗したぜ☆

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
Executor (to chat_manager):

exitcode: 1 (execution failed)
Code output: 
Traceback (most recent call last):
  File "", line 2, in <module>
    df["domain"] = [
NameError: name 'df' is not defined

修正するぜ☆

Engineer (to chat_manager):

It seems like there was an error because the DataFrame 'df' is not defined in this context. I apologize for the oversight. Let me correct this. I will re-run the scraping script and then add the domain information to the DataFrame. After that, I will convert the DataFrame to a markdown table. Let's try this again

再実行するぜ☆

Executor (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
| title                                                                    | authors                                                                                                                                                                     | abstract                                                                         | link                              | published           | domain                      |
|:-------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------|:--------------------|:----------------------------|
| Compressing LLMs: The Truth is Rarely Pure and Never Simple              | ['Ajay Jaiswal', 'Zhe Gan', 'Xianzhi Du', 'Bowen Zhang', 'Zhangyang Wang', 'Yinfei Yang']                                                                                   | Despite their remarkable achievements, modern Large Language Models (LLMs)       | http://arxiv.org/abs/2310.01382v1 | 2023-10-02 17:42:37 | Machine Learning            |
| Towards Efficient and Effective Adaptation of Large Language Models for  | ['Bo Peng', 'Ben Burns', 'Ziqi Chen', 'Srinivasan Parthasarathy', 'Xia Ning']                                                                                               | In recent years, with large language models (LLMs) achieving state-of-the-art    | http://arxiv.org/abs/2310.01612v1 | 2023-10-02 20:03:42 | Machine Learning            |
|   Sequential Recommendation                                              |                                                                                                                                                                             | performance in context understanding, increasing efforts have been dedicated to  |                                   |                     |                             |
| EcoAssistant: Using LLM Assistant More Affordably and Accurately         | ['Jieyu Zhang', 'Ranjay Krishna', 'Ahmed H. Awadallah', 'Chi Wang']                                                                                                         | Today, users ask Large language models (LLMs) as assistants to answer queries    | http://arxiv.org/abs/2310.03046v1 | 2023-10-03 22:16:13 | Natural Language Processing |
| Large Language Model Cascades with Mixture of Thoughts Representations   | ['Murong Yue', 'Jie Zhao', 'Min Zhang', 'Liang Du', 'Ziyu Yao']                                                                                                             | Large language models (LLMs) such as GPT-4 have exhibited remarkable             | http://arxiv.org/abs/2310.03094v1 | 2023-10-04 18:21:17 | Machine Learning            |
|   for Cost-efficient Reasoning                                           |                                                                                                                                                                             | performance in a variety of tasks, but this strong performance often comes with  |                                   |                     |                             |
| Driving with LLMs: Fusing Object-Level Vector Modality for Explainable   | ['Long Chen', 'Oleg Sinavski', 'Jan Hünermann', 'Alice Karnsund', 'Andrew James Willmott', 'Danny Birch', 'Daniel Maund', 'Jamie Shotton']                                  | Large Language Models (LLMs) have shown promise in the autonomous driving        | http://arxiv.org/abs/2310.01957v1 | 2023-10-03 11:05:14 | Robotics                    |
|   Autonomous Driving                                                     |                                                                                                                                                                             | sector, particularly in generalization and interpretability. We introduce a      |                                   |                     |                             |
| Enhancing Financial Sentiment Analysis via Retrieval Augmented Large     | ['Boyu Zhang', 'Hongyang Yang', 'Tianyu Zhou', 'Ali Babar', 'Xiao-Yang Liu']                                                                                                | Financial sentiment analysis is critical for valuation and investment            | http://arxiv.org/abs/2310.04027v1 | 2023-10-06 05:40:23 | Finance                     |
|   Language Models                                                        |                                                                                                                                                                             | decision-making. Traditional NLP models, however, are limited by their           |                                   |                     |                             |
| Evaluating Multi-Agent Coordination Abilities in Large Language Models   | ['Saaket Agashe', 'Yue Fan', 'Xin Eric Wang']                                                                                                                               | A pivotal aim in contemporary AI research is to develop agents proficient in     | http://arxiv.org/abs/2310.03903v1 | 2023-10-05 21:18:15 | Multi-Agent Systems         |
| A Comprehensive Evaluation of Large Language Models on Benchmark         | ['Israt Jahan', 'Md Tahmid Rahman Laskar', 'Chun Peng', 'Jimmy Huang']                                                                                                      | Recently, Large Language Models (LLM) have demonstrated impressive capability    | http://arxiv.org/abs/2310.04270v1 | 2023-10-06 14:16:28 | Natural Language Processing |
|   Biomedical Text Processing Tasks                                       |                                                                                                                                                                             | to solve a wide range of tasks. However, despite their success across various    |                                   |                     |                             |
| LanguageMPC: Large Language Models as Decision Makers for Autonomous     | ['Hao Sha', 'Yao Mu', 'Yuxuan Jiang', 'Li Chen', 'Chenfeng Xu', 'Ping Luo', 'Shengbo Eben Li', 'Masayoshi Tomizuka', 'Wei Zhan', 'Mingyu Ding']                             | Existing learning-based autonomous driving (AD) systems face challenges in       | http://arxiv.org/abs/2310.03026v1 | 2023-10-04 17:59:49 | Decision Making             |
|   Driving                                                                |                                                                                                                                                                             | comprehending high-level information, generalizing to rare events, and           |                                   |                     |                             |
| Improving web element localization by using a large language model       | ['Michel Nass', 'Emil Alegroth', 'Robert Feldt']                                                                                                                            | Web-based test automation heavily relies on accurately finding web elements.     | http://arxiv.org/abs/2310.02046v1 | 2023-10-03 13:39:22 | Web Development             |
| A New Dialogue Response Generation Agent for Large Language Models by    | ['Siwei Wu', 'Xiangqing Shen', 'Rui Xia']                                                                                                                                   | Large Language Models (LLMs), such as ChatGPT, have recently been applied to     | http://arxiv.org/abs/2310.03293v1 | 2023-10-05 03:45:54 | Natural Language Processing |
|   Asking Questions to Detect User's Intentions                           |                                                                                                                                                                             | various NLP tasks due to its open-domain generation capabilities. However,       |                                   |                     |                             |
| Resolving Knowledge Conflicts in Large Language Models                   | ['Yike Wang', 'Shangbin Feng', 'Heng Wang', 'Weijia Shi', 'Vidhisha Balachandran', 'Tianxing He', 'Yulia Tsvetkov']                                                         | Large language models (LLMs) often encounter knowledge conflicts, scenarios      | http://arxiv.org/abs/2310.00935v1 | 2023-10-02 06:57:45 | Knowledge Management        |
| Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | ['Ming Jin', 'Shiyu Wang', 'Lintao Ma', 'Zhixuan Chu', 'James Y. Zhang', 'Xiaoming Shi', 'Pin-Yu Chen', 'Yuxuan Liang', 'Yuan-Fang Li', 'Shirui Pan', 'Qingsong Wen']       | Time series forecasting holds significant importance in many real-world          | http://arxiv.org/abs/2310.01728v1 | 2023-10-03 01:31:25 | Time Series Analysis        |
| LLMParser: A LLM-based Log Parsing Framework                             | ['Zhihan Jiang', 'Jinyang Liu', 'Zhuangbin Chen', 'Yichen Li', 'Junjie Huang', 'Yintong Huo', 'Pinjia He', 'Jiazhen Gu', 'Michael R. Lyu']                                  | The process of log parsing, which converts log messages into structured          | http://arxiv.org/abs/2310.01796v1 | 2023-10-03 04:46:59 | Log Parsing                 |
| Reasoning on Graphs: Faithful and Interpretable Large Language Model     | ['Linhao Luo', 'Yuan-Fang Li', 'Gholamreza Haffari', 'Shirui Pan']                                                                                                          | Large language models (LLMs) have demonstrated impressive reasoning abilities    | http://arxiv.org/abs/2310.01061v1 | 2023-10-02 10:14:43 | Graph Theory                |
|   Reasoning                                                              |                                                                                                                                                                             | in complex tasks. However, they lack up-to-date knowledge and experience         |                                   |                     |                             ||                                                                          |                                                                                                                                                                             | reasoning results.                                                               |                                   |                     |                             |
| SPELL: Semantic Prompt Evolution based on a LLM                          | ['Yujian Betterest Li', 'Kai Wu']                                                                                                                                           | Prompt engineering is a new paradigm for enhancing the performance of trained    | http://arxiv.org/abs/2310.01260v1 | 2023-10-02 14:51:16 | Natural Language Processing |
| Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using         | ['Jen-tse Huang', 'Wenxuan Wang', 'Eric John Li', 'Man Ho Lam', 'Shujie Ren', 'Youliang Yuan', 'Wenxiang Jiao', 'Zhaopeng Tu', 'Michael R. Lyu']                            | Large Language Models (LLMs) have recently showcased their remarkable            | http://arxiv.org/abs/2310.01386v1 | 2023-10-02 17:46:09 | Psychology                  |
|   PsychoBench                                                            |                                                                                                                                                                             | capacities, not only in natural language processing tasks but also across        |                                   |                     |                             |
| SmartPlay : A Benchmark for LLMs as Intelligent Agents                   | ['Yue Wu', 'Xuan Tang', 'Tom M. Mitchell', 'Yuanzhi Li']                                                                                                                    | Recent large language models (LLMs) have demonstrated great potential toward     | http://arxiv.org/abs/2310.01557v2 | 2023-10-02 18:52:11 | Gaming                      |
| Deciphering Diagnoses: How Large Language Models Explanations Influence  | ['D. Umerenkov', 'G. Zubkova', 'A. Nesterov']                                                                                                                               | Clinical Decision Support Systems (CDSS) utilize evidence-based knowledge and    | http://arxiv.org/abs/2310.01708v1 | 2023-10-03 00:08:23 | Healthcare                  |
|   Clinical Decision Making                                               |                                                                                                                                                                             | patient data to offer real-time recommendations, with Large Language Models      |                                   |                     |                             |
| Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of  | ['Naiming Liu', 'Shashank Sonkar', 'Zichao Wang', 'Simon Woodhead', 'Richard G. Baraniuk']                                                                                  | We propose novel evaluations for mathematical reasoning capabilities of Large    | http://arxiv.org/abs/2310.02439v1 | 2023-10-03 21:19:50 | Education                   |
|   Large Language Models with Misconceptions                              |                                                                                                                                                                             | Language Models (LLMs) based on mathematical misconceptions. Our primary         |                                   |                     |                             |
| A Formalism and Approach for Improving Robustness of Large Language      | ['Ke Shen', 'Mayank Kejriwal']                                                                                                                                              | Large Language Models (LLMs), such as ChatGPT, have achieved impressive          | http://arxiv.org/abs/2310.03283v1 | 2023-10-05 03:20:41 | Machine Learning            |
|   Models Using Risk-Adjusted Confidence Scores                           |                                                                                                                                                                             | milestones in natural language processing (NLP). Despite their impressive        |                                   |                     |                             |
| SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks  | ['Alexander Robey', 'Eric Wong', 'Hamed Hassani', 'George J. Pappas']                                                                                                       | Despite efforts to align large language models (LLMs) with human values,         | http://arxiv.org/abs/2310.03684v1 | 2023-10-05 17:01:53 | Cybersecurity               |
| HeaP: Hierarchical Policies for Web Actions using LLMs                   | ['Paloma Sodhi', 'S. R. K. Branavan', 'Ryan McDonald']                                                                                                                      | Large language models (LLMs) have demonstrated remarkable capabilities in        | http://arxiv.org/abs/2310.03720v1 | 2023-10-05 17:40:09 | Web Development             |
| Avalon's Game of Thoughts: Battle Against Deception through Recursive    | ['Shenzhi Wang', 'Chang Liu', 'Zilong Zheng', 'Siyuan Qi', 'Shuo Chen', 'Qisen Yang', 'Andrew Zhao', 'Chaofei Wang', 'Shiji Song', 'Gao Huang']                             | Recent breakthroughs in large language models (LLMs) have brought remarkable     | http://arxiv.org/abs/2310.01320v2 | 2023-10-02 16:27:36 | Gaming                      |
|   Contemplation                                                          |                                                                                                                                                                             | success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is    |                                   |                     |                             |
| MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use       | ['Yue Huang', 'Jiawen Shi', 'Yuan Li', 'Chenrui Fan', 'Siyuan Wu', 'Qihui Zhang', 'Yixin Liu', 'Pan Zhou', 'Yao Wan', 'Neil Zhenqiang Gong', 'Lichao Sun']                  | Large language models (LLMs) have garnered significant attention due to their    | http://arxiv.org/abs/2310.03128v1 | 2023-10-04 19:39:26 | Decision Making             |
| GraphText: Graph Reasoning in Text Space                                 | ['Jianan Zhao', 'Le Zhuo', 'Yikang Shen', 'Meng Qu', 'Kai Liu', 'Michael Bronstein', 'Zhaocheng Zhu', 'Jian Tang']                                                          | Large Language Models (LLMs) have gained the ability to assimilate human         | http://arxiv.org/abs/2310.01089v1 | 2023-10-02 11:03:57 | Graph Theory                |
| Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology   | ['Jintian Zhang', 'Xin Xu', 'Shumin Deng']                                                                                                                                  | As Natural Language Processing (NLP) systems are increasingly employed in        | http://arxiv.org/abs/2310.02124v1 | 2023-10-03 15:05:52 | Multi-Agent Systems         |
|   View                                                                   |                                                                                                                                                                             | intricate social environments, a pressing query emerges: Can these NLP systems   |                                   |                     |                             |
| Can Large Language Models Provide Security & Privacy Advice? Measuring   | ['Yufan Chen', 'Arjun Arunasalam', 'Z. Berkay Celik']                                                                                                                       | Users seek security & privacy (S&P) advice from online resources, including      | http://arxiv.org/abs/2310.02431v1 | 2023-10-03 20:54:29 | Cybersecurity               |
|   the Ability of LLMs to Refute Misconceptions                           |                                                                                                                                                                             | trusted websites and content-sharing platforms. These resources help users       |                                   |                     |                             |
| CITING: Large Language Models Create Curriculum for Instruction Tuning   | ['Tao Feng', 'Zifeng Wang', 'Jimeng Sun']                                                                                                                                   | The recent advancement of large language models (LLMs) has been achieved         | http://arxiv.org/abs/2310.02527v1 | 2023-10-04 01:58:34 | Education                   |



以下略

そして訪れる RateLimit による突然の死(　・ω・)

[autogen.oai.completion: 10-09 06:21:13] {237} INFO - retrying in 10 seconds...
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/autogen/oai/completion.py", line 207, in _get_response
    response = openai_completion.create(**config)
  File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/abstract/engine_api_resource.py", line 155, in create
    response, _, api_key = requestor.request(
  File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 299, in request
    resp, got_stream = self._interpret_response(result, stream)
  File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 710, in _interpret_response
    self._interpret_response_line(
  File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 775, in _interpret_response_line
    raise self.handle_error_response(
openai.error.RateLimitError: Rate limit reached for default-gpt-4 in organization org- on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.

あぁ、、、無情、、、

これ以上やるとワタクシのお財布が悲鳴をあげるのでお試しはおしまい(/・ω・)/

あとはドキュメントを適当にまとめたやつをシュッと書いておくぞ☆

ちなみに FAQ に Rate Limit の対策がかかれているのだ(/・ω・)/

Handle Rate Limit Error and Timeout Error

You can set retry_wait_time and max_retry_period to handle rate limit error. And you can set request_timeout to handle timeout error. They can all be specified in llm_config for an agent, which will be used in the create function for LLM inference.

retry_wait_time (int): the time interval to wait (in seconds) before retrying a failed request.
max_retry_period (int): the total timeout (in seconds) allowed for retrying failed requests.
request_timeout (int): the timeout (in seconds) sent with a single request.

Please refer to the documentation for more info.

以下は 2023/10/9 時点のドキュメントをシュッとまとめりんぐ☆

Multi-agent Conversation Framework

AutoGenは、基礎モデルを使用する高レベルの抽象化として、統一されたマルチエージェント会話フレームワークを提供します。自動化されたエージェントチャットを介して、LLM、ツール、人間を統合する、有能でカスタマイズ可能な会話可能なエージェントを特徴としています。複数の有能なエージェント間のチャットを自動化することで、コードによるツールの利用を必要とするタスクを含め、自律的に、あるいは人間からのフィードバックを受けながら、エージェントにタスクを実行させることができます。

このフレームワークは、複雑なLLMワークフローのオーケストレーション、自動化、最適化を簡素化します。LLMモデルのパフォーマンスを最大化し、弱点を克服します。最小限の労力でマルチエージェント会話に基づく次世代LLMアプリケーションを構築することができます。

DeepL 翻訳

✨ ConversableAgent クラス:

いろんなエージェントたちとおしゃべりしながらタスクを完成させるクラスです！🗣️

✨ サブクラスちゃんたち:
1️⃣ AssistantAgent:
- 🤖 AIアシスタントの役目を持っています。
- LLM（GPT-4など）を使って、Pythonのコードを書くことが得意！🐍
- 修正や提案もバッチリ！✍️
- llm_configで設定できるよ！

2️⃣ UserProxyAgent:
- 🙋‍♂️ 人間の代わりになるエージェント！
- 人からの入力を待ったり、コードを走らせたりします。💼
- コード実行を無効にしたいときは、code_execution_configをFalseにするだけ！❌
- LLMのレスポンスは最初はオフ。llm_configでオンにできるよ！🔛

🤝 タスクの完成:

このシステムは人の手を借りながらも、ちゃんと自立してタスクを完成させることができます！🚀

💡 コードの中身:

"assistant"という名前のAssistantAgentさん🤖と、"user_proxy"という名前のUserProxyAgentさん🙋‍♂️を使って、タスクを一緒に解決するよ！

以上、たくさんのエージェントたちが協力してタスクを完成させる素敵なクラスの紹介でした！🎉

シンプルな二人のエージェントのやりとりは以下のように表せますぞ(/・ω・)/

✨ AutoGenのすごさ:
1️⃣ 初期化ステップ後は、完全自律でおしゃべりができる！🤖💬
2️⃣ でも、人間との協力も大切に💁‍♂️: `human_input_mode`をALWAYSに設定することで、人間と一緒に問題を解決するモードも可能👫🔧

AutoGenは、全自動でも、人と協力しても、しっかりタスクをこなせる頼れるフレームワークだよ！🌟🌼

✨ AutoGenの特徴:

プログラミング言語💻 + 自然言語🗣 = すごく動的な会話✨

🤔 動的 vs 静的会話:

静的: 会話の流れがあらかじめ決まっている📜
動的: 入力や会話の流れに合わせて、エージェントのやりとりが変わる🌀

🔥 動的会話の利点:

事前にどんな会話になるかわからない複雑なアプリにピッタリ🎯

✨ AutoGenが提供する動的会話のアプローチ:
1️⃣ 登録された自動返信:
- プラグイン式の自動返信機能🔌
- メッセージの内容や状況に合わせて、他のエージェントとのやり取りを選べる🔄
- 例: グループチャットで、誰が次に話すかをLLMに決めさせる👥💬

2️⃣ LLMベースの関数呼び出し:
- LLMが推論を元に、特定の関数を呼ぶかどうかを決める🤔💡
- LLMが他のエージェントにメッセージを送ることで、動的な会話が進行📩
- 例: 学生アシスタントが問題を自動でエキスパートに相談するシナリオ🎓🔍

AutoGenで、複雑なタスクもしっかりとした会話で解決できるようになるよ！🌼🚀

色々とサンプルがありますぞ(/・ω・)/

Enhanced Inference

autogen.Completionはopenai.Completionとopenai.ChatCompletionを置き換える拡張推論APIです。パフォーマンスチューニング、APIの統一、キャッシュ、エラーハンドリング、マルチコンフィグ推論、結果のフィルタリング、テンプレート化などです。

DeepL 翻訳

✨ テキスト生成のコスト:

基礎モデルを使ってのテキスト生成のコストは、入力と出力のトークン数の合計で測定される🔢

🏗️ アプリケーションビルダーの視点:

目標は、推論の予算制約内で生成テキストの有用性を最大化すること💡 (例: コーディングの問題を解くのに必要な平均ドルコストで測定🔍💰)

🛠️ ハイパーパラメータの最適化:

生成テキストの有用性とコストに大きな影響を与える📊
推論のハイパーパラメータを最適化することで、この目標を達成できる🚀

基礎モデルを使ったテキスト生成で、最適なパフォーマンスとコストのバランスを得るための戦略が必要だよ！🌼🎯

いじれるハイパーパラメータはこんな感じですな(/・ω・)/

model
prompt/messages
max_tokens
temperature
top_p
n
stop
presence_penalty, frequency_penalty
best_of

✨ 検証データ:

多様なインスタンスを集めることが大切📁
これらのインスタンスは、辞書の形のイテラブル（繰り返し可能なもの）に保存できる🔍

📝 例:

各インスタンスは辞書で表される📔
1️⃣ "problem"というキーに、数学の問題の説明を文字列として保存📏🤔
2️⃣ "solution"というキーに、その解答を文字列として保存🖋️✅

検証データをしっかりと整理して、問題と解答をきちんと対応させることで、データの分析や利用が効果的になるよ！🌼📊

✨ 評価関数:

評価関数は、応答のリストと検証データのキーに対応する他のキーワード引数を入力として受け取り、メトリクスの辞書を出力するべき📊

📝 例:

def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
    # レスポンスのリストから1つの回答を選択
    answer = voted_answer(responses)
    # 回答が正しいかチェック
    return {"success": is_equivalent(answer, solution)}

🔍 ユーティリティ:

`autogen.code_utils` や `autogen.math_utils` には、コード生成や数学の問題解決のためのサンプル評価関数が提供されている🔧

評価関数は、生成された応答の品質や正確さを評価するための重要なツールとして活用されるよ！🌼📝

✨ 最適化するメトリクス:

通常、最適化するメトリクスは、すべてのチューニングデータインスタンスにわたる集約されたメトリクスです📊

📝 例:

ユーザーは "success"（成功）というメトリクスと "max"（最大）という最適化モードを指定できる🎯

💡 集約関数:

デフォルトでは、集約関数は平均を取るものとされる🧮
必要に応じて、ユーザーはカスタマイズされた集約関数を提供することができる🔧

最適化するメトリクスの選択と集約方法は、チューニングプロセスの品質と結果を大きく

✨ 探索空間:

ユーザーは、各ハイパーパラメータの検索範囲を指定することができる🔍

🔧 各ハイパーパラメータの詳細:

1️⃣ model:

定数の文字列や、`flaml.tune.choice` で指定された複数の選択肢。

2️⃣ prompt/messages:

プロンプトは文字列か文字列のリスト、メッセージは辞書のリストかリストのリスト。
データインスタンスごとにフォーマットされる。

3️⃣ max_tokens, n, best_of:

定数や、`flaml.tune.randint` などで指定。
デフォルト: `max_tokens` は [50, 1000)、`n` は [1, 100)、`best_of` は 1。

4️⃣ stop:

文字列、文字列のリスト、リストのリスト、またはNone。デフォルトはNone。

5️⃣ temperature or top_p:

定数や、`flaml.tune.uniform` などで指定。
デフォルトでは、`temperature` と `top_p` のいずれかが [0, 1] の範囲で均一に選ばれる。

6️⃣ presence_penalty, frequency_penalty:

定数や、`flaml.tune.uniform` などで指定。
デフォルトではチューニングされない。

探索空間の指定により、ハイパーパラメータのチューニングが効果的に行われ、最適な結果を得ることができるよ！🌼🎯

✨ 予算:

推論予算と最適化予算を指定することができる💰

1️⃣ 推論予算:

データインスタンスごとの平均推論コストを指します🔍

2️⃣ 最適化予算:

チューニングプロセスで許される総予算を指します🔄

📏 計測方法:

両方の予算は、1000トークンあたりの価格に従ってドルで計測される💵

きちんとした予算の設定は、効果的なハイパーパラメータチューニングを行いながらコストを管理するために重要だよ！🌼💹

✨ チューニングの実行:

`autogen.Completion.tune` を使用してチューニングを行うことができる🎛️

📝 例:

import autogen

config, analysis = autogen.Completion.tune(
    data=tune_data,
    metric="success",
    mode="max",
    eval_func=eval_func,
    inference_budget=0.05,
    optimization_budget=3,
    num_samples=-1,
)

🔍 詳細:

`num_samples` はサンプリングする設定の数を指します。-1 は無制限を意味します（最適化予算が使い果たされるまで）。
`config` には最適化された設定が含まれ、`analysis` には試みたすべての設定と結果のための `ExperimentAnalysis` オブジェクトが含まれる。

🌼 結果の活用:

チューニングされた `config` は、推論を実行するために使用できる🚀

チューニングツールを使うことで、最適なハイパーパラメータ設定を見つけ、効果的なテキスト生成を行うことができるよ！🌼🎯

✨ APIの統一:

`autogen.Completion.create` は `openai.Completion.create`、`openai.ChatCompletion.create`、OpenAI API、およびAzure OpenAI APIと互換性があります🔗
このため、"text-davinci-003"、"gpt-3.5-turbo"、"gpt-4" などのモデルが共通のAPIを共有できる✨

🔄 自動変換:

チャットモデルを使用し、`autogen.Completion.create` への入力としてプロンプトが与えられた場合、プロンプトはチャット完了APIの要件に合わせてメッセージに自動変換されます📩

🌟 利点:

統一されたAPIで、同じプロンプトに対してチャットモデルと非チャットモデルの両方を試すことができる🔍

💡 ローカルLLM:

FastChatのようなパッケージを使用してエンドポイントを起動し、同じAPIを使用してリクエストを送信することができる🔌
ローカルLLMでの推論方法の例については、参考リンクを参照📖 (※実際のリンクは提供されていません)

📥 チャットベースのモデル専用:

`autogen.ChatCompletion` が使用できる🗨️
メッセージの代わりにプロンプトが提供された場合、プロンプトからメッセージへの自動変換も行います🔄

APIの統一を通じて、さまざまなモデルやAPIにアクセスする際の複雑さを軽減し、効率的に実験やテストを行うことができるよ！🌼👩‍💻

✨ キャッシング:

APIの呼び出し結果はローカルにキャッシュされ、同じリクエストが発行されたときに再利用されます🔒

🌟 利点:

再現性：実験を繰り返すか続ける際に有用です🔄
コスト節約：以前のリクエストの結果を再利用することで、API呼び出しのコストを節約することができます💰

🎲 制御されたランダムネス:

"seed" の設定、`set_cache` の使用、または `create()` 内での指定を使用して、制御されたランダムネスを許可します🌱

キャッシングを活用することで、効率的に実験を行いつつ、リソースとコストを節約することができるよ！🌼📊

✨ ランタイムエラー:

OpenAI APIを呼び出す際に、接続、レート制限、タイムアウトなどの理由でエラーが発生することがよくあります⚠️
いくつかのエラーは一時的です💨

🔄 自動リトライ:

`autogen.Completion.create`は一時的なエラーに対処し、自動でリトライします🔄
リクエストのタイムアウト、最大再試行期間、再試行の待機時間は、`request_timeout`、`max_retry_period`、`retry_wait_time`を通じて設定可能です⏱️

🌐 レート制限の軽減:

レート制限を緩和するために、異なるモデル/エンドポイントの設定のリストを渡すことができます📋
レート制限がボトルネックとなる開発プロセスを加速するために便利です🚀
最後の選択肢で失敗するとエラーが発生するので、リストの最後の選択肢が最も利用可能であることを確認してください🔒

🔧 ユーティリティ関数:

便宜上、設定リストをロードするためのユーティリティ関数を提供しています、例として `config_list_from_json` などがあります📜

これらの機能を利用することで、APIの呼び出しエラーをスムーズに処理し、開発プロセスを効果的に加速することができるよ！🌼🛠️

✨ ロジックエラー:

もう一つのエラータイプは、返されたレスポンスが要件を満たさない場合です🚫
例えば、レスポンスが有効なJSON文字列であることが求められる場合、有効でないレスポンスをフィルタリングしたいと思うでしょう🔍

🛡️ フィルタ関数:

def valid_json_filter(context, config, response):
    for text in autogen.Completion.extract_text(response):
        try:
            json.loads(text)
            return True
        except ValueError:
            pass
    return False

response = autogen.Completion.create(
    config_list=[{"model": "text-ada-001"}, {"model": "gpt-3.5-turbo"}, {"model": "text-davinci-003"}],
    prompt="How to construct a json request to Bing API to search for 'latest AI news'? Return the JSON request.",
    filter_func=valid_json_filter,
)

設定のリストとフィルタ関数を提供することで、要件を満たさないレスポンスをフィルタリングできます🖐️
上記の例では、`text-ada-001`、`gpt-3.5-turbo`、`text-davinci-003`のモデルを順番に使用し、有効なJSON文字列が返されるか、最後の設定が使用されるまで試行します🔄

🔄 複数回の試行:

最終的なレスポンスのロバスト性を高めるために、リスト内で同じモデルを複数回繰り返して使用して、一つのモデルを複数回試すこともできます💪

🌟 高度な使用ケース:

GPT-4のコーディング性能を68%から90%に向上させ、推論コストを削減する方法については、このブログ投稿をチェックしてください📊🚀

ロジックエラーに対処するためのこれらのツールを使用することで、APIの呼び出しを更に強化し、信頼性を向上させることができます！🌼🔧

✨ テンプレーティング:

与えられたコンテキストで提供されたプロンプトやメッセージがテンプレートの場合、自動的にマテリアル化されます💬

🎈 例1:

`{problem}` を具体的な問題文に置き換えてプロンプトを作成📝

response = autogen.Completion.create(
    context={"problem": "..."},
    prompt="{problem} Solve the problem carefully.",
    allow_format_str_template=True,
    **config
)

📔 例2:

テンプレートは、上記の例のようなフォーマット文字列または、入力フィールドから文字列を生成する関数のいずれかです🔧

def content(turn, context):
    return "\n".join([context[f"user_message_{turn}"], context[f"external_info_{turn}"]])

`messages` リストは、ユーザーやアシスタントの会話を追跡するために使用されます🗨️
`context` は各ターンのメッセージや外部情報を保存するために使用されます📦

🗨️ 会話ベースのAPIを使用して、問題の説明や解答を効果的に生成し、疑問を解決するためにこれらのテンプレートを活用することができます🙌

色々とサンプルがありますぞ(/・ω・)/

はい、というわけで今回はちょっと長いけどシュッとやったのでおしまい。

この記事が気に入ったらサポートをしてみませんか？

シュッと AutoGen と戯れる

主な機能

ゆーすけーす

Installation

論文について分析してもらうぞ☆

Let's 円卓会議☆

Handle Rate Limit Error and Timeout Error​

Multi-agent Conversation Framework

Enhanced Inference

Handle Rate Limit Error and Timeout Error