ChatGPT APIを利用してURLで指定したページの紹介文を出力するコマンドをつくる

mah_lab / 西見公宏

2023年3月16日 18:00

最近せっかく毎日noteで記事を投稿しているので、昔更新していたfacebookページでも記事更新のお知らせを流そうかと思いました。

とはいえnoteのシェアボタンみたいにタイトルとURLだけが表示されているというのもビミョーですし、毎回紹介文を書くのは続く気がしません。

そうだ！ChatGPTに書いてもらおう！

コードを書く

とはいえChatGPTにはURLから情報を取得する機能はないので、良い具合にURLから情報を取得するコードを書く必要があります。

ChatGPTに渡せるトークン数には限りがあるので、コード記述部分やアウトプット部分はできるだけ削りたいです。

そんな要件も踏まえ、ChatGPTになんか良いライブラリない？と聞いたらBeautifulSoupを提案してもらえたので、早速pip installします。

pip install requests beautifulsoup4

で、こんな関数を書いてみました。

def extract_main_text_from_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error occurred: {e}", file=sys.stderr)
        sys.exit(1)

    soup = BeautifulSoup(response.content, "html.parser")

    # ページタイトル、ソースコードに該当する部分を省く
    for code_tag in soup(['title', 'pre', 'code']):
        code_tag.extract()

    # テキストデータに変換
    text = soup.get_text(separator=" ", strip=True)

    # ソースコードのような部分を除去
    text = re.sub(r'(```.*?```)|(`.*?`)|(\{.*?\})', '', text, flags=re.DOTALL)

    # 文章の中心部分のみを抽出
    paragraphs = [p.strip() for p in text.split("\n") if p.strip()]
    main_text = "\n".join(paragraphs)

    return main_text

文章要約のために必要そうな部分だけ切り取って返す関数です。コメントはChatGPTが入れてくれました。

上記の関数を利用して書いたスクリプトは以下の通りです。

import re
import requests
import argparse
import sys
from langchain.llms import OpenAIChat
from bs4 import BeautifulSoup

def extract_main_text_from_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error occurred: {e}", file=sys.stderr)
        sys.exit(1)

    soup = BeautifulSoup(response.content, "html.parser")

    # ページタイトル、ソースコードに該当する部分を省く
    for code_tag in soup(['title', 'pre', 'code']):
        code_tag.extract()

    # テキストデータに変換
    text = soup.get_text(separator=" ", strip=True)

    # ソースコードのような部分を除去
    text = re.sub(r'(```.*?```)|(`.*?`)|(\{.*?\})', '', text, flags=re.DOTALL)

    # 文章の中心部分のみを抽出
    paragraphs = [p.strip() for p in text.split("\n") if p.strip()]
    main_text = "\n".join(paragraphs)

    return main_text

# LLMのセットアップ
system_prompt = """
You are an excellent social media marketer.
From the information in the article, you are responsible for presenting it in an engaging way.
"""
prefix_messages = [{ "role": "system", "content": system_prompt}]
llm = OpenAIChat(temperature=0.5, prefix_messages=prefix_messages)

# argparse のコード
parser = argparse.ArgumentParser(prog="urlsm.py",
                                 description="Summarize article from URL")
parser.add_argument("url", help="URL to extract text from")
args = parser.parse_args()

# プロンプトを作成
url = args.url
user_prompt = f"""
Summarize the following text so that it can be included as an introduction to the article on the Facebook page.

# conditions:
- The text should be in Japanese.
- Add appropriate line breaks.
- Make it easy for elementary school students to understand.
- Don't include a sentence that says "who is introducing this".

# text:
{extract_main_text_from_url(url)}
"""

result = llm(user_prompt)
print(result)

トークン数稼ぎのためにプロンプトは英語で指定しています。
以下は日本語での表現例です。

プリフィクスメッセージ
- あなたは優秀なソーシャルメディアマーケターです
- 記事の情報から、それを魅力的に表現するのがあなたの仕事です
ユーザーメッセージ
- 以下の文章を、Facebookページの記事の紹介文として掲載できるように要約してください
- 条件：
  - 日本語で書いて
  - 適切な改行を入れること
  - 小学生が理解しやすいように
  - 「〇〇さんの記事です」みたいな文は入れないで

実行結果

この記事を要約してもらいました。

「LangChainとChatGPTで文章のファクトチェックを行う」という記事では、文章の真偽を調べる方法について紹介されています。LangChainを使って、質問に対する答えを生成し、仮定が正しいかどうかを判断することで、ファクトチェックを行っています。また、実際にコードを使ってファクトチェックを行う手順も紹介されています。気になる方はぜひ読んでみてください！

ChatGPTによる要約

これでシェアが捗るぜ！

この記事が気に入ったらサポートをしてみませんか？