自然言語処理を使い分析　経済ニュース活用　-準備編-

2024年3月4日 14:40

NewsAPIを使ってみる。

目的

経済ニュースをもとに今後の株価の変動を予測すること。また、大枠のトレンドがわかれば、購入株をある程度トレンドに合わせておけば資産が増えると考えています。

方法

そのために、pythonを使いニュースを取得することから始まるため、NewsAPIを使って情報を取得することにします。当方プログラミング知識がないため、chatGPTを大いに活用していきます。
その後、機械学習を使用。使用する手法としては自然言語処理にてキーワードを抽出し、感情スコアを算出。特に高いスコアと低いスコアの記事に関連する企業を探し、その後の経過を確認してみる。
そのため、過去の記事から取得する必要があるが、まずは試験的に以下のコードを作成してみました。

テストコード

import nltk
import requests
from textblob import TextBlob
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
from nltk.corpus import stopwords
import string

# NLTKの初期設定とストップワードのダウンロード
nltk.download('punkt')
nltk.download('stopwords')

# 英語のストップワードリストを取得し、句読点を追加
stop_words = set(stopwords.words('english'))
stop_words.update(['.', ',', ':', ';', '(', ')', '[', ']', '{', '}', '``', "''", '`'])

# カスタムストップワードの追加
custom_stop_words = ['a', 'the', 'to']
stop_words.update(custom_stop_words)

def fetch_news(api_key, country, language='en'):
    url = "https://newsapi.org/v2/top-headlines"
    params = {
        'apiKey': api_key,
        'category': 'business',
        'country': country,
        'language': language,
        'pageSize': 100,
    }
    response = requests.get(url, params=params)
    if response.status_code == 200:
        return response.json()['articles']
    else:
        print("ニュースの取得中にエラーが発生しました。")
        return []

def analyze_and_display_news(articles):
    for article in articles:
        print(f"タイトル: {article['title']}")
        description = article['description'] if article['description'] else ''
        
        # キーワードの抽出、ストップワードの除去
        tokens = [word for word in word_tokenize(description.lower()) if word not in stop_words and word.isalpha()]
        fdist = FreqDist(tokens)
        print("キーワード（最頻出単語）:")
        for word, frequency in fdist.most_common(5):
            print(f"{word}: {frequency}")
        
        # 感情分析
        blob = TextBlob(description)
        sentiment = blob.sentiment.polarity
        print("感情スコア:", sentiment)
        if sentiment > 0:
            print("感情: 良い")
        elif sentiment < 0:
            print("感情: 悪い")
        else:
            print("感情: 中立")
        
        print("-" * 50)

# APIキーと国コードの設定（APIキーは適切に保護してください）
api_key = "Your NewsAPI key"
country_codes = ['us', 'jp']

for country_code in country_codes:
    print(f"--- {country_code.upper()}の経済ニュース ---")
    articles = fetch_news(api_key, country_code)
    analyze_and_display_news(articles)

実行結果（最初の3つの記事）

--- USの経済ニュース ---
タイトル: Souplantation-style restaurant opens in Southern California - KTLA Los Angeles
キーワード（最頻出単語）:
restaurant: 2
new: 1
southern: 1
california: 1
aims: 1
感情スコア: 0.03939393939393939
感情: 良い
--------------------------------------------------
タイトル: Boeing, Alaska Airlines hit with $1B lawsuit filed by 3 Flight 1282 passengers - Fox Business
キーワード（最頻出単語）:
alaska: 2
airlines: 2
three: 1
passengers: 1
max: 1
感情スコア: 0.0
感情: 中立
--------------------------------------------------
タイトル: With Flovent inhaler off the market, some parents face challenges in getting generics for kids with asthma - Fox Business
キーワード（最頻出単語）:
flovent: 1
one: 1
popular: 1
inhalers: 1
treating: 1
感情スコア: 0.3666666666666667
感情: 良い
--------------------------------------------------

結果の考察と課題。

英語の記事しかこのコードでは取得できないので、日本の記事の取得も考える。ただし、感情スコアについては現在日本語を用いた場合にうまく動作ができないことが確認できているため、日本語の記事については翻訳を活用した上で、機械学習する手法にするか、取得した記事を英語に翻訳するプログラムを入れる必要がありそう。
なお、記事自体は取得できているが、現在NewsAPIについて無料会員のため、記事の取得数に限度があり、取得する記事の効率化も求められると感じています。

次回、以下の要件を入れて実際に分析をしてみる内容

・数年前の記事を取得し、感情スコアの高い記事に出ている会社と低い記事に出ている会社をピックアップする。もしくはキーワードより近しい会社を選定。
・今日までの株価の動きを確認してみる。

この記事が気に入ったらサポートをしてみませんか？