見出し画像

超簡単Pythonで株価予測(Optuna・LightGBM 利用)ハイパーパラメータ自動最適化

PythonでOptunaを利用して超簡単に翌日の株価の上下予測のハイパーパラメータを自動最適化

自動最適化元は下記の過去投稿をどうぞ

1. ツールインストール

$ pip install scikit-learn lightgbm pandas-datareader optuna

2. ファイル作成

pred.py

import pandas_datareader as pdr
from sklearn.model_selection import train_test_split
import lightgbm as lgb
from sklearn.metrics import accuracy_score
import numpy as np
import optuna

def objective(trial):
 X_train, X_test, y_train, y_test = train_test_split(
   X,
   y,
   test_size=0.2,
   shuffle=False,
 )
 dtrain = lgb.Dataset(X_train, label=y_train)
 param = {
   "objective": "binary",
   "metric": "binary_logloss",
   "verbosity": -1,
   "boosting_type": "gbdt",
   "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
   "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
   "num_leaves": trial.suggest_int("num_leaves", 2, 256),
   "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
   "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
   "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
   "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
 }

 gbm = lgb.train(param, dtrain)
 preds = gbm.predict(X_test)
 pred_labels = np.rint(preds)
 accuracy = accuracy_score(y_test, pred_labels)
 return accuracy

df = pdr.get_data_yahoo("AAPL", "2010-11-01", "2020-11-01")
df["Diff"] = df.Close.diff()
df["SMA_2"] = df.Close.rolling(2).mean()
df["Force_Index"] = df["Close"] * df["Volume"]
df["y"] = df["Diff"].apply(lambda x: 1 if x > 0 else 0).shift(-1)
df = df.drop(
 ["Open", "High", "Low", "Close", "Volume", "Diff", "Adj Close"],
 axis=1,
).dropna()
# print(df)
X = df.drop(["y"], axis=1).values
y = df["y"].values
X_train, X_test, y_train, y_test = train_test_split(
 X,
 y,
 test_size=0.2,
 shuffle=False,
)
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
clf = lgb.LGBMRegressor(**dict(trial.params.items()))
clf.fit(
 X_train,
 y_train,
)
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred > 0.5))

3. 実行

$ python pred.py

0.5515873015873016

以上、超簡単!

4. 結果

自動最適化前 0.5456349206349206

自動最適化後 0.5515873015873016

約1%改善しました

5. 参考


いいなと思ったら応援しよう!