MLflowをやってみた

2020年8月22日 18:46

事前知識

MLflow(https://mlflow.org/)
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components:
MLflowは、実験、再現性、デプロイメント、中央モデルレジストリなど、MLのライフサイクルを管理するためのオープンソースのプラットフォームです。MLflow は現在、以下の 4 つのコンポーネントを提供しています。

MLflow Tracking
Record and query experiments: code, data, config, and results.
実験の記録とクエリ：コード、データ、設定、結果。

MLflow Projects
Package data science code in a format to reproduce runs on any platform.
データサイエンスのコードを、どのプラットフォームでも実行できるように再現するための形式でパッケージ化します。

MLflow Models
Deploy machine learning models in diverse serving environments.
多様なサービス環境で機械学習モデルを展開します。

Model Registry
Store, annotate, discover, and manage models in a central repository.
モデルを中央のリポジトリに保存、アノテーション、発見、管理します。

自分目線のメモ

- 目下の目標として、Trackingを使おう。
- HTTPサーバーが立って、そこで過去の学習パラメータやら結果が見れるらしい。
- クライアントとサーバーは別のPCでやれた方がいいな。チームで開発するようなときをイメージして。
- 環境は、サーバー、クライアント共にWindows。

参考にしたURL
- https://future-architect.github.io/articles/20200626/
- https://qiita.com/fam_taro/items/155912068ff475a53e44

環境構築

anaconda環境構築
202008_mlflowを作成。
pythonのバージョンはなんとなく3.8にしてみた。

conda create -n 202008_mlflow python=3.8

できた。（普通
mlflowをpipインストール。

pip install mlflow
～～略～～
Could not install packages due to an EnvironmentError: [WinError 5] アクセスが拒否されました。: 'c:\\programdata\\anaconda3\\lib\\site-packages\\__pycache__\\pythoncom.cpython-37.pyc'
Consider using the `--user` option or check the permissions.

うーんわからん。
なぜ3.7をいじろうとしているのか。
消す。

conda remove -n 202008_mlflow --all

anaconda環境構築（リベンジ）
今度はpython=3.7で。

conda create -n 202008_mlflow python=3.7
conda activate 202008_mlflow
pip install mlflow

実はまた同じエラー出たけど、activateするの忘れてたからactivateしてみたら入った。さっきもactivateするの忘れてたからか。だからといってbaseに入らない理由はわからないけど、たぶんそうだ。3.8は知らん。

MLflow環境構築

HTTPサーバーが立ったら、管理ファイルとか結果を足元に保存するみたいなので、適当なフォルダへ移動してから立ち上げる。

cd D:\60_storage\MLFlow\server
mkdir mlruns
mlflow server --backend-store-uri ./mlruns --host 0.0.0.0 --port 8886

起動した。ctl+Cで止まる。
http://localhost:8886/ でアクセスできた。ひとまずサーバーの環境はおしまい。

クライアント（学習するソース）側

機械学習を実施するソースで、mlflowの関数を呼び出したり、保存先を指定したりするらしい。

機械学習の準備
sklearnのirisを使おう。３値分類。パラメータチューニングの記録をいじりたい例として、NNでやってみる。
serverと同じPCではやるけど、テストの為、あえて別のフォルダで別のanaconda環境でやってみる。

環境づくり
あえて意地悪してpython3.8にしてみる。

conda create -n 202008_mlflow_cli python=3.8
conda activate 202008_mlflow_cli
pip install mlflow sklearn

mlflowやっぱ入った。
環境変数で、サーバーを指し示す必要があるらしい。

set MLFLOW_TRACKING_URI=http://localhost:8886

毎回setするのは絶対忘れるので、ちゃんと使う時はなんとかしてうまくやる。

元のソース
irisはデフォルトでも精度高く出ちゃうので、max_iterを低くしてあえて途中で終わるようにしてある。

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

# データ整備
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# パラメータチューニング
params = {
	"solver": "sgd",
	"hidden_layer_sizes": (100,),
	"alpha": 0.0001,
	"beta_1": 0.9,
	"beta_2": 0.999,
	"random_state": 0,
	"max_iter": 200
}

# 学習と予測
clf = MLPClassifier(**params)
clf.fit(X_train, y_train)

# 結果
acc_score = clf.score(X_test, y_test)
print(acc_score)

MLflow的なソース

わかりにくいけど、「 #### 追加 ####」って書いたところを追加。

# MLflow準備       #### 追加 ####
import mlflow
mlflow.start_run()

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

# データ整備
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# パラメータチューニング
params = {
	"solver": "sgd",
	"hidden_layer_sizes": (100,),
	"alpha": 0.0001,
	"beta_1": 0.9,
	"beta_2": 0.999,
	"random_state": 0,
	"max_iter": 200
}

# MLflowでパラメータ出力       #### 追加 ####
for k,v in params.items():
   mlflow.log_param(k,v)

# 学習と予測
clf = MLPClassifier(**params)
clf.fit(X_train, y_train)

# 結果
acc_score = clf.score(X_test, y_test)
print(acc_score)

# MLflowで結果出力       #### 追加 ####
mlflow.log_metric('acc', acc_score)
# モデルを保存
mlflow.sklearn.log_model(clf, 'model')

・何もしない時より結構時間かかる。たぶん、サーバーへいろいろ送るから。でもこのソースでもでっかい学習でも＋５秒とかかもしれない。
・サーバーの方で結果が見えた！
・画像も送りたくなる！（今は作るのメンドクサイからいいや)　`mlflow.log_artifact()` ←備忘録
・クライアントにもサーバーにも、各々のmlruns フォルダ以下に何やら保存されている。

微調整、その他
・サーバーの画面でExperiments（実験）のDefaultに全部入るので、分けたくなる。Defaultのページの上の方に、Experiment IDが書いてあるので、それを`mlflow.start_run(experiment_id=0)`とかって渡せばいい。
または、`mlflow.set_experiment(なまえ)`の後で`mlflow.start_run()`でもいいらしい。

・Experimentsの名前は、日本語でも大丈夫だった。指定するのはIDだし。

・Experimentsを作るときに設定するArtifact Locationは、パラメータとかの保存場所なのだけど変えるとややこしくなるから変えない方がよさげ。

現場からは以上です！

この記事が気に入ったらサポートをしてみませんか？