ローカルLLMをCore MLモデルに変換する - huggingface/exporters の使い方

2024年3月3日 16:22

exporters とは

Hugging Faceが公開している exporters というツールがある。

TransformersのモデルをCore MLにエクスポートするためのツールで、coremltoolsをラップしたものではあるが、変換に伴う色々な問題をツール側で吸収してくれている。要はこのツールを使えばcoremltoolsをそのまま使うよりも簡単にTransformersモデルをCore MLモデルに変換できる。

モチベーション

exporters は以下の公式ブログで紹介されていたもので、

Llama 2などのLLMモデルもこれを用いて変換されたようだ（あるいは、その変換の過程で得られた知見がこのツールに盛り込まれている）

公開されている変換済みモデルは、試してみたもののまだモバイル端末には大きすぎたので、

もっと小さいモデルを自前で変換したらよいかもしれない、と思い exporters ツールを使ってみることにした。

なお、同記事で紹介されている "transformers-to-coreml conversion Space" は、2024年3月3日現在 500 エラーがでていて利用できない（試せないので詳細は不明だが、exportersをWebサービス化してGUIから操作できるようにしたものと思われる）。

インストール

git clone してきて、

$ cd exporters
$ pip install -e .

だけでOK。（venvで仮想環境は作った）

とりあえず変換

READMEに載っている

python -m exporters.coreml --model=distilbert-base-uncased exported/

を実行してみた。

コマンドの意味としては、

exporters.coreml パッケージを実行
- 同パッケージを使うと、モデルのチェックポイントをCore MLモデルに変換することができる

`--model` 引数に変換元モデルのチェックポイントを指定
- Hugging Face Hub 上のチェックポイントでも、ローカルに保存されているチェックポイントでもOK
`exported` ディレクトリに変換後の Core ML ファイルが保存される
- デフォルトのファイル名は `Model.mlpackage`

ここまで進んだが、

Torch version 2.2.1 has not been tested with coremltools. You may run into unexpected errors. Torch 2.1.0 is the most recent version that has been tested.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 53.4kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 483/483 [00:00<00:00, 1.17MB/s]
vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 662kB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.50MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 268M/268M [00:40<00:00, 6.68MB/s]
Using framework PyTorch: 2.2.1
（略）
Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[x] values not close enough (atol: 0.0001)

こういうエラーが出た：

Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
（略）
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.004475116729736328

生成まではうまくいったが、バリデーションで失敗している。

READMEによると本来はこういうログが出るらしい

Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/Model.mlpackage

しかし、READMEの別パートには次のように書かれており、この最後のバリデーションでの失敗は許容範囲のようだ。

If validation fails with an error such as the following, it doesn't necessarily mean the model is broken:（以下のようなエラーで検証が失敗しても、必ずしもモデルが壊れているとは限りません）

> ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.12345

The comparison is done using an absolute difference value, which in this example is 0.12345. That is much larger than the default tolerance value of 1e-4, hence the reported error. However, the magnitude of the activations also matters. For a model whose activations are on the order of 1e+3, a maximum absolute difference of 0.12345 would usually be acceptable.
（比較は差の絶対値で行われ、この例では0.12345です。これはデフォルトの許容値である1e-4よりはるかに大きく、そのためエラーが報告されています。しかし、アクティブ度の大きさも重要です。アクティブ度が1e+3程度のモデルであれば、最大差の絶対値が0.12345であれば通常は許容範囲でしょう。）

If validation fails with this error and you're not entirely sure if this is a true problem, call `mlmodel.predict()` on a dummy input tensor and look at the largest absolute magnitude in the output tensor.
（もしこのエラーで検証が失敗し、これが本当に問題なのか全くわからない場合は、ダミーの入力テンソルで `mlmodel.predict()` を呼び出し、出力テンソルの最大の絶対値の大きさを調べてください。）

いったん成功とする。

LLMを変換してみる

ここから先は

8,984字 / 7画像

文章やサンプルコードは多少荒削りかもしれませんが、ブログや書籍にはまだ書いていないことを日々大量に載せています。たったの400円で、すぐに購読解除してもその月は過去記事もさかのぼって読めるので、少しでも気になる内容がある方にはオトクかと思います。

日々の学びメモ

¥400 / 月

技術的なメモやサンプルコード、思いついたアイデア、考えたこと、お金の話等々、頭をよぎった諸々を気軽に垂れ流しています。

ログイン

最後まで読んでいただきありがとうございます！もし参考になる部分があれば、スキを押していただけると励みになります。 Twitterもフォローしていただけたら嬉しいです。 https://twitter.com/shu223/