TFLite Model Benchmark Toolの使い方

2019年10月10日 20:16

1. TFLite Model Benchmark Tool

「TFLite Model Benchmark Tool」は、デスクトップマシンとAndroidの両方で、TFLiteモデルと個々のオペレータをベンチマークするためのシンプルなC++バイナリです。iOS向けには「iOS benchmark app」が提供されています。

バイナリはTFLiteモデルを受け取り、ランダムな入力を生成して、指定された実行回数でモデルを繰り返し実行した後、処理速度の統計が報告されます。

2. TFLite Model Benchmark Toolのインストール

今回はMacにインストールします。

(1)Bazelのインストール
「Bazel」はGoogleが開発したビルドツールで、「TFLite Model Benchmark Tool」をインストールするのに必要です。

「brew」でインストールします。

$ brew tap bazelbuild/tap
$ brew tap-pin bazelbuild/tap
$ brew install bazelbuild/tap/bazel

(2)TensorFlowリポジトリの取得
gitでTensorFlowリポジトリを取得します。

$ git clone https://github.com/tensorflow/tensorflow.git

(3)Benchmark Toolのビルド
tensorflowフォルダに移動し、以下のコマンドでビルドします。

$ cd tensorflow
＄ bazel build -c opt tensorflow/lite/tools/benchmark:benchmark_model

3. TFLiteモデルの準備

ベンチマークの測定の練習するため、以下から精度の高そうな「mobilenet_v1_1.0_224.tflite」と速度の速そうな「mobilenet_v1_0.25_128.tflite」をダウンロードします。

・models/mobilenet_v1.md at master · tensorflow/models · GitHub

4. TFLiteモデルのベンチマーク

「mobilenet_v1_1.0_224.tflite」のベンチマークを測定するコマンドは次の通りです。

$ bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
    --graph=mobilenet_v1_1.0_224.tflite \
    --num_threads=4

Initialized session in 8.997ms
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=25 first=102628 curr=17619 min=15958 max=102628 avg=20268.9 std=16827

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=59 first=16414 curr=16950 min=15950 max=21357 avg=17098.9 std=789

「mobilenet_v1_0.25_128.tflite」のベンチマークを測定するコマンドは次の通りです。

$ bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
    --graph=mobilenet_v1_0.25_128.tflite \
    --num_threads=4

Initialized session in 0.543ms
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=478 first=4843 curr=888 min=841 max=4843 avg=1034.9 std=249

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=1047 first=940 curr=895 min=848 max=1987 avg=944.801 std=120

「mobilenet_v1_0.25_128.tflite」の方が20倍早いことがわかります。

5. 個々のオペレータのベンチマーク

個々のオペレータのベンチマークを測定するには、引数に「」--enable_op_profiling=trueを追加します。

$ bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
    --graph=mobilenet_v1_0.25_128.tflite \
    --num_threads=4 \
    --enable_op_profiling=true

Initialized session in 0.561ms
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=488 first=4747 curr=1056 min=860 max=4747 avg=1014.85 std=245

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=946 first=1024 curr=1152 min=876 max=1760 avg=997.406 std=137

Average inference timings in us: Warmup: 1014.85, Init: 561, no stats: 997.406
============================== Run Order ==============================
	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	                 CONV_2D	            0.000	    0.123	    0.132	 13.289%	 13.289%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_0/Relu6]
	       DEPTHWISE_CONV_2D	            0.132	    0.034	    0.040	  4.009%	 17.298%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu6]
	                 CONV_2D	            0.172	    0.073	    0.074	  7.473%	 24.771%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_1_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.247	    0.025	    0.026	  2.604%	 27.375%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_2_depthwise/Relu6]
	                 CONV_2D	            0.273	    0.052	    0.052	  5.215%	 32.590%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_2_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.325	    0.045	    0.043	  4.347%	 36.937%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6]
	                 CONV_2D	            0.368	    0.085	    0.055	  5.566%	 42.504%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.424	    0.013	    0.015	  1.477%	 43.981%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_4_depthwise/Relu6]
	                 CONV_2D	            0.438	    0.043	    0.044	  4.445%	 48.426%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_4_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.483	    0.018	    0.019	  1.884%	 50.310%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_5_depthwise/Relu6]
	                 CONV_2D	            0.501	    0.041	    0.048	  4.804%	 55.113%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_5_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.549	    0.007	    0.007	  0.748%	 55.862%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_6_depthwise/Relu6]
	                 CONV_2D	            0.557	    0.039	    0.042	  4.200%	 60.062%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_6_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.599	    0.008	    0.009	  0.887%	 60.949%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_7_depthwise/Relu6]
	                 CONV_2D	            0.608	    0.037	    0.044	  4.460%	 65.409%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_7_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.652	    0.009	    0.010	  1.040%	 66.450%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_8_depthwise/Relu6]
	                 CONV_2D	            0.662	    0.060	    0.043	  4.307%	 70.757%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_8_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.705	    0.009	    0.011	  1.061%	 71.818%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_9_depthwise/Relu6]
	                 CONV_2D	            0.716	    0.053	    0.043	  4.284%	 76.102%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_9_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.759	    0.009	    0.010	  1.027%	 77.128%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_10_depthwise/Relu6]
	                 CONV_2D	            0.769	    0.058	    0.043	  4.294%	 81.423%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_10_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.812	    0.026	    0.010	  1.031%	 82.454%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_11_depthwise/Relu6]
	                 CONV_2D	            0.822	    0.038	    0.043	  4.295%	 86.749%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.865	    0.003	    0.005	  0.453%	 87.202%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_12_depthwise/Relu6]
	                 CONV_2D	            0.869	    0.041	    0.045	  4.485%	 91.687%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_12_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.914	    0.004	    0.004	  0.445%	 92.132%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_13_depthwise/Relu6]
	                 CONV_2D	            0.919	    0.038	    0.044	  4.460%	 96.592%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_13_pointwise/Relu6]
	         AVERAGE_POOL_2D	            0.963	    0.001	    0.002	  0.176%	 96.768%	     0.000	        1	[MobilenetV1/Logits/AvgPool_1a/AvgPool]
	                 CONV_2D	            0.965	    0.027	    0.029	  2.917%	 99.684%	     0.000	        1	[MobilenetV1/Logits/Conv2d_1c_1x1/BiasAdd]
	                 RESHAPE	            0.994	    0.000	    0.000	  0.029%	 99.713%	     0.000	        1	[MobilenetV1/Logits/SpatialSqueeze]
	                 SOFTMAX	            0.994	    0.003	    0.003	  0.287%	100.000%	     0.000	        1	[MobilenetV1/Predictions/Reshape_1]

============================== Top by Computation Time ==============================
	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	                 CONV_2D	            0.000	    0.123	    0.132	 13.289%	 13.289%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_0/Relu6]
	                 CONV_2D	            0.172	    0.073	    0.074	  7.473%	 20.762%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_1_pointwise/Relu6]
	                 CONV_2D	            0.368	    0.085	    0.055	  5.566%	 26.329%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Relu6]
	                 CONV_2D	            0.273	    0.052	    0.052	  5.215%	 31.544%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_2_pointwise/Relu6]
	                 CONV_2D	            0.501	    0.041	    0.048	  4.804%	 36.347%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_5_pointwise/Relu6]
	                 CONV_2D	            0.869	    0.041	    0.045	  4.485%	 40.832%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_12_pointwise/Relu6]
	                 CONV_2D	            0.608	    0.037	    0.044	  4.460%	 45.293%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_7_pointwise/Relu6]
	                 CONV_2D	            0.919	    0.038	    0.044	  4.460%	 49.753%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_13_pointwise/Relu6]
	                 CONV_2D	            0.438	    0.043	    0.044	  4.445%	 54.197%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_4_pointwise/Relu6]
	       DEPTHWISE_CONV_2D	            0.325	    0.045	    0.043	  4.347%	 58.545%	     0.000	        1	[MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6]

Number of nodes executed: 31
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	                 CONV_2D	       15	     0.773	    79.039%	    79.039%	     0.000	       15
	       DEPTHWISE_CONV_2D	       13	     0.202	    20.654%	    99.693%	     0.000	       13
	                 SOFTMAX	        1	     0.002	     0.204%	    99.898%	     0.000	        1
	         AVERAGE_POOL_2D	        1	     0.001	     0.102%	   100.000%	     0.000	        1
	                 RESHAPE	        1	     0.000	     0.000%	   100.000%	     0.000	        1

Timings (microseconds): count=946 first=1022 curr=1152 min=873 max=1757 avg=995.674 std=137
Memory (bytes): count=0
31 nodes observed

6. パラメータ

必須パラメータは次の通りです。

◎graph: string
TFLiteモデルファイルへのパス。

オプションパラメータは次の通りです。

◎num_threads: int (default=1)
スレッドの数。

◎warmup_runs: int (default=1)
ベンチマーク開始前に行うウォームアップの実行回数。

◎num_runs: int (default=50)
実行回数。

◎run_delay: float (default=-1.0)
後続のベンチマーク実行間の秒単位の遅延。
正でない値は、遅延を使用しないことを意味する。

◎use_nnapi: bool (default=false)
Android NNAPIを使用するかどうか。
一部のAndroid Pデバイスでは、モデルのNNAPIを使用できない。

◎nnapi_accelerator_name: str (default="")
使用するNNAPIアクセラレータの名前（Android Q +が必要）。
空白で自動選択。

◎nnapi_execution_preference: string (default="")
NNAPI実行優先順位。
fast_single_answer/sustained_speed/low_power/undefined

◎use_legacy_nnapi: bool (default=false)
レガシーAndroid NNAPI TFLiteパスを使用するかどうか。

◎use_gpu: bool (default=false)
GPUアクセラレータデリゲートを使用するかどうか。
Androidデバイスでのみ使用可能。

◎enable_op_profiling: bool (default=false)
オペレータごとのプロファイリング測定を有効にするかどうか。

この記事が気に入ったらサポートをしてみませんか？