事前学習済みの TensorFlow.js モデルのまとめ

npaka

2020年6月9日 06:59

事前学習済みの TensorFlow.js モデルをまとめました。

1. MobileNet

ImageNetのラベルで画像分類を行う。

2. PoseNet

人間の姿勢推定を行う。

◎ キーポイント

0: nose
1: leftEye
2: rightEye
3: leftEar
4: rightEar
5: leftShoulder
6: rightShoulder
7: leftElbow
8: rightElbow
9: leftWrist
10: rightWrist
11: leftHip
12: rightHip
13: leftKnee
14: rightKnee
15: leftAnkle
16: rightAnkle

◎ 単一姿勢推定

{
  "score": 0.32371445304906,
  "keypoints": [
    {
      "position": {
        "y": 76.291801452637,
        "x": 253.36747741699
      },
      "part": "nose",
      "score": 0.99539834260941
    },
    {
      "position": {
        "y": 71.10383605957,
        "x": 253.54365539551
      },
      "part": "leftEye",
      "score": 0.98781454563141
    },
    :
}

◎ 複数姿勢推定

[
  // ポーズ1
  {
    "score": 0.42985695206067,
    "keypoints": [
      {
        "position": {
          "x": 126.09371757507,
          "y": 97.861720561981
        },
        "part": "nose",
        "score": 0.99710708856583
      },
      {
        "position": {
          "x": 132.53466176987,
          "y": 86.429876804352
        },
        "part": "leftEye",
        "score": 0.99919074773788
      },
    :
  },
  // ポーズ2
  {
    "score": 0.42985695206067,
    "keypoints": [
      {
        "position": {
          "x": 126.09371757507,
          "y": 97.861720561981
        },
        "part": "nose",
        "score": 0.99710708856583
      },
      {
        "position": {
          "x": 132.53466176987,
          "y": 86.429876804352
        },
        "part": "leftEye",
        "score": 0.99919074773788
      },
    :
  }
]

3. Coco SSD

画像から物体検出を行う。

[{
  bbox: [x, y, width, height],
  class: "person",
  score: 0.8380282521247864
}, {
  bbox: [x, y, width, height],
  class: "kite",
  score: 0.74644153267145157
}]

4. BodyPix

人の体のパーツのセグメンテーションを行う。

◎ 人のセグメンテーション

{
  width: 640,
  height: 480,
  data: Uint8Array(307200) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, …],
  allPoses: [{"score": 0.4, "keypoints": […]}, …]
}

◎ 人の体パーツのセグメンテーション

{
  width: 680,
  height: 480,
  data: Int32Array(307200) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 1, 0, 0, …],
  allPoses: [{"score": 0.4, "keypoints": […]}, …]
}

◎ 体のパーツ

0: left_face
1: right_face
2: left_upper_arm_front
3: left_upper_arm_back
4: right_upper_arm_front
5: right_upper_arm_back
6: left_lower_arm_front
7: left_lower_arm_back
8: right_lower_arm_front
9: right_lower_arm_back
10: left_hand
11: right_hand
12: torso_front
13: torso_back
14: left_upper_leg_front
15: left_upper_leg_back
16: right_upper_leg_front
17: right_upper_leg_back
18: left_lower_leg_front
19: left_lower_leg_back
20: right_lower_leg_front
21: right_lower_leg_back
22: left_foot
23: right_foot

5. DeepLab v3

セマンティックセグメンテーションを行う。

6. Blazeface

顔検出を行う軽量モデル。

[
  {
  topLeft: [232.28, 145.26],
  bottomRight: [449.75, 308.36],
  probability: [0.998],
  landmarks: [
     [295.13, 177.64], // 右目
     [382.32, 175.56], // 左目
     [341.18, 205.03], // 鼻
     [345.12, 250.61], // 口
     [252.76, 211.37], // 右耳
     [431.20, 204.93] // 左耳
  ]
  }
]

7. MediaPipe Facemesh

486個の顔のランドマークの3次元座標の検出を行う。

[
  {
  faceInViewConfidence: 1, // 顔のバウンディングボックス
  boundingBox: { // The bounding box surrounding the face.
    topLeft: [232.28, 145.26],
    bottomRight: [449.75, 308.36],
  },
  mesh: [ // 顔のランドマークの3次元座標
    [92.07, 119.49, -17.54],
    [91.97, 102.52, -30.54],
    ...
  ],
  scaledMesh: [ // 正規化された顔のランドマークの3次元座標
    [322.32, 297.58, -17.54],
    [322.18, 263.95, -30.54]
  ],
  annotations: { // scaledMesh座標のセマンティックグループ。
    silhouette: [
    [326.19, 124.72, -3.82],
    [351.06, 126.30, -3.00],
    ...
    ],
    ...
  }
  }
]

8. MediaPipe Handpose

21個の手のキーポイントの3次元座標を検出する。現在は、1つの手しか検出できないが、マルチハンド検出が将来のリリースされる予定。

[
  {
  handInViewConfidence: 1, // 手が存在する確率
  boundingBox: { // 手を囲むバウンディングボックス
    topLeft: [162.91, -17.42],
    bottomRight: [548.56, 368.23],
  },
  landmarks: [ // 各手のランドマークの3D座標
    [472.52, 298.59, 0.00],
    [412.80, 315.64, -6.18],
    ...
  ],
  annotations: { // ランドマーク座標の意味的グループ
    thumb: [
    [412.80, 315.64, -6.18]
    [350.02, 298.38, -7.14],
    ...
    ],
    ...
  }
  }
]

9. Speech Commands

音声から音声コマンドの認識を行う。Web Audio APIも利用。

◎ デフォルトの音声コマンド

zero 〜 nine
up
down
left
right
go
stop
yes
no
(unknown word)
(background noise)

10. Universal Sentence Encoder

テキストを512次元の埋め込みにエンコードするモデル。感情分類やテキスト類似性分析などの自然言語処理タスクへの入力として使用。

以下は、6つの文を埋め込み、行列で自己相似性スコアをレンダリングした例。

11. Text Toxicity

テキストの有害度の検出を行う。

{
 "label": "identity_attack",
 "results": [{
   "probabilities": [0.9659664034843445, 0.03403361141681671],
   "match": false
 }]
},
{
 "label": "insult",
 "results": [{
   "probabilities": [0.08124706149101257, 0.9187529683113098],
   "match": true
 }]
},

12. Mobile BERT

自然言語処理タスクで最先端の結果を取得する言語表現を事前学習する方法。ウィキペディアの記事と各記事の一連の質問と回答のペアで構成される読解データセットで学習。

13. KNN Classifier

K-Nearest Neighborsアルゴリズムを使用して分類子を作成するためのユーティリティを提供します。転移学習に使用できます。

14. face-api.js

顔検出・顔認識を行う。

◎ ssdMobilenetv1
精度が高い顔検出。バウンディングボックスと精度を取得。

◎ tinyFaceDetector
推論が高速な顔検出。バウンディングボックスと精度を取得。

◎ faceLandmark68Net
顔ランドマークを検出。68個の顔のポイントを取得。

◎ faceRecognitionNet
2つの顔の類似性を判断。

◎ faceExpressionNet
顔の表情の分類。

◎ ageGenderNet
年齢と性別を推定。

15. Handtrack.js

手のバウンディングボックスの検出を行う。

[{
  bbox: [x, y, width, height],
  class: "hand",
  score: 0.8380282521247864
}, {
  bbox: [x, y, width, height],
  class: "hand",
  score: 0.74644153267145157
}]

この記事が気に入ったらサポートをしてみませんか？