画像生成AIの指に関する研究 #6 - 物体検出＋形状検出のアプローチを検討してみる

とーふのかけら

2023年11月15日 15:57

本研究のシリーズ記事

画像生成AIの指に関する研究 #1
「何故AIは指が苦手なのか」

画像生成AIの指に関する研究 #2
「CLIPが出したトークンは紐解けるのか」

画像生成AIの指に関する研究 #3
「特徴点抽出＋統計分析で指の描画改善方法を導けるか」

画像生成AIの指に関する研究 #4
「物体検出の確信度を用いたアプローチを検討してみる」

画像生成AIの指に関する研究 #5
「物体検出の確信度を用いたアプローチを実践してみる」

前回のおさらい

画像生成AIの指に関する研究6回目、ということで進めていきます。

前回、「物体検出の確信度（Confidence Score）の統計分析」を求める為に作成したスクリプトを用いて、実際にサンプルパターンを検証してみました。

python .\object_detection.py -d "D:\predict_img\" "D:\yolo_outputs\"

サンプル母数は少ないものの、追加するプロンプトによって差が出ることが確認できました。

再度、検証したところ、また問題が発覚したので、改良する為のアプローチを考えてみます。

発覚した問題点

前回、物体検出モデルを用いて、確信度（Confidence Score）のデータの集積を実施していました。

しかし、確信度が高い画像をいくつかSampling inspectionしてみたところ、以下の問題点が浮き彫りになりました。

【問題点】

指の本数が多い、または少なくても高確信度を出していた
指の形状が歪なものは低い確信度を返すが、関節が変な方向を向いているものに関しては、確信度の評価に反映されていないように見える

以下に、例をいくつかあげます。

e.g.) 右手であるにも関わらず、左手として高確信度を出している

e.g.) 指の本数過多（6本）なのに、高確信度を出している

e.g.) 手の形状が歪なのに、高確信度を出している

つまり、物体検出の考え方として、〇〇ではないか？というものをスコア付けし検出する観点から、色味や大まかな形状、物体の境界線は判断しているものの、人間の手として正常な形状であるかどうかまでは判断していない、という問題点が確認できました。

形状検出のアプローチを取り入れてみる

形状検出とは？

形状検出とは、その物体がどのような形状を成しているか、を判断し検出します。
手に関して言うと、Landmark（特徴点）を検出し、手かどうかを判断します。
今回使用する形状検出モデルはMediapipeになります。

Mediapipeとは？

Mediapipeとは、Googleが公開している画像処理技術になります。

物体検出（Object Detection）と検出追跡（Detection Tracking）を行い、それがどのようなものなのかを判断し表示させます。

出典元：MediaPipe: A Framework for Building Perception Pipelines
https://arxiv.org/pdf/1906.08172.pdf

どのようなことができるのか

実際に使用してみます。

import mediapipe as mp
import cv2
import matplotlib.pyplot as plt

# Shape detection Settings
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
        static_image_mode=True,
        max_num_hands=2,
        min_detection_confidence=0.5)

# Shape detection
images = cv2.imread('./predict_img/sampleimg.png')
images = cv2.cvtColor(images, cv2.COLOR_BGR2RGB)
results=hands.process(images)
shaping_image = images.copy()

# Get Predict
if results.multi_hand_landmarks:
   for hand_landmarks in results.multi_hand_landmarks:
       mp.solutions.drawing_utils.draw_landmarks(
       shaping_image,
       hand_landmarks,
       mp_hands.HAND_CONNECTIONS,
       mp.solutions.drawing_styles.get_default_hand_landmarks_style(),
       mp.solutions.drawing_styles.get_default_hand_connections_style()
       )
   plt.imsave('./yolo_outputs/sampleshape.png',shaping_image)

人間の手の関節（手首含む）は全部で21か所あります。
関節を検出し、手のLandmarkを表示することができます。

もちろんですが、この推論にも確信度が出力されます。

sampleshape.png, 0.8703588247299194

形状検出　確信度スコア

今回使用するスクリプト

前回使用したPythonスクリプトを改良しました。
コードも多分見やすくなってると思います。

スクリプト実行すると、物体検出と形状検出、統計分析を一括で行えます。

本スクリプト、各種ライブラリ及びモジュールのライセンスは、こちらで確認してください。

LICENSE AND NOTICE - konapieces's gist

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ------------------------------------------------------------------------
# Script : obj_analysis.py
# ------------------------------------------------------------------------
# Description : Confidence Score Statistics Analysis Tool
# Date : 2023-11-15
# Author : konapieces [https://twitter.com/konapieces]
# License : AGPL 3.0 License
# ------------------------------------------------------------------------
# * License
# * This software is available under the AGPL 3.0 License and terms described in the LICENSE AND NOTICE.
# * Please refer to the LICENSE AND NOTICE for the license terms of the Library and module you are using.
# ------------------------------------------------------------------------

from ultralytics import YOLO
from argparse import ArgumentParser, RawTextHelpFormatter, RawDescriptionHelpFormatter, ArgumentDefaultsHelpFormatter
import logging
import sys
import glob
import os
import pandas as pd
import datetime
import datetime as dt
import mediapipe as mp
import cv2
import matplotlib.pyplot as plt

# Parser class
class DescriptionFormatter(RawTextHelpFormatter, RawDescriptionHelpFormatter, ArgumentDefaultsHelpFormatter):
        pass

# Init function
def initialize():
        parser = ArgumentParser(
                prog='obj_analysis.py',
                usage=('obj_analysis.py <-detect arg> [<-arg> value] <target_directory> <output_directory>'),
                description="""
-------------------------------------------------------------------------

Script : obj_analysis.py

Description : Confidence Score Statistics Analysis Tool.

-------------------------------------------------------------------------

Notice :

If no detection argument is specified, --object is specified.
Specify the path of the image file for which you want to detect objects.

-------------------------------------------------------------------------
                """,
                formatter_class=DescriptionFormatter,
                add_help=True,
                )

        parser.add_argument('target', help='Specifies the file path of the source image')
        parser.add_argument('output', help='Specifies the output directory')
        parser.add_argument('-d', '--hand', action='store_true', help='Perform detect hands')
        parser.add_argument('-o', '--object', action='store_true', help='Perform detect objects')
        parser.add_argument('-i', '--imgsz', type=int, default=512, help='Images size')
        parser.add_argument('-ns', '--noshow', default=True, action='store_false', help='Not showing images')
        parser.add_argument('-c', '--conf', type=float, default=0.5, help='Confidence threshold')

        # Variant
        args = parser.parse_args()
        target_path = args.target
        output_path = args.output
        argimgsz = args.imgsz
        argsshow = args.noshow
        argsconf = args.conf
        target_file_name = os.path.basename(target_path)
        
        return (args,target_path,output_path,target_file_name,argimgsz,argsshow,argsconf)


# Select models
def modelsel():
        # Variant
        init=initialize()

        # Select model
        if init[0].hand is True:
                init[0].object = False
                model = YOLO("./models/hand_yolov8n.pt")
                models = "hand_yolov8n.pt"
        elif init[0].object is True:
                model = YOLO("./models/yolov8n.pt")
                models = "yolov8n.pt"
        elif init[0].hand is False and init[0].object is False:
                model = YOLO("./models/yolov8n.pt")
                models = "yolov8n.pt"
        return (model,models)


# Folder name
def processtime():
        # Variant
        init=initialize()

        # Init format
        f_format = datetime.datetime.now()
        output_dir = "{0:%Y%m%d_%H%M%S}".format(f_format)
        os.mkdir(init[2]+"/"+output_dir)

        output = str(init[2] +  "/" + output_dir)
        return output


# Terminal Print
def termPrint():
        # Variant
        models=modelsel()
        init=initialize()

        # Print
        print("------- Process Parameters -------")
        print("Target Directory = "+init[1])
        print("Model = "+models[1])
        print("Images Size = "+str(init[4])+" px")
        print("Confidence Threshold = "+str(init[6]))
        print("Show Images = "+str(init[5]))
        print("----------------------------------")


# Object Detection
def obj_Detection():
        # Variant
        models=modelsel()
        p_time=processtime()
        init=initialize()
        i=1
        j=1

        # Get Image
        pattern = '%s/*.png'
        comparing_files = glob.glob(pattern % (init[1]))
        if len(comparing_files) == 0:
                logging.error(': no files.')
                sys.exit(1)

        # Array Image
        for comparing_file in comparing_files:
                comparing_file_name = os.path.basename(comparing_file)
                if comparing_file_name == init[3]:
                        continue

                comparing_img_path = os.path.join(
                os.path.abspath(os.path.dirname(__file__)) + '/../',
                comparing_file,
                )

                # Predict
                models[0].predict(
                        comparing_img_path,
                        save=True,
                        imgsz=int(init[4]),
                        conf=float(init[6]),
                        show=bool(init[5]),
                        save_conf=True,
                        save_txt=True,
                        exist_ok=True,
                        project=p_time)

        # Statistics Process
        # Get confidence text
        pattern = '%s/*.txt'
        txtOutput=p_time+"/predict/labels/"
        target_file_name = os.path.basename(txtOutput)
        comparing_files = glob.glob(pattern % (txtOutput))
        if len(comparing_files) == 0:
                logging.error(': no files.')
                sys.exit(1)

        # Text trim
        df = pd.DataFrame(columns=['Data','Confidence Score'])
        for comparing_file in comparing_files:
                comparing_file_name = os.path.basename(comparing_file)
                if comparing_file_name == target_file_name:
                        continue

                comparing_txt_path = os.path.join(
                os.path.abspath(os.path.dirname(__file__)) + '/../',
                comparing_file,
                )

                with open(comparing_txt_path, 'r') as file:
                        txtLength = sum(1 for line in file)

                for j in range(txtLength):
                        ref_ = pd.read_csv(comparing_txt_path,
                                                engine='python',
                                                sep="[,;/ :\t]",
                                                names=['Num','X Axis','Y Axis','Width','Height','Confidence Score'],
                                                usecols=['Confidence Score'])
                        ref=float(ref_.values[j])
                        df.loc[str(i)]=[comparing_file_name,ref]
                        i=i+1
                        
                        # *If you want to output the inference results to the terminal, please remove the comment out.
                        #logging.info('%s: %f.' % (comparing_file_name,ref))
                j=1

        # Statistics
        df.loc['Max'] = ['',df['Confidence Score'].max()]
        df.loc['Min'] = ['',df['Confidence Score'].min()]
        df.loc['Average'] = ['',df['Confidence Score'].mean()]
        df.loc['Median'] = ['',df['Confidence Score'].median()]
        df.loc['Standard Deviation'] = ['',df['Confidence Score'].std()]
        df.loc['Variance'] = ['',df['Confidence Score'].var()]
        print(df)

        # Output CSV
        now = dt.datetime.now()
        time = now.strftime('%Y%m%d-%H%M%S')
        df.to_csv(p_time+'\Object_Analysis_Test_{}.csv'.format(time))

        # Shape detection
        if init[0].hand is True:
                shp_Detection(p_time)


# Shape detection
def shp_Detection(p_time):
        # Logging
        logging.info('%s : Processing...' % (__file__))
        logging.info('%s : Shape Analysis Process Start.' % (__file__))

        # Variant
        init=initialize()
        i=0
        j=1

        # Shape detection
        mp_hands = mp.solutions.hands
        hands = mp_hands.Hands(
        static_image_mode=True,
        max_num_hands=2,
        min_detection_confidence=0.5)

        # Get Image
        pattern = '%s/*.png'
        comparing_files = glob.glob(pattern % (init[1]))
        if len(comparing_files) == 0:
                logging.error('no files.')
                sys.exit(1)
        
        # Make dataframe
        df = pd.DataFrame(columns=['Data','Shape Confidence Score'])

        make_dir = "shape"
        output=p_time + "/" + make_dir + "/"

        # Directory exist check
        if os.path.isdir(output):
                logging.info('%s : Output directory already exists.' % (__file__))
        else:
                logging.info('%s : Target directory does not exist.' % (__file__))

        # Make Directory
        if not os.path.exists(output):
                logging.info('%s : Create a new target directory.' % (__file__))
                os.makedirs(output)

        # Array Image
        for comparing_file in comparing_files:
                comparing_file_name = os.path.basename(comparing_file)
                if comparing_file_name == init[3]:
                        continue

                comparing_img_path = os.path.join(
                os.path.abspath(os.path.dirname(__file__)) + '/../',
                comparing_file,
                )

                output_img=output+comparing_file_name
                
                # Shape detection
                images = cv2.imread(comparing_img_path)
                images = cv2.cvtColor(images, cv2.COLOR_BGR2RGB)
                results=hands.process(images)
                shaping_image = images.copy()

                # *If you want to output the inference results to the terminal, please remove the comment out.
                # print(comparing_img_path)
                # print(results.multi_handedness)

                # Get Predict
                if results.multi_hand_landmarks:
                        for hand_landmarks in results.multi_hand_landmarks:
                                mp.solutions.drawing_utils.draw_landmarks(
                                        shaping_image,
                                        hand_landmarks,
                                        mp_hands.HAND_CONNECTIONS,
                                        mp.solutions.drawing_styles.get_default_hand_landmarks_style(),
                                        mp.solutions.drawing_styles.get_default_hand_connections_style()
                                )
                                scores_dict = results.multi_handedness[0].classification[0].score
                                df.loc[str(j)] = [comparing_file_name,scores_dict]
                                i=i+1
                                j=j+1
                        plt.imsave(output_img,shaping_image)

        # Statistics
        df.loc['Max'] = ['',df['Shape Confidence Score'].max()]
        df.loc['Min'] = ['',df['Shape Confidence Score'].min()]
        df.loc['Average'] = ['',df['Shape Confidence Score'].mean()]
        df.loc['Median'] = ['',df['Shape Confidence Score'].median()]
        df.loc['Standard Deviation'] = ['',df['Shape Confidence Score'].std()]
        df.loc['Variance'] = ['',df['Shape Confidence Score'].var()]
        print(df)

        # Output CSV
        now = dt.datetime.now()
        time = now.strftime('%Y%m%d-%H%M%S')
        df.to_csv(p_time+'\Object_ShapeDetect_Test_{}.csv'.format(time))


# MAIN
if __name__ == '__main__':
        # Debuglog
        logging.basicConfig(
                level=logging.DEBUG,
                format='%(asctime)s %(levelname)s: %(message)s',
        )
        logging.info('%s : Object Analysis Process Start.' % (__file__))

        # Terminal Print
        termPrint()

        # Detection Process
        obj_Detection()

        # Debug end
        logging.info('%s : Object Analysis Process End.' % (__file__))
        sys.exit(0)

スクリプト仕様は以下。

Parserを作成して、Arg指定できるようにしてます。指定しないと --objectで動きます。
・-d , --hand : 手検出（Default : False）
・-o , --object : 物体検出（Default : False）
・-i , --imgsz : 画像サイズ（Default : 512 px）
・-ns , --noshow : 結果表示しない（Default : False）
・-c , --conf : 確信度閾値（Default : 0.5）
target_directoryに対して、総当たり検査し、output_directoryに結果を出力します。
手検出モデルである「hand_yolov8n.pt」は自動取得しませんが、物体検出モデルの「yolov8n.pt」は指定ディレクトリにモデルがなければ自動的にダウンロードします。
スクリプトを使用する場合は、モデルの配置場所（88,91,94行）を使用環境に応じて適時書き換えてください。
統計関数を使用し算出したものを、CSV形式で出力します。
形状検出の最大検出数は「2」です。変更したい方は、244行目を変更してください。
形状検出の確信度閾値は「0.5」です。変更したい方は、245行目を変更してください。
以下、コメントアウトしている部分（204行、287-288行）はDebug項目です。
Terminal画面がうるさくなりますが、表示させたい人はコメントアウトを削除してください。

# *If you want to output the inference results to the terminal, please remove the comment out.
# logging.info('%s: %f.' % (comparing_file_name,ref))

# *If you want to output the inference results to the terminal, please remove the comment out.
# print(comparing_img_path)
# print(results.multi_handedness)

統計値算出は取捨選択できます。必要ないものは消してもOKです。
必要なライブラリやモジュールは各自導入してください。
（* 個人用で作っただけなので、setup.pyとか作ってません！）

ultralytics
opencv-python
pandas
mediapipe
matplotlib

必要ライブラリ

試しに使ってみる

(中略)
00088-2023-11-07_LastpieceReVive_A1615.fp16_DPM++ 2M Karras_1260558576.txt, 0.887286
Max,,0.922119
Min,,0.780997
Average,,0.873373
Median,,0.888451
Standard Deviation,,0.034858
Variance,,0.019703

物体検出確信度

(中略)
00088-2023-11-07_LastpieceReVive_A1615.fp16_DPM++ 2M Karras_1260558576.png, 0.98122215270996
Max,,0.990292
Min,,0.744703
Average,,0.923606
Median,,0.959574
Standard Deviation,,0.079389
Variance,,0.024931

形状検出確信度

おわりに

「画像生成AIの指に関する研究 #6 - 物体検出＋形状検出のアプローチを検討してみる」
いかがだったでしょうか。

めっちゃ訳が分からなくなってきましたね！
この研究、どこに向かってるでしょうか…
有意義な研究にさせる為にがんばります…！

これからも、「ふむふむなるほどわからん」程度で見て頂けると幸いです！

こんな変な視点からお送りする記事も、メンバーシップではやっていますので、連載を楽しみにしてもらえると幸いです。

次回もお楽しみに！

この記事が参加している募集

#AIとやってみた

31,214件

よろしければサポートお願いします！✨ 頂いたサポート費用は活動費（電気代や設備費用）に使わさせて頂きます！✨