iOSアプリ開発入門 (2) - SFSpeechRecognizer

npaka

2021年6月23日 20:11

iOSアプリでの「SFSpeechRecognizer」による「音声認識」の実装方法をまとめました。

・iOS14

前回

1. SFSpeechRecognizer

iOSアプリで「音声認識」を実装するには、「SFSpeechRecognizer」を使います。

2. サーバー版とオンデバイス版の違い

サーバー版とオンデバイス版には、次のような違いがあります。精度ならサーバー版、制限回避やプライバシーならオンデバイス版になります。

3. UI

今回は、「UILabel」と「UIButton」を1つずつ配置します。

4. Info.plist

「Info.plist」に以下の項目を設定します。

・Private - Speech Recognition Usage Description : 音声認識の用途の説明。
・Private - Microphone Usabe Description : マイクの用途の説明。

5. コード

コードは、次のとおりです。

import UIKit
import Speech
import AVFoundation

// ViewController
class ViewController: UIViewController {
    // UI
    @IBOutlet weak var label: UILabel!
    @IBOutlet weak var button: UIButton!

    // 音声認識
    let recognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "ja_JP"))!
    let audioEngine = AVAudioEngine()
    let recognitionReq = SFSpeechAudioBufferRecognitionRequest()
    var recognitionTask: SFSpeechRecognitionTask!
    var recording: Bool = false

    // ビューのロード時に呼ばれる
    override func viewDidLoad() {
        super.viewDidLoad()
       
        // UI
        self.recording = false
        self.label.text = ""
        self.button.isEnabled = false
        self.button.setTitle("音声認識の開始", for: .normal)
    }
   
    // ビューの表示時に呼ばれる
    override func viewDidAppear(_ animated: Bool) {
        // 音声認識の利用の承認要求
        SFSpeechRecognizer.requestAuthorization {(authStatus) in
            DispatchQueue.main.async {
                // 承認かつ利用可能時
                if authStatus == SFSpeechRecognizerAuthorizationStatus.authorized &&
                    self.recognizer.isAvailable {

                    // オーディオセッションの準備
                    self.setupAudioSession()
        
                    // UI
                     self.button.isEnabled = true
                }
             }
        }
    }
   
    // オーディオセッションの準備
    func setupAudioSession() {
        do {
            // 音声認識の設定
            if self.recognizer.supportsOnDeviceRecognition {
                self.recognitionReq.requiresOnDeviceRecognition = true // オンデバイス音声認識
            }
            self.recognitionReq.shouldReportPartialResults = true // 中間結果の取得

            // オーディオセッションの準備
            let audioSession = AVAudioSession.sharedInstance()
            try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print(error.localizedDescription)
        }
    }
  
    // ボタンクリック時に呼ばれる
    @IBAction func onButtonClick(_ sender: UIButton) {
        // 音声認識の開始
        if (!self.recording) {
            // UI
            self.recording = true
            self.button.setTitle("音声認識の停止", for: .normal)
            self.label.text = ""

            // 音声認識の開始
            self.startSpeechRecognition()
        }
        // 音声認識の停止
        else {
            //UI
            self.recording = false
            self.button.setTitle("音声認識の開始", for: .normal)
           
            // 音声認識の停止
            self.stopSpeechRecognition()
        }
    }
  
    // 音声認識の開始
    func startSpeechRecognition() {
        do {
            // 音声認識タスクの停止
            if (self.recognitionTask != nil) {
                self.recognitionTask.cancel()
                self.recognitionTask.finish()
                self.recognitionTask = nil
            }
            
            // 入力ノードの生成
           let inputNode = self.audioEngine.inputNode
           let recordingFormat = inputNode.outputFormat(forBus: 0)
           inputNode.installTap(onBus: 0, bufferSize: 2048, format: recordingFormat) {(buffer, time) in
               self.recognitionReq.append(buffer)
           }            
            
            // 音声認識の開始
            self.audioEngine.prepare()
            try self.audioEngine.start()
            self.recognitionTask = self.recognizer.recognitionTask(
                with: recognitionReq, resultHandler: {(result, error) in
                if let error = error {
                    print("\(error)")
                } else {
                    DispatchQueue.main.async {
                        self.label.text = result!.bestTranscription.formattedString
                    }
                }
            })
        } catch {
            print(error.localizedDescription)
        }
    }
 
    // 音声認識の停止
    func stopSpeechRecognition() {
        self.audioEngine.stop()
        self.audioEngine.inputNode.removeTap(onBus: 0)
        self.recognitionReq.endAudio()
    }
}

6. 音声認識が利用可能かどうかの確認

音声認識は、基本的にネットワークを使うので、利用できない場合もあります。

◎ 音声認識が利用可能かどうかの確認

var isAvailable: Bool

◎ 音声認識が利用可能かどうかの変更受信

var delegate: SFSpeechRecognizerDelegate?

受信側には「SFSpeechRecognizerDelegate」を実装します。

func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool) {
    // 音声認識が利用可能かどうかの変更時の処理
}

7. 音声認識のリクエストの設定

「SFSpeechAudioBufferRecognitionRequest」で設定可能です。

◎ 中間結果の取得
音声認識中の中間結果を返すかどうかを指定します。

self.recognitionReq.shouldReportPartialResults = true

◎ オンデバイス音声認識
ネットワークを利用せず、オンデバイスのみで音声認識を行います。

if self.recognizer.supportsOnDeviceRecognition {
    self.recognitionReq.requiresOnDeviceRecognition = true
}

◎ 語彙の追加
音声認識リクエストの語彙を追加することで、システムに含まれていない語彙を認識させやすくなります（体感的では変わってなさそう）。

self.recognitionReq.contextualStrings = ["ヒカキン"] // 語彙の追加

8. レスポンスの情報

self.recognizer.recognitionTask()のコールバックで返ってくる「SFSpeechRecognitionResult」には、以下の情報が含まれています。

◎ 信頼度が最も高い文字起こし

var bestTranscription: SFTranscription

「SFTranscription」が持つ情報は、次のとおりです。

var formattedString: String - 文字起こし
var segments: [SFTranscriptionSegment] - セグメント情報(信頼度など)

◎ 信頼度高い順にソートされた文字起こし配列

var transcriptions: [SFTranscription]

◎ 音声認識が完了したかどうか

var isFinal: Bool

◎ メタデータ

var speechRecognitionMetadata: SFSpeechRecognitionMetadata?

「SFSpeechRecognitionMetadata」が持つ情報は、次のとおりです。

var averagePauseDuration: TimeInterval
var speakingRate: Double
var speechDuration: TimeInterval
var speechStartTimestamp: TimeInterval
var voiceAnalytics: SFVoiceAnalytics?

9. 参考

次回

この記事が気に入ったらサポートをしてみませんか？