Web Speech API の使い方

npaka

2021年4月23日 20:36

「Web Speech API」を試してみました。

・Chrome 89.0.4389.128

1. Web Speech API

Webブラウザ上で「音声認識」と「音声合成」を行うためのブラウザネイティブなAPIです。Chromeで利用可能です。

2. 音声認識

音声認識を行うコードは、次のとおりです。

・index.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
  </head>
  <body>
    <div id="output"></div>
    <script type="text/javascript" src="main.js"></script>
  </body>
</html>

・main.js

// SpeechRecognitionの準備
SpeechRecognition = webkitSpeechRecognition || SpeechRecognition;
const recognition = new SpeechRecognition()
recognition.lang = 'ja-JP' // 言語コード

// 発話検出時に呼ばれる
recognition.onresult = (event) => {
 let utterance = event.results[0][0].transcript
 output.innerHTML = output.innerHTML + utterance + "<br>"
}

// 終了時に呼ばれる
recognition.onend = (event) => {
 // 音声認識の再度開始
 recognition.start()
}

// 音声認識の開始
output.innerHTML = "文字起こし開始...<br>"
recognition.start()

周囲の音声が文字起こしされて画面に表示されます。

3. 音声認識 - 認識途中も結果表示

認識途中も結果表示する音声認識のコードは、次のとおりです。

・main.js

// SpeechRecognitionの準備
SpeechRecognition = webkitSpeechRecognition || SpeechRecognition;
const recognition = new SpeechRecognition()
recognition.lang = 'ja-JP' // 言語コード
recognition.interimResults = true // 認識途中の結果取得

// 確定テキスト
let text = ""

// 発話検出時に呼ばれる
recognition.onresult = (event) => {
  let utterance = event.results[0][0].transcript
  output.innerHTML = text + utterance + "<br>" // 認識途中の結果表示
  if (event.results[0].isFinal) text = output.innerHTML　// 認識完了時
}

// 終了時に呼ばれる
recognition.onend = (event) => {
  // 音声認識の開始
  recognition.start()
}

// 音声認識の開始
output.innerHTML = "文字起こし開始...<br>"
text = output.innerHTML
recognition.start()

「recognition.interimResults = true」で認識途中の結果が「onresult」に通知されるようになります。音声認識完了時の結果かどうかは「event.results[0].isFinal」で確認できます。

4. 音声合成

音声合成のコードは、次のとおりです。

・index.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
  </head>
  <body>
    <div id="button">音声合成</div>
    <script type="text/javascript" src="main.js"></script>
  </body>
</html>

・main.js

// ボタンクリック時に呼ばれる
button.addEventListener("click", function() {
    let utterance = new SpeechSynthesisUtterance()
    utterance.text = "こんにちは" // テキスト
    utterance.lang = "ja-JP" // 言語コード
    utterance.rate = 1.5 // 速度 (0.1〜10、初期値:1)
    utterance.pitch = 0.75 // ピッチ（0〜2、初期値:1）
    utterance.volume = 1 // 音量(0〜1、初期値1)
    speechSynthesis.speak(utterance)
})

ユーザー操作なしにspeechSynthesis.speak()を呼ぶと以下のエラーがでます。

speechSynthesis.speak() without user activation is no longer allowed since M71, around December 2018.

【おまけ】 SpeechRecognitionのイベント

「SpeechRecognition」のイベントは、次の順番に通知されます。

(1) onstart : スタート
(2) onaudiostart : オーディオスタート
(3) onsoundstart : サウンドスタート
(4) onspeechstart : スピーチスタート
(5) onresult : 結果
(6) onspeechend : スピーチ終了
(7) onsoundend : サウンド終了
(8) onaudioend : オーディオ終了
(9) onend : 終了

その他に以下の2つのイベントがあります。

・onnomatch : マッチなし
・onerror: エラー

この記事が気に入ったらサポートをしてみませんか？