WebGLでChatGPTと会話したかった記事

2023年6月30日 13:51

やりたいこと

UnityでWebGLの開発
音声入力でChatGPTに話しかける。
ChatGPTの返答を音声にする。
ChatGPTの返答は遅いので、句読点で区切る。

音声入力

faster-whisper の large-v2を使用

from faster_whisper import WhisperModel
whisper_model = WhisperModel("large-v2",download_root="pretrained_models",compute_type="int8",device="cpu")

ChatGPTとの通信部分

            def getSentenceOfOpenAIStream(chat_data:dict,transcription:str):

                response_stream = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature=0.7,
                    messages= chat_data + [{"role": "user", "content": transcription}],
                    stream=True
                )
                acumulatedResponse = ""
                for item in response_stream:
                    choice = item['choices'][0]
                    if choice["finish_reason"] is None:
                        if not "role" in choice["delta"].keys():
                            acumulatedResponse += choice["delta"]["content"]
                        if "。" in acumulatedResponse or "、" in acumulatedResponse:
                            yield acumulatedResponse
                            acumulatedResponse=""
                    else:
                        print(choice["finish_reason"])

response_streamのdeltaを足していき、句読点が見つかったところでyieldする。
これにより、ChatGPTの文章生成と音声生成＋通信＋音声再生が同時に行えるため、時間短縮になる。

通信について

モデルはpythonなのでUnity-Python間の通信を構築する

完全に手探りだった。

from django.http import StreamingHttpResponse
from django.http.multipartparser import MultiPartParser
from rest_framework.views import APIView

むりやりStreamingHttpResponseでChatGPTのStreamingを処理

         def response_generator():
            for slicedResponse in getSentenceOfOpenAIStream(chat_data=chat_data,transcription=transcription):
                print(slicedResponse)
                response_text = slicedResponse
                response_audio_data, _ = audioInferer.infer_audio(response_text,42)
                print("yielding response slice")
                yielding_component = response_audio_data.tobytes()
                yield yielding_component

        response = StreamingHttpResponse(response_generator())

OepnAIからのレスポンス、音声のバイナリデータ、サンプリングレート、文字起こしを送りたいがどのようにすればいいだろうか…

この記事が気に入ったらサポートをしてみませんか？