マルチモーダルタスク

書いてみる

人気の記事一覧

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

2か月前

ViLaM: A Vision-Language Model with Enhanced Visual Grounding and Generalization Capability

7か月前