人気の記事一覧
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
Agent AI: Surveying the Horizons of Multimodal Interaction
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild