人気の記事一覧

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

1か月前

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

2か月前

C3LLM: Conditional Multimodal Content Generation Using Large Language Models

2か月前

TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

2か月前

Agent AI: Surveying the Horizons of Multimodal Interaction

3か月前

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

6か月前

FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild

7か月前