Poster
Improving LLM Video Understanding with 16 Frames Per Second
Yixuan Li · Changli Tang · Jimin Zhuang · Yudong Yang · Guangzhi Sun · Wei Li · Zejun MA · Chao Zhang
West Exhibition Hall B2-B3 #W-123
Human vision naturally processes continuous motion, but most AI video models only analyze a few still frames per second, missing important visual details. To address this, we developed F-16, a new AI model that can understand videos at a much higher frame rate—16 frames per second. F-16 compresses visual information from each second of video, allowing it to capture motion and key details more effectively without needing much more computing power. Tests show that F-16 performs better than previous models on various video understanding tasks, including general and detailed benchmarks, as well as complex activities like sports. It even beats leading commercial models like GPT-4o and Gemini 1.5 Pro in analyzing fast-paced sports like basketball and diving.