Apple Develops LLM for Enhanced Long-Form Video Comprehension

August 23, 2025

Apple researchers have unveiled the SlowFast-LLaVA-1.5 model, an advanced version of the SlowFast-LLaVA model, excelling in long-form video analysis. This model integrates video perception with pre-trained large language models (LLMs), efficiently processing videos by focusing on significant frames while maintaining language coherence. Traditional methods analyze every frame, overwhelming LLM context windows, but Apple’s approach uses a dual-stream framework: a slow stream reviews fewer frames in detail while a fast stream assesses frame motion quickly. Consequently, SF-LLaVA-1.5 outperforms larger models across video tasks and maintains strong performance in image tasks like OCR and reasoning while overcoming limitations of previous models. Although it has a maximum input frame length of 128, which may occasionally miss crucial frames, its ability to operate on public datasets underscores its versatility. Now available on GitHub and Hugging Face, this model sets state-of-the-art benchmarks in long-form video and image comprehension.

Source link

{{post_title}}

Apple Develops LLM for Enhanced Long-Form Video Comprehension

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

Celebrating Three Years of OpenAI ChatGPT: RTZ #922

SuperIntent, the AI DeFi Co-Pilot, Achieves $25M FDV and Launches Alpha...

“Employee Manipulates Google’s Nano Banana AI to Fake Injury for Paid...

NO COMMENTS

LEAVE A REPLY Cancel reply