Model | Easy | Med | Hard | Average |
---|---|---|---|---|
GPT-4o | 45.7% | 41.5% | 38.0% | 41.7% |
Gemini-1.5-Pro | 30.3% | 30.5% | 24.1% | 28.3% |
Claude-3.5-Sonnet | 37.8% | 34.6% | 34.3% | 35.6% |
LLaVA-Video-7B | 7.8% | 9.0% | 8.5% | 8.4% |
Qwen2VL-7B | 11.2% | 8.8% | 1.6% | 7.2% |
VidDiff (ours) | 49.9% | 37.9% | 38.5% | 42.1% |