Model | Easy | Med | Hard | Average |
---|---|---|---|---|
GPT-4o | 58.3% | 53.2% | 48.9% | 53.5% |
Gemini-1.5-Pro | 67.8% | 53.6% | 51.7% | 57.7% |
Claude-3.5-Sonnet | 57.1% | 50.5% | 52.5% | 53.4% |
LLaVA-Video-7B | 56.6% | 52.0% | 48.3% | 52.3% |
Qwen2VL-7B | 49.0% | 52.6% | 49.6% | 50.4% |
VidDiff (ours) | 62.7% | 56.2% | 50.0% | 56.3% |