| Model | Easy | Med | Hard | Average |
|---|---|---|---|---|
| GPT-4o | 58.3% | 53.2% | 48.9% | 53.5% |
| Gemini-1.5-Pro | 67.8% | 53.6% | 51.7% | 57.7% |
| Claude-3.5-Sonnet | 57.1% | 50.5% | 52.5% | 53.4% |
| LLaVA-Video-7B | 56.6% | 52.0% | 48.3% | 52.3% |
| Qwen2VL-7B | 49.0% | 52.6% | 49.6% | 50.4% |
| VidDiff (ours) | 62.7% | 56.2% | 50.0% | 56.3% |