Random | - | 22.0 | 21.9 | 21.8 | 21.9 |
Llama-3.2-11b-vision-instruct | Small | 30.3 | 32.4 | 29.3 | 28.7 |
LLaVA-Mistral-7B | Medical | 39.8 | 31.6 | 43.1 | 37.1 |
VILA1.5-13b | Small | 41.8 | 41.8 | 47.5 | 40.9 |
Llama-3.2-90b-Vision-Instruct | Large | 42.4 | 44.9 | 42.1 | 38.7 |
LLaVA-Med-Mistral-7B | Medical | 43.0 | 37.3 | 47.1 | 41.6 |
Llama-3.1-Nemotron-70b-Instruct | Large | 44.2 | 44.9 | 43.3 | 44.8 |
*GPT-4o | Large | 45.6 | 48.7 | 43.1 | 44.8 |
Pixtral-12b | Small | 45.6 | 46.9 | 44.8 | 44.8 |
GPT-4o-mini | Small | 46.2 | 48.5 | 43.6 | 47.0 |
Gemini-Flash-1.5-8b | Small | 46.7 | 48.7 | 43.6 | 49.1 |
Claude-3.5-Haiku | Small | 47.1 | 48.0 | 43.8 | 51.7 |
Qwen-2-vl-72b-Instruct | Large | 47.5 | 49.2 | 45.7 | 47.8 |
VILA1.5-40b | Large | 47.5 | 47.2 | 47.9 | 47.4 |
Grok-2-Vision | Large | 48.4 | 50.3 | 46.4 | 48.7 |
Qwen-2-VL-7b | Small | 48.8 | 54.1 | 43.3 | 49.6 |
Pixtral-Large | Large | 49.8 | 50.8 | 49.5 | 48.7 |
Human | - | 50.3 | 52.7 | 47.5 | 51.4 |
Gemini-Pro-1.5 | Large | 51.1 | 52.0 | 50.2 | 50.9 |
*Claude-3.5-Sonnet | Large | 51.7 | 54.1 | 50.2 | 50.4 |
o1 | Reasoning | 52.8 | 55.4 | 50.2 | 53.0 |