Introduction

Recently, I found the MathVista benchmark evaluates LLMs solving math problems from pictures. Remarkably, Claude 3.5 Sonnet, Gemini 1.5 Pro (May 2024), and GPT-4o have outperformed the average human.

And the newly released Claude 3.5 Sonnet reached a score of 67, much higher than Gemini 1.5 Pro (May 2024) (63.9) and GPT-4o (63.8).

What is MathVista?

Here is a quota introduction of MathVista from its homepage.

To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. It consists of 6,141 examples, derived from 28 existing multimodal datasets involving mathematics and 3 newly created datasets (i.e., IQTest, FunctionQA, and PaperQA). Completing these tasks requires fine-grained, deep visual understanding and compositional reasoning, which all state-of-the-art foundation
models find challenging.

This is the leaderboard published:

My Experiments

Data source

I found a website called Math-Exercises that categorizes math problems. It offers pictures of the problems and their answers.

I picked 5 problems from this page and tested them on Claude 3.5 Sonnet, Gemini 1.5 Pro, and GPT-4o. Let me show you each problem and their answers. Let's start.

Round 1 - Set Operations

Problem:

Find the intersection A∩B, union A∪B and differences A-B, B-A of sets A, B if :

Claude 3.5 Sonnet:

Gemini 1.5 Pro:

GPT-4o:

Right Answer:

Result:

🌟 Awesome! All three models are correct! Maybe it’s too simple for AI.

Round 2 - Set Operations Again!

This time, I am using a slightly more abstract set problem.

Problem:

Find the intersection A∩B, union A∪B and differences A-B, B-A of sets A, B if :

Claude 3.5 Sonnet:

Gemini 1.5 Pro:

GPT-4o:

Right Answer:

Result:

🌟 Incredible, they are all right again! Is AI really good at set operations?

Round 3 - Algebraic Expressions

I chose a different type of problem. This time, I want to test AI's algebraic skills.

Problem:

By grouping the terms factor the polynomials and algebraic expressions :

Claude 3.5 Sonnet:

Gemini 1.5 Pro:

GPT-4o:

Right Answer:

Result:

😱 What? They are all wrong this time. I can’t understand how they know how to solve this problem but still make mistakes at the end.

Round 4 - Linear Equations

Problem:

Solve the linear equations and check the solution :

Claude 3.5 Sonnet:

Gemini 1.5 Pro:

GPT-4o:

It prints out too much. I only screenshot the final answer.

Right Answer:

Result:

🌟 They all found the right answer again! However, Claude failed in the answer check, even though the answer is correct. It's so odd.

Round 5 - Inequalities

Maybe the Equations is too simple for AIs, this time I use inequalities!

Problem:

Solve the linear inequalities with absolute value :

Claude 3.5 Sonnet:

Gemini 1.5 Pro:

It prints out too much. I only screenshot the final answer.

GPT-4o:

It prints out too much. I only screenshot the final answer.

Right Answer:

Result:

😢 Maybe it’s too difficult for AI. They are trying hard, but it's all wrong.

Conclusion

AI can solve many math problems. Although it may fail in some cases, it can still provide useful hints.

And finally, Claude 3.5 Sonnet, Gemini 1.5 Pro, and GPT-4o received the same scores in this math match! In conclusion, state-of-the-art AIs perform similarly at math.

Can AI really solve math from pictures?

Introduction

What is MathVista?

My Experiments

Data source

Round 1 - Set Operations

Round 2 - Set Operations Again!

Round 3 - Algebraic Expressions

Round 4 - Linear Equations

Round 5 - Inequalities

Conclusion

Links: