Tech
Study Finds AI Models Get Basic Math Wrong Around 40 Percent of the Time
Artificial intelligence (AI) tools are increasingly used for everyday calculations, but a new study suggests users should approach their answers with caution. Researchers from the Omni Research on Calculation in AI (ORCA) found that when tested on 500 real-world math prompts, AI models had roughly a 40 percent chance of producing an incorrect result.
The study evaluated five widely used AI systems in October 2025: ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), Claude 4.5 Sonnet (Anthropic), DeepSeek V3.2 (DeepSeek AI), and Grok-4 (xAI). None of the models scored above 63 percent overall, with Gemini leading at 63 percent, Grok close behind at 62.8 percent, and DeepSeek at 52 percent. ChatGPT-5 scored 49.4 percent, while Claude trailed at 45.2 percent. The average accuracy across all five models was 54.5 percent.
“Although the exact rankings might shift if we repeated the benchmark today, the broader conclusion would likely remain the same: numerical reliability remains a weak spot across current AI models,” said Dawid Siuda, co-author of the ORCA Benchmark.
Performance varied across categories. AI models performed best in basic math and conversions, with Gemini achieving 83 percent accuracy and Grok 76.9 percent. ChatGPT-5 scored 66.7 percent in the same category, giving a combined average of 72.1 percent—the highest across the seven tested categories. Physics proved the most challenging, with overall accuracy dropping to 35.8 percent. Grok led this category at 43.8 percent, while Claude scored just 26.6 percent.
Some AI systems struggled more than others in specific fields. DeepSeek recorded only 10.6 percent accuracy in biology and chemistry, meaning it failed nearly nine out of ten questions. In finance and economics, Gemini and Grok reached 76.7 percent, while the other three models scored below 50 percent.
The study also categorized the types of mistakes AI makes. “Sloppy math” errors, including miscalculations or rounding issues, accounted for 68 percent of mistakes. Faulty logic errors represented 26 percent, reflecting incorrect formulas or assumptions. Misreading instructions accounted for 5 percent, while some AI simply refused to answer. Siuda noted that multi-step calculations with rounding were particularly prone to error.
The research highlights the importance of verifying AI-generated calculations. “If the task is critical, use calculators or proven sources, or at least double-check with another AI,” Siuda advised.
All 500 prompts used in the study had one correct answer and were designed to reflect everyday math tasks, including statistics, finance, physics, and basic arithmetic. The findings indicate that while AI can assist with calculations, it remains unreliable for precise numerical work and users should remain cautious when relying on these tools.
Tech
EU Accuses Meta of Failing to Keep Under-13s Off Facebook and Instagram
Tech
Europe Emerges as Rising Hub in Global Race for AI Talent
Tech
Study Finds Chatbots Can Mirror Hostility in Heated Exchanges
A new academic study has found that ChatGPT can produce abusive language when exposed to escalating human conflict, raising fresh concerns about how artificial intelligence behaves in tense interactions.
The research, published in the Journal of Pragmatics, examined how the chatbot responded to arguments that gradually became more hostile. Researchers presented the system with a sequence of five increasingly heated exchanges and asked it to generate what it considered the most plausible reply.
According to the findings, the AI’s tone shifted as the conversations intensified. While early responses remained measured, later replies began to mirror the aggression in the prompts. In some cases, the chatbot produced insults, profanity and even threats.
Examples cited in the study included statements such as “you should be ashamed of yourself” and more explicit language involving personal threats. The researchers said this pattern suggests that prolonged exposure to hostile input can push the system beyond its usual safeguards.
The study was co-authored by Vittorio Tantucci and Jonathan Culpeper at Lancaster University. Tantucci said the results show that AI can “escalate” alongside human users, potentially overriding built-in mechanisms designed to limit harmful responses.
“When humans escalate, AI can escalate too,” he said, noting that this behavior raises questions about how such systems should be deployed in sensitive environments.
Despite the concerning examples, the researchers found that the chatbot was generally less aggressive than human participants in similar scenarios. In some cases, it attempted to defuse tension through sarcasm or indirect responses rather than direct confrontation.
For instance, when faced with a threat during a simulated dispute, the AI responded with a sarcastic remark rather than escalating the situation further. This suggests that while the system can adopt hostile language, it may also attempt to manage conflict in less direct ways.
The findings add to ongoing debates about the role of artificial intelligence in areas such as mediation, customer service and online communication, where systems may encounter emotionally charged interactions.
Experts say the research highlights the importance of continued testing and refinement of AI safety measures, particularly as such tools are increasingly used in real-world settings involving human conflict.
OpenAI, the developer of ChatGPT, had not issued a public response to the study at the time of publication.
-
Entertainment2 years agoMeta Acquires Tilda Swinton VR Doc ‘Impulse: Playing With Reality’
-
Business2 years agoSaudi Arabia’s Model for Sustainable Aviation Practices
-
Business2 years agoRecent Developments in Small Business Taxes
-
Sports2 years agoChina’s Historic Olympic Victory Sparks National Pride Amid Controversy
-
Home Improvement1 year agoEffective Drain Cleaning: A Key to a Healthy Plumbing System
-
Politics2 years agoWho was Ebrahim Raisi and his status in Iranian Politics?
-
Sports2 years agoKeely Hodgkinson Wins Britain’s First Athletics Gold at Paris Olympics in 800m
-
Business2 years agoCarrectly: Revolutionizing Car Care in Chicago
