Tech
Study Finds AI Models Get Basic Math Wrong Around 40 Percent of the Time
Artificial intelligence (AI) tools are increasingly used for everyday calculations, but a new study suggests users should approach their answers with caution. Researchers from the Omni Research on Calculation in AI (ORCA) found that when tested on 500 real-world math prompts, AI models had roughly a 40 percent chance of producing an incorrect result.
The study evaluated five widely used AI systems in October 2025: ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), Claude 4.5 Sonnet (Anthropic), DeepSeek V3.2 (DeepSeek AI), and Grok-4 (xAI). None of the models scored above 63 percent overall, with Gemini leading at 63 percent, Grok close behind at 62.8 percent, and DeepSeek at 52 percent. ChatGPT-5 scored 49.4 percent, while Claude trailed at 45.2 percent. The average accuracy across all five models was 54.5 percent.
“Although the exact rankings might shift if we repeated the benchmark today, the broader conclusion would likely remain the same: numerical reliability remains a weak spot across current AI models,” said Dawid Siuda, co-author of the ORCA Benchmark.
Performance varied across categories. AI models performed best in basic math and conversions, with Gemini achieving 83 percent accuracy and Grok 76.9 percent. ChatGPT-5 scored 66.7 percent in the same category, giving a combined average of 72.1 percent—the highest across the seven tested categories. Physics proved the most challenging, with overall accuracy dropping to 35.8 percent. Grok led this category at 43.8 percent, while Claude scored just 26.6 percent.
Some AI systems struggled more than others in specific fields. DeepSeek recorded only 10.6 percent accuracy in biology and chemistry, meaning it failed nearly nine out of ten questions. In finance and economics, Gemini and Grok reached 76.7 percent, while the other three models scored below 50 percent.
The study also categorized the types of mistakes AI makes. “Sloppy math” errors, including miscalculations or rounding issues, accounted for 68 percent of mistakes. Faulty logic errors represented 26 percent, reflecting incorrect formulas or assumptions. Misreading instructions accounted for 5 percent, while some AI simply refused to answer. Siuda noted that multi-step calculations with rounding were particularly prone to error.
The research highlights the importance of verifying AI-generated calculations. “If the task is critical, use calculators or proven sources, or at least double-check with another AI,” Siuda advised.
All 500 prompts used in the study had one correct answer and were designed to reflect everyday math tasks, including statistics, finance, physics, and basic arithmetic. The findings indicate that while AI can assist with calculations, it remains unreliable for precise numerical work and users should remain cautious when relying on these tools.
Tech
Study Says EU Regulations Are Slowing Rollout of Advanced AI Models
A new study by Governance.AI has found that European Union regulations are delaying the rollout of advanced artificial intelligence models, with technology companies increasingly pointing to the bloc’s regulatory framework as a key obstacle to launching new AI products in Europe.
The report examined 375 large language models (LLMs) released between June 2018 and May 2026, comparing their availability across the United States, the European Union and the United Kingdom. According to the findings, at least 11 percent of advanced AI model releases were either delayed or never launched in the EU compared with the United States. In the UK, the figure stood at 7 percent.
Researchers said they identified 68 cases in which AI models experienced delays or were withheld from specific markets. Regulatory factors were cited as the primary reason in 56 of those cases, making them the most common cause of restricted availability.
The study reviewed releases from major AI developers, including Meta, Google, OpenAI and Anthropic. Meta recorded the highest proportion of delayed or unavailable releases, with 26 percent of its AI models delayed or withheld in the EU and 15 percent in the UK. Anthropic’s Claude 3 Opus was highlighted as one example, with its web application arriving in the EU 71 days later than in the United States.
According to the report, data protection rules have emerged as the biggest regulatory hurdle, particularly for AI systems capable of processing images, audio and real-time video rather than text alone.
The researchers argued that uncertainty surrounding the application of the General Data Protection Regulation (GDPR) to AI model training and deployment has created additional challenges for developers. They also said enforcement of data protection rules has generally been stricter within the EU than in the UK, despite both jurisdictions sharing similar legal foundations following the adoption of the GDPR before Britain’s exit from the bloc.
The report noted that the full impact of newer legislation, including the Digital Markets Act, which began taking effect in 2023, and the Artificial Intelligence Act, adopted in 2024, has yet to be fully reflected in the data.
At the same time, the European Union is reviewing proposals aimed at making data rules more practical for AI development through its Digital Omnibus initiative. Lawmakers are also considering changes to copyright legislation and the AI Act’s copyright provisions to strengthen protections for creators, measures that researchers say could affect future AI model availability if implemented too strictly.
John Lidiard, a UK AI policy researcher and one of the report’s authors, said policymakers should consider the impact that regulatory barriers can have on businesses and consumers seeking access to the latest AI technologies. He said balancing innovation with effective oversight would remain a key challenge as governments continue to develop AI regulations.
Tech
French Startups Face Political Uncertainty as AI Reshapes Innovation Landscape
Tech
AI Security Test Reveals Vulnerabilities in US Government Systems Within Hours
-
Entertainment2 years agoMeta Acquires Tilda Swinton VR Doc ‘Impulse: Playing With Reality’
-
Sports2 years agoChina’s Historic Olympic Victory Sparks National Pride Amid Controversy
-
Business2 years agoSaudi Arabia’s Model for Sustainable Aviation Practices
-
Business2 years agoRecent Developments in Small Business Taxes
-
Home Improvement2 years agoEffective Drain Cleaning: A Key to a Healthy Plumbing System
-
Politics2 years agoWho was Ebrahim Raisi and his status in Iranian Politics?
-
Sports2 years agoKeely Hodgkinson Wins Britain’s First Athletics Gold at Paris Olympics in 800m
-
Business2 years agoCarrectly: Revolutionizing Car Care in Chicago
