Tech

Study Finds AI Models Get Basic Math Wrong Around 40 Percent of the Time

Published

4 months ago

December 30, 2025

Artificial intelligence (AI) tools are increasingly used for everyday calculations, but a new study suggests users should approach their answers with caution. Researchers from the Omni Research on Calculation in AI (ORCA) found that when tested on 500 real-world math prompts, AI models had roughly a 40 percent chance of producing an incorrect result.

The study evaluated five widely used AI systems in October 2025: ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), Claude 4.5 Sonnet (Anthropic), DeepSeek V3.2 (DeepSeek AI), and Grok-4 (xAI). None of the models scored above 63 percent overall, with Gemini leading at 63 percent, Grok close behind at 62.8 percent, and DeepSeek at 52 percent. ChatGPT-5 scored 49.4 percent, while Claude trailed at 45.2 percent. The average accuracy across all five models was 54.5 percent.

“Although the exact rankings might shift if we repeated the benchmark today, the broader conclusion would likely remain the same: numerical reliability remains a weak spot across current AI models,” said Dawid Siuda, co-author of the ORCA Benchmark.

Performance varied across categories. AI models performed best in basic math and conversions, with Gemini achieving 83 percent accuracy and Grok 76.9 percent. ChatGPT-5 scored 66.7 percent in the same category, giving a combined average of 72.1 percent—the highest across the seven tested categories. Physics proved the most challenging, with overall accuracy dropping to 35.8 percent. Grok led this category at 43.8 percent, while Claude scored just 26.6 percent.

Some AI systems struggled more than others in specific fields. DeepSeek recorded only 10.6 percent accuracy in biology and chemistry, meaning it failed nearly nine out of ten questions. In finance and economics, Gemini and Grok reached 76.7 percent, while the other three models scored below 50 percent.

The study also categorized the types of mistakes AI makes. “Sloppy math” errors, including miscalculations or rounding issues, accounted for 68 percent of mistakes. Faulty logic errors represented 26 percent, reflecting incorrect formulas or assumptions. Misreading instructions accounted for 5 percent, while some AI simply refused to answer. Siuda noted that multi-step calculations with rounding were particularly prone to error.

The research highlights the importance of verifying AI-generated calculations. “If the task is critical, use calculators or proven sources, or at least double-check with another AI,” Siuda advised.

All 500 prompts used in the study had one correct answer and were designed to reflect everyday math tasks, including statistics, finance, physics, and basic arithmetic. The findings indicate that while AI can assist with calculations, it remains unreliable for precise numerical work and users should remain cautious when relying on these tools.

Tech

EU Accuses Meta of Failing to Keep Under-13s Off Facebook and Instagram

Published

3 days ago

April 29, 2026

Web Reporter

European Union regulators have issued preliminary findings against Meta Platforms, saying the company has failed to effectively prevent children under the age of 13 from using Facebook and Instagram.

The European Commission said its investigation found that Meta’s current safeguards do not meet the requirements of the Digital Services Act, the bloc’s landmark online safety law.

Although Meta’s terms of service require users to be at least 13 years old, regulators said the company’s age-verification systems are insufficient. Children can reportedly create accounts simply by entering a false date of birth, with no effective mechanism in place to confirm their real age.

According to the Commission, between 10% and 12% of children under 13 in the European Union are using Facebook or Instagram. That figure is significantly higher than Meta’s own internal estimates.

Regulators also said Meta failed to adequately consider established scientific research showing that younger children are particularly vulnerable to potential harms associated with social media use, including exposure to inappropriate content and risks to mental well-being.

Meta has rejected the Commission’s preliminary conclusions. In a statement, the company said both Facebook and Instagram are intended only for users aged 13 and older and that it already has systems in place to identify and remove underage accounts.

The company added that it continues to invest in technologies designed to detect younger users and indicated that additional safety measures will be announced in the coming days.

Meta also argued that determining a user’s true age remains a challenge across the technology industry and said a broader, industry-wide solution is needed. The company pledged to continue working with European regulators on the issue.

The findings come as several EU member states consider introducing wider restrictions on children’s access to social media, including proposals to ban use by those under 15.

To address the problem, the European Union is preparing to launch its own age-verification app. European Commission President Ursula von der Leyen said earlier this month that the technology is ready for rollout, although no official launch date has been announced.

Meta now has the opportunity to review the Commission’s findings and submit a formal response.

If the preliminary conclusions are upheld, the Commission could issue a binding non-compliance ruling. Under the Digital Services Act, penalties can reach up to 6% of a company’s global annual revenue, potentially exposing Meta to fines worth billions of euros.

Tech

Europe Emerges as Rising Hub in Global Race for AI Talent

Published

3 days ago

April 29, 2026

Web Reporter

Europe is strengthening its position in the global competition for artificial intelligence talent, as stricter U.S. immigration rules and shifting international workforce trends encourage more professionals to consider careers across the continent.

A new study by the Germany-based think tank Interface found that countries including Ireland, Germany and the Netherlands are increasingly attracting AI specialists, helping Europe establish itself as a major global market for skilled technology workers.

The research, based on data from workforce intelligence firm Revelio Labs, analysed 1.6 million AI professionals worldwide. It found that while the United States and India remain the dominant players, Europe is emerging as a strong third centre for AI expertise.

The United States continues to lead in advanced AI engineering and research roles, while India remains particularly competitive in software development and non-technical positions. Both countries have close to one million AI professionals.

Within Europe, the United Kingdom ranks as the world’s third-largest AI labour market, with around 145,000 professionals. Germany has become one of the continent’s standout performers, boasting approximately 17,000 AI engineers, the fourth-highest total globally.

Several other European nations, including Italy, France and the Netherlands, also rank among the world’s top 10 markets by total AI workforce.

On a per-capita basis, however, smaller countries are proving especially competitive. Ireland ranks second globally behind Singapore, with 4.19 AI professionals for every 1,000 residents. Switzerland, Luxembourg, the Netherlands and Denmark also place among the world’s leading markets by population.

The Netherlands has become an increasingly attractive destination for American AI professionals relocating to Europe. It now has the highest number of AI engineers within the European Union, although investment in Dutch AI start-ups remains below the European average.

European cities are also gaining prominence. Munich, Amsterdam and Berlin are the only cities in Europe to rank among the world’s top 25 for concentration of AI professionals.

The study also highlighted the growing importance of Indian talent to Europe’s AI ambitions. Indians now account for more than 16% of the global AI workforce, with an increasing number choosing Europe for education and employment.

Across the European Union, the share of Indian AI professionals rose from 7.7% in 2024 to 8.3% in 2025. Ireland has seen particularly strong growth, with Indian professionals now making up nearly 30% of its AI workforce.

Researchers said Europe’s ability to develop domestic talent while continuing to attract skilled workers from abroad will be critical to maintaining its growing role in the rapidly evolving AI sector.

Tech

Study Finds Chatbots Can Mirror Hostility in Heated Exchanges

Published

1 week ago

April 23, 2026

Web Reporter

A new academic study has found that ChatGPT can produce abusive language when exposed to escalating human conflict, raising fresh concerns about how artificial intelligence behaves in tense interactions.

The research, published in the Journal of Pragmatics, examined how the chatbot responded to arguments that gradually became more hostile. Researchers presented the system with a sequence of five increasingly heated exchanges and asked it to generate what it considered the most plausible reply.

According to the findings, the AI’s tone shifted as the conversations intensified. While early responses remained measured, later replies began to mirror the aggression in the prompts. In some cases, the chatbot produced insults, profanity and even threats.

Examples cited in the study included statements such as “you should be ashamed of yourself” and more explicit language involving personal threats. The researchers said this pattern suggests that prolonged exposure to hostile input can push the system beyond its usual safeguards.

The study was co-authored by Vittorio Tantucci and Jonathan Culpeper at Lancaster University. Tantucci said the results show that AI can “escalate” alongside human users, potentially overriding built-in mechanisms designed to limit harmful responses.

“When humans escalate, AI can escalate too,” he said, noting that this behavior raises questions about how such systems should be deployed in sensitive environments.

Despite the concerning examples, the researchers found that the chatbot was generally less aggressive than human participants in similar scenarios. In some cases, it attempted to defuse tension through sarcasm or indirect responses rather than direct confrontation.

For instance, when faced with a threat during a simulated dispute, the AI responded with a sarcastic remark rather than escalating the situation further. This suggests that while the system can adopt hostile language, it may also attempt to manage conflict in less direct ways.

The findings add to ongoing debates about the role of artificial intelligence in areas such as mediation, customer service and online communication, where systems may encounter emotionally charged interactions.

Experts say the research highlights the importance of continued testing and refinement of AI safety measures, particularly as such tools are increasingly used in real-world settings involving human conflict.

OpenAI, the developer of ChatGPT, had not issued a public response to the study at the time of publication.