Connect with us

Tech

Study Finds AI Models Get Basic Math Wrong Around 40 Percent of the Time

Published

on

Artificial intelligence (AI) tools are increasingly used for everyday calculations, but a new study suggests users should approach their answers with caution. Researchers from the Omni Research on Calculation in AI (ORCA) found that when tested on 500 real-world math prompts, AI models had roughly a 40 percent chance of producing an incorrect result.

The study evaluated five widely used AI systems in October 2025: ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), Claude 4.5 Sonnet (Anthropic), DeepSeek V3.2 (DeepSeek AI), and Grok-4 (xAI). None of the models scored above 63 percent overall, with Gemini leading at 63 percent, Grok close behind at 62.8 percent, and DeepSeek at 52 percent. ChatGPT-5 scored 49.4 percent, while Claude trailed at 45.2 percent. The average accuracy across all five models was 54.5 percent.

“Although the exact rankings might shift if we repeated the benchmark today, the broader conclusion would likely remain the same: numerical reliability remains a weak spot across current AI models,” said Dawid Siuda, co-author of the ORCA Benchmark.

Performance varied across categories. AI models performed best in basic math and conversions, with Gemini achieving 83 percent accuracy and Grok 76.9 percent. ChatGPT-5 scored 66.7 percent in the same category, giving a combined average of 72.1 percent—the highest across the seven tested categories. Physics proved the most challenging, with overall accuracy dropping to 35.8 percent. Grok led this category at 43.8 percent, while Claude scored just 26.6 percent.

Some AI systems struggled more than others in specific fields. DeepSeek recorded only 10.6 percent accuracy in biology and chemistry, meaning it failed nearly nine out of ten questions. In finance and economics, Gemini and Grok reached 76.7 percent, while the other three models scored below 50 percent.

See also  EU’s Data Union Strategy Seeks to Boost AI and Cross-Border Data Use, but GDPR Stays Untouched

The study also categorized the types of mistakes AI makes. “Sloppy math” errors, including miscalculations or rounding issues, accounted for 68 percent of mistakes. Faulty logic errors represented 26 percent, reflecting incorrect formulas or assumptions. Misreading instructions accounted for 5 percent, while some AI simply refused to answer. Siuda noted that multi-step calculations with rounding were particularly prone to error.

The research highlights the importance of verifying AI-generated calculations. “If the task is critical, use calculators or proven sources, or at least double-check with another AI,” Siuda advised.

All 500 prompts used in the study had one correct answer and were designed to reflect everyday math tasks, including statistics, finance, physics, and basic arithmetic. The findings indicate that while AI can assist with calculations, it remains unreliable for precise numerical work and users should remain cautious when relying on these tools.

Tech

European Governments Move to Cut Dependence on Palantir Amid Rising Security and Privacy Concerns

Published

on

European governments are increasingly seeking to reduce their reliance on US data analytics firm Palantir, as political scrutiny grows over the company’s role in defence, policing and intelligence operations across the continent. Officials in several countries have raised concerns about digital sovereignty, privacy risks and the long-term dependence on foreign technology providers in sensitive state systems.

In the Netherlands, State Secretary for Defence Derk Boswijk told parliament this week that the country aims to have a “fully fledged alternative” to Palantir within two years. He said the company has been used in a “very limited, compartmentalised, and small scale” capacity since 2010, but added that the government is now pursuing a “two-track policy” to reduce dependency and ensure independent capability in data-driven defence systems.

Dutch lawmakers are also examining broader concerns about the company’s role in government infrastructure. One politician, Michelle Jagtenberg, questioned Palantir’s suitability for public contracts, citing allegations of “racist and anti-democratic ideology.” The discussion follows a parliamentary motion passed in 2025 calling for reduced reliance on the firm and greater use of European-developed alternatives.

Similar concerns are emerging across Europe. A UK parliamentary report described Palantir’s systems as creating an “unacceptable point of weakness” in national infrastructure, while Switzerland has reportedly rejected at least nine bids from the company due to security considerations. Denmark is also working to identify domestic alternatives to replace its existing systems.

Palantir’s platforms are widely used for analysing large datasets in defence and intelligence operations. Its Gotham software has been linked to military targeting systems, which the company describes as supporting an “AI-powered kill chain” for decision-making. Critics, however, argue that such tools raise serious ethical and transparency concerns, particularly when deployed in conflict zones or law enforcement.

See also  Study Finds Several AI Chatbots Responded to Requests About Violent Attacks

The company’s leadership has also drawn controversy. Co-founder Peter Thiel and CEO Alex Karp have both faced criticism for remarks about the use of military technology. Karp has previously described Palantir’s software as a tool intended to “disrupt” and, when necessary, be used in lethal operations, comments that have intensified public debate over the company’s role in modern warfare.

Human rights organisations, including Amnesty International, have raised concerns about Palantir’s handling of sensitive data, including healthcare information processed through contracts with the UK’s National Health Service. The group has warned about risks related to privacy, transparency and the scale of data access granted under government agreements.

Despite growing opposition, Palantir continues to hold contracts across Europe. The United Kingdom remains a major client, including defence agreements worth hundreds of millions of pounds. Germany, Spain and Denmark also continue to use the company’s systems in varying capacities, although several governments are now actively exploring alternatives from European technology providers.

As debates intensify over digital sovereignty and national security, European policymakers appear increasingly divided between maintaining existing systems and building independent data infrastructure free from reliance on US defence technology firms.

Continue Reading

Tech

Microsoft Unveils In-House AI Models and Quantum Breakthrough as Tech Giant Moves to Reduce External Dependence

Published

on

Microsoft has taken a major step toward reducing its reliance on external artificial intelligence partners, unveiling seven in-house AI models at its Build 2026 developer conference in San Francisco. The move signals a strategic shift as the company seeks greater control over its AI stack while its key investee firms prepare for high-profile public listings.

Satya Nadella, Microsoft’s chief executive, told attendees that the industry is entering a new phase in which companies must do more than simply consume frontier AI systems. “We believe the time has come for every company to move from consuming a frontier model to fully participating at the frontier,” he said.

At the centre of the announcement is MAI-Thinking-1, Microsoft’s first reasoning model built entirely from scratch using commercially licensed data and without distillation from external systems. The model includes 35 billion active parameters and a 256,000-token context window, designed for complex reasoning tasks, coding, and long-form instruction handling.

Microsoft also introduced MAI-Code-1-Flash, a coding-focused model integrated into GitHub Copilot and Visual Studio Code, aimed at converting natural language prompts into functional software code. The company said these tools will run on Azure infrastructure, allowing it to reduce costs currently paid to external model providers and potentially offer cheaper services to developers.

Mustafa Suleyman, chief executive of Microsoft AI, said internal testing suggested strong performance gains. After optimisation for consulting firm McKinsey, he said the new models outperformed OpenAI’s GPT-5.5 in quality while offering what Microsoft estimates as up to ten times better cost efficiency, based on scaled public pricing comparisons.

See also  Mary Meeker: AI Is the Fastest Tech Shift in History, Outpacing Even the Internet

In independent evaluations conducted by Surge, Microsoft’s third-party rating partner, MAI-Thinking-1 was reportedly preferred over Anthropic’s Claude Sonnet 4.6, while matching Claude Opus 4.6 on coding benchmarks.

Alongside its AI announcements, Microsoft revealed progress in quantum computing. The company’s new Majorana 2 chip is said to be 1,000 times more stable than its predecessor, extending qubit lifespan from milliseconds to an average of 20 seconds. While still far from practical deployment, Microsoft believes this marks a meaningful step toward scalable quantum machines.

Zulfi Alam, corporate vice president of Microsoft Quantum, said the company aims to deliver a commercially useful quantum system by 2029, though current prototypes contain only 12 qubits, far short of the millions required for full-scale systems.

The announcements come as Microsoft’s AI partners move toward public markets. Anthropic has filed confidentially for an IPO following a major funding round valuing it at $965 billion, while OpenAI is also preparing a filing. Microsoft has invested heavily in both companies, committing billions of dollars while integrating their models into Azure.

The new direction suggests Microsoft is positioning itself to compete directly with its own partners, as the race for dominance in advanced AI and next-generation computing intensifies.

Continue Reading

Tech

Estonia’s AI Education Model Draws Attention as Europe Debates Digital Learning

Published

on

As European governments weigh how to integrate artificial intelligence into classrooms and allocate funding for digital literacy, Estonia’s approach to AI education is gaining attention as a practical and structured model.

The Baltic nation’s AI Leap programme is designed not only to teach students how to use artificial intelligence tools but also to strengthen critical thinking and teacher involvement at a time when AI is becoming deeply embedded in everyday learning.

Concerns have grown across Europe that while students are increasingly comfortable using AI tools, many struggle to evaluate or question the information these systems generate. Educators and employers have raised concerns that overreliance on chatbots and automated tools could weaken analytical thinking and increase vulnerability to misinformation.

Estonia has chosen to address this challenge directly rather than attempting to limit student exposure to AI.

According to the AI Leap programme, between 64% and 90% of Estonian students were already using AI tools before the initiative began. Programme organisers argued that ignoring this reality could undermine learning and reasoning skills.

The initiative aims to train 48,000 students and 6,700 teachers over two years in a country with a population of just 1.36 million.

The programme has two primary goals: helping teachers adapt to AI-assisted education and encouraging students to develop responsible, thoughtful AI habits.

To support this effort, Estonia has introduced several key measures. Teachers participate in study circles that meet monthly to develop teaching methods and exchange experiences. A central online platform provides educational resources, videos, self-assessment tools and discussion forums.

More than 4,000 teachers are also receiving premium access to advanced AI platforms such as ChatGPT and Gemini to support lesson planning and classroom preparation.

See also  Northvolt Collapse Raises Questions Over Europe’s Green Tech Ambitions

One of the programme’s most distinctive features is a Socratic-style chatbot designed to guide students rather than provide direct answers. The chatbot encourages questioning, self-management and contextual thinking, helping students assess AI-generated information instead of accepting it automatically.

The programme also includes debate leagues, creative arts projects and student-led initiatives aimed at encouraging discussion and experimentation with AI beyond formal classroom settings.

Estonia has placed strong emphasis on management and implementation. School principals oversee local delivery, while nine regional managers coordinate activities across seven educational regions. The initiative operates through a public-private partnership, with the government providing half of the funding and private partners contributing the remainder.

Technology companies, educators and researchers are involved in designing and testing tools tailored to Estonia’s education system.

Education analysts say Estonia’s strategy highlights a broader lesson for Europe: AI literacy may depend less on limiting technology and more on teaching students how to use it thoughtfully, critically and responsibly.

Continue Reading

Trending