Tech
Study Finds AI Models Get Basic Math Wrong Around 40 Percent of the Time
Artificial intelligence (AI) tools are increasingly used for everyday calculations, but a new study suggests users should approach their answers with caution. Researchers from the Omni Research on Calculation in AI (ORCA) found that when tested on 500 real-world math prompts, AI models had roughly a 40 percent chance of producing an incorrect result.
The study evaluated five widely used AI systems in October 2025: ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), Claude 4.5 Sonnet (Anthropic), DeepSeek V3.2 (DeepSeek AI), and Grok-4 (xAI). None of the models scored above 63 percent overall, with Gemini leading at 63 percent, Grok close behind at 62.8 percent, and DeepSeek at 52 percent. ChatGPT-5 scored 49.4 percent, while Claude trailed at 45.2 percent. The average accuracy across all five models was 54.5 percent.
“Although the exact rankings might shift if we repeated the benchmark today, the broader conclusion would likely remain the same: numerical reliability remains a weak spot across current AI models,” said Dawid Siuda, co-author of the ORCA Benchmark.
Performance varied across categories. AI models performed best in basic math and conversions, with Gemini achieving 83 percent accuracy and Grok 76.9 percent. ChatGPT-5 scored 66.7 percent in the same category, giving a combined average of 72.1 percent—the highest across the seven tested categories. Physics proved the most challenging, with overall accuracy dropping to 35.8 percent. Grok led this category at 43.8 percent, while Claude scored just 26.6 percent.
Some AI systems struggled more than others in specific fields. DeepSeek recorded only 10.6 percent accuracy in biology and chemistry, meaning it failed nearly nine out of ten questions. In finance and economics, Gemini and Grok reached 76.7 percent, while the other three models scored below 50 percent.
The study also categorized the types of mistakes AI makes. “Sloppy math” errors, including miscalculations or rounding issues, accounted for 68 percent of mistakes. Faulty logic errors represented 26 percent, reflecting incorrect formulas or assumptions. Misreading instructions accounted for 5 percent, while some AI simply refused to answer. Siuda noted that multi-step calculations with rounding were particularly prone to error.
The research highlights the importance of verifying AI-generated calculations. “If the task is critical, use calculators or proven sources, or at least double-check with another AI,” Siuda advised.
All 500 prompts used in the study had one correct answer and were designed to reflect everyday math tasks, including statistics, finance, physics, and basic arithmetic. The findings indicate that while AI can assist with calculations, it remains unreliable for precise numerical work and users should remain cautious when relying on these tools.
Tech
European Governments Move to Cut Dependence on Palantir Amid Rising Security and Privacy Concerns
Tech
Microsoft Unveils In-House AI Models and Quantum Breakthrough as Tech Giant Moves to Reduce External Dependence
Microsoft has taken a major step toward reducing its reliance on external artificial intelligence partners, unveiling seven in-house AI models at its Build 2026 developer conference in San Francisco. The move signals a strategic shift as the company seeks greater control over its AI stack while its key investee firms prepare for high-profile public listings.
Satya Nadella, Microsoft’s chief executive, told attendees that the industry is entering a new phase in which companies must do more than simply consume frontier AI systems. “We believe the time has come for every company to move from consuming a frontier model to fully participating at the frontier,” he said.
At the centre of the announcement is MAI-Thinking-1, Microsoft’s first reasoning model built entirely from scratch using commercially licensed data and without distillation from external systems. The model includes 35 billion active parameters and a 256,000-token context window, designed for complex reasoning tasks, coding, and long-form instruction handling.
Microsoft also introduced MAI-Code-1-Flash, a coding-focused model integrated into GitHub Copilot and Visual Studio Code, aimed at converting natural language prompts into functional software code. The company said these tools will run on Azure infrastructure, allowing it to reduce costs currently paid to external model providers and potentially offer cheaper services to developers.
Mustafa Suleyman, chief executive of Microsoft AI, said internal testing suggested strong performance gains. After optimisation for consulting firm McKinsey, he said the new models outperformed OpenAI’s GPT-5.5 in quality while offering what Microsoft estimates as up to ten times better cost efficiency, based on scaled public pricing comparisons.
In independent evaluations conducted by Surge, Microsoft’s third-party rating partner, MAI-Thinking-1 was reportedly preferred over Anthropic’s Claude Sonnet 4.6, while matching Claude Opus 4.6 on coding benchmarks.
Alongside its AI announcements, Microsoft revealed progress in quantum computing. The company’s new Majorana 2 chip is said to be 1,000 times more stable than its predecessor, extending qubit lifespan from milliseconds to an average of 20 seconds. While still far from practical deployment, Microsoft believes this marks a meaningful step toward scalable quantum machines.
Zulfi Alam, corporate vice president of Microsoft Quantum, said the company aims to deliver a commercially useful quantum system by 2029, though current prototypes contain only 12 qubits, far short of the millions required for full-scale systems.
The announcements come as Microsoft’s AI partners move toward public markets. Anthropic has filed confidentially for an IPO following a major funding round valuing it at $965 billion, while OpenAI is also preparing a filing. Microsoft has invested heavily in both companies, committing billions of dollars while integrating their models into Azure.
The new direction suggests Microsoft is positioning itself to compete directly with its own partners, as the race for dominance in advanced AI and next-generation computing intensifies.
Tech
Estonia’s AI Education Model Draws Attention as Europe Debates Digital Learning
As European governments weigh how to integrate artificial intelligence into classrooms and allocate funding for digital literacy, Estonia’s approach to AI education is gaining attention as a practical and structured model.
The Baltic nation’s AI Leap programme is designed not only to teach students how to use artificial intelligence tools but also to strengthen critical thinking and teacher involvement at a time when AI is becoming deeply embedded in everyday learning.
Concerns have grown across Europe that while students are increasingly comfortable using AI tools, many struggle to evaluate or question the information these systems generate. Educators and employers have raised concerns that overreliance on chatbots and automated tools could weaken analytical thinking and increase vulnerability to misinformation.
Estonia has chosen to address this challenge directly rather than attempting to limit student exposure to AI.
According to the AI Leap programme, between 64% and 90% of Estonian students were already using AI tools before the initiative began. Programme organisers argued that ignoring this reality could undermine learning and reasoning skills.
The initiative aims to train 48,000 students and 6,700 teachers over two years in a country with a population of just 1.36 million.
The programme has two primary goals: helping teachers adapt to AI-assisted education and encouraging students to develop responsible, thoughtful AI habits.
To support this effort, Estonia has introduced several key measures. Teachers participate in study circles that meet monthly to develop teaching methods and exchange experiences. A central online platform provides educational resources, videos, self-assessment tools and discussion forums.
More than 4,000 teachers are also receiving premium access to advanced AI platforms such as ChatGPT and Gemini to support lesson planning and classroom preparation.
One of the programme’s most distinctive features is a Socratic-style chatbot designed to guide students rather than provide direct answers. The chatbot encourages questioning, self-management and contextual thinking, helping students assess AI-generated information instead of accepting it automatically.
The programme also includes debate leagues, creative arts projects and student-led initiatives aimed at encouraging discussion and experimentation with AI beyond formal classroom settings.
Estonia has placed strong emphasis on management and implementation. School principals oversee local delivery, while nine regional managers coordinate activities across seven educational regions. The initiative operates through a public-private partnership, with the government providing half of the funding and private partners contributing the remainder.
Technology companies, educators and researchers are involved in designing and testing tools tailored to Estonia’s education system.
Education analysts say Estonia’s strategy highlights a broader lesson for Europe: AI literacy may depend less on limiting technology and more on teaching students how to use it thoughtfully, critically and responsibly.
-
Entertainment2 years agoMeta Acquires Tilda Swinton VR Doc ‘Impulse: Playing With Reality’
-
Sports2 years agoChina’s Historic Olympic Victory Sparks National Pride Amid Controversy
-
Business2 years agoSaudi Arabia’s Model for Sustainable Aviation Practices
-
Business2 years agoRecent Developments in Small Business Taxes
-
Home Improvement2 years agoEffective Drain Cleaning: A Key to a Healthy Plumbing System
-
Politics2 years agoWho was Ebrahim Raisi and his status in Iranian Politics?
-
Sports2 years agoKeely Hodgkinson Wins Britain’s First Athletics Gold at Paris Olympics in 800m
-
Business2 years agoCarrectly: Revolutionizing Car Care in Chicago
