Connect with us

Tech

Study Finds AI Models Get Basic Math Wrong Around 40 Percent of the Time

Published

on

Artificial intelligence (AI) tools are increasingly used for everyday calculations, but a new study suggests users should approach their answers with caution. Researchers from the Omni Research on Calculation in AI (ORCA) found that when tested on 500 real-world math prompts, AI models had roughly a 40 percent chance of producing an incorrect result.

The study evaluated five widely used AI systems in October 2025: ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), Claude 4.5 Sonnet (Anthropic), DeepSeek V3.2 (DeepSeek AI), and Grok-4 (xAI). None of the models scored above 63 percent overall, with Gemini leading at 63 percent, Grok close behind at 62.8 percent, and DeepSeek at 52 percent. ChatGPT-5 scored 49.4 percent, while Claude trailed at 45.2 percent. The average accuracy across all five models was 54.5 percent.

“Although the exact rankings might shift if we repeated the benchmark today, the broader conclusion would likely remain the same: numerical reliability remains a weak spot across current AI models,” said Dawid Siuda, co-author of the ORCA Benchmark.

Performance varied across categories. AI models performed best in basic math and conversions, with Gemini achieving 83 percent accuracy and Grok 76.9 percent. ChatGPT-5 scored 66.7 percent in the same category, giving a combined average of 72.1 percent—the highest across the seven tested categories. Physics proved the most challenging, with overall accuracy dropping to 35.8 percent. Grok led this category at 43.8 percent, while Claude scored just 26.6 percent.

Some AI systems struggled more than others in specific fields. DeepSeek recorded only 10.6 percent accuracy in biology and chemistry, meaning it failed nearly nine out of ten questions. In finance and economics, Gemini and Grok reached 76.7 percent, while the other three models scored below 50 percent.

See also  Privacy Concerns May Hinder AI Adoption in European Homes, Samsung Research Finds

The study also categorized the types of mistakes AI makes. “Sloppy math” errors, including miscalculations or rounding issues, accounted for 68 percent of mistakes. Faulty logic errors represented 26 percent, reflecting incorrect formulas or assumptions. Misreading instructions accounted for 5 percent, while some AI simply refused to answer. Siuda noted that multi-step calculations with rounding were particularly prone to error.

The research highlights the importance of verifying AI-generated calculations. “If the task is critical, use calculators or proven sources, or at least double-check with another AI,” Siuda advised.

All 500 prompts used in the study had one correct answer and were designed to reflect everyday math tasks, including statistics, finance, physics, and basic arithmetic. The findings indicate that while AI can assist with calculations, it remains unreliable for precise numerical work and users should remain cautious when relying on these tools.

Tech

Meta Launches Muse Spark, Its First Major AI Model in Nine Months

Published

on

Meta has unveiled its first major AI model in nine months, following a $14.3 billion (€12.24 billion) investment spree and executive hiring push to rival OpenAI and Google. The American tech company introduced the model, called Muse Spark, on Wednesday, claiming it is faster and smarter than its previous technologies.

The company, founded by Mark Zuckerberg, invested $14.3 billion in Scale AI in June 2025 and recruited its CEO and co-founder, Alexandr Wang, to oversee Meta Superintelligence Labs, which houses teams working on foundational AI models. Zuckerberg also embarked on a hiring campaign, bringing in executives from competitors including OpenAI, Anthropic, and Google.

In a blog post, Meta said, “Over the last nine months, Meta Superintelligence Labs rebuilt our AI stack from the ground up, moving faster than any development cycle we have run before. This initial model is small and fast by design, yet capable enough to reason through complex questions in science, math, and health. It is a powerful foundation, and the next generation is already in development.”

Muse Spark is positioned as a significant upgrade over Meta’s last major release, Llama 4, launched in April 2025. The company highlighted that the model excels in advanced reasoning, particularly in scientific, mathematical, and medical queries. To improve its health advice capabilities, Meta worked with over 1,000 physicians to curate training data, aiming for more accurate and comprehensive responses.

The AI model will power the company’s digital assistant in the Meta AI app and website, with planned integration across Facebook, Instagram, WhatsApp, Messenger, and the Ray-Ban Meta AI glasses. A “contemplating mode” will gradually roll out, allowing multiple AI agents to reason in parallel on complex tasks. Meta’s technical blog noted this feature is designed to compete with high-level reasoning in models such as Gemini Deep Think and GPT Pro.

See also  Microsoft Authenticator to Discontinue Password Access—Users Must Switch to Edge by August 1

Zuckerberg emphasized on social media that Meta aims to build AI products that “don’t just answer your questions but act as agents that do things for you.” Unlike conventional chatbots, these AI agents operate autonomously, gathering information based on user preferences to assist without direct human commands.

One notable shift for Meta is the move away from open-source AI models. Unlike earlier releases, Muse Spark is not available for public download, meaning access to the technology is currently restricted. The company said the model is initially available only in the United States.

Muse Spark underscores Meta’s aggressive push into the competitive AI market, combining extensive investment, executive recruitment, and technical innovation to challenge the dominance of established players like OpenAI and Google.

Continue Reading

Tech

OpenAI Urges Governments to Rethink Economy as AI Growth Accelerates

Published

on

OpenAI has called on governments to rethink the foundations of the economy, warning that artificial intelligence (AI) could soon surpass human intelligence and drastically change how people work, live, and pay taxes. The company outlined its initial policy ideas on Monday, aimed at mitigating the economic disruption caused by rapid AI adoption in the United States and worldwide.

One key proposal is the creation of a public wealth fund that would give citizens a direct stake in AI-driven economic growth. According to the policy document, the fund could invest in diversified, long-term assets, including AI companies and broader firms adopting AI technologies, with returns distributed to all citizens.

The company also suggested that governments encourage businesses to launch four-day workweek pilot programs without any reduction in pay. This approach aims to balance the productivity gains provided by AI with the well-being of workers. Lawmakers are also urged to modernize tax systems by increasing taxation on corporate income and capital gains instead of labor income, which could be affected by AI-related job losses. The report proposes additional measures, such as taxing companies that replace human labor with automation.

OpenAI recommends that social benefits, including retirement pensions and healthcare, be provided through portable accounts that follow individuals across different jobs, industries, and entrepreneurial ventures. This model would help ensure continuity of support in a labor market increasingly influenced by AI.

These recommendations echo broader discussions among AI leaders about the future of work. OpenAI CEO Sam Altman and xAI’s Elon Musk have previously highlighted universal basic income as a potential necessity as traditional employment declines. Other tech leaders, including Nvidia’s Jensen Huang and Zoom’s Eric Yuan, have advocated shorter workweeks to distribute productivity gains from AI more evenly.

See also  UpScrolled Emerges as Ethical Social Media Alternative at Web Summit Qatar 2026

Concerns about AI’s long-term impact extend beyond economics. In January, Anthropic CEO Dario Amodei warned that superintelligent AI, capable of outpacing human decision-making, poses “existential danger.” He suggested tighter controls on the export of key technologies, such as semiconductor chips used to train large language models, as one way to manage the risk. Amodei also called for transparency laws requiring AI companies to disclose how they guide their models’ behavior.

OpenAI’s policy document represents an early step in urging governments to address the structural changes AI may bring. The proposals highlight the need to rethink traditional concepts of work, taxation, and social support as the technology continues to advance rapidly.

As AI continues to reshape global economies, policymakers and industry leaders face increasing pressure to develop strategies that protect citizens while fostering innovation and sustainable growth.

Continue Reading

Tech

Uzbekistan to Produce Humanoid Robots in Partnership with South Korea

Published

on

Uzbekistan has signed an agreement with South Korea’s ROBOTIS to launch humanoid robot production, marking a major step in its high-tech ambitions. At the same time, students across the country are learning robotics and programming, gaining skills that could prepare them for careers in the emerging industry.

The agreement, signed between the UzElTechSanoat Association and ROBOTIS, sets out plans to establish humanoid robot production within Uzbekistan, develop manufacturing infrastructure, and train specialists for the growing robotics sector. ROBOTIS, known for its humanoid platforms and smart robotic actuators, will support the creation of technological foundations and help prepare a workforce capable of designing and operating advanced robotic systems.

The initiative forms part of Uzbekistan’s broader push to build a domestic innovation ecosystem, combining industrial cooperation with education. Early exposure to robotics and programming is at the heart of this strategy.

In a robotics classroom, 12-year-old Mirkomil Shodiev demonstrates the impact of these programs. Using an EVO-3 educational robotics kit, he assembles and programs his own robot, controlling its movements through lines of code. “This was created by me,” he says. “You connect it to a computer, write code, and it performs tasks using the motor.”

Mirkomil began IT classes four months ago, learning Scratch and now studying Python, a programming language widely used in web development, automation, and robotics. He hopes to build websites and earn money in the future, reflecting the growing importance of digital skills in Uzbekistan’s economy.

The government’s Digital Uzbekistan-2030 strategy is expanding nationwide training in programming and digital skills. IT education centres and specialised academies are growing to meet rising demand for technology careers. At the Robot Academy, where Mirkomil studies, students aged eight to fifteen gain hands-on experience in programming, robotics, and engineering. “Our students create scientific projects, develop games, and build Telegram bots,” says teacher Navruz Shaydullayev. “Programming helps develop their thinking, logic, and intellectual abilities.”

See also  Study Finds Chatbots May Encourage Harmful Behaviour by Excessively Agreeing with Users

Classroom projects emphasize translating digital commands into physical movement, a key principle behind robotics and industrial automation. Students learn to design, assemble, and control machines independently, building skills that can directly feed into the country’s industrial ambitions.

The partnership with ROBOTIS will extend these educational initiatives into the workforce, providing training for engineers, programmers, and technicians in humanoid robotics. Officials hope the program will strengthen Uzbekistan’s technological competitiveness and create highly skilled jobs in a fast-growing global sector.

For students like Mirkomil, the future is already taking shape. “In the future, I want to continue in this field,” he says. “After finishing the courses, I would like to study in Tashkent as well.” As Uzbekistan prepares to manufacture humanoid robots, classrooms across the country are quietly training the people who may one day build them.

Continue Reading

Trending