Benchmark Model - Search News

Alibaba's model never trained as an agent — and improved agent performance across seven benchmarks

Real environments can't inject edge cases on demand. Alibaba's Qwen-AgentWorld simulates them — and outperformed ...

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

1mon

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing single-model systems from Anthropic and OpenAI by using more than 100 specialized AI ...

Tech Times

OpenAI Life Science Benchmark Reveals AI Passes Only 1 in 3 Scientific Research Tasks

AI life science benchmark LifeSciBench, published June 17 by OpenAI with 173 PhD scientists, shows frontier models clear only ...

Business Wire

MLCommons and AVCC Release Automotive Benchmark Proof-of-Concept

SAN FRANCISCO--(BUSINESS WIRE)--MLCommons ® and the Autonomous Vehicle Computing Consortium (AVCC) have achieved the first step toward a comprehensive MLPerf ® Automotive Benchmark Suite for AI ...

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

11d

ACE ROBOTICS’ Kairos World Model Leads Multiple Global Embodied-Intelligence Benchmarks

SHANGHAI, CHINA - Media OutReach Newswire - 15 June 2026 - ACE ROBOTICS today announced that its open-source Kairos ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results