Large Language Models Benchmarks

Small Language Models Outperform Frontier AI On Cost, Speed And Accuracy

Bigger has defined AI from day one. New data says task-specific small models beat frontier LLMs on accuracy, cost and speed — and save money.

Geeky Gadgets

AI Benchmarks Are Broken : The Leaderboard Illusion

What if the tools we trust to measure progress are actually holding us back? In the rapidly evolving world of large language models (LLMs), AI benchmarks and leaderboards have become the gold standard ...

Becker's Hospital Review

ChatGPT, Gemini, Claude outperform clinical AI tools: Study

ChatGPT Gemini Claude beat clinical AI tools on medical benchmarks, outperforming OpenEvidence and UpToDate in accuracy and clinician alignment.

13d

AI has passed the test but not the exam: Why ‘Humanity’s Last Exam’ matters

There is a temptation, when AI systems begin to outperform human baselines on established tests, to interpret this as a sign ...

SiliconANGLE

Elon Musk’s xAI sets AI benchmark records with new reasoning-optimized Grok 4 model

Elon Musk’s xAI Holdings Corp. has debuted a new large language model, Grok 4, that’s optimized for reasoning tasks such as generating code. The LLM’s late Wednesday launch followed a turbulent week ...

Sakana Ai

Find Sakana Ai Latest News, Videos & Pictures on Sakana Ai and see latest updates, news, information from NDTV.COM. Explore more on Sakana Ai.

10d

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

It allows engineering teams to host frontier-level AI on their own sovereign infrastructure, entirely eliminating vendor lock ...

20h

Fable 5 is Expected to Return Soon With New Enterprise Features

Fable 5 is expected to return with enterprise features, while OpenAI introduces its Jalapeno inference chip to solve 2026 ...

9don MSN

China's Z.ai GLM-5.2 tops OpenAI’s GPT 5.5 model on key benchmarks

Chinese startup Z.ai has launched GLM-5.2, a powerful AI model for complex coding projects. This new large language model ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results