Arena Blog

Latest

Agent Arena: Causal Evaluation of Agents in the Real World

Agent Arena: Causal Evaluation of Agents in the Real World

Agents are increasingly doing real work. The resulting task distribution has greatly expanded. We desire an agent evaluation that scales along with usage and capability.

Empowering Users to Get More Done With Agent Mode

Empowering Users to Get More Done With Agent Mode

New Categories for Web Development in Code Arena

New Categories for Web Development in Code Arena

Multimodal Max

Multimodal Max

Arena Leaderboard Dataset

Arena Leaderboard Dataset

March 2026: Arena Updates across Product, Leaderboard Rankings & Research

March 2026: Arena Updates across Product, Leaderboard Rankings & Research

Inside BullshitBench: AI Models and Nonsense Detection

Inside BullshitBench: AI Models and Nonsense Detection

Research

New Categories for Web Development in Code Arena

New Categories for Web Development in Code Arena

AI coding models are increasingly used to build web apps, but aggregated leaderboards obscure key performance differences. After analyzing 250k+ Code Arena prompts, we identified major front-end task categories and built new leaderboard views to compare model strengths and weaknesses.

Multimodal Max

Multimodal Max

Supporting Independent Research in AI Evaluation

Supporting Independent Research in AI Evaluation

Introducing Max

Introducing Max

Studying the Frontier: Arena Expert

Studying the Frontier: Arena Expert

Arena's Ranking Method

Arena's Ranking Method

Arena Expert and Occupational Categories

Arena Expert and Occupational Categories

News

Supporting Independent Research in AI Evaluation

Supporting Independent Research in AI Evaluation

Arena’s Academic Partnerships Program provides funding and support for independent research advancing the scientific foundations of AI evaluation.

Introducing Max

Introducing Max

LMArena is now Arena

LMArena is now Arena

Video Arena Is Live on Web

Video Arena Is Live on Web

Fueling the World’s Most Trusted AI Evaluation Platform

Fueling the World’s Most Trusted AI Evaluation Platform

Arena's Ranking Method

Arena's Ranking Method

The Next Stage of AI Coding Evaluation Is Here

The Next Stage of AI Coding Evaluation Is Here