Forum AI — NewsBench

Our methodology

Forum AI convenes leading experts to evaluate AI on the topics that matter.

Experts identify the test cases that matter most — the prompts where issues are likeliest to surface. Our judges are then calibrated to high agreement with expert consensus before any model is scored. View research

Fareed Zakaria

CNN Host & Author

Tony Blinken

Former Sec. of State

Kevin McCarthy

Former Speaker of the House

Niall Ferguson

Historian, Hoover Institution

Scott Jennings

Political Commentator

Jackie Reses

CEO, Lead Bank

Anne Neuberger

Former Deputy NSA

Sebastian Mallaby

Author, Senior Fellow CFR

Bethany McLean

Journalist & Author

Reihan Salam

President, Manhattan Institute

Ian Bremmer

Founder, Eurasia Group

Sebastian Kurz

Former Chancellor of Austria

Fareed Zakaria

CNN Host & Author

Tony Blinken

Former Sec. of State

Kevin McCarthy

Former Speaker of the House

Niall Ferguson

Historian, Hoover Institution

Scott Jennings

Political Commentator

Jackie Reses

CEO, Lead Bank

Anne Neuberger

Former Deputy NSA

Sebastian Mallaby

Author, Senior Fellow CFR

Bethany McLean

Journalist & Author

Reihan Salam

President, Manhattan Institute

Ian Bremmer

Founder, Eurasia Group

Sebastian Kurz

Former Chancellor of Austria

Subscribe to our newsletter to stay on top of the latest from Forum AI.

v1.0 · Updated May 2026

Neutrality Leaderboard

Do AI systems present all sides of the story?

Political and social debates rarely have a single correct answer, yet AI systems are increasingly asked to navigate them. We evaluate whether models present relevant perspectives fairly, without favoring one side, using loaded language, or embedding assumptions in the framing.

Overall Neutrality score

Ideological lean

When models fail Neutrality, they often lean left or right politically. We assess whether those non-neutral responses use language, framing, or conclusions that align with U.S. left-leaning views, U.S. right-leaning views, or other ideological perspectives.

Major findings

Subscribe to our newsletter to stay on top of the latest from Forum AI.

v1.0 · Updated May 2026

Source Quality Leaderboard

Are AI systems using reliable sources?

The credibility of an AI model's answer is only as good as the sources it draws from. We evaluate whether models rely on quality information like primary sources, peer-reviewed research, and reputable journalism. We also flag paid content and government-controlled media.

Average Source Quality score

Source tier breakdown

Major findings

Subscribe to our newsletter to stay on top of the latest from Forum AI.

v1.0 · Updated May 2026

Accuracy Leaderboard

Are AI systems covering the news accurately?

Factual errors in news contexts can mislead voters, spread misinformation, and undermine trust. We evaluate how accurately models represent verifiable claims, whether they hallucinate information, and how well they distinguish established facts from contested assertions.

Overall Accuracy score

Claim Accuracy Breakdown

Responses with at least one false claim

Share of model responses that contained one or more false claims — the breadth of factual errors across answers.

False-claim rate

Share of individual claims (across all responses) that were false — the density of factual errors within answers.

Major findings

Subscribe to our newsletter to stay on top of the latest from Forum AI.

v1.0 · Updated May 2026

Judge Health

This page is for internal use only and is not visible to the public.

Full Benchmark

By Criteria

Neutrality

Factuality

Source Quality

By Topic

Bias Assessment

Lean Detection Rate