AI has conquered technical and creative challenges remarkably well. It writes code, generates images, and summarizes data in ways we could have never imagined. But as these systems move beyond knowledge into domains requiring wisdom, we face a question that will define the future of this technology: how do we teach AI to have good judgment?
Consider citizens turning to AI for advice on how to vote in an upcoming election, or a teenager looking to AI for support during a mental health crisis. These moments can’t be solved with just facts and data, they require judgment.
As we look ahead, the question isn't whether AI will shape our political discourse, guide our children, or influence our most important decisions. It’s whether we'll prepare it with the judgment these inevitable responsibilities demand.
Having worked on high stakes AI systems across Facebook News, Instagram Youth Safety, and Meta's AI lab, I've seen how tricky it can be to train AI to have good judgment. Get this right, and AI could be a transformative force for good. Get it wrong, and we risk embedding mediocrity and harmful biases into systems that may shape humanity for generations.
But today's approach to training AI systems isn’t built for this.
Most AI development relies on "data labeling at scale": mobilizing thousands of contract workers to evaluate model outputs quickly. This works well for technical applications. But for subjective domains that require human judgement, we face a stark choice: do we want AI trained by anyone available for contract work, or by the people humanity trusts as leading experts in their fields?
Having seen both approaches first-hand, I can tell you that the difference here is not just theoretical.
A junior psychologist working a side gig will evaluate mental health guidance differently than a researcher who has spent decades studying adolescent development. A graduate student labeling political content for cash brings a different perspective than a journalist who has covered conflicts across continents. In technical domains, these differences may not matter. In matters of judgment, they mean everything.
To bring high-caliber experts into the AI training loop, we need a new approach. This is what we’re building at Forum AI.
First, we need selectivity over scale in our networks. Instead of massive networks of annotators, we need smaller, curated networks of peer-reviewed experts: academics, industry veterans, and thought leaders whose judgment is trusted not just by their colleagues but by society at large.
Second, we need systems designed to scale the judgment of these elite experts for repetitive tasks like annotation. These experts' time is limited and expensive. By working closely with leading experts and institutions like Stanford Human-Centered AI, we've developed "expert-in-the-loop" AI systems that mirror expert annotation with high precision at scale: automating the repetitive while preserving the nuanced. Over time, we’ll have a large family of fine-tuned ‘expert judgement’ models at our disposal.
Third, experts need a seat at the table to inform critical decisions. By automating repetitive tasks like annotation, we free our experts’ time to focus on higher-leverage work like defining benchmarks and discussing recommendations. Experts shouldn't be labeling thousands of responses in back rooms, they should sit at the table where decisions are made, alongside technical leaders.
At Forum AI, we bring world-leading experts—renowned academics, seasoned journalists, industry leaders—into the AI development process alongside technical teams. We help to produce benchmarks, human evaluations, LLM judges, and targeted training data that brings expert-level intelligence to AI systems when it matters most.
I believe that the future of AI will be defined not by the systems that can do the most things, but by those that can do the most important things well. This future demands we choose expertise over expedience.
.png)
.png)
.png)