Expert-in-the-Loop: Strategies for Scaling the World's Best Human Knowledge

As AI grows more sophisticated, we need more sophisticated data to improve it.

Already today, we’re seeing declining demand for ‘sweatshop data’ and an increased need for data from skilled experts. Code samples from senior engineers, expert-annotated medical cases showing diagnostic thinking, and curated creative writing samples are now more valuable than bulk scraped content.

As this trend continues, it increasingly demands higher and higher levels of human expertise, creating a fundamental bottleneck where ever-more-specialized knowledge—not computing power or raw data—becomes the scarcest resource constraining progress.

We’ve found that this is especially true for subjective domains, such as news or social issues, where careful, nuanced reasoning is required. 

Unlike more technical fields where expertise can be more easily validated, these areas require human judgment calls about bias, context, cultural sensitivity, and ethical implications—the kind of sophisticated reasoning that should only come from seasoned domain specialists who understand not just what to think, but how to think responsibly about complex, contested issues.

For example, we've found that annotating for bias requires a nuanced and constantly evolving understanding of the political landscape. Within the conservative movement alone, there are diverse opinions on many topics, and understanding how to label data for these varying perspectives requires deep expertise in both politics and the specific subject matter.

The path forward: scaling expertise.

As AI demands higher and higher levels of human expertise, we need to get really good at scaling experts. Unlike today's approach of hiring thousands of people for millions of hours, highly skilled experts are scarce, expensive, and busy.

At Forum AI, we work with some of the world's most reputable experts to create data for the most challenging, sensitive topics. Throughout this work, we've identified several tactics for effectively scaling expert involvement.

Tactic 1 - Scenario Selection: Identifying the Optimal Scenarios for Generalization

We've found that AI can effectively scale expertise to adjacent scenarios when those scenarios are carefully chosen.

For example, in building our expert-in-the-loop annotation system, we created a hierarchy of news topics—from broad verticals like geopolitics → subverticals like global conflicts → specific narratives like the Russia-Ukraine conflict → individual stories. We also mapped out the different label types such as bias, missing context, editorial significance, and source credibility. 

This mapping of our domain allows us to systematically test where expert input generalizes well versus where it requires more specificity.

We discovered that expert input generalizes differently across levels. For example, bias judgments work well at the subvertical level, while editorial significance requires narrative-level specificity. This allows us to strategically optimize expert usage—we can gather bias labels more broadly at the subvertical level with fewer experts, but focus greater expert attention on editorial significance at more granular levels.

Over time, this structure enables surgical deployment of human expertise while leveraging AI generalization wherever possible.

Tactic 2 - Expert Selection: Surgically Involving the Right Experts at the Right Time

The second tactic is focused on using the right experts for the right scenarios. 

Starting with our domain structures from above, we can map experts to different specialties based on where they uniquely demonstrate the greatest accuracy and knowledge. For instance, if one expert excels at annotating the accuracy of data covering global conflicts, we focus their efforts there rather than areas where they're less effective. 

But surgical involvement goes further than basic mapping. We build detailed internal profiles of our experts including work history, published topics, life experiences, and affiliations. For example, when we're looking to gather insights from experts to generate net new content as part of our retrieval or training data offerings, we use LLMs to search through these profiles and identify where each expert can provide uniquely valuable input. 

Interestingly, this approach has allowed us to leverage experts in more ways than we would have expected. For example, our detailed profiles revealed we could also leverage Economics experts to analyze trade implications in geopolitical stories like the Trump/Putin meeting. 

Tactic 3 - Codifying Rationale: Capturing How Experts Think, Not Just What They Say

In complex subjective domains, an expert's reasoning process is often more valuable than their final judgment. Rather than simply collecting annotations, we invest in systematically capturing and codifying how experts apply their judgment, enabling AI systems to replicate this sophisticated reasoning at scale.

For example, when assessing bias in news data, experts don't just render a verdict. They follow a structured thought process: evaluating source credibility and potential conflicts of interest, mapping the broader narrative context to identify what's been omitted, and decomposing content into distinct factual claims that can be assessed independently. Each of these steps represents learnable reasoning patterns.

Capturing this level of nuance requires deep, iterative collaboration with experts. Through structured feedback loops, we document not just their conclusions but their decision frameworks: the heuristics, edge cases, and contextual factors that inform their judgment. This transforms expert knowledge from individual assessments into replicable cognitive processes that AI can learn and apply consistently across thousands of scenarios.

The future of AI progress increasingly depends on our ability to capture and scale the world's best human expertise. As models become more sophisticated, the bottleneck shifts from raw computational power to accessing the nuanced judgment and specialized knowledge that only true experts possess.

The tactics we've outlined—strategic scenario selection, surgical expert involvement, and comprehensive rubric development—are probably just the beginning of what's possible. At Forum AI, we're continuing to refine these approaches as we work with leading experts across domains.

If you're building AI systems that need expert-level training data, or if you're interested in implementing these expert-scaling tactics in your own work, we'd love to hear from you. Reach out to discuss how we can help unlock the expertise your AI systems need to reach their full potential.

Recent posts

Latest from us