How the AI Review Works

When It Triggers
The moment a user submits a flag on a news article, the fact-check-ai bot is called and runs in the background — the user sees a spinner while it processes.

Step 1 — Scope Enforcement
The very first thing the function does is check content_type. If it's not news_article or article, the flag is immediately rejected with the message "Fact-checking only applies to news articles." This is enforced at both the Edge Function level AND the DB level via the submit_fact_check_flag RPC.

Step 2 — Context Assembly
Before calling Claude, the function assembles a rich context package:
DataSourcePurposeFull article textnews_articles.raw_content (up to 5000 chars)Core content for Claude to analyzeArticle headline, summary, category, sentimentnews_articlesSupporting contextSource name + trust scorenews_sourcesHow credible is the outlet publishing this?Coin prices at ingestionprice_snapshotsVerify price-related claimsFlagger's ViewCred tier + scoreuser_viewcred_scoresOracle flaggers get more weight than New usersPrior flags on same articlefact_check_flagsHas this been reviewed before? Was it cleared?3 recent verdicts (confirmed/cleared)fact_check_flagsFew-shot examples to calibrate Claude's judgment

Step 3 — Category-Specific Prompt
Claude gets a tailored focus instruction based on the flag category:

Misleading headline → Compare headline vs article body for exaggeration
Out of context → Check for missing timeframes or cherry-picked data
Price manipulation → Look for pump/dump language patterns
Misinformation → Identify the specific claim and verify it
Manipulated data → Check for implausible statistics
Unverified claim → Look for speculation presented as fact

Step 4 — Claude Sonnet with Web Search
The function calls Claude Sonnet (upgraded from Haiku) with the web search tool enabled. This means Claude can actually go look things up in real time — if an article claims "Bitcoin hit $200K today", Claude can search to verify rather than just reasoning from training data.
Claude is instructed to reason through 6 explicit steps before deciding:

What specific factual claim is being challenged?
Is this claim verifiable (vs opinion, prediction, satire)?
Does the flagger's evidence support their claim?
What does the source's credibility score tell us?
What red flags exist in the language, framing, or missing context?
Confidence score (0.0–1.0)

Step 5 — Confidence-Based Routing
Claude returns a structured JSON verdict. The confidence score determines what happens next:
ConfidenceDecisionStatus≥ 0.80Pass (high confidence)ai_passed_high → board review0.40–0.79Pass (low confidence)ai_passed_low → board review with caution note< 0.40Rejectai_rejected → flag dismissed
Low-confidence passes go to the board with a warning like: [LOW CONFIDENCE 61% — review carefully] so board members know to scrutinize it more.

Step 6 — Graceful Failure
If the Anthropic API fails (expired credits, timeout, etc.), the function doesn't block — it marks the flag as pending_board with a note saying "AI review failed — escalated to board" so no flag gets silently lost.

What Happens Next

Rejected flags → Flagger is notified, 50 pts forfeited
Passed flags → Advance to Stage 2 (Board Review) where Oracle-tier board members see the full AI reasoning trail and decide whether to approve or dismiss

The AI never makes the final call — it's a filter and signal for humans, carrying 30% weight in the overall verdict.