U.S. House Oversight Epstein Estate Documents

AI-ranked analysis of 20,000+ pages released by the House Oversight Committee. These are the estate documents already made public—not the unreleased files still pending disclosure.

Complete: All 25,781 documents have been processed and indexed.

Rows Loaded

0

Avg. Importance

0

Top Lead Type

Last Updated

Top Power Mentions

Lead Type Breakdown

Top Agencies

Importance Score Distribution

How the Importance Score Works

Each document is analyzed locally through LM Studio using the open-source OpenAI GPT-OSS-120B model, running entirely offline on commodity hardware. The model scores passages from 0 (no meaningful lead) to 100 (critical revelation).

Scoring criteria (click to toggle)
  • Investigative usefulness: Are there concrete follow-ups such as names, transactions, or dates?
  • Controversy & sensitivity: Would confirmation trigger public scrutiny or legal risk?
  • Novelty: Does this surface new information versus repeating well-known facts?
  • Power linkage: Does the lead implicate heads of state, major financiers, intelligence services, or similar actors?

Rows above ~70 typically warrant deeper review—they combine unusual claims with explicit power connections.

LLM prompt (full text)
You analyze primary documents related to court and investigative filings.
Focus on whether the passage offers potential leads—even if unverified—that connect influential actors (presidents, cabinet officials, foreign leaders, billionaires, intelligence agencies) to controversial actions, financial flows, or possible misconduct.
Score each passage on:
  1. Investigative usefulness: Does it suggest concrete follow-up steps, names, transactions, dates, or relationships worth pursuing?
  2. Controversy / sensitivity: Would the lead cause public outcry or legal risk if validated?
  3. Novelty: Is this information new or rarely reported, versus already known?
  4. Power linkage: Does it implicate high-ranking officials or major power centers? Leads tying unknown individuals only to minor issues should score lower.
Assign an importance_score from 0 (no meaningful lead) to 100 (blockbuster lead linking powerful actors to fresh controversy). Reserve 70+ for claims that, if true, would represent major revelations or next-step investigations.
Use the scale consistently:
  • 0–10  : noise, duplicates, previously published facts, or gossip with no actors.
  • 10–30 : low-value context; speculative or weak leads lacking specifics.
  • 30–50 : moderate leads with partial details or missing novelty.
  • 50–70 : strong leads with actionable info or notable controversy.
  • 70–85 : high-impact, new revelations tying powerful actors to clear misconduct.
  • 85–100: blockbuster revelations demanding immediate follow-up.
Return strict JSON with the following fields:
  - headline (string)
  - importance_score (0-100 number)
  - reason (string explaining score)
  - key_insights (array of short bullet strings)
  - tags (array of topical strings)
  - power_mentions (array listing high-profile people or institutions mentioned; include titles or roles if possible)
  - agency_involvement (array naming government, intelligence, or law-enforcement bodies involved or implicated)
  - lead_types (array describing lead categories such as 'financial flow', 'legal exposure', 'foreign influence', 'sexual misconduct', etc.)
If a category has no data, return an empty array for it.
          

Source & Credits

Raw text was sourced from the 20,000 Epstein Files dataset prepared by tensonaut, who OCR'd ~25,000 pages released by the House Oversight Committee. The processed CSV mirrors the Hugging Face release: tensonaut/EPSTEIN_FILES_20K. This project builds atop that foundational OCR work to provide ranking and analysis layers.