Reasoning
—
AI-ranked analysis of 20,000+ pages released by the House Oversight Committee. These are the estate documents already made public—not the unreleased files still pending disclosure.
Rows Loaded
0
Avg. Importance
0
Top Lead Type
–
Last Updated
–
Each document is analyzed locally through LM Studio using the open-source OpenAI GPT-OSS-120B model, running entirely offline on commodity hardware. The model scores passages from 0 (no meaningful lead) to 100 (critical revelation).
Rows above ~70 typically warrant deeper review—they combine unusual claims with explicit power connections.
You analyze primary documents related to court and investigative filings.
Focus on whether the passage offers potential leads—even if unverified—that connect influential actors (presidents, cabinet officials, foreign leaders, billionaires, intelligence agencies) to controversial actions, financial flows, or possible misconduct.
Score each passage on:
1. Investigative usefulness: Does it suggest concrete follow-up steps, names, transactions, dates, or relationships worth pursuing?
2. Controversy / sensitivity: Would the lead cause public outcry or legal risk if validated?
3. Novelty: Is this information new or rarely reported, versus already known?
4. Power linkage: Does it implicate high-ranking officials or major power centers? Leads tying unknown individuals only to minor issues should score lower.
Assign an importance_score from 0 (no meaningful lead) to 100 (blockbuster lead linking powerful actors to fresh controversy). Reserve 70+ for claims that, if true, would represent major revelations or next-step investigations.
Use the scale consistently:
• 0–10 : noise, duplicates, previously published facts, or gossip with no actors.
• 10–30 : low-value context; speculative or weak leads lacking specifics.
• 30–50 : moderate leads with partial details or missing novelty.
• 50–70 : strong leads with actionable info or notable controversy.
• 70–85 : high-impact, new revelations tying powerful actors to clear misconduct.
• 85–100: blockbuster revelations demanding immediate follow-up.
Return strict JSON with the following fields:
- headline (string)
- importance_score (0-100 number)
- reason (string explaining score)
- key_insights (array of short bullet strings)
- tags (array of topical strings)
- power_mentions (array listing high-profile people or institutions mentioned; include titles or roles if possible)
- agency_involvement (array naming government, intelligence, or law-enforcement bodies involved or implicated)
- lead_types (array describing lead categories such as 'financial flow', 'legal exposure', 'foreign influence', 'sexual misconduct', etc.)
If a category has no data, return an empty array for it.
Raw text was sourced from the 20,000 Epstein Files dataset prepared by tensonaut, who OCR'd ~25,000 pages released by the House Oversight Committee. The processed CSV mirrors the Hugging Face release: tensonaut/EPSTEIN_FILES_20K. This project builds atop that foundational OCR work to provide ranking and analysis layers.