Reasoning
—
AI-ranked analysis of the DOJ File Transparency Act corpus, prioritized as the most current public release and processed independently from the House Oversight corpus.
Help keep this tool online. If this project helps your reporting or research, please support hosting and development so we can keep the lights on.
Filtering
Adjust filters, then click Apply to run the query.
Rows Loaded
0
Avg. Importance
0
Top Lead Type
–
Last Updated
–
Scoring uses an investigative triage prompt. Inputs can be OCR text or rendered page images, and outputs follow a strict JSON schema consumed by this viewer.
agency_involvement plus lead_types are constrained to controlled vocabularies.headline, importance_score, reason, key_insights,
tags, power_mentions, agency_involvement, and
lead_types.The rubric is integer-based from 0 to 100. Practical interpretation: 0-20 low signal, 21-50 limited to moderate leads, 51-70 strong leads, 71-85 high-impact leads, and 86-100 exceptional to blockbuster leads.
Raw text was sourced from the 20,000 Epstein Files dataset prepared by tensonaut, who OCR'd ~25,000 pages released by the House Oversight Committee. The processed CSV mirrors the Hugging Face release: tensonaut/EPSTEIN_FILES_20K. This project builds atop that foundational OCR work to provide ranking and analysis layers.