Drugs@FDA reviews, indexed for pharmacometrics.
Every Drugs@FDA application and every review document, in one searchable index — with an explicit lane for the population PK and PK/PD subset useful for regulatory pharmacometrics and for the
nlmixr2lib
model library.
What we have, right now
Live counts from the Drugs@FDA bulk export, loaded into postgres on this server. Refreshed nightly from
download.open.fda.gov/drug/drugsfda/
.
Every NDA / BLA / ANDA in Drugs@FDA, including supplements.
Unique review-class PDFs indexed (deduplicated by URL).
Multidiscipline 'Review' + Pediatric Clinical Pharmacology — the popPK-likely subset before regex filtering.
Reviews tagged explicitly as Pediatric Clinical Pharmacology — the cleanest popPK-only subset to mirror first.
Document-type breakdown
Most Drugs@FDA traffic is letters and labels. The popPK signal lives in three doc-types: Review (multidiscipline reviews — the modern bundle since ~2017), Pediatric Clinical Pharmacology Review (cleanest popPK-only subset), and Summary Review (executive-summary view).
| Document type | Count | Share |
|---|---|---|
| Letter | 29,499 |
|
| Label | 24,241 |
|
| Review | 7,094 |
|
| Medication Guide | 1,100 |
|
| Summary Review | 724 |
|
| Pediatric Medical Review | 522 |
|
| Pediatric Clinical Pharmacology Review | 447 |
|
| Pediatric Statistical Review | 352 |
|
| Pediatric Written Request | 181 |
|
| Other | 162 |
|
| Other Important Information from FDA | 159 |
|
| Pediatric Amendment 1 | 121 |
|
Pipeline status
Search is only as good as what we've ingested. Each step below runs once per document; the live counters update as we go. The popPK filter on the search page reads
poppk_verified
from the database — anything below
Step 4
has not been LLM-verified yet.
Schedule:
Step 2 (mirror) runs daily at 04:00 UTC via
scripts/fda_mirror_pdfs.sh
; Step 3 (extract + regex score) runs daily at 07:00 UTC via
scripts/fda_extract_score.sh
. Both are idempotent and skip already-processed rows, so re-running is safe.
Drugs@FDA bulk JSON loaded — 29,033 applications, 65,075 documents in postgres.
2,759 of 7,554 candidate review PDFs mirrored to s3://ndexr/fda/. Browser-UA fetch confirmed working — FDA blocks bare data-center clients.
2,721 documents have pdftotext output and a popPK regex score. The score counts hits across 'population PK', 'NONMEM', 'FOCEI', 'two-compartment', etc.
Haiku confirms whether each candidate document contains a population PK or PK/PD model with parameter estimates. 2 verified positive so far.
On verified docs, pull compartments, absorption type, covariates, IIV, residual error, population size into structured columns — feeds the search facets and the nlmixr2lib stub.
Drug / sponsor / year / doc-type / 'has popPK' filter. Reads from the columns Step 5 fills. Also offers 'show me 2-cmt models with WT and CRCL' faceted browsing.
Where this connects
fda.ndexr.io is the search index. Two sister sites consume it:
-
nlmixr2.ndexr.io
surfaces the popPK-verified subset as model stubs in the
nlmixr2libcontribution format — Bill Denney's drive to get every published popPK model into a free, parameter-validated R library. - research.ndexr.io is the subject-and-people view of the broader Bioconductor / pharmacometrics frontier — fda.ndexr.io is the regulatory-document layer underneath it.
Source:
open.fda.gov bulk download
. Mirror:
s3://ndexr/fda/
. Schema:
fda_applications
+
fda_documents
in the same postgres that powers
stats.ndexr.io
and
repo.ndexr.io
.
Reviews per year
Coverage of Review-class documents (Multidiscipline Review, Pediatric Clinical Pharmacology, Summary Review) by submission status year. Useful for spotting where the corpus thins out — pre-2015 multidiscipline reviews are sparse because they hadn't been bundled yet.