fda.ndexr.io

Drugs@FDA reviews, indexed for pharmacometrics.

Every Drugs@FDA application and every review document, in one searchable index — with an explicit lane for the population PK and PK/PD subset useful for regulatory pharmacometrics and for the nlmixr2lib model library.

What we have, right now

Live counts from the Drugs@FDA bulk export, loaded into postgres on this server. Refreshed nightly from download.open.fda.gov/drug/drugsfda/ .

29,033
Applications

Every NDA / BLA / ANDA in Drugs@FDA, including supplements.

65,075
Documents

Unique review-class PDFs indexed (deduplicated by URL).

7,554
popPK candidates

Multidiscipline 'Review' + Pediatric Clinical Pharmacology — the popPK-likely subset before regex filtering.

447
Pediatric ClinPharm

Reviews tagged explicitly as Pediatric Clinical Pharmacology — the cleanest popPK-only subset to mirror first.

Document-type breakdown

Most Drugs@FDA traffic is letters and labels. The popPK signal lives in three doc-types: Review (multidiscipline reviews — the modern bundle since ~2017), Pediatric Clinical Pharmacology Review (cleanest popPK-only subset), and Summary Review (executive-summary view).

Document type Count Share
Letter 29,499
Label 24,241
Review 7,094
Medication Guide 1,100
Summary Review 724
Pediatric Medical Review 522
Pediatric Clinical Pharmacology Review 447
Pediatric Statistical Review 352
Pediatric Written Request 181
Other 162
Other Important Information from FDA 159
Pediatric Amendment 1 121
Pipeline status

Search is only as good as what we've ingested. Each step below runs once per document; the live counters update as we go. The popPK filter on the search page reads poppk_verified from the database — anything below Step 4 has not been LLM-verified yet.

Schedule: Step 2 (mirror) runs daily at 04:00 UTC via scripts/fda_mirror_pdfs.sh ; Step 3 (extract + regex score) runs daily at 07:00 UTC via scripts/fda_extract_score.sh . Both are idempotent and skip already-processed rows, so re-running is safe.

Step 1 done
Metadata ingest

Drugs@FDA bulk JSON loaded — 29,033 applications, 65,075 documents in postgres.

Step 2 ready
PDF mirror

2,759 of 7,554 candidate review PDFs mirrored to s3://ndexr/fda/. Browser-UA fetch confirmed working — FDA blocks bare data-center clients.

Step 3 ready
Text extraction + regex score

2,721 documents have pdftotext output and a popPK regex score. The score counts hits across 'population PK', 'NONMEM', 'FOCEI', 'two-compartment', etc.

Step 4 ready
LLM popPK verify

Haiku confirms whether each candidate document contains a population PK or PK/PD model with parameter estimates. 2 verified positive so far.

Step 5 pending
Structured extraction

On verified docs, pull compartments, absorption type, covariates, IIV, residual error, population size into structured columns — feeds the search facets and the nlmixr2lib stub.

Step 6 pending
Search UI

Drug / sponsor / year / doc-type / 'has popPK' filter. Reads from the columns Step 5 fills. Also offers 'show me 2-cmt models with WT and CRCL' faceted browsing.

Where this connects

fda.ndexr.io is the search index. Two sister sites consume it:

  • nlmixr2.ndexr.io surfaces the popPK-verified subset as model stubs in the nlmixr2lib contribution format — Bill Denney's drive to get every published popPK model into a free, parameter-validated R library.
  • research.ndexr.io is the subject-and-people view of the broader Bioconductor / pharmacometrics frontier — fda.ndexr.io is the regulatory-document layer underneath it.

Source: open.fda.gov bulk download . Mirror: s3://ndexr/fda/ . Schema: fda_applications + fda_documents in the same postgres that powers stats.ndexr.io and repo.ndexr.io .

Reviews per year

Coverage of Review-class documents (Multidiscipline Review, Pediatric Clinical Pharmacology, Summary Review) by submission status year. Useful for spotting where the corpus thins out — pre-2015 multidiscipline reviews are sparse because they hadn't been bundled yet.