PMID Audit Status
Public, ongoing transparency about citation integrity across the ASH registries.
ASH commits to public transparency about its own citation integrity. On 2026-05-27 an internal audit (Quality Cycle 06) found that approximately 80% of PMIDs in hand-built registries did not resolve to the papers they claimed. This page documents the finding and the public cleanup plan.
This is the kind of disclosure most institutions hide. ASH publishes it because trust is built from transparency about errors, not from concealing them.
Audit findings
~80%
PMID mismatch rate
in hand-built registries
9,902 / 9,903
imported PMIDs verified
research-articles-imported.ts — full corpus checked vs live NCBI (2026-06-04)
1
fabricated PMID caught
39152982 — does not resolve in NCBI; quarantined
What the audit found
- Michalsen 2003 (the landmark hirudotherapy KOA RCT) has three different fabricated PMIDs across the codebase. None resolves to the actual paper.
- All Tier A FDA-cleared device-indication citations in condition-registry have wrong PMIDs that resolve to unrelated specialty papers (bifid acetabular labrum, diabetic retinopathy biomarkers, etc.).
- Diagnostic pattern: famous PMIDs (RE-LY, ROCKET AF, ARISTOTLE, ENGAGE AF-TIMI 48) verify correctly. Obscure hirudotherapy literature PMIDs invented. Consistent with model-generated hallucination pattern.
- research-articles-imported.ts — the full imported corpus (9,903 unique PMIDs) was re-checked against the live NCBI/PubMed efetch API on 2026-06-04: 9,902 resolve (99.99%) and now carry a verified citation-integrity status. Exactly one fabricated identifier (PMID 39152982) was caught and quarantined \u2014 the verification engine works and the import corpus is now fully audited, not sampled.
Full audit report: docs/pmid-audit-cycle-06-2026-05-27.md in the public GitHub repository.
Why this matters
A wrong PMID is worse than no PMID. It looks like provenance \u2014 it suggests "this claim is backed by published research" \u2014 when in fact the cited paper has nothing to do with the claim.
For ASH specifically, this risk concentrates in the Tier B (RCT-supported off-label) and Tier C (investigational) tiers, where readers \u2014 patients, clinicians, and researchers \u2014 rely on the citations to verify ASH's framing for themselves. If the PMID is wrong, the verification chain is broken.
What ASH is doing about it
1. Public disclosure (this page)
Audit findings published openly. No quiet cleanup.
2. Editorial policy: no new fabricated PMIDs
Effective 2026-05-27: every new registry entry must either cite a PubMed-verified PMID or explicitly omit the PMID field. Mandatory verification step in editorial workflow.
3. PMID Integrity Sprint (queued)
Dedicated cleanup of all 4 hand-built registries:
- Extract every PMID across registries
- Verify each via PubMed E-utilities API (efetch/esummary)
- Replace incorrect PMIDs with verified citations
- Remove unverifiable PMIDs (DOI fallback or no identifier)
- Add
pmidVerified: "2026-MM-DD"field per KeyTrial - Re-run Cycle 06 audit \u2014 target 0% mismatch
Sprint queued as a separate task. Tracking: this page will be updated as cleanup progresses.
4. CI guard for future regressions
After cleanup, add an automated PMID verification check to CI. Any PR adding a new PMID must show that PMID resolves to a real PubMed paper before merge can proceed.
Per-registry cleanup status
| Registry | PMIDs (approx) | Cleanup status | Notes |
|---|---|---|---|
| research-articles-imported.ts | 9,903 | Fully verified | Full corpus re-checked vs live NCBI 2026-06-04 — 9,902 verified, 1 fabricated (PMID 39152982) quarantined |
| rct-registry.ts | ~60 | Awaiting sprint | Highest priority \u2014 clinical evidence anchor |
| condition-registry.ts | ~30 | Awaiting sprint | Tier A device-indication citations urgent |
| compound-registry.ts | ~50 | Mixed | Wave 1 additions (Lukas 2018) verified; older entries pending |
| biography-registry.ts | ~45 | Awaiting sprint | Lower clinical impact but still must verify |
As a reader, what should you do?
- Verify any PMID you rely on. Click through to PubMed and confirm the article title matches the claim ASH is making. Standard practice in evidence-based medicine.
- Treat PMIDs in pre-2026-05-27 hand-built entries as unverified. The verification work hasn't been completed yet for those entries.
- Trust the research-articles-imported.ts entries \u2014 those were auto-generated from PubMed metadata directly.
- Report mismatches. If you find a wrong PMID we haven't caught, file a GitHub issue or use the corrections page.