From: Next generation phenotyping using narrative reports in a rare disease clinical data warehouse
Sets | RETT | DOCK8 deficiency | LOWE | SILVER RUSSELL | BARDET BIEDL | APDS 1 and 2 |
---|---|---|---|---|---|---|
Median age at visit (years) | 8.2 [4.8–12.6] | 11.4 [9.3–14.1] | 12.8 [5.8–20.3] | 2.4 [0.8–5.4] | 15.7 [10.1–41.5] | 12.8 [7.7–18.6] |
Median follow up (years) | 2.6 [0–4.9] | 3.1 [0.3–9] | 6.6 [3–10.3] | 2 [0.8–4.7] | 2 [0.1–6.6] | 7.5 [4.8–8.6] |
# Patients | 209 | 15 | 23 | 50 | 53 | 23 |
# Documents | 5034 | 3296 | 1325 | 1133 | 1317 | 2337 |
Phenotypes extracted, not negated and in patient context | ||||||
# Phenotypes | 18,538 | 6886 | 5281 | 6563 | 6345 | 9716 |
# distinct Phenotypes | 1022 | 706 | 577 | 738 | 801 | 710 |
Evaluation by experts in the Top50 phenotypes | ||||||
Medical Experts | NBB | CP | RS | JA | RS | NM |
# Phenotypes ranked by Freq | 31 | 36 | 36 | 16 | 17 | 39 |
# Phenotypes ranked by TF-IDF | 38 | 37 | 41 | 11 | 12 | 37 |
# Phenotypes Freq union TF-IDF | 42 | 52 | 50 | 16 | 19 | 52 |
# Phenotypes Freq intersect TF-IDF | 28 | 22 | 28 | 11 | 11 | 25 |
Average Precision, ranked by Freq | 0.86 | 0.91 | 0.88 | 0.55 | 0.66 | 0.83 |
Average Precision, ranked by TF-IDF | 0.91 | 0.84 | 0.90 | 0.49 | 0.52 | 0.83 |