From: The use of machine learning in rare diseases: a scoping review
Variable | Categories | Definition | Example(s) |
---|---|---|---|
Rare disease | All rare diseases described at least once in the studies (studies investigating more than one rare disease were categorized as “Diverse”) | Orphanet disorder name | Cystic fibrosis, Sickle cell anemia, Gaucher disease |
Disease group | All disease groups of the 381 specific diseases included in the search as well as disease groups of other diseases identified in the studies | Orphanet disease group as defined by the preferential parent in the classification hierarchy | Rare neurologic disease, Rare respiratory disease, Rare endocrine disease |
Publication year | Years from 2010 to 2019 | Year of the publication date of the article | |
Country of study | All countries that published at least one article | Country of institution of senior (i.e. last) author of the study | |
Medical application | Diagnosis | Studies aiming to correctly diagnose patients | Classification of cases and controls or different disease subtypes, Identification of biomarkers, Deep phenotyping, Decision support |
Treatment | Studies aiming to improve treatment or develop new therapies | Detection of therapeutic targets, Identification of binding proteins | |
Prognosis | Prediction of a patient-relevant endpoint | Prediction of complication, disease onset, survival, disease progression, Risk estimation | |
Basic research | Other basic research not classified into one of the categories above | Exploration of molecular disease mechanisms | |
Patient number | “< 20”, “20–99”, “100–1000”, “> 1000”, “not applicable / no information” | Number of patients included in the study | |
Input dataa | Clinical test score | Data from a clinical test score | Glasgow Coma Scale, ALS Functional Rating Scale |
Demographic data | General patient characteristics | Age, Sex, Ethnicity | |
Functional test data | Data from physiological tests | ECG, EEG, EMG, gait pattern, pulse, blood pressure, eye movements | |
Images | Data from medical imaging | MRI, PET, CT, retinal images, face photographs | |
Laboratory data | Data from laboratory test | Blood glucose, platelet counts, creatinine | |
Literature | Data extracted from scientific texts | Published literature, NCBI disease corpus | |
Medication data | Data about medication | Use of antibiotics, medication plan | |
Omics data | Molecular data | Genomics, Proteomics, Metabolomics, Epigenomics | |
Patient / Family history | Data from patients’ or relatives’ past medical history | Pre-existing conditions, parental data | |
Other EHR data | Other data from electronic health records | Diagnoses, procedures, other medical records | |
Other | Other types of input data | Questionnaire or interview data, donors’ characteristics in HSCT | |
Type of algorithma | Artificial Neural Network | Convolutional neural network, Recurrent neural network, Multi-layer perceptron | |
Bayesian Methods | Naïve Bayes | ||
Clustering | k-means clustering, Hierarchical clustering | ||
Decision Tree | Decision tree | ||
Discriminant Analysis | Linear discriminant analysis | ||
Ensemble Methods | AdaBoost, Random forest | ||
Instance-based Learning | k-nearest neighbor | ||
Regression (logistic) | Logistic regression | ||
Regression (other) | Linear regression | ||
Support Vector Machine | Support vector machine | ||
Other | Algorithms not classified into one of the categories above | Reinforcement learning, Graphical models | |
External validation | yes / no | Performance of algorithm tested on external data or against a human expert | Comparing automated scoring of chest radiographs with scoring by radiologists |