Skip to main content

Table 1 Data extracted from the studies

From: The use of machine learning in rare diseases: a scoping review

VariableCategoriesDefinitionExample(s)
Rare diseaseAll rare diseases described at least once in the studies (studies investigating more than one rare disease were categorized as “Diverse”)Orphanet disorder nameCystic fibrosis, Sickle cell anemia, Gaucher disease
Disease groupAll disease groups of the 381 specific diseases included in the search as well as disease groups of other diseases identified in the studiesOrphanet disease group as defined by the preferential parent in the classification hierarchyRare neurologic disease, Rare respiratory disease, Rare endocrine disease
Publication yearYears from 2010 to 2019Year of the publication date of the article 
Country of studyAll countries that published at least one articleCountry of institution of senior (i.e. last) author of the study 
Medical applicationDiagnosisStudies aiming to correctly diagnose patientsClassification of cases and controls or different disease subtypes, Identification of biomarkers, Deep phenotyping, Decision support
TreatmentStudies aiming to improve treatment or develop new therapiesDetection of therapeutic targets, Identification of binding proteins
PrognosisPrediction of a patient-relevant endpointPrediction of complication, disease onset, survival, disease progression, Risk estimation
Basic researchOther basic research not classified into one of the categories aboveExploration of molecular disease mechanisms
Patient number“<  20”, “20–99”, “100–1000”, “>  1000”, “not applicable / no information”Number of patients included in the study 
Input dataaClinical test scoreData from a clinical test scoreGlasgow Coma Scale, ALS Functional Rating Scale
Demographic dataGeneral patient characteristicsAge, Sex, Ethnicity
Functional test dataData from physiological testsECG, EEG, EMG, gait pattern, pulse, blood pressure, eye movements
ImagesData from medical imagingMRI, PET, CT, retinal images, face photographs
Laboratory dataData from laboratory testBlood glucose, platelet counts, creatinine
LiteratureData extracted from scientific textsPublished literature, NCBI disease corpus
Medication dataData about medicationUse of antibiotics, medication plan
Omics dataMolecular dataGenomics, Proteomics, Metabolomics, Epigenomics
Patient / Family historyData from patients’ or relatives’ past medical historyPre-existing conditions, parental data
Other EHR dataOther data from electronic health recordsDiagnoses, procedures, other medical records
OtherOther types of input dataQuestionnaire or interview data, donors’ characteristics in HSCT
Type of algorithmaArtificial Neural Network Convolutional neural network, Recurrent neural network, Multi-layer perceptron
Bayesian Methods Naïve Bayes
Clustering k-means clustering, Hierarchical clustering
Decision Tree Decision tree
Discriminant Analysis Linear discriminant analysis
Ensemble Methods AdaBoost, Random forest
Instance-based Learning k-nearest neighbor
Regression (logistic) Logistic regression
Regression (other) Linear regression
Support Vector Machine Support vector machine
OtherAlgorithms not classified into one of the categories aboveReinforcement learning, Graphical models
External validationyes / noPerformance of algorithm tested on external data or against a human expertComparing automated scoring of chest radiographs with scoring by radiologists
  1. aFor these variables, a study could be assigned to more than one category