Skip to main content

Table 1 Data extracted from the studies

From: The use of machine learning in rare diseases: a scoping review

Variable

Categories

Definition

Example(s)

Rare disease

All rare diseases described at least once in the studies (studies investigating more than one rare disease were categorized as “Diverse”)

Orphanet disorder name

Cystic fibrosis, Sickle cell anemia, Gaucher disease

Disease group

All disease groups of the 381 specific diseases included in the search as well as disease groups of other diseases identified in the studies

Orphanet disease group as defined by the preferential parent in the classification hierarchy

Rare neurologic disease, Rare respiratory disease, Rare endocrine disease

Publication year

Years from 2010 to 2019

Year of the publication date of the article

 

Country of study

All countries that published at least one article

Country of institution of senior (i.e. last) author of the study

 

Medical application

Diagnosis

Studies aiming to correctly diagnose patients

Classification of cases and controls or different disease subtypes, Identification of biomarkers, Deep phenotyping, Decision support

Treatment

Studies aiming to improve treatment or develop new therapies

Detection of therapeutic targets, Identification of binding proteins

Prognosis

Prediction of a patient-relevant endpoint

Prediction of complication, disease onset, survival, disease progression, Risk estimation

Basic research

Other basic research not classified into one of the categories above

Exploration of molecular disease mechanisms

Patient number

“<  20”, “20–99”, “100–1000”, “>  1000”, “not applicable / no information”

Number of patients included in the study

 

Input dataa

Clinical test score

Data from a clinical test score

Glasgow Coma Scale, ALS Functional Rating Scale

Demographic data

General patient characteristics

Age, Sex, Ethnicity

Functional test data

Data from physiological tests

ECG, EEG, EMG, gait pattern, pulse, blood pressure, eye movements

Images

Data from medical imaging

MRI, PET, CT, retinal images, face photographs

Laboratory data

Data from laboratory test

Blood glucose, platelet counts, creatinine

Literature

Data extracted from scientific texts

Published literature, NCBI disease corpus

Medication data

Data about medication

Use of antibiotics, medication plan

Omics data

Molecular data

Genomics, Proteomics, Metabolomics, Epigenomics

Patient / Family history

Data from patients’ or relatives’ past medical history

Pre-existing conditions, parental data

Other EHR data

Other data from electronic health records

Diagnoses, procedures, other medical records

Other

Other types of input data

Questionnaire or interview data, donors’ characteristics in HSCT

Type of algorithma

Artificial Neural Network

 

Convolutional neural network, Recurrent neural network, Multi-layer perceptron

Bayesian Methods

 

Naïve Bayes

Clustering

 

k-means clustering, Hierarchical clustering

Decision Tree

 

Decision tree

Discriminant Analysis

 

Linear discriminant analysis

Ensemble Methods

 

AdaBoost, Random forest

Instance-based Learning

 

k-nearest neighbor

Regression (logistic)

 

Logistic regression

Regression (other)

 

Linear regression

Support Vector Machine

 

Support vector machine

Other

Algorithms not classified into one of the categories above

Reinforcement learning, Graphical models

External validation

yes / no

Performance of algorithm tested on external data or against a human expert

Comparing automated scoring of chest radiographs with scoring by radiologists

  1. aFor these variables, a study could be assigned to more than one category