In this scoping review, we explored the scientific literature about machine learning methods used in the context of rare diseases. In particular, we investigated in which rare diseases and disease groups machine learning was typically applied, which types of algorithms and input data were used and which medical applications were studied.
Considering the large number of known rare diseases, the number of diseases investigated in the machine learning studies identified in this review was relatively small. The majority of diseases was in the highest prevalence class (1–5 / 10,000 patients), despite the search string including more diseases in the lower prevalence class (1–9 / 100,000 patients). Moreover, a large proportion of studies investigated a few relatively “common” or well-known rare diseases, such as amyotrophic lateral sclerosis, lupus or cystic fibrosis. This shows that the pattern that applies to rare diseases in general also seems to apply within the group of rare diseases: Diseases with a comparatively high prevalence are investigated more frequently whereas diseases with a lower prevalence are “orphans” that receive less attention. (However, note that our literature search might have missed some studies about diseases with a very low prevalence of 1–9 / 1,000,000 or lower because these diseases were not explicitly included in the search string and could only be identified via the generic rare disease search strings.)
Our review also revealed some rare disease groups that were investigated more frequently than to be expected from their occurrence in the search string. For example, the number of studies investigating rare neurologic diseases, rare systemic or rheumatologic diseases, rare respiratory diseases, rare cardiac diseases and rare gastroenterologic diseases was higher than to be expected. This observation can partly be explained by the prevalence of the diseases within a disease groups, i.e. disease groups containing more diseases with higher prevalence being investigated more frequently in the studies (as described in the previous paragraph). However, there were also disease groups – for example neurologic diseases – that were overrepresented in the studies, despite containing more diseases with a lower prevalence. For these disease groups the availability of data may play an important role: Many of the overrepresented disease groups work with imaging data (e.g., MRI data for neurologic diseases), which lend themselves particularly well for their use with machine learning. Some disease groups may also appear more frequently because they are part of large medical disciplines (e.g., neurology, rheumatology, cardiology etc.), which are not limited to rare conditions, and which can therefore draw on a large pool of existing research and methods.
There were also disease groups underrepresented in the studies. Most interestingly, our review did not identify any machine learning studies about rare skin diseases. This is surprising, as the diagnosis of skin conditions is often cited as one of the prime examples of successful machine learning applications in medicine [13, 27]. Developing machine learning applications for the diagnosis of rare skin conditions could therefore be a highly promising field of research. Similarly, rare inborn errors of metabolism and rare developmental defects during embryogenesis were also underrepresented in the studies and could possibly benefit from machine learning research – in particular because they constitute two of the most common groups of rare diseases.
Investigating typical algorithms, we identified ensemble methods, support vector machines and artificial neural networks as the algorithms most frequently used in the studies. Again, the choice of algorithms in the studies could be partly due to the data available to the algorithms. Images were identified as the most common type of input data, and the algorithms typically used in the studies (e.g., artificial neural networks) work well with this type of data. Moreover, image data (such as MRI, PET or CET) are acquired in large quantities in medical practice and can be processed in a relatively standardized way, thus providing a good data source for machine learning. The barrier of applying machine learning to other types of data, such as unstructured text data in medical records, is higher because these data are often not standardized and therefore more difficult to process. This highlights the importance of international health IT standards and medical terminologies that can improve interoperability and that can help to make medical data more accessible to machine learning . In the context of rare diseases, standard vocabularies such as SNOMED CT , the Orphanet rare disease nomenclature  or the Human Phenotype Ontology (HPO) [31, 32] could particularly facilitate data interoperability.
Only a relatively small proportion of the studies in this review tested their algorithms on an external validation data set or validated performance against human experts. However, to facilitate translation of machine learning methods into clinical practice, appropriate validation is crucial. Machine learning studies should therefore aim to evaluate their performance on external data so that their potential for real-world application can be more easily assessed (of course, this applies to machine learning in general, not only in the context of rare diseases). Note that our review did not evaluate the performance of the machine learning algorithms, since the studies identified in this scoping review were too heterogeneous to perform meaningful comparisons across studies. To investigate algorithm performance, more specific systematic literature reviews and meta-analyses are needed (for example, focusing on specific diseases, input data or outcome variables).
Most studies identified in this review focused on diagnosis and prognosis of rare diseases. Considering that these are typical applications of machine learning (i.e., classification and prediction), this is not surprising. However, machine learning can also play an important role in improving the treatment of rare diseases, and future studies could focus more on this aspect, for example by using machine learning to accelerate drug development .
As to be expected in the context of rare diseases, the number of patients included in the studies was relatively small. Comparable reviews investigating machine learning in more common diseases, for example in diabetes mellitus , cancer  or coronary artery disease , have access to larger pools of patient data. This is important, as the performance of machine learning algorithms largely depends on the amount of data available for training the algorithms. The lack of sufficient training data could also explain why rare diseases with a higher prevalence were investigated more often than lower prevalence diseases. It is therefore important to further promote cross-institutional and international collaboration to create data sets sufficiently large for machine learning research.