Skip to main content
  • Poster presentation
  • Open access
  • Published:

Characterization and classification of Rare Disease Registries by using exploratory data analyses

European Commission and Patients Associations identify Registries as strategic instruments to improve knowledge in the field of Rare Diseases [1, 2]. Interoperability between Rare Diseases Patient Registries (RDPR) is especially needed to support research activities, to validate therapeutic treatments and to plan public health actions. Because of the extreme variety of RDPR, a uniform and standardized way of collecting data and the identification of specific levels of connection between RDPR with similar aims is needed.

In this study, exploratory data analyses were applied to the EPIRARE (European Platform for Rare Diseases Registries) Registry Survey in order to generate a macro-classification and characterization of RDPR and to deepen different informative needs.

At first, a Multiple Correspondence Analysis (MCA) suggested associations between selected variables characterizing the structure of RDPR (Figure 1). Then, a Cluster analysis (CA) was developed using the declared “Aims” of each RDPR. CA confirmed the variable associations emerged by MCA and identified three groups defined as: Public Health (PHR), Clinical-Genetic Research (CGRR), and Treatment Registries (TR). Finally, the random forest (RF) method was applied to the Survey data, leading to six classification models endowed of good predictive power and thus confirming the reliability of considering three groups of RDPR. RF also identified several informative variables which allowed the characterization of the three categories of RDPR, defined by data of different nature and by different levels of diffusion (Table 1).

Figure 1
figure 1

Factorial plan by MCA.

Table 1 Main characteristics of Clinical-Genetic Research, Treatment, and Public Health Registries according to the most informative variables emerged after the random forest method. Variables reported in the table characterize most of the registries of each class.

These results, identifying different profiles of RDPR and specific informative needs, represent an informative support aimed at addressing the activities for the design of an European platform of Rare Diseases. Identification of informative cores could address the activities of a platform able to enhance the sharing of information between RDPR with common aims, but also to facilitate a coherent dialogue between RDPR with different profiles.

Guide to interpretation: the arrows indicate the directions of association among the aims; the dimension of the circles represents the frequency of the variable. The higher are the coordinate and the frequency of the variable, the more it contributes to the interpretation of the factorial axis; variables placed on the same direction are correlated.


  1. Commission of the European Communities: Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions on Rare Diseases: Europe's challenges. 2008, Brussels, COM(2008) 679 final. Available at:

    Google Scholar 

  2. Council recommendation of 8 June 2009 on an action in the field of rare diseases. Official Journal of the European Union. 2009/C 151/02. Available at:

Download references


This work is part of the activities of EPIRARE, a 3-year project started on April 15, 2011 (grant 2010 12 02) and co-funded by the European Commission within the EU Programme on Health.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alessio Coi.

Additional information

Alessio Coi, Michele Santoro contributed equally to this work.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coi, A., Santoro, M., Lipucci, M. et al. Characterization and classification of Rare Disease Registries by using exploratory data analyses. Orphanet J Rare Dis 9 (Suppl 1), P4 (2014).

Download citation

  • Published:

  • DOI: