Skip to content


  • Poster presentation
  • Open Access

Characterization and classification of Rare Disease Registries by using exploratory data analyses

  • 1, 2Email author,
  • 2,
  • 3,
  • 4,
  • 5,
  • 5,
  • 5,
  • 5 and
  • 1, 2
Contributed equally
Orphanet Journal of Rare Diseases20149 (Suppl 1) :P4

  • Published:


  • Random Forest
  • Rare Disease
  • Treatment Registry
  • Informative Support
  • Exploratory Data Analysis

European Commission and Patients Associations identify Registries as strategic instruments to improve knowledge in the field of Rare Diseases [1, 2]. Interoperability between Rare Diseases Patient Registries (RDPR) is especially needed to support research activities, to validate therapeutic treatments and to plan public health actions. Because of the extreme variety of RDPR, a uniform and standardized way of collecting data and the identification of specific levels of connection between RDPR with similar aims is needed.

In this study, exploratory data analyses were applied to the EPIRARE (European Platform for Rare Diseases Registries) Registry Survey in order to generate a macro-classification and characterization of RDPR and to deepen different informative needs.

At first, a Multiple Correspondence Analysis (MCA) suggested associations between selected variables characterizing the structure of RDPR (Figure 1). Then, a Cluster analysis (CA) was developed using the declared “Aims” of each RDPR. CA confirmed the variable associations emerged by MCA and identified three groups defined as: Public Health (PHR), Clinical-Genetic Research (CGRR), and Treatment Registries (TR). Finally, the random forest (RF) method was applied to the Survey data, leading to six classification models endowed of good predictive power and thus confirming the reliability of considering three groups of RDPR. RF also identified several informative variables which allowed the characterization of the three categories of RDPR, defined by data of different nature and by different levels of diffusion (Table 1).
Figure 1
Figure 1

Factorial plan by MCA.

Table 1

Main characteristics of Clinical-Genetic Research, Treatment, and Public Health Registries according to the most informative variables emerged after the random forest method. Variables reported in the table characterize most of the registries of each class.


Public Health Registries

Treatment Registries

Clinical-Genetic research Registries


- epidemiologic research

- disease surveillance

- healthcare services planning

- treatment evaluation

- treatment monitoring

- clinical research

- genetic

-natural history of the disease

Collected data


- clinical

- medications, devices and health services

- genetic

- family history

- date of the patient death

- patient-reported outcomes - anthropometric info

- clinical

- genetic

- family history

Coding system


No coding system or own code

No coding system or own code

Services requested to a EU platform

“Quality control systems”

“Facilitated access to useful data sources”

“Model documents”

These results, identifying different profiles of RDPR and specific informative needs, represent an informative support aimed at addressing the activities for the design of an European platform of Rare Diseases. Identification of informative cores could address the activities of a platform able to enhance the sharing of information between RDPR with common aims, but also to facilitate a coherent dialogue between RDPR with different profiles.

Guide to interpretation: the arrows indicate the directions of association among the aims; the dimension of the circles represents the frequency of the variable. The higher are the coordinate and the frequency of the variable, the more it contributes to the interpretation of the factorial axis; variables placed on the same direction are correlated.




This work is part of the activities of EPIRARE, a 3-year project started on April 15, 2011 (grant 2010 12 02) and co-funded by the European Commission within the EU Programme on Health.

Authors’ Affiliations

Fondazione Toscana Gabriele Monasterio, Pisa, Italy
Institute of Clinical Physiology, National Council of Research, Pisa, Italy
European Organisation for Rare Diseases (EURORDIS), Paris, France
Department of Pharmacy, University of Pisa, Italy
National Centre for Rare Diseases, National Institute for Health, Rome, Italy


  1. Commission of the European Communities: Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions on Rare Diseases: Europe's challenges. 2008, Brussels, COM(2008) 679 final. Available at: Scholar
  2. Council recommendation of 8 June 2009 on an action in the field of rare diseases. Official Journal of the European Union. 2009/C 151/02. Available at:


© Coi et al; licensee BioMed Central Ltd. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.