- Open Access
Identification of risk features for complication in Gaucher’s disease patients: a machine learning analysis of the Spanish registry of Gaucher disease
Orphanet Journal of Rare Diseases volume 15, Article number: 256 (2020)
Since enzyme replacement therapy for Gaucher disease (MIM#230800) has become available, both awareness of and the natural history of the disease have changed. However, there remain unmet needs such as the identification of patients at risk of developing bone crisis during therapy and late complications such as cancer or parkinsonism. The Spanish Gaucher Disease Registry has worked since 1993 to compile demographic, clinical, genetic, analytical, imaging and follow-up data from more than 400 patients. The aims of this study were to discover correlations between patients’ characteristics at diagnosis and to identify risk features for the development of late complications; for this a machine learning approach involving correlation networks and decision trees analyses was applied.
A total of 358 patients, 340 type 1 Gaucher disease and 18 type 3 cases were selected. 18% were splenectomyzed and 39% had advanced bone disease. 81% of cases carried heterozygous genotype. 47% of them were diagnosed before the year 2000. Mean age at diagnosis and therapy were 28 and 31.5 years old (y.o.) respectively. 4% developed monoclonal gammopathy undetermined significance or Parkinson Disease, 6% cancer, and 10% died before this study. Previous splenectomy correlates with the development of skeletal complications and severe bone disease (p = 0.005); serum levels of IgA, delayed age at start therapy (> 9.5 y.o. since diagnosis) also correlates with severe bone disease at diagnosis and with the incidence of bone crisis during therapy. High IgG (> 1750 mg/dL) levels and age over 60 y.o. at diagnosis were found to be related with the development of cancer. When modelling the decision tree, patients with a delayed diagnosis and therapy were the most severe and with higher risk of complications.
Our work confirms previous observations, highlights the importance of early diagnosis and therapy and identifies new risk features such as high IgA and IgG levels for long-term complications.
Gaucher Disease (GD)(MIM#230800; MIM#231000; MIM#230900) is the most common lysosomal storage disorder (LSD) [1, 2]; some of the most common problems for GD patients are difficulty in diagnosis , appearance of complications, variability in the intensity of symptoms and absence of curative treatments with decreased quality of life [4, 5]. Clinical characteristics of GD are well established, but there remains a lack of information due to the singularity of the cases . Also, it has been impossible to define a complete phenotype-genotype correlation [7,8,9] or to create a prognosis model for complications. Because of these the initiative to create registries has been developed by different institutions, research groups, and pharmaceutical companies; allowing a continuous improvement in the knowledge of the disease [10,11,12].
GD has a pan-ethnic distribution with cases described worldwide. Outside the Ashkenazi Jewish population, incidence ranges from 1 in 40,000 to 1 in 100,000 inhabitants; however, in the Ashkenazi population a higher incidence has been found (1 in 2500) and GD is not considered a rare disease . Three types have been described: type-1 GD (GD1; MIM#230800), or the non-neuronopathic GD form, is the most common in western countries and it is characterized by the absence of primary involvement in the central nervous system; type-2 GD (GD2; MIM#230900) is the acute neuronopathic form with very severe cases, all of them with a short lifespan of less than 2 years; and type-3 GD (GD3; MIM#231000), or the juvenile/adult neuronopathic form, described for first time in 1959 , is characterized by neurological affectation and also involvement of other organs such as lungs, cardiac valves, and kyphosis, among other manifestations [15,16,17,18].
The application of Enzymatic Replacement Therapy (ERT), which began in 1991, has significantly improved awareness of the disease, and has changed the characteristics and expectations of patients as well as the experience of everyone involved in GD management. Nowadays, ERT offers a secure therapy for GD patients with 3 different available enzymes worldwide, two of them in Europe (Imiglucerase, Sanofi-Genzyme and Velaglucerasa alfa, Takeda pharmaceuticals), taliglucerase alfa obtained from plant cell-expressed is until now non approval in EU [19,20,21,22]. Since 2004, Substrate Reduction Therapy (SRT) has been developed for GD treatment, first with one iminosugar (Miglustat, Actelion Pharmaceuticls) and more recently with a ceramide mimetic (Eliglustat Tartatre, Sanofi-Genzyme) [23,24,25] expanding the therapeutic options to GD patients. However, there is still the need to develop means of identifying the small number of patients who are at risk of bone crisis while receiving ERT, as well as those who are at risk of developing late complications such cancer or parkinsonism.
The Spanish Gaucher Disease Registry (SGDR) has worked since 1993 to compile demographic, clinical, genetic, analytical and imaging data of Spanish GD patients (currently numbering 361 GD1, 36 GD, and 21 GD3). The Registry has allowed us to calculate GD prevalence in Spain (about 1/100,000 inhabitants) and to identify the GBA (MIM*606463) variants distribution in the population [12, 18].
In the last decades, the explosion of all kind of data has driven to the use of different big data and machine learning techniques for many applications in the healthcare and bioinformatics fields (several reviews can be seen in references [26,27,28]. In particular the application of computational tools and correlations network techniques for the analysis of data can provide new insights into the relationship between different variables and with the disease, as well as informative and descriptive visualizations [28, 29]. The main objective of this project is to identify new correlations among the patient characteristics and to made a first approximation to the development of prediction models for the risk of late complications.
Patients and methods
Since the establishment of the SGDR coordinated by the Fundación Española para el Estudio y Terapéutica de la Enfermedad de Gaucher y otras lisosomales (FEETEG), a total of 418 GD patients have been reported in Spain. All patients included in the SGDR provided informed consent for the collection and use of the information and biological samples for research projects, all according to the Helsinki declaration of 1963 revised in October 2013, and in accordance with European Regulation 2016/679 on the protection of personal data and the free movement of such data. For this study, ethics and scientific FEETEG boards gave their approval.
All the registered patients were included except those diagnosed with GD2 and those who had less than 70% of baseline data available (Table 1). Of 418 patients in the SGDR, 358 (85.6%) were analysed.
In collaboration with Kampal Data Solutions demographic, clinical, analytical, imagining data at diagnosis and comorbidities during the follow-up were evaluated (Table 1).
Variables: Birthdate, age at diagnosis, gender, concomitant diseases, family history of Parkinson disease (PD), death date, severity category of disease according to Gaucher Disease Severity Score System category (GD-DS3) (mild, moderate, severe), liver size, spleen size, spleen removal, previous bone crisis and bone disease degree according to the Spanish magnetic resonance image score (S-MRI) (mild: 0–4; moderate: 5-8; severe > 9), bone mineral density (DEXA), GD biomarkers (chitotriosidase activity (ChT), CCL18/PARC and Glucosilsphyngosine (GluSph) concentrations), B12 vitamin level, iron concentration, serum ferritin, cholesterol, triglycerides, high density lipoprotein cholesterol (HDL), Low density lipoprotein cholesterol (LDL), aspartate transaminase (AST), alanine transaminase (ALT), gamma-glutamyl transferase (GGT), acid phosphatase, bilirubin, hemoglobin concentration, white blood cells (WBC) count, platelets count, serum gammaglobulin fraction, immunoglobulins (IgG-, IgA-, IgM) -serum concentrations, glucocerebrosidase (GCase) activity, GBA genotype (NM_000157), presence of absence of the variant NM_0003465:c.1049_1072dup24 on CHIT1, age to start therapy, type of therapy (enzyme replacement therapy (ERT) or substrate reduction therapy (SRT) or no therapy, new bone crisis or joint replacement, development of malignancies or PD, collected over a follow-up period of 5 to 25 years.
The aimed conditions for which the analysis sought correlations were the presence of severe bone disease at diagnosis, development of bone crisis during follow-up, and the development of neoplasia or PD.
The statistical analysis of the data was made in two parts.
Baseline data analysis
A descriptive analysis was performed by splitting the variables between numerical and categorical. To establish correlation, Pearson, Chi-Square, Mann-Whitney and Mann-Whitney normalized tests were used.
Based on the results of the first step and the correlation between the different variables, we proceeded to the development of a predictive model using decision trees.
To implement the models, a training and validation cohort  were used, with application of the cross-validation technique . This allowed us to offer an estimate of errors. The models have been built only with GD1 patients; GD3 patients have been ruled out because they can die prematurely due to the severity of their disease. Standard quality metrics such as test sample size, accuracy, sensitivity, specificity, odds ratio (OR), positive predictive value (PPV), true positives (TP), true negatives (TN), false positives (FP), false negatives, area under the receptor operator curve (AUC) were calculated. Preprocessing, data analysis and modelling were carried out through the programming language R programming language (version 3.6.2), by using, among others, the following packages: car, ggplot2, vcd, GGally, plyr, igraph, rpart, dplyr [32,33,34].
Most patients were GD1 (337 GD1; 94. 4%) and the rest were GD3 (21, 5.6%). The most frequent GBA genotype was complex heterozygosity (290; 81.0%) with the most common variant being NM_000157:c.1226A > G (353/716 alleles; 49.3%). Forty-seven GD1 patients (13. 9%) were homozygous for c.1226A > G, and 9 GD3 patients (42.8%) were homozygous for c.1448 T > C. Diagnosis was made before the year 2000 for 168 (46. 9%) and 36 (10.1%) died before this study. Most of patients (193, 53. 9%) were treated with ERT. At diagnosis, 65 patients (18.2%) were splenectomized, and 139 (38.89%) had advanced bone disease with bone complications. Regarding comorbidities, 14 (4.1%) GD1 patients developed monoclonal gammopathy of undetermined significance (MGUS), another 14 (4.1%) suffered PD, and 20 (5.6%) malignant neoplasia (Table 2).
Correlations between numerical variables
A detailed correlation between the numerical variables (Table S1) and categorical variables (Table S2) can be found in supplemental material. A graph was constructed to provide a representation of how the different variables are related to each other, not only in pairs but in a global way (Fig. 1). In this graph, the nodes are the different variables and a link is established between two of them if the correlation (Pearson’s r) calculated between them is statistically significant (p ≤ 0.05). The weight of the link is equal to the correlation between the two variables. The position algorithm used for its creation tries to place more closely those nodes that are joined by stronger links, while those that are unrelated are further away. The highest correlation was established between the age of diagnosis and the age of onset of treatment. The statistical analysis was performed in order to stablish correlation among all variables, however, there were some correlations that need to be taken carefully in an individualized manner, in special the one involving baseline characteristics and variables such as the age at diagnosis, time since diagnosis to therapy, time on therapy. At this respect, for example, some patients did not start therapy because the ERT was not available, and as consequence their age at therapy correlates with the delay of therapy.
Correlation between categorical variables
Table S2 from supplemental material shows the significance of the correlation between the categorical variables. There was a high correlation between spleen removal and the presence of bone disease (χ2n = 10.87, p < 0.01) and repeat bone crises (χ2n= 15.93, p < 0.01). Almost all of the patients who suffered new bone crises had previous bone lesions (χ2n = 30.47, p-value< 0.01).
Family history of PD and GBA genotypes no NM_000157. 4:c.1226A > G in homozygosity were the variables related to the development of PD (χ2n= 4.58, p < 0.01 in the correlation between having PD’s or not and the set of 11 different genotypes).
The last correlation between categorial variables with statistical significance was cancer development (not only hematological) and spleen removal (χ2n = 3.80, p = 0.05) (Fig. 2).
Correlation between numerical variables and conditions
To stablish correlation between the presence of conditions such as severe bone disease, repeated bone crisis, Spleen removal, Parkinson Disease and neoplasia with the numerical variables the normalized Mann-Whitney test was used; for this two levels were stablished, level 1 the absence of the condition and level 2 the presence of the condition (Table S3 supplementary material).
The numerical variables that showed the main relevance for severe bone disease at diagnosis were the S-MRI (Un = 0.98, p < 0.01) and IgA levels (Un = 0.93, p = 0.01), Table S3, supplemental material. Nevertheless, there were other variables that present relatively high correlations and low p-values, such as high levels of ferritin (Un = 0.85, p = 0.06), triglycerides (Un = 0.75, p < 0.01), delayed age at diagnosis (> 9.5y.o.) (p < 0.001), time in years between diagnosis and the start of treatment (Un = 0.67, p = 0.01) or delayed age of initiation of ERT (Un = 0.61, p = 0.01) (Fig. 3A1).
The same happens with the appearance of successive bone crises during ERT. The variables that correlate and have greater significance are S-MRI (Un = 0.92, p < 0.01), high IgA levels (Un = 0.91, p = 0.08), and delayed age of initiation of ERT (Un = 0.51, p = 0.07) (Fig. 3B1).
Neoplasia and Parkinson’s disease
High levels of IgG (Un = 0.91, p = 0.01) and time delays before the start of therapy (mean age at diagnosis: 28.1 y.o. (0. 5-87); mean age at start therapy 31.5 y.o. (1-83); mean delay time: 7.8 years (0–46) (Un = 0.70, p = 0.00) were related to the development of neoplasia (Fig. 4A).
In relation to the occurrence of PD, the numerical variables that have a significant correlation were elevated ferritin levels (Un = 0.92, p = 0.04) and age at diagnosis (Un = 0.45, p = 0.01); in this last correlation the age of PD onset probably has more weight (Fig. 4B). The significant correlations are presented in Table S3 of supplemental material.
Correlation between categorical variables
All correlations observed between categorical variables are shown in Fig. 2.
High correlation were found between spleen removal and the severe bone disease and repeated bone crises (p = 0.0001). Almost all of the patients who suffered new bone crises had previous bone lesions (p = 0.0005) in spite of long-term ERT exposure.
The family history of PD and GBA genotypes other than homozygous NM_000157.4:c.1226A > G were also found to be statistically associated with PD development (p < 0.01).
The last correlation between categorial variables with statistical significance was cancer
Generation of predictive models for complications by means of decision trees
Decision trees show the best prediction for the development of severe bone disease in patients with an S-MRI > 2.5 who started therapy after 9.5 years; 87% of patients with these characteristics developed a severe bone disease (Fig. 5).
For neoplasia, a higher risk was found when IgG > 1725 mg/dL and age of diagnosis > 60 y.o.
In the case of PD, it was not possible to design a tree that improves the prediction of the risk of disease development, because the percentage of patients was very small. However, it has been observed that there is an important correlation with the GBA genotype (p < 0.01) and also with the existence of relatives with PD (p = 0.08), although this was not statistically significant. In the supplemental material, Tables S3-S4 show the statistical significance of correlation among the selected variables used for the algorithms.
Large-scale data (big data) in our case when referring to a rare disease have been adapted to the number of cases available; this kind of analysis is a new tool that has recently been incorporated into biomedical activity; machine learning is the study of computer algorithms that improve automatically through experience and it involves a wide series of algorithms, classification and regression models such as decision trees being some of them [26,27,28,29]. This methodology is especially useful for obtaining pooled information on the diversity of outcomes and identifying prognostic factors potentially related to disease complications . In rare disease research, this is of particular interest due to the scarcity and the spread of the data among the different centers . Various approaches have been applied in the area of rare diseases, especially in looking for genetic associations  and making correlations between genotype and phenotype .
Registries have an important role in this kind of analysis, because they include complete information about patients, which is especially important for rare disease research. This collected information helps in diagnosis, patient management, treatment strategy planning, health care planning and follow-up. It enables the acceleration of research and paves new pathways for personalized medicine [39, 40].
This study is the first attempt to establish a correlation network among different biochemical and clinical characteristics in a national-base cohort. We have aimed to analyse diagnostic data and to relate them with long-term complications as bone crises, development of neoplasia or PD, which are the most common and disabling complications [41,42,43,44,45].
Two observations, already accepted in Gaucher research, were also confirmed in this machine-learning study: first, the fact that spleen-removal patients have a higher risk of presenting more serious and extensive bone disease; second, our observation that almost all patients with new bone crisis – despite having received long-term ERT – had previous bone lesions, which remind us that the most feared complication in GD1 are not solved merely by starting ERT. These two facts confirm previous reports and provide validity of our analysis [42, 45,46,47]. In addition, genotypes different from homozygous NM_000175. 4:c.1226A > G are significantly correlated with bone disease (p = 0.05). This last observation is in line with the observation that c.1226A > G variant provides a mild phenotype [48, 49].
It is a priority to identify accurate risk factors of bone crisis to improve treatment dosage and to avoid this complication. The standard biomarkers related to GD (ChT activity, CCL18/PARC and GluSph concentrations) have been discarded as risk factors for bone complication [50, 51] even though their concentration will be increased during bone crisis, due to the acute inflammatory event [45, 49]. This reminds us of the importance to continue searching for other biomarkers. Our results confirm the lack of association between these biomarkers and disease outcomes, but other biomarkers, such as high levels of ferritin, show a tendency in patients with advanced bone disease although it was not statistically significant.
Surprisingly, the high serum IgA concentration correlates with the degree of bone involvement and with the development of bone crisis (p = 0.001). The age of onset of treatment (mean 30.6 y.o.) (p = 0.01) also shows a clearer relevance for the occurrence of bone crises (p = 0.01).
In this study, the development of malignancies appears strongly correlated with the delayed age at the start of ERT (p < 0.01) and the increased concentration of IgG (p = 0.01). Many aspects remain to be unraveled in the complexity of the immune system, but aging is an important factor clearly related to humoral immune dysfunction and the appearance of malignancies . Polyclonal and monoclonal gammopathies in GD patients are common  and we observed a significant correlation between high levels of IgG and the appearance of neoplasia . However, the origin of these alterations is not fully clarified, and is attributed to the chronic inflammation state; also, it is related to an increase in levels of inflammatory cytokines such as interleukins (IL-6, IL-10) that could lead to an overproduction of immunoglobulins [53, 54]. Another hypothesis could be that B lymphocytes were activated by specific type II natural killer T lymphocytes, with a T follicular helper profile, and that the clonal immunoglobulin in GD patients and in mouse models of GD was reactive against GluSph .
The identification of levels of IgA as a risk factor for complication was a surprising finding; it has not been previously reported that IgA levels are related to severe bone disease and the presence of repeated bone crises in GD.
Also, our data, shown an analysis of the main clinical features of GD patients at diagnosis; in accordance with previous reports [2, 6, 10,11,12], general characteristics such as polyclonal gammopathies, bone pain, bone vascular lesions, hypertriglyceridemia, splenomegaly and family history of parkinsonism, would be findings that can help to identify GD patients.
The SGDR only includes GD patients from Spain, thus the main limitations for the study are the absence of a larger data set. Despite this, the included data reflect the characteristics of the disease in this country. It could be interesting to validate these findings by studying other populations with a greater number of patients; however, taking into account the homogeneity of the series and the single-country origin, the data are solid.
Our work confirms previous observations such as the relationship among bone disease and splenectomy; it highlights the importance of early diagnosis and therapy and identifies new risk features such as high IgA and IgG levels for long-term complications. This is first attempt in which all the baseline diagnosis data has been included in a study to perform analysis of network correlations. This open the possibility to move forward using nowadays technology; it will help us to identified features that can predict risk for complications or maybe, if more patients can be included, a better phenotype-genotype correlation.
Availability of data and materials
The data analysed and generated during the current study belongs to the SRGD and to the FEETEG are available under request through the corresponding author.
Area under the curve
Pulmonary and activation-regulated chemokine
Bone mineral density exam
Enzymatic Replacement Therapy
Spanish foundation of Gaucher disease and other lysosomal disorders
Gaucher Disease Severity Score System category
Type 1 GD
High density lipoprotein
Low density lipoprotein
Lysosomal storage disorder
Monoclonal gammopathy undetermined significance
Positive predictive value
Spanish magnetic resonance image score
Spanish Gaucher Disease Registry
Substrate Reduction Therapy
White blood cells
Brady RO. Gaucher’s disease: past, present and future. Bailleres Clin Haematol. 1997;10:621–34.
Cox TM, Schofield JP. Gaucher’s disease: clinical features and natural history. Baillieres Clin Haematol. 1997;10:657–89.
Orphanet / INSERM US14. Orphanet Web Site. [Online]. Available from: https://www.orpha.net. Accessed 10 May 2020.
Hayes RP, Grinzaid KA, Duffey EB, Elsas LJ 2nd. The impact of Gaucher disease and its treatment on quality of life. Qual Life Res. 1998;7:521–34.
Giraldo P, Solano V, Pérez-Calvo JI, Giralt M, Rubio-Félix D. Quality of life related to type 1 Gaucher disease: Spanish experience. Qual Life Res. 2005;14:453–62.
Rosenbloom BE, Weinreb NJ. Gaucher disease: a comprehensive review. Crit Rev Oncog. 2013;18:163–75.
Stirnemann J, Belmatoug N, Camou F, Serratrice C, Froissart R, Caillaud C, Levade T, Astudillo L, Serratrice J, Brassier A, Rose C, Billette de Villemeur T, Berger MG. A review of Gaucher disease pathophysiology, clinical presentation and treatments. Int J Mol Sci. 2017;18:441.
Grabowski GA, Horowitz M. Gaucher’s disease: molecular, genetic and enzymological aspects. Baillieres Clin Haematol. 1997;10:635–56.
Alfonso P, Aznarez S, Giralt M, Pocovi M, Giraldo P. Mutation analysis and genotype/phenotype relationships of Gaucher disease patients in Spain. J Hum Genet. 2007;52:391–6.
Grabowski GA, Zimran A, Ida H. Gaucher disease types 1 and 3: phenotypic characterization of large populations from the ICGG Gaucher registry. Am J Hematol. 2015;90(Suppl 1):S12–8.
Zimran A, Belmatoug N, Bembi B, Deegan P, Elstein D, Fernandez-Sasso D, Giraldo P, Goker-Alpan O, Lau H, Lukina E, Panahloo Z, Schwartz IVD, GOS Study group. Demographics and patient characteristics of 1209 patients with Gaucher disease: descriptive analysis from the Gaucher outcome survey (GOS). Am J Hematol. 2018;93(2):205–12.
Giraldo P, Pocoví M, Pérez-Calvo J, Rubio-Félix D, Giralt M. Report of the Spanish Gaucher’s disease registry: clinical and genetic characteristics. Haematologica. 2000;85:792–9.
Fried K. Gaucher’s disease among the Jews of Israel. Bull Res Council Isr. 1958;7B:213.
Hillborg PO. Morbus Gaucher: Norbotten. Nord Med. 1959;61:303.
Horowitz M, Wilder S, Horowitz Z, Reiner O, Gelbart T, Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics. 1989;4:87–96.
Grabowski GA. Phenotype, diagnosis, and treatment of Gaucher’s disease. Lancet. 2008;372:1263–71.
Stirnemann J, Vigan M, Hamroun D, Heraoui D, Rossi-Semerano L, Berger MG, Rose C, Camou F, de Roux-Serratrice C, Grosbois B, Kaminsky P, Robert A, Caillaud C, Froissart R, Levade T, Masseau A, Mignot C, Sedel F, et al. The French Gaucher’s disease registry: clinical characteristics, complications and treatment of 562 patients. Orphanet J Rare Dis. 2012;7:77.
Giraldo P, Alfonso P, Irún P, Gort L, Chabás A, Vilageliu L, Grinberg D, Sá Miranda CM, Pocovi M. Mapping the genetic and clinical characteristics of Gaucher disease in the Iberian Peninsula. Orphanet J Rare Dis. 2012;7:17.
Barton NW, Brady RO, Dambrosia JM, Di Bisceglie AM, Doppelt SH, Hill SC, Mankin HJ, Murray GJ, Parker RI, Argoff CE, et al. Replacement therapy for inherited enzyme deficiency--macrophage-targeted glucocerebrosidase for Gaucher’s disease. N Engl J Med. 1991;324:1464–70.
Beutler E, Dale GL, Guinto DE, Kuhl W. Enzyme replacement therapy in Gaucher's disease: preliminary clinical trial of a new enzyme preparation. Proc Natl Acad Sci U S A. 1977;74:4620–3.
Zimran A, Altarescu G, Philips M, Attias D, Jmoudiak M, Deeb M, Wang N, Bhirangi K, Cohn GM, Elstein D. Phase 1/2 and extension study of velaglucerase alfa replacement therapy in adults with type 1 Gaucher disease: 48-month experience. Blood. 2010;115:4651–6.
Zimran A, Brill-Almon E, Chertkoff R, Petakov M, Blanco-Favela F, Muñoz ET, Solorio-Meza SE, Amato D, Duran G, Giona F, Heitner R, Rosenbaum H, Giraldo P, Mehta A, Park G, Phillips M, Elstein D, Altarescu G, Szleifer M, Hashmueli S, Aviezer D. Pivotal trial with plant cell-expressed recombinant glucocerebrosidase, taliglucerase alfa, a novel enzyme replacement therapy for Gaucher disease. Blood. 2011;118:5767–73.
Cox T, Lachmann R, Hollak C, Aerts J, van Weely S, Hrebícek M, Platt F, Butters T, Dwek R, Moyses C, Gow I, Elstein D, Zimran A. Novel oral treatment of Gaucher’s disease with N-butyldeoxynojirimycin (OGT 918) to decrease substrate biosynthesis. Lancet. 2000;355:1481–5.
Lukina E, Watman N, Arreguin EA, Banikazemi M, Dragosky M, Iastrebner M, Rosenbaum H, Phillips M, Pastores GM, Rosenthal DI, Kaper M, Singh T, Puga AC, Bonate PL, Peterschmitt MJ. A phase 2 study of eliglustat tartrate (Genz-112638), an oral substrate reduction therapy for Gaucher disease type 1. Blood. 2010;116:893.
Mistry PK, Balwani M, Baris HN, Turkia HB, Burrow TA, Charrow J, Cox GF, Danda S, Dragosky M, Drelichman G, El-Beshlawy A, Fraga C, Freisens S, Gaemers S, Hadjiev E, Kishnani PS, Lukina E, Maison-Blanche P, Martins AM, Pastores G, Petakov M, et al. Addendum to letter to the editor: safety, efficacy, and authorization of eliglustat as a first-line therapy in Gaucher disease type 1. Blood Cells Mol Dis. 2019;77:101–2.
Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Heal Inf Sci Syst. 2014;2(1):3.
Gandomi A, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag. 2015;35:137–44.
Rai BK, Meshram AA, Gunasekaran A. Big data in healthcare management: a review of literature. Am J Theor Appl Bus. 2018;4:57–69.
Yu L, Chao H, Lizhong D, Zhonxia L, Yijie P, Xin G. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. 2019;166:4–21.
Bishop CM. In pattern recognition and machine learning. New York: Springer-Verlag; 2006.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI. 1995;14:1137–45.
R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2018. Available from https://www.R-project.org/.
Wickham H. In ggplot2: elegant graphics for data analysis. New York: Springer-Verlag; 2016.
Terry Therneau and Beth Atkinson. rpart: Recursive Partitioning and Regression Trees. R package version 4.1–15. 2019. Available from https://CRAN.R-project.org/package=rpart.
Kwon JM, Kim KH, Jeon KH, Lee SE, Lee HY, Cho HJ, Choi JO, Jeon ES, Kim MS, Kim JJ, Hwang KK, Chae SC, Baek SH, Kang SM, Choi DJ, Yoo BS, Kim KH, Park HY, Cho MC, Oh BH. Artificial intelligence algorithm for predicting mortality of patients with acute heart failure. PLoS One. 2019;14(7):e0219302.
Fu Y, Jia J, Yue L, Yang R, Guo Y, Ni X, Shi T. Systematically Analyzing the Pathogenic Variations for Acute Intermittent Porphyria. Front Pharmacol. 2019;10:1018.
Garcelon N, Burgun A, Salomon R, Neuraz A. Electronic health records for the diagnosis of rare diseases. Kidney Int. 2020;97:676.
Brasil S, Pascoal C, Francisco R, Dos Reis FV, Videira PA, Valadão AG. Artificial Intelligence (AI) in rare diseases: is the future brighter? Genes (Basel). 2019;10(12):978.
Rappaport N, Fishilevich S, Nudel R, Twik M, Belinky F, Plaschkes I, Stein TI, Cohen D, Oz-Levi D, Safran M, Lancet D. Rational confederation of genes and diseases: NGS interpretation via GeneCards, MalaCards and VarElect. Biomed Eng Online. 2017;16(Suppl 1):72.
Andrade-Campos M, Alfonso P, Irun P, Armstrong J, Calvo C, Dalmau J, Domingo MR, Barbera JL, Cano H, Fernandez-Galán MA, Franco R, Gracia I, Gracia-Antequera M, Ibañez A, Lendinez F, Madruga M, Martin-Hernández E, O'Callaghan MDM, Del Soto AP, Del Prado YR, Sancho-Val I, Sanjurjo P, Pocovi M, Giraldo P. Diagnosis features of pediatric Gaucher disease patients in the era of enzymatic therapy, a national-base study from the Spanish Registry of Gaucher Disease. Orphanet J Rare Dis. 2017;12(1):84.
Marcucci G, Zimran A, Bembi B, Kanis J, Reginster JY, Rizzoli R, Cooper C, Brandi ML. Gaucher disease and bone manifestations. Calcif Tissue Int. 2014;95(6):477–94.
van Dussen L, Lips P, van Essen HW, Hollak CE, Bravenboer N. Heterogeneous pattern of bone disease in adult type 1 Gaucher disease: clinical and pathological correlates. Blood Cells Mol Dis. 2014;53(3):118–23.
Astudillo L, Therville N, Colacios C, Ségui B, Andrieu-Abadie N, Levade T. Glucosylceramidases and malignancies in mammals. Biochimie. 2016;125:267–80.
Indellicato R, Trinchera M. The link between Gaucher disease and Parkinson’s disease sheds light on old and novel disorders of sphingolipid metabolism. Int J Mol Sci. 2019;20(13):3304.
Hughes D, Mikosch P, Belmatoug N, Carubbi F, Cox T, Goker-Alpan O, Kindmark A, Mistry P, Poll L, Weinreb N, Deegan P. Gaucher disease in bone: from pathophysiology to practice. J Bone Miner Res. 2019;34(6):996–1013.
Andrade-Campos M, Valero E, Roca M, Giraldo P, Spanish group on Gaucher Disease. The utility of magnetic resonance imaging for bone involvement in Gaucher disease. Assessing more than bone crises. Blood Cells Mol Dis. 2018;68:126–34.
Mistry PK, Batista JL, Andersson HC, Balwani M, Burrow TA, Charrow J, Kaplan P, Khan A, Kishnani PS, Kolodny EH, Rosenbloom B, Scott CR, Weinreb N. Transformation in pretreatment manifestations of Gaucher disease type 1 during two decades of alglucerase/imiglucerase enzyme replacement therapy in the International Collaborative Gaucher Group (ICGG) Gaucher registry. Am J Hematol. 2017;92(9):929–39.
Hruska KS, LaMarca ME, Scott CR, Sidransky E. Gaucher disease: mutation and polymorphism spectrum in the glucocerebrosidase gene (GBA). Hum Mutat. 2008;29(5):567–83.
Gervas-Arruga J, Cebolla JJ, de Blas I, Roca M, Pocovi M, Giraldo P. The influence of genetic variability and proinflammatory status on the development of bone disease in patients with Gaucher disease. PLoS One. 2015;10(5):e0126153 Published 2015 May 15.
Raskovalova T, Deegan PB, Mistry PK, Pavlova E, Yang R, Zimran A, Berger J, Bourgne C, Pereira B, Labarère J, Berger MG. Accuracy of chitotriosidase activity and CCL18 concentration in assessing type I Gaucher disease severity. A systematic review with meta-analysis of individual participant data. Haematologica. 2020.
Irún P, Cebolla JJ, López de Frutos L, De Castro-Orós I, Roca-Espiau M, Giraldo P. LC MS/MS analysis of plasma glucosylsphingosine as a biomarker for diagnosis and follow-up monitoring in Gaucher disease in the Spanish population. Clin Chem Lab Med. 2020;58:798–809.
Pawelec G. Immunity and ageing in man. Exp Gerontol. 2006;41(12):1239–42.
de Fost M, Out TA, de Wilde FA, et al. Immunoglobulin and free light chain abnormalities in Gaucher disease type I: data from an adult cohort of 63 patients and review of the literature. Ann Hematol. 2008;87:439–49.
Nguyen Y, Stirnemann J, Lautredoux F, Cador B, Bengherbia M, Yousfi K, Hamroun D, Astudillo L, Billette de Villemeur T, Brassier A, Camou F, Dalbies F, Dobbelaere D, Gaches F, Leguy-Seguin V, Masseau A, Pers YM, Pichard S, Serratrice C, Berger MG, Fantin B, Belmatoug N, on behalf of the French Evaluation of Gaucher Disease. Treatment Committee † Immunoglobulin Abnormalities in Gaucher disease: an analysis of 278 patients included in the French Gaucher Disease Registry. Int J Mol Sci. 2020;21:1247. https://doi.org/10.3390/ijms21041247.
Nair S, Branagan AR, Liu J, Boddupalli CS, Mistry PK, Dhodapkar MV. Clonal immunoglobulin against Lysolipids in the origin of myeloma. N Engl J Med. 2016;374:555–61.
We are indebted to all the members of the Grupo Español de Enfermedades de Depósito Lisosomal (GEEDL) and with all of the treating physicians and collaborators of the FEETEG and Association of Patients with GD (AEEFEG) and specially to the patients and their families.
This work was supported by FEETEG.
All the patients included in the SRDG have signed an informed consent to the use of their data on research purpose. The scientific and ethics committees of the FEETEG foundation approved this study.
Consent for publication
PhD Jorge J Cebolla is employee of Takeda Pharmaceutical outside the submitted work; all other authors have indicated they have no financial relationships, relevant to this article, to disclose.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
(supplementary material). Correlation between numerical variables. Table S2. (supplementary material). Correlations between categorical variables. Table S3. (supplementary material): Correlations of the conditions with numerical variables. Table S4. (supplementary material) Correlations of the conditions with categorical variables Bone disease.
About this article
Cite this article
Andrade-Campos, M.M., de Frutos, L.L., Cebolla, J.J. et al. Identification of risk features for complication in Gaucher’s disease patients: a machine learning analysis of the Spanish registry of Gaucher disease. Orphanet J Rare Dis 15, 256 (2020). https://doi.org/10.1186/s13023-020-01520-7
- Gaucher disease
- Machine learning
- Bone crisis