Skip to main content

Estimated prevalence of mucopolysaccharidoses from population-based exomes and genomes



In this study, the prevalence of different types of mucopolysaccharidoses (MPS) was estimated based on data from the exome aggregation consortium (ExAC) and the genome aggregation database (gnomAD). The population-based allele frequencies were used to identify potential disease-causing variants on each gene related to MPS I to IX (except MPS II).


We evaluated the canonical transcripts and excluded homozygous, intronic, 3′, and 5′ UTR variants. Frameshift and in-frame insertions and deletions were evaluated using the SIFT Indel tool. Splice variants were evaluated using SpliceAI and Human Splice Finder 3.0 (HSF). Loss-of-function single nucleotide variants in coding regions were classified as potentially pathogenic, while synonymous variants outside the exon–intron boundaries were deemed non-pathogenic. Missense variants were evaluated by five in silico prediction tools, and only those predicted to be damaging by at least three different algorithms were considered disease-causing.


The combined frequencies of selected variants (ranged from 127 in GNS to 259 in IDUA) were used to calculate prevalence based on Hardy–Weinberg's equilibrium. The maximum estimated prevalence ranged from 0.46 per 100,000 for MPSIIID to 7.1 per 100,000 for MPS I. Overall, the estimated prevalence of all types of MPS was higher than what has been published in the literature. This difference may be due to misdiagnoses and/or underdiagnoses, especially of the attenuated forms of MPS. However, overestimation of the number of disease-causing variants by in silico predictors cannot be ruled out. Even so, the disease prevalences are similar to those reported in diagnosis-based prevalence studies.


We report on an approach to estimate the prevalence of different types of MPS based on publicly available population-based genomic data, which may help health systems to be better prepared to deal with these conditions and provide support to initiatives on diagnosis and management of MPS.


The mucopolysaccharidoses (MPS) are a group of lysosomal diseases characterized by the deficiency of one of eleven enzymes involved in the breakdown of glycosaminoglycans (GAGs) which are constituents of the extracellular matrix. When there is a disturbance in their activities this leads to downstream consequences at the cellular level affecting multiple organs and systems. The MPS may be divided into different types according to the enzyme deficiency and the accumulated substrate (type I, II, IIIA, IIIB, IIIC, IIID, IVA, IVB, VI, VII, and IX). GAGs are constituents of the extracellular matrix, where impaired activities can lead to a spate of negative consequences both at the cellular and the physiological levels. Affected individuals usually have coarse facial features, cardiac and pulmonary problems, and, depending on the MPS type, bone dysplasia (dysostosis multiplex) and/or neurological impairment such as behavioural problems and developmental delay [1,2,3]. The severity of the diseases is variable, and individuals with MPS I, II, IVA, VI, and VII may benefit from market-approved enzyme replacement therapy, while there are novel therapies such as fusion proteins, gene therapy, and genome editing under investigation for several MPS [4].

Incidence and prevalence data are important to back up health system decisions and are necessary to calculate the cost–benefit of new therapies and treatment. Despite extensive molecular characterization having been done for the genes that encode the enzymes involved in these diseases with over 2,109 pathogenic variants reported in the Human Gene Disease Database (HGMD®) [5], there is still lack of specific epidemiology data on MPS. Newborn screening programs that include lysosomal diseases have arisen worldwide and may bring valuable information. However, such programs are still largely restricted to very few countries and most types of MPS are not included in the list of screened diseases [6, 7]. Population-based genomic data can help narrow the information gap, since now it is possible to rely on carrier frequency instead of the incidence of a disease among live births. However, care must be taken when using in silico predictors to classify genetic variants in order to have the most reliable data possible.

Herein, we used the frequency of potential disease-causing variants present in population-based genomic databases such as the Exome Aggregation Consortium (ExAC) [8]⁠ and the Genome Aggregation Database (gnomAD) [9], to estimate the prevalence of the different types of MPS after applying Hardy–Weinberg principles [10].


Table 1 shows the number of variants present in each database and after the merger, which ranged from 961 (IDS) to 2988 (GALNS). After subsequent filtering steps, these numbers were reduced, ranging from 31 (IDS) to 259 (IDUA) (Table 2). A detailed description of the excluded variants can be found in Additonal file 1: Table S1.

Table 1 Number of variants in each gene present in ExAC and gnomAD
Table 2 Number of variants considered deleterious per category for each gene

The number of variants excluded due to homozygosis ranged between 3 in GNS and GUSB to 113 in IDS (in homozygosis or hemizygosis); none of them were stop gain, stop loss, or start loss. The overall number of heterozygous canonical and non-canonical splice site variants considering all genes was 452, with 224 being considered deleterious by the in silico algorithms. One splice site variant could not be analysed by HSF nor SpliceAI (Additonal file 3: Table S3). In addition, 213 out of 218 frameshift and 188 in-frame insertions and deletions were considered deleterious. Variants that could not be analysed by SIFT Indel were excluded from further analysis. All variants considered deleterious by only one splice program as well as frameshift and nonsense variants in the last exon or located < 50 nucleotides upstream of the 3’ most splice-generated exon-exon junction were excluded from the calculations of minimum frequency. The number of variants considered deleterious in each category is shown in Table 2.

All 3,111 missense variants were analysed by five different in silico tools. A consensus on pathogenicity was reached for 588 variants, while 548 variants were classified as pathogenic by four tools and 382 variants by three.

The allele frequencies of each variant for a given gene were added together and considered as the minimum and maximum frequency of the deleterious recessive allele. This number was then used to calculate minimum and maximum prevalence of disease based on the Hardy–Weinberg equilibrium (Table 3). As the number of variants retained for IDS was very low (31 variants), the estimated frequency of MPS II must be viewed with caution. It is worth noticing that variants on GLB1 can be associated either with MPS IVB or GM1 gangliosidosis.

Table 3 Estimated disease prevalence based on allele frequencies of potentially disease-causing variants for each gene

Only two of the 2,061 retained variants have frequencies over 0.001—p.(His356Pro) in NAGLU with 0.007993 and p.(Asp152Asn) in GUSB with 0.001153. After all five tier variant selections, maximum and minimum estimated disease prevalence was calculated based on global allele frequency (Table 3).

In addition to estimated overall disease prevalence, the prevalence of MPS in specific populations was calculated for eight ethnic groups present in the databases (Figs. 1, 2 and Additonal file 4: Table S4).

Fig. 1

Schematic example showing all steps of maximum (a) and minimum (b) variant selection for the IDUA gene (MPS I)

Fig. 2

Estimated maximum (a) and minimum (b) prevalence of the MPS types per 100,000 individuals in different ethnic groups. Data for MPS II not included (see discussion)


In this study, we used public data from WES and WGS to estimate the prevalence of different types of MPS. As MPS symptoms usually show up in the first decade of life, it is unlikely that severely affected individuals would be part of such databases. However, the possibility of undiagnosed individuals with milder phenotypes being included in those cannot be ruled out. Importantly, individuals homozygous for rare variants present in any MPS gene (Additonal file 2: Table S2), which could represent individuals with attenuated forms of the disease were filtered out in the second-tier variant selection.

The estimated global frequency for all types of MPS except for type VI found in this study was either above or at the upper limit in comparison to frequencies of MPS in different countries based on the number of diagnosed cases in reference centres [20] (Table 4). Worthy of note is the fact that the maximum prevalence as reported by Khan et al., 2017 is for a limited number of countries, whereas our data was calculated collectively for the different ethnic backgrounds present in the databases. This means that we may have overestimated the prevalence of diseases in the general population. A recent study estimated the prevalence of MPS in Brazil based on 600 affected individuals with all types of MPS included in a national network database [21]. The researchers found discrepancy when comparing the estimated prevalence based on diagnosis (0.24/100,000) to the estimated prevalence based on genetic screening for the most common pathogenic variant in IDUA among healthy volunteers (0.95/100,000), for example. Furthermore, the estimated prevalence of MPS VI in Brazil was the second highest in the world, with prevalence similar to that found in the present study (1.02/100,000 compared with 1.12/100,000).

Table 4 Estimated prevalence in the present study compared to the incidence (in 100,000) as reported by Khan et al., 2017 for each MPS type

Several measures were taken to reduce the chance of prevalence overestimation. For example, variants were filtered in sequential steps, in order to obtain the most specific data possible. Also, both homozygotes and variants with frequency higher than 0.001 were excluded. Additional filtering based on functional predictions was also performed in order to include only variants more likely to affect protein function. After that, all variants remaining for analysis had allele frequencies below 0.001 and most of them have not been previously reported as disease-causing. This was expected since variants classified as of uncertain significance (VUS) based on the standards and guidelines of the American College of Medical Genetics/Association of Molecular Pathology (ACMG/AMP) [10] are known to account for a substantial part of disease-causing variants for MPS and have a significant impact on incidence estimates. For example, Clark et al. [22] showed that 25% of VUS analysed in MPS IIIB were potentially disease-causing and cause reduced enzyme activity.

It is worthy of note that sequential filtering steps and use of consensus scores do not guarantee that only pathogenic variants are selected or that only non-pathogenic variants are discarded. However, the estimation error is not directly measurable. Furthermore, the high frequency filter is necessary to exclude variants with frequencies incompatible with MPS disease. Although this may lead the possibility of underascertainment, frequencies like 0.007993 and 0.001153 for variant c.1067A > C; p.(His356Pro) in NAGLU and the c.454G > A; p.(Asp152Asn) in GUSB are not found in clinical practice. These were the only two variants excluded because of high frequency. We considered using curated variants reported either on ClinVar or Human Genome Mutation Database (HGMD), however, this would significantly reduce the number of retained variants (for instance, from 259 to 47 for IDUA, data not shown). Different in silico tools were used to estimate the likelihood of a variant being disease-causing. However, as no data on the sensitivity and specificity of such softwares are available for MPS genes, it is impossible to estimate the number of false-positive results. For instance, several well characterized pathogenic variants reported in HGMD had low deleteriousness scores as evaluated by the Combined Annotation-Dependent Depletion (CADD) [23] that has an overall higher performance than other predictors (data not shown).

The existence of compound heterozygotes cannot be ruled out. In fact, most individuals with MPS who are not a result of from consanguineous marriage are indeed compound heterozygotes. However, due to the structure of both databases used in this study, it is impossible to set up conditions where the occurrence of variants in cis cannot be ruled out, which would contribute to the overestimation of disease prevalence.

Despite these limitations, a similar approach has been used by Appadurai et al., 2015 to estimate the prevalence of cerebrotendinous xanthomatosis (CTX). As in the present study, the authors suggested an apparent underdiagnosis of CTX based on the allele frequency of potentially disease-causing variants present in ExAC. Interestingly, the discrepancy between genomic data and the diagnosis-based incidence is more pronounced for the rarest MPS diseases, such as MPS IIIC, IIID, IVB, VII, and IX. For some forms of MPS I, II, VI, and IX, it is possible that variants leading to deficient enzyme activity are not clinically recognized due to attenuated phenotypes [24,25,26]. On the other hand, severe cases of MPS VII may lead to premature death before the diagnosis is reached or even sought [27].

Notably, data emerging from large datasets of WES and WGS are disclosing novel phenotypes for well-known diseases, especially intermediate phenotypes [28,29,30]. This may also be the case for MPS and could help explain the higher prevalence predicted by our work, with patients not being recognized clinically due to an unusual presentation.

In the case of MPS IVB, there is an additional complexity since the same gene is involved in another lysosomal disorder with different accumulated substrate and clinical features, called GM1 gangliosidosis [31]. In this study, variants of GLB1 were considered disease-causing regardless of the associated phenotype. Therefore, the overall frequency of alleles was used to estimate the prevalence of MPS IVB, whereas in fact only about 13.3% of curated disease-causing variants in this gene are associated with MPS IVB, the rest leading to the three types of GM1 gangliosidosis [32].

After the filtering steps, IDS had a limited number of retained disease-causing variants (29 variants), and therefore the estimated prevalence for MPS II was lower than what has been previously reported [20]. The higher prevalence observed in studies based on reference centres and diagnostic laboratories may be related to the proportion of patients having de novo variants. Pollard et al. [33] show that this happens in 22.5% of MPS II cases. In addition, recombination events between IDS and its pseudogene IDS2 are a common cause of the disease, with structural variants such as gross rearrangements and complete or partial deletions seen in between 10 and 28% of affected individuals [34,35,36,37,38,39,40]. Those types of variants could not be taken into account in our estimates because of the structure of the populational databases used. As a result, the estimated prevalence of MPS II is not as reliable as it is for the other types of MPS. It is worth mentioning that the other study that uses a similar method for two X-linked diseases (Menkes disease and ATP7A-related disorders) [41] also found a very low number of variants, which could suggest that this strategy is not the best approach for X-linked disorders.


In summary, we report on an approach to estimate the prevalence of the different types of MPS based on publicly available population-based genomic data that may help to better tailor screening and diagnostic programs for these diseases, to prepare the health systems to deal with a more precise estimated number of patients, and may serve as a starting point for other rare-disease initiatives.



Genetic variants (GRCh37/hg19) from ExAC V0.3.1 and gnomAD v2.0.2 [8, 9]⁠ were used to estimate the prevalence of different types of MPS. These public data aggregated information from 125,748 WES and 15,708 WGS collected from unrelated individuals and 1,756 parent–offspring trios with no known rare disease. The genetic data were collected from case–control studies of adult-onset common diseases, spanning six global and eight sub-continental ancestries, determined by ancestry-informative markers [9]. Although related individuals can have an influence upon the frequency of variants, the size of the database which has a total of 141,456 individuals makes the influence of 1,756 trios irrelevant.

The data was retrieved separately for each gene, and then merged to create one single unified database. When variants were common to both databases, the allele frequencies from gnomAD were used for further analysis, as it includes ExAC data.

First-tier variant selection

Variants of the gene located in 5′ and 3′ UTR, upstream and downstream, as well as intronic and non-coding transcript exons, were excluded assuming that no disease-causing variant has been described in such positions for any MPS. In addition, synonymous variants outside the exon–intron boundaries were also excluded, as well as variants in non-canonical transcripts.

Second-tier variant selection

In second-tier analysis, missense, nonsense, stop gain and stop-loss, frameshift, and splice site variants present in homozygosis (and hemizygosis for IDS) were excluded based on the assumption that neither ExAC and gnomAD include MPS-affected individuals as they exclude samples from patients with severe pediatric diseases and their relatives [8]. Therefore, any homozygous variant should not be pathogenic. Heterozygous loss-of-function variants such as stop gain, stop loss, and start loss were considered as potentially disease-causing, considering the impact on protein function and strong evidence of pathogenicity as per the ACMG/AMP guidelines [10].

Third-tier variant selection

Heterozygous alterations in canonical or non-canonical splice site were analysed using Human Splice Finder [11] and SpliceAI [12]. In-frame insertions, deletions and frameshift variants outside the last exon were analysed using SIFT Indel [13]. Variants were classified based on the default algorithms parameters for deleteriousness.

Fourth-tier variant selection

The analysis of missense variants was made using five in silico algorithms: MutPred [14], PolyPhen2 [15], PROVEAN [16], SIFT [17], and REVEL [18]. Since Polyphen2 provides more than two categories, results were transformed into binary data considering "possibly pathogenic" and “probably pathogenic” as deleterious. For REVEL, an ensemble algorithm, a rank score over 0.75 was considered deleterious. To calculate the maximum prevalence of the disease, a variant was considered deleterious when at least three software packages agreed on pathogenicity. For the minimum prevalence, we included missense variants for which all in silico tools agreed on pathogenicity.

Fifth-tier variant selection

The remaining variants were analysed to make sure that only rare alleles were retained. Therefore any variant with a frequency greater than 0.001 was excluded, as no variants associated with low enzymatic activity (≤ 15% wild type) were found with higher allele frequencies [19].

Calculation of disease prevalence using Hardy–Weinberg principles

The frequency of a given variant retained as being disease-causing was calculated by dividing the number of chromosomes bearing the genetic change by the total number of chromosomes subjected to analysis in this position. Then the sum of all variant frequencies for each gene was used as the frequency of the recessive allele (q). The prevalence was then calculated as q2, from the Hardy–Weinberg formula p2 + 2pq + q2. The incidence for each specific population was calculated using the population-specific frequencies.

Calculation of confidence Interval

A script in R was used to estimate the confidence interval. The variances in the frequency of variants and in the prevalence estimate were calculated equally as exhibit eqautions 5 and 13 from Clark et al. [22]. The confidence intervals were adapted to consider the sum of allele frequencies instead of probability, as suggested by Clark et al. [22].

Availability of data and materials

The authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.







Human gene disease database


Exome aggregation consortium


Genome aggregation database


Variants classified as of uncertain significance


Combined Annotation-Dependent Depletion


Cerebrotendinous xanthomatosis


Whole exome sequencing


Whole genome sequencing


  1. 1.

    Muenzer J. Overview of the mucopolysaccharidoses. Rheumatology (Oxford). 2011;50(5):v4–12.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Giugliani R. Mucopolysacccharidoses: From understanding to treatment, a century of discoveries. Genet Mol Biol. 2012;35(Suppl 4):924–31.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Sun A. Lysosomal storage disease overview. Ann Transl Med. 2018;6(24):476.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Giugliani R, Federhen A, Vairo F, et al. Emerging drugs for the treatment of mucopolysaccharidoses. Expert Opin Emerg Drugs. 2016;21(1):9–26.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, Cooper DN. The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet. 2014;133(1):1–9.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Robinson BH, Gelb MH. The importance of assay imprecision near the screen cutoff for newborn screening of lysosomal storage diseases. Int J Neonatal Screen. 2019;5(2):17.

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Schielen PCJI, Kemper EA, Gelb MH. Newborn screening for lysosomal storage diseases: a concise review of the literature on screening methods, therapeutic possibilities and regional programs. Int J Neonatal Screen. 2017;3(2):6.

    Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv. 2019;531210. Available from:

  10. 10.

    Appadurai V, DeBarber A, Chiang PW, et al. Apparent underdiagnosis of cerebrotendinous xanthomatosis revealed by analysis of ~60,000 human exomes. Mol Genet Metab. 2015;116(4):298–304.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Desmet FO, Hamroun D, Lalande M, Collod-Béroud G, Claustres M, Béroud C. Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 2009;37(9):e67.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535-548.e24.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    Hu J, Ng PC. SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One. 2013;8(10):e77940. Published 2013 Oct 23; doi:

  14. 14.

    Li B, Krishnan VG, Mort ME, et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics. 2009;25(21):2744–50.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS ONE. 2012;7(10):e46688.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073–81.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Ioannidis NM, Rothstein JH, Pejaver V, et al. REVEL: an Ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet. 2016;99(4):877–85.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Clarke LA, Giugliani R, Guffon N, et al. Genotype-phenotype relationships in mucopolysaccharidosis type I (MPS I): Insights from the International MPS I registry. Clin Genet. 2019;96(4):281–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Khan SA, Peracha H, Ballhausen D, et al. Epidemiology of mucopolysaccharidoses. Mol Genet Metab. 2017;121(3):227–40.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Federhen A, Pasqualim G, de Freitas TF, et al. Estimated birth prevalence of mucopolysaccharidoses in Brazil. Am J Med Genet A. 2020;182(3):469–83.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Clark WT, Yu GK, Aoyagi-Scharber M, LeBowitz JH. Utilizing ExAC to assess the hidden contribution of variants of unknown significance to Sanfilippo Type B incidence. PLoS One. 2018;13(7):e0200008.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886–94.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Kiykim E, Barut K, Cansever MS, et al. Screening mucopolysaccharidosis Type IX in patients with juvenile idiopathic arthritis. JIMD Rep. 2016;25:21–4.

    Article  PubMed  Google Scholar 

  25. 25.

    Pinto E, Vairo F, Conboy E, de Souza CFM, et al. Diagnosis of attenuated mucopolysaccharidosis VI: clinical, biochemical, and genetic pitfalls. Pediatrics. 2018;142(6):e20180658.

    Article  Google Scholar 

  26. 26.

    Rigoldi M, Verrecchia E, Manna R, Mascia MT. Clinical hints to diagnosis of attenuated forms of Mucopolysaccharidoses. Ital J Pediatr. 2018;44(Suppl 2):132.

    Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Sands MS. Mucopolysaccharidosis type VII: a powerful experimental system and therapeutic challenge. Pediatr Endocrinol Rev. 2014;12(Suppl 1):159–65.

    PubMed  Google Scholar 

  28. 28.

    Bonafé L, Kariminejad A, Li J, et al. Brief report: peripheral osteolysis in adults linked to ASAH1 (Acid Ceramidase) mutations: a new presentation of farber’s disease. Arthritis Rheumatol. 2016;68(9):2323–7.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Kim SY, Choi SA, Lee S, et al. Atypical presentation of infantile-onset farber disease with novel ASAH1 mutations. Am J Med Genet A. 2016;170(11):3023–7.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Yu FPS, Amintas S, Levade T, Medin JA. Acid ceramidase deficiency: farber disease and SMA-PME. Orphanet J Rare Dis. 2018;13(1):121.

    Article  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Lee JS, Choi JM, Lee M, et al. Diagnostic challenge for the rare lysosomal storage disease: late infantile GM1 gangliosidosis. Brain Dev. 2018;40(5):383–90.

    Article  PubMed  Google Scholar 

  32. 32.

    Caciotti A, Garman SC, Rivera-Colón Y, et al. GM1 gangliosidosis and Morquio B disease: an update on genetic alterations and clinical findings. Biochim Biophys Acta. 2011;1812(7):782–90.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Pollard LM, Jones JR, Wood TC. Molecular characterization of 355 mucopolysaccharidosis patients reveals 104 novel mutations. J Inherit Metab Dis. 2013;36(2):179–87.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Bunge S, Rathmann M, Steglich C, et al. Homologous nonallelic recombinations between the iduronate-sulfatase gene and pseudogene cause various intragenic deletions and inversions in patients with mucopolysaccharidosis type II. Eur J Hum Genet. 1998;6(5):492–500.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Brusius-Facchin AC, Schwartz IV, Zimmer C, et al. Mucopolysaccharidosis type II: identification of 30 novel mutations among Latin American patients. Mol Genet Metab. 2014;111(2):133–8.

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Kosuga M, Mashima R, Hirakiyama A, et al. Molecular diagnosis of 65 families with mucopolysaccharidosis type II (Hunter syndrome) characterized by 16 novel mutations in the IDS gene: Genetic, pathological, and structural studies on iduronate-2-sulfatase. Mol Genet Metab. 2016;118(3):190–7.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Chiong MA, Canson DM, Abacan MA, Baluyot MM, Cordero CP, Silao CL. Clinical, biochemical and molecular characteristics of Filipino patients with mucopolysaccharidosis type II - Hunter syndrome. Orphanet J Rare Dis. 2017;12(1):7.

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Dvorakova L, Vlaskova H, Sarajlija A, et al. Genotype-phenotype correlation in 44 Czech, Slovak, Croatian and Serbian patients with mucopolysaccharidosis type II. Clin Genet. 2017;91(5):787–96.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Zanetti A, D’Avanzo F, Rigon L, et al. Molecular diagnosis of patients affected by mucopolysaccharidosis: a multicenter study. Eur J Pediatr. 2019;178(5):739–53.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Zhang W, Xie T, Sheng H, et al. Genetic analysis of 63 Chinese patients with mucopolysaccharidosis type II: Functional characterization of seven novel IDS variants. Clin Chim Acta. 2019;491:114–20.

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    Kaler SG, Ferreira CR, Yam LS. Estimated birth prevalence of Menkes disease and ATP7A-related disorders based on the Genome Aggregation Database (gnomAD). Mol Genet Metab Rep. 2020;5(24):100602.

    CAS  Article  Google Scholar 

Download references


The authors would like to thank the Research Incentive Fund of the Clinicas Hospital in Porto Alegre (Fundo de Incentivo à Pesquisa do Hospital de Clínicas de Porto Alegre—- FIPE/HCPA).


This work was supported by the Brazilian National Council for Technological and Scientific Development (CNPq) and the Research Incentive Fund of the Clinicas Hospital in Porto Alegre (FIPE/HCPA).

Author information




UM conceived the study, PB and GP collected the data; PB and FV carried out the analysis and interpretation of data; PB, UM, and FV wrote the manuscript; UM, RG, FV and GP revised the manuscript. All authors read and approved the submitted version of the manuscript.

Corresponding author

Correspondence to Filippo Vairo.

Ethics declarations

Ethics approval and informed consent to participate

No ethical approval was required.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additonal file 1.

The number of variants excluded at each category for each MPS gene at the calculated maximums frequency. Bold numbers identify retained variants.

Additional file 2.

The total number of variants excluded for homozygosis for each MPS gene and the number of homozygosis variants with frequency less than 0.001.

Additional file 3.

The number of variants excluded from the analysis for each MPS gene.

Additional file 4.

The number of variants excluded from the analysis for each MPS gene.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Borges, P., Pasqualim, G., Giugliani, R. et al. Estimated prevalence of mucopolysaccharidoses from population-based exomes and genomes. Orphanet J Rare Dis 15, 324 (2020).

Download citation


  • Mucopolysaccharidoses (MPS)
  • Estimated prevalence
  • Exome aggregation consortium (ExAC)
  • Genome aggregation database (gnomAD)
  • In silico analysis