There is scientific consensus that CTX is clinically not well recognized, frequently has delayed diagnosis , and has a prevalence likely underestimated in the general population [20, 39], with less than 600 cases reported worldwide [12, 28]. In order to facilitate early diagnosis of this disease, we sought to calculate the most updated conservative and accurate estimates for CTX prevalence. We followed the approach of previous studies for estimating how common a monogenic autosomal recessive disease may be [20, 25, 33, 40, 41], and we leveraged the availability of a large genetic dataset, gnomAD. We developed a highly curated list of alleles in CYP27A1 that have been reported to be pathogenic and used their bioinformatic characteristics to identify additional missense alleles of CYP27A1 that are predicted to be pathogenic for CTX. We then applied the Hardy–Weinberg principle to estimate CTX risk in global populations. These analyses show that CTX is present in global populations at rates more common than previously appreciated. For the first time, we also attempted to geographically map CTX clinical activity worldwide, using a new emerging tool for genomic variation data sharing.
The belief that CTX is an exceedingly rare occurrence is not an uncommon scenario in the field of rare diseases. As for CTX, this is typically due to the intrinsic nature of the disease, which can show pleomorphism in clinical and laboratory features overlapping with other diseases, having a variable clinical course and symptoms onset, and lacking any clear genotype–phenotype correlations [3, 42]. The task of establishing an accurate estimate for how common CTX may be in the population becomes even more challenging considering the presence of a subpopulation of patients with a milder phenotype . Interestingly, such individuals may carry the same variants (eg, p.Arg395Cys) that are found in patients with neurological involvement, suggesting that perhaps additional damaging mechanisms or genetic modifiers other than CYP27A1 loss of function may be at play. Reinforcing this enigmatic picture is the report of a pair of siblings carrying the same pathogenic variants but being at opposite ends of the clinical spectrum, where one sibling developed rare spinal xanthomatosis and the other developed a mild form with minor tendon xanthomas . In contrast, examples of common clinical manifestation are well documented. Bilateral cataracts are found in about 80% of CTX patients [16, 44] and interim analysis of patients recruited on the sole basis for having juvenile-onset idiopathic bilateral cataract shows a molecularly confirmed diagnosis in 1.5%-1.8% of the patients, representing a 500-fold increase in CTX prevalence in this subset of patients [45, 46].
Our findings largely support, expand upon, and refine previous genetic estimates based on the ExAC data and more limited variant inclusion . We identified three levels for CTX prevalence. The highest estimates were found in the Asian populations at 1 per 44,407–93,084, which are in line with previous investigations (1:36,072–75,601) . In both studies, the top variant remained the missense c.1415G > C (p.Gly472Ala), but with a higher carrier AF in gnomAD compared with the ExAC (0.0014 vs 0.0010) . To our knowledge, the p.Gly472Ala variant has been reported in two patients: one of Asian origin carrying a homozygous mutation  and one in a newborn screening . In the latter study, one sample was found positive for biochemical CTX biomarkers and compound heterozygous for two pathogenic variants in CYP27A1 (c.1214G > A—p.Arg405Gln; c.1415G > C—p.Gly472Ala). While we do not have confirmation that this sample was from an individual of EAS ancestry, we speculate that this might be the case as p.Gly472Ala is unique to EAS and both alleles show high AF in EAS compared with the other ancestries in gnomAD. In such a scenario, the reported incidence of 1 per 32,000 from Hong et al. , might provide an independent and orthogonal validation of our findings in the EAS population.
An intermediate level of CTX prevalence was found in the AMR, AFR, and EUR populations. In AMR, although we obtained very close estimates (1 per 70,795–157,878) compared with the ExAC study (1:71,677–148,914) , differences in the number, type, and AF of variants may be found. For instance, we identified twice as many alleles, with the top variant p.Met383Lys showing a much higher AF (0.00124 vs 0.00078). We provided two prevalence estimates for AMR since it would be difficult to confidently include/exclude this variant from our analysis. Following ACMG/AMP guidelines, we could attempt to apply the BS1 criteria that is used to classify a variant as “likely benign” when its AF is greater than expected for the disorder [30, 47, 48]. However, reliable estimates for population prevalence and allelic heterogeneity would be needed to use the maximum credible population AF as filter . A plausible strategy to leverage this filter would be to create a distribution of prevalence values and use two estimates for allelic heterogeneity, for instance, 10% (conservative) and 30% based on the most frequent allele found in CTX patients (c.1183C > T, p.Arg395Cys) . By doing this exercise, we found that at 10% allelic heterogeneity, p.Met383Lys was retained only with a disease prevalence of 1:10,000–20,000 (penetrance 100%-50%), which is currently not supported by any clinical, epidemiological, or genetic study in the AMR population. At 30% allelic heterogeneity, we found that p.Met383Lys was retained with a disease prevalence of 1:100,000–180,000 (penetrance 100%-50%), which may be currently supported by genetic estimates from the previous CTX study (~ 1:70,000–150,000) . However, it could be argued that if p.Met383Lys was as frequent as p.Arg395Cys, we would have expected at least a few CTX patients described in the literature carrying this allele.
Consistent with the strong literature evidence, we found p.Arg395Cys to be most prevalent in the EUR, FIN, and AMR populations. Also, this variant showed an increase in AF compared with the ExAC frequency (0.00051 vs 0.00017) . Our estimates instead show a significant increase in prevalence for the AFR and EUR populations, that moved from 1:468,624 to 1:166,440 and from 1:461,358 to 1:233,597, respectively . Again, the same top drivers were found between the two studies but with an increase in AF (p.Arg448His: 0.00041 vs 0.00031 in AFR; p.Arg395Cys: 0.00038 vs 0.00021 in EUR). As discussed above, such differences are not surprising, especially when considering the sizable difference in the number of individuals investigated between the two databases. In addition, we need to take into account improvements in the performance of in silico predictors and the evolving nature of clinical and functional information, which are both critical to variant classification.
Similar to the AF of p.Met383Lys, we have retained in our model the start_lost variant c.2T > C (NM_000784.4:p.?), which was found to be unique to the SAS population. While this type of variant may be considered always pathogenic, as it is expected to produce no protein, the clinical impact in practice can be heterogeneous, and alternative mechanisms for protein translation should be considered. In fact, it is notable to find this variant at the heterozygous state in four siblings of a South African family with a mild (no neurological involvement) CTX phenotype (see family #8) . It is possible that future functional studies will be able to clarify the role of the c.2T > C allele in the pathogenesis of CTX or perhaps to rule out its involvement with direct implications for our SAS estimate. The lowest level of prevalence was found in the FIN population, which is in line with findings from the previous CTX study .
Lastly, with the purpose of increasing awareness, having a proactive approach to patient identification, and aiding early treatment intervention, we leveraged the VarSome platform to create a clinically relevant geographic map based on recent query activity. We found that most of the variant queries were from clinical geneticists/genetic counselors and that the most searched variants were consistent with our findings from the analysis of the gnomAD AF. We recognize that the proposed geographic map represents only a proxy to a possible clinical distribution of prospective CTX patients and that the difference in counts per country may also depend on or be limited to factors such as access and popularity of the VarSome search engine, country size, accessibility, and costs to genetic testing, as well as socioeconomic or political efforts to boost scientific progress in the fight against rare diseases. Nonetheless, our findings line up with patient reports from around the world (USA, Israel, Italy, Japan, the Netherlands, Belgium, Brazil, Canada, France, Iran, Norway, Tunisia, Spain, China, and Sweden; see CTX at https://www.ncbi.nlm.nih.gov/books/NBK1409/ and https://rarediseases.org/).
There are limitations and methodological assumptions underlying the approach in this study. We and others have applied the Hardy–Weinberg principle in our genetic risk calculations that assume that heterozygous individuals are not subject to selection, populations are at equilibrium with respect to allele and genotype frequencies, and random mating is observed. Also, the concept of AF and how rare an allele may be is strictly related to the size of the population under investigation; thus, with the availability of larger population-based datasets, estimates will change and become more accurate. Lastly, we recognize that although the adoption of filtering criteria and in silico tools to predict pathogenic missense with unknown or uncertain clinical significance is helpful and commonly used, there is currently no gold standard and these strategies could lead to the over- or underestimation of disease frequency. We attempted to mitigate overestimation by cross-referencing the pathogenic labels across different databases and sources and by applying strict filtering criteria with the goal of providing rather conservative estimates. We mitigated underestimation by manually curating missense variants that were filtered out by our bioinformatic workflow. Additional genetic variation that has not been accounted for in our calculation may be conferred by inframe deletions or insertions, intronic, noncanonical splice sites, and structural variants. Lasty, our approach assumed that all variants in the final model contribute to the risk of disease with 100% penetrance. While we are not aware of reports on CYP27A1 pathogenic variants of reduced penetrance, we cannot exclude their presence.
In conclusion, our study, that includes additional variants, new informatics tools, and newer, expanded databases, supports and refines previous estimates for CTX disease risk at the population level and provides a novel prospective geographical map of CTX clinical activity worldwide. We confirm with this larger, more comprehensive study that CTX is more common than current worldwide patient estimates, and we highlight the most common pathogenic variants. We underscore the value of leveraging large and diversified population-based genetic databases to assess risk for inherited diseases. Such efforts, cross-referenced with other large-scale programs, such as newborn screening  and retrospective administrative claims studies , may provide the most accurate strategy to assess disease presence in a population. In turn, this will translate into greater awareness, better recognition, and early treatment intervention, which will directly benefit patients and caretakers.