Skip to main content

The IDeaS initiative: pilot study to assess the impact of rare diseases on patients and healthcare systems



Rare diseases (RD) are a diverse collection of more than 7–10,000 different disorders, most of which affect a small number of people per disease. Because of their rarity and fragmentation of patients across thousands of different disorders, the medical needs of RD patients are not well recognized or quantified in healthcare systems (HCS).


We performed a pilot IDeaS study, where we attempted to quantify the number of RD patients and the direct medical costs of 14 representative RD within 4 different HCS databases and performed a preliminary analysis of the diagnostic journey for selected RD patients.


The overall findings were notable for: (1) RD patients are difficult to quantify in HCS using ICD coding search criteria, which likely results in under-counting and under-estimation of their true impact to HCS; (2) per patient direct medical costs of RD are high, estimated to be around three–fivefold higher than age-matched controls; and (3) preliminary evidence shows that diagnostic journeys are likely prolonged in many patients, and may result in progressive, irreversible, and costly complications of their disease


The results of this small pilot suggest that RD have high medical burdens to patients and HCS, and collectively represent a major impact to the public health. Machine-learning strategies applied to HCS databases and medical records using sentinel disease and patient characteristics may hold promise for faster and more accurate diagnosis for many RD patients and should be explored to help address the high unmet medical needs of RD patients.


When combined, rare diseases are not actually rare, as they collectively affect around 25–30 million people in the United States (US) and more than 300 million people worldwide [1,2,3,4]. RD represent a diverse spectrum of more than 7–10,000 different disorders, most of which affect only a few hundred to a few thousand people per disease [5,6,7,8]. It is estimated that around 85% of RD are genetic diseases, [6] the majority of which are serious or life-threatening conditions that carry substantial morbidity and early mortality, and present considerable medical and financial burdens to RD patients and the families who care for them [9,10,11]. Given the large number of different rare diseases, each of which affects only a small number of patients, assessing the true impact of rare diseases on healthcare systems (HCS) is challenging. RD are generally difficult to diagnose, with many patients undergoing prolonged diagnostic journeys, termed the diagnostic odyssey, in order to obtain an accurate diagnosis [12, 13]. Even when accurately diagnosed though, less than half of RD map to an International Classification of Disease (ICD) 10 code, with far fewer (< 20%) having a specific ICD 10 code, [14] resulting in most RD being under-recognized and under-counted within HCS databases (such as payor/insurance databases) [15,16,17] and myriad downstream effects, such as imprecise coding of RD patients and poor tracking and understanding of both RD patients and the diseases themselves. Further, without a diagnosis, it is often the case that a set of labs, notes, and other features (e.g., a computable phenotype) cannot be reliably or consistently used to identify RD patients. Hence, the true impact of RD on HCS are not well described, and RD remain largely invisible to the HCS.

There are some estimates in the medical literature that medical care for RD patients may account for more than 10% of overall costs in some HCS, [18] and a few small studies, mainly case series, have shown high direct medical costs of RD at single centers in individual diseases or narrow clusters of related diseases (e.g., severe/refractory seizures) [19, 20]. Recently, a patient-reported survey on direct and indirect costs of rare diseases in the U.S. was reported, which showed high direct and indirect medical cost burdens to patients and HCS, with total costs estimated to be about $1 trillion (US) in 2019 [21]. Another recent study examined pediatric and adult hospital discharges in patients with rare and common conditions, which showed substantially higher healthcare utilization in rare versus common diagnoses, with RD accounting for nearly half of the US national healthcare costs [22].

In order to better understand RD medical costs, more accurately identify RD patients, and shorten the diagnostic odyssey for RD patients, additional work needs to be done to develop generalizable methodologies and tools (e.g., clinical decision support tools) that can be used across different HCS to adequately and consistently identify RD patients within HCS and to objectively quantify direct medical costs associated with RD by disease and overall. Similarly, the impact of delayed or misdiagnosis of RD on patients and HCS has not been well quantified [13]. While delayed or misdiagnosis is an issue for both rare and common diseases, delays in diagnosis disproportionately impact RD patients given the often years-long diagnostic odyssey most patients undergo. Misdiagnosis and lack of diagnosis can result in inappropriate care, lack of targeted or, when available, disease modifying treatment, and missed opportunities for intervention that may ameliorate or prevent disease progression, which in some cases is irreversible or require administration within certain time windows (e.g., neurodegenerative or metabolic disorders) [23, 24].

IDeaS (Impact of Rare Diseases on Patients and Healthcare Systems) is a collaboration between the Office of Rare Diseases Research (ORDR) within the National Institutes of Health (NIH) National Center for Advancing Translational Sciences (NCATS), Eversana™, a commercial life sciences company, the Oregon Health & Science University (OHSU), Oregon’s public academic health center, Sanford Health (Sanford), a large integrated healthcare system predominately from the northern Midwestern states, and a health insurer in Australia. IDeaS is intended to be a small preliminary pilot study whose overall purpose is to explore the feasibility of identifying and describing RD patients in a limited set of 14 representative RD within different and diverse HCS. The 3 main aims are to: (1) explore whether methodologies could be developed to quantify patients with RD and provide estimates of disease prevalence in different HCS; (2) quantify the direct medical costs of a representative set of 14 RD in order to identify additional areas for study into RD direct costs and health burdens that may help identify gaps in RD research; and (3) perform a preliminary assessment of the diagnostic journey for selected patients in 2 RD (Batten disease [BD] late infantile neuronal ceroid lipofuscinosis type 2 [CLN2] and cystic fibrosis [CF]) to start to identify disease-course characteristics that might be used to inform the development of strategies that could accelerate RD diagnosis using graphical representation of the disease course in patient “journey maps” (Figs. 5, 6). While the IDeaS pilot study is limited in scope, it is hoped that the results of these explorations will contribute to further development of methods and approaches that can help us better understand the complex issues currently impeding our understanding of cost and utilization drivers for RD that could be applicable to the thousands of known RD, as well as to inform larger research questions, such as the relationships between costs and cost savings, patient outcomes and disease rarity. However, these lines of inquiry will require additional and iterative development of analytical tools and approaches that are beyond the scope of this study.


We conducted the IDeaS study, a retrospective, descriptive pilot study, to explore the feasibility of quantifying patient and direct medical costs for 14 representative RD (Table 1). There are 14 RD (or disease groups) included in the pilot that were selected by the study authors to explore a diverse set of disorders that differed in prevalence, organ systems affected, age of onset, clinical course, and availability of an approved treatment or specific ICD code, intended to be representative of many RD beyond the 14 used in this pilot. The pilot IDeaS study includes 3 main Aims for exploration.

Table 1 Rare diseases included in the IDeaS study, and affiliated ICD and CPT codes

Aim 1: Estimation of disease prevalence in different HCS databases

We initially attempted to identify patients with the 14 pilot RD within the 5 different HCS databases using diagnostic (ICD) codes (see Table 1, Additional file 1: Table S1); however, due to the substantially different billing methods used by the Australian healthcare system (see below), we were not able to reliably connect the Australian HCS data to the 14 RD used in the pilot. Thus, exploration and comparison of the Australian data could not be performed and was dropped from further consideration.

For the remaining 4 HCS, a patient is considered diagnosed with the RD when there are at least two instances of any one of the corresponding diagnosis codes in the patient's chart or medical claims data, occurring at least 3 months apart. Two diseases, pheochromocytoma (Pheo) and Charcot Marie Tooth (CMT), did not have specific ICD codes and additional analyses were attempted by adding specific Current Procedural Terminology (CPT) codes to the search criteria (see Results section).

Percentage of patients

The percentage of patients with a RD was estimated by calculating the number of patients with the disease diagnosis divided by the total number of patients within the HCS database during the specified time period using the source data and HCS approaches summarized in Table 2. For 12 of the 14 RDs that either had specific ICD codes or mapped to 1 or more ICD codes, the 4 HCS databases were searched using the ICD codes listed in Table 1. However, given differences between some of the databases in how patient data is categorized, some customization by system was necessary, including: (1) The NCATS analysis was inclusive of data obtained prior to 2015, and only ICD9 codes were used; and (2) the Eversana database was predominantly organized around billing, and certain non-billable ICD codes were not able to be used in the analyses [for example ICD-9 code 277.0 (Cystic Fibrosis, nonbillable)].

Table 2 Aim 1 healthcare system database characteristics and approaches

The Australia HCS data assessment was performed using the Australian Refined Diagnosis Related Groups (AR-DRG) system, which is an Australian admitted patient classification system that relates the number and type of patients treated in a hospital to the resources required by the hospital, in a clinically meaningful way [25, 26]. AR-DRGs group patients with similar diagnoses requiring similar hospital services. Episodes of admitted hospital acute care are assigned with disease and intervention codes, including Australian Modification ICD-10 (ICD-10-AM) and other coding standards.

The medical literature and public health sources were searched to provide a prevalence estimate comparator for each of the diseases.

Aim 2: Average direct cost estimates by disease

Direct medical costs were estimated for patients with each of 13 of the 14 RDs identified in Aim 1 using HCS data from 2 of the collaborating institutions NCATS and Eversana. For one disease, CMT, which lacked a specific ICD code, patients were not able to be reliably identified in Aim 1, and this disease was dropped from further analysis. Direct medical costs were estimated using the U.S. dollar amount paid to the HCS that was extracted from the database’s billing records. As per Aim 1, a patient was considered diagnosed with the disease when there were at least two instances of any one of the corresponding diagnoses codes in the patient's medical claims, occurring at least 3 months apart. The first occurrence of the diagnosis satisfying these criteria was defined as the date of diagnosis of the disease for the patient. For the NCATS database, patients were first identified using the RD ICD codes (Aim 1) then direct costs were calculated by disease using billing codes that represent what was paid by the State of Florida’s Medicare/Medicaid program for the time period 2007–2012. For Eversana, direct costs were extracted from the payment information in the IBM® Marketscan® Research Database in years 2006–2020, which includes gross payment made to a provider. For a given RD, the total cost was calculated in each of the years for the set of patients with costs in the database during that given year, independent of the stage of diagnosis (both pre-diagnosis and post-diagnosis).

Total cost of care

For NCATS, the total cost of care was calculated by summing the total costs of all visits for each patient in the defined population during the specified time period. For Eversana, the total cost of care for each disease was computed as the sum of cost of care of all patients over all the years.

Average cost per patient (PP)

For NCATS, the average cost per patient (PP) in the 5-year time period was derived by calculating the total cost of all visits for each patient in the defined population and the average was then calculated. For Eversana, the total PP cost was calculated in each year for each disease separately by dividing the total cost of care for all patients in that disease cohort in that year by the number of patients in that disease cohort in that year.

Per patient per year (PPPY) cost

For NCATS, the PPPY cost was calculated by dividing the average cost over the 5-year time period for each RD population. For Eversana, the PPPY cost for each disease was calculated by averaging the PP over the 15-year time period.

Weight average (wtavg) costs PP for the 13 representative RDs were then calculated using the formula shown in Additional file 2: Figure S1.

Control population: average cost of age-matched patients without the rare disease

For NCATS, a control population was created by querying the system for patients that had a general wellness visit within the specified time period. This resulted in patients being pulled with the CPT codes listed as “initial history and examination related to the healthy individual” in adult, adolescent, childhood, and infant age groups (CPT 90750, 90751, 90752, 90754). For Eversana, the average costs for all age-matched patients without the RD within the same HCS database and time period were used as a control using the same methodology.

Aim 3: Creation of patient journey maps in selected diseases

Using patient-level data in the Eversana (IBM® Marketscan®) database, patient “journey maps” were created, which charted the patient’s clinical course for two diseases, BD CLN2 and CF, for two patients per disease who were identified as having the highest total direct medical costs (Figs. 5, 6). For each patient, key clinical features and major medical milestones, patient characteristics, disease-modifying therapy, and billing costs were extracted from the individual patient records and mapped over the available time period.


Aim 1: Estimation of disease prevalence in different HCS databases

Disease percentage within the HCS databases was estimated by identifying RD patients by ICD codes as a percentage of total patients within the HCS (Fig. 1). The findings show that:

Fig. 1

Estimated rare disease percentages by healthcare system database and in the medical literature/published data sources. Percentage of patients with each of the 13 of the 14 representative rare diseases for which a percentage was able to be calculated (excludes CMT) in the 4 healthcare system databases, and disease percentage extrapolated to the US population from the medical literature/public data sources. SCD sickle cell disease, MD muscular dystrophy, CF cystic fibrosis, HHT hereditary hemorrhagic teleangiectasia, BD Batten disease, LGS Lennox Gastaut syndrome, FSGS focal segmental glomerulosclerosis, EOE eosinophilic esophagitis, OI osteogenesis imperfecta, MNGIE mitochondrial neurogastrointestinal encephalopathy, Pheo pheochromocytoma, TA Takayasu’s arteritis, CMT Charcot Marie Tooth disease, NCATS National Center for Advancing Translational Sciences, OHSU Oregon Health and Science University, Med Lit medical literature/public data sources

Two of the 14 RDs, Pheo and CMT, do not have specific ICD codes and patients with these diseases were not able to be identified using ICD codes alone. With the aim of more specifically identifying only the patients with these 2 RD of interest, additional analyses were attempted by adding specific CPT codes to the search criteria. For Pheo, which is included under the non-specific ICD code “benign neoplasm of adrenal gland” (ICD-9 227.0) inclusive of several non-related diseases and conditions, the CPT codes for labs more specific to Pheo (e.g., catecholamines) were added to the search criteria as a more sensitive indicator of Pheo vs other benign adrenal tumors (see Table 1). This combined search for Pheo was able to be performed within the 4 remaining HCS databases (NCATS, Eversana, Sanford, OHSU) resulting in a more targeted identification of Pheo patients. A similar strategy for CMT was attempted using the CPT codes thought to be more specific to CMT [e.g., PMP22 (peripheral myelin protein 22)] (see Table 1); however, this approach resulted in 3 of the 4 HCS databases yielding 0 patients, and was not able to provide estimates of the percentage of patients across the different HCS databases. Thus, CMT was dropped from further analysis.

Second, overall the percentage estimates for the remaining 13 diseases were found to vary widely by HCS (Table 2, Fig. 2). Consistent with the medical literature, Sickle Cell Disease (SCD), Muscular Dystrophy (MD), CF, and Eosinophilic Esophagitis (EoE) had the highest percentages of patients, and Takayasu’s Disease, Pheo, and Mitochondrial NeuroGastroIntestinal Encephalomyopathy (MNGIE) had the lowest. The percentages within a disease were quite variable across the different HCS data analyses, and for many of the diseases, the NCATS analysis showed higher percentages of patients with the selected diseases. These findings may be partially explained by the different populations represented in each of the databases. Many RD, especially genetically-based RD, are known to cluster within certain populations and the variable findings may merely show clustering of populations within certain geographic areas or HCS. For example, many RD are highly debilitating with substantial morbidity that may limit a patient or caregiver’s ability to work or attend school. Thus, RD patients may be disproportionately reliant upon public insurance programs for their healthcare, which may partially explain the higher percentages for some RD in the NCATS findings. The estimates from the medical literature also showed that, in many cases, disease percentages by HCS were not consistent with generally reported literature estimates in that the literature-cited prevalence rates tended to be lower for most of the diseases than the percentages calculated from the HCS databases.

Fig. 2

PPPY cost of care of 13 RD versus control. Average per patient per year costs calculated within 2 different healthcare systems databases A Eversana and B NCATS, versus an age-matched control. SCD sickle cell disease, MD muscular dystrophy, CF cystic fibrosis, HHT hereditary hemorrhagic teleangiectasia, BD Batten disease, LGS Lennox Gastaut syndrome, FSGS focal segmental glomerulosclerosis, EOE eosinophilic esophagitis, OI osteogenesis imperfecta, MNGIE mitochondrial neurogastrointestinal encephalopathy, Pheo pheochromocytoma, TA Takayasu’s arteritis

Aim 2: Average direct cost estimates by disease

Cost per patient per year (PPPY)

An evaluation of direct medical costs by disease was estimated independently for the NCATS and Eversana HCS data sources and compared to an age-matched control without the RD. Direct medical costs to payors from HCS billing records were estimated by averaging per patient (PP) cost by disease and total direct costs vs control were estimated by adding the average cost PP by disease over the respective time periods. The results show that average RD costs ranged from 1.5- to 23.9-fold higher versus control (Fig. 2). The Eversana HCS database estimates (Fig. 2a), which were extracted from a mix of commercial and public insurance/payors over an almost 15-year time period (2006–2020), showed per patient per year (PPPY) costs ranged from $8812 to $140,044 for RD patients vs $5862 for the control. The highest PPPY costs for RDs for the Eversana analysis were for Urea Cycle Disorders (UCD), Lennox Gastaut Syndrome (LGS), and BD, and the lowest for EoE, Hereditary Hemorrhagic Telangiectasis (HHT), and SCD. The NCATS estimates (Table 2b), which were extracted from an almost exclusively Medicaid datasource for the 5-year period 2007–2012, PPPY costs ranged from $4859 to 18,994 for RD patients versus $2211 for the control. The highest PPPY costs were for MNGIE, UCD, and MD, and the lowest for EOE, HHT and Pheo. While the NCATS and Eversana cost estimates differed by PPPY and by cost per disease, in every case, the PPPY cost for RD patients exceeded those of the control.

An estimated PPPY cost averaged across the RD was estimated using a weighted average (wtavg). The wtavg for the Eversana analysis was $16,644 for an average RD patient versus $5862 for the control (2.8-fold higher for RD vs control), and for the NCATS analysis was $10,695 for a RD patient versus $2211 for the control (4.8-fold higher).

Total cost within time period

Total costs by RD within the time period, averaged by year, were then calculated by multiplying the number of patients with the disease (or control) by the average cost of the disease (Figs. 3, 4, Table 3). For the Eversana analysis (Fig. 3), the results show that the total costs were higher for the control population and for any individual RD. For NCATS (Fig. 4), there were 3 RD that exceeded the average total costs per year, including LGS, MD and SCD, and with the total costs per disease and control differing from the Eversana data. The reasons for generally lower total costs per disease vs control is likely due to the small number of patients per disease, despite the high average costs PP for RD. The high total costs for the 3 RD in the NCATS analysis vs control are likely due to LGS, MD and SCD being relatively prevalent for a RD, and due to the possible enrichment of patients with RD in a public insurance database.

Fig. 3

Eversana RD versus control total costs of 13 RD over 15-year time period. Total costs within the 15-year time period 2005–2020 calculated from the Eversana HCS database for 13 representative RD. Costs were calculated by taking the average PPPY cost by disease (Fig. 2a) and multiplying by the number of patients with the disease (Table 3). SCD sickle cell disease, MD muscular dystrophy, CF cystic fibrosis, HHT hereditary hemorrhagic teleangiectasia, BD Batten disease, LGS Lennox Gastaut syndrome, FSGS focal segmental glomerulosclerosis, EOE eosinophilic esophagitis, OI osteogenesis imperfecta, MNGIE mitochondrial neurogastrointestinal encephalopathy, Pheo pheochromocytoma, TA Takayasu’s arteritis

Fig. 4

NCATS RD versus control total costs of 13 RD over 5-year time period. Total costs within the 5-year time period 2002–2007 calculated from the NCATSHCS database for 13 representative RD. Costs were calculated by taking the average PPPY cost by disease (Fig. 2b) and multiplying by the number of patients with the disease (Table 3). SCD sickle cell disease, MD muscular dystrophy, CF cystic fibrosis, HHT hereditary hemorrhagic teleangiectasia, BD Batten disease, LGS Lennox Gastaut syndrome, FSGS focal segmental glomerulosclerosis, EOE eosinophilic esophagitis, OI osteogenesis imperfecta, MNGIE mitochondrial neurogastrointestinal encephalopathy, Pheo pheochromocytoma, TA Takayasu’s arteritis

Table 3 Unique patient counts and calculated disease percentages by HCS, and estimates from the medical literature

Aim 3: Creation of patient journey maps in selected diseases

In order to better understand the disease course leading to diagnosis for RD, with the hopes to identify and diagnose patients with RD sooner after clinical presentation, an exploratory analysis of individual patient journeys were plotted on journey maps, which document key medical events, diagnosis and treatments in 2 RD areas, BD and CF. For this pilot analysis, 2 highest cost patients with each disease were mapped and compared with each other. BD and CF were selected because they have an available disease modifying therapy that allowed for preliminary description of clinical course pre- and post-therapy.

For CF (Fig. 5), 2 highest costs patients were overlaid, with the date of diagnosis used as time 0 for each patient. The results show the overall clinical course of Patient 1 (red), who experienced 2 upper respiratory tract illnesses approximately 10 and 20 months prior to diagnosis, and was later diagnosed with CF at age 5 years and started on disease modifying therapy (ivacaftor) at approximately 2 years post-diagnosis. Patient 1’s course post-diagnosis shows costs predominantly for prescription drugs, with almost no subsequent clinical events in the post-diagnosis time period. Patient 2 (blue) experienced primary pulmonary hypertension, congestive heart failure, major depressive disorder, and substance abuse disorder clinical events in the approximately 30 months prior to diagnosis, with a CF diagnosis at age 20 years. He subsequently underwent prolonged home infusion therapy and a heart-double lung transplant, accounting for much of the high direct medical costs for this patient.

Fig. 5

Diagnostic journey maps in 2 high-cost cystic fibrosis patients. BDP MDI beclomethasone dipropionate metered dose inhaler, CC complication/comorbidity, ICU intensive care unit, PERT pancreatic enzyme replacement therapy

For BD (Fig. 6), 2 highest costs patients were evaluated, one with CLN2 for which there is an approved disease-modifying therapy, and one unspecified BD patient who did not receive disease-modifying therapy. The results show that pre-diagnosis, Patient 1 (CLN2, red), whose HCS data begins at approximately 12 years of age, had neurodegenerative complications of the disease beginning at the start of his known clinical course, and diagnosis at age 14 years. Disease-modifying therapy (cerliponase) was initiated approximately 4 months after diagnosis, and the patient’s course post-diagnosis reflects costs predominantly for prescription drugs, with two clinical events for BD-related complications (shunt removal) in the post-diagnosis time period. Patient 2 (BD, blue) had premature birth, numerous ICU and other hospitalizations for convulsions, respiratory failure, nervous system procedures, and other complications of BD, with subsequent diagnosis at age 2 years, and post-diagnosis events, including ICU and hospitalizations relating to neurodegenerative and respiratory complications of the disease, and eventual transition to home nursing care.

Fig. 6

Diagnostic journey maps in 2 high-cost batten disease patients. CLN2 late infantile neuronal ceroid lipofuscinosis type 2


In this pilot study, we explored the feasibility of quantifying the number of RD patients within different HCS and the direct medical costs for their care, and performed a preliminary analysis of the diagnostic journey for individual RD patients. The results are notable for three major findings.

First, estimating RD percentages within and across different databases and HCS using straight-forward ICD code search strategies is not able to provide reliable or consistent RD patient identification or disease percentage estimates. We saw wide variability in percentage estimates for 14 representative RD, which may, in part, be due to differences in patient populations within the different HCS, the different types of HCS, and the type of data being queried (EHR data vs medical claims data). Given that many RD are genetic, clustering of patients in geographic areas or different payor systems with specialized expertise is not unexpected; however, in preliminary analysis of the diagnostic journey, and as reported by others, [12, 13] we know that many RD patients undergo prolonged periods of time where they are undiagnosed or misdiagnosed, which also may contribute to small percentages and variability across HCS. Furthermore, the lack of infrastructure for sharing RD knowledge and tools for diagnosis in HCS could lead to disparities in diagnostic rate and time to diagnosis. For the 2 RD in our sample that did not have a specific ICD codes (Pheo, CMT), identifying patients with these conditions was even more difficult. Pheo patients were relatively consistently identified across the different HCS by developing customized search criteria, in this case using specific CPT codes, but CMT patients could not be reliably identified using a similar approach. Given that at least half of RD do not currently map to a specific ICD code, consistently and reliably quantifying the estimated 25–30 million RD patients in the US with the thousands of different RD is a daunting task that would require individualized and computable phenotyping criteria for most RD.

Identifying and quantifying RD patients internationally was shown to be even more difficult. Different countries use different approaches for patient classification and payment, which may not be readily applied in other HCS, and our attempts to combine the AR-DRG system into the study were unsuccessful. Interoperability and data/knowledge sharing are crucial to improve the ability for HCS to diagnose and care for patients. RD, being rare, require this knowledge and data from around the world be utilized in local HCS; our attempts to identify and profile RD patients in Australia highlight this persisting need. Many open science initiatives exist to overcome these issues; however, coding systems, classification strategies, and tools for sharing RD case information have yet to be implemented in most HCS. Further, important data and knowledge are needed directly from patients, such as from registries, natural history studies and biobanks, however, these important datasources that could provide this knowledge [27] have not to date been integrated into HCS. A call to action to make data and knowledge openly shareable and interoperable into HCS was recently published [28].

Second, RD direct medical costs are high, with RD average PPPY costs estimated to be approximately three to fivefold higher than age-matched controls. While there were differences in total direct costs PP depending on different payors HCS used, the PP costs were still consistently higher across the RD in this sample when compared to non-RD patients. This result is not unexpected—high direct medical costs and healthcare utilization are surrogates for poor health. Patients with complex conditions and serious illnesses, regardless of type or rarity, are generally heavily reliant upon healthcare services to sustain life and relieve pain and suffering with resultant high costs to patients and their families, HCS, and society writ large. Most RD are genetic disorders that interrupt or affect fundamental biological processes (e.g., enzyme deficiencies), are overwhelmingly serious and life-threatening conditions, often affecting more than one organ system, which result in substantial impacts to the patient’s overall health and activities of daily living. Unlike most other illnesses however, RD disproportionately (but not exclusively) affect younger patients—children, adolescents, and young adults—with impacts, on average, showing substantially higher costs versus age-matched non-RD patients.

We additionally note that the total cost of an individual RD was generally lower than for the control overall. Given the fragmentation of small numbers of RD patients across thousands of different disorders and despite the relatively high PP costs per RD disease, many RD are likely to have a relatively low total cost (PP cost times the number of patients) that may not stand out within HCS, and thus, not call sufficient attention to the seriousness and high clinical needs for many RD.

Third, preliminary assessment of high-cost RD patients with two RD (CF, BD) showed that these patients had long (ranging for ~ 1.5 to 20 years) diagnostic journeys after first clinical presentation prior to receiving a definitive diagnosis, which for 3 of the 4 patients described resulted in the occurrence of irreversible complications of the disease and ongoing high costs and HCS utilization related to disease progression.

Mapping of the clinical course also showed that there is potential for identifying and diagnosing suspected RD patients sooner. These patients showed recurrent engagement with the HCS, persistent and progressive symptoms often falling into more general “basket” terms (e.g., convulsions, developmental delay, recurrent infections), and high utilization relative to age-matched controls. These patterns could be leveraged to escalate patients for definitive diagnosis and intervention sooner in order to slow disease progression or avoid catastrophic presentations and hospital admissions (e.g., organ transplant, ICU stays) [29]. We saw candidate diagnoses within the problem lists, and although often found in clinical notes, they may not be documented as diagnoses until later time points. The administration of disease-modifying treatments showed changes in clinical course in the two patients in this study. While high-costs continued post-diagnosis and treatment administration, the costs for the treated patients almost entirely clustered into the costs for outpatient treatment administration vs continuing hospital care for the patients without a disease-modifying therapy. This signal in individual patients shows hope for earlier diagnosis and intervention, where available, potentially offering beneficial effects and altering the clinical course in some RD.

Study limitations

There were several limitations to this study. The study was intended to be a pilot/exploratory study to assess the feasibility of identifying and quantifying costs and utilization in RD in a select sample of 14 RD. Although the sample of RD was chosen to reflect the diversity of RD, with widely varying presentations, clinical course, age and populations affected in this sample, the 14 RD admittedly represent only a small sample of the estimated 7–10,000 different RD and 25–30 million patients in the US with RD, and it is not known if these RD are truly representative of the RD population generally. This study was also intended as a preliminary feasibility pilot to begin to address the large problem of identifying, describing and quantifying RD patient data within the US healthcare system, which could then be used to answer larger research questions currently beyond the scope of the IDeaS analyses, such as relationships between costs and cost savings, patient outcomes and disease rarity. However, we see the current analyses as important first steps in what is intended to be an iterative process of developing methodologies that can progressively and deliberately address these larger research goals over time. Additionally, the widely varying percentages of these diseases in different HCS and versus commonly cited literature sources makes it difficult to understand the true prevalence of RD in HCS in the US. The information sources presented additional limitations. Data included in the EHR, but not placed in structured data fields is not available for simple extraction and limits the ability to identify RD diagnoses. While this may occur with both rare and common disease diagnoses, it disproportionately affects RD because only about half of RD can be mapped to a more specific ICD code or cluster, as well as the prolonged timelines between symptom/disease onset and accurate diagnosis and coding of RD patients that make them especially difficult to identify within HCS. Additionally, US patients frequently change their HCS plans and lack of continuity of data from one EHR or HCS to another makes it difficult to identify original diagnosis dates or sentinel signs/terms that may facilitate RD recognition [30]. Thus, taken together, our study suggests that RD patients have long diagnostic journeys compounded by lack of HCS continuity, and tend to be classified under broader non-specific terms, at least early on in their disease course, resulting in percentage estimates that are likely to be underestimates of their true prevalence and impact of RD on HCS.

Direct costs are also based on the costs to payors, which are known to differ substantially by type of insurance (or no insurance) for individual patients. PP and total costs in the 2 HCS presented in this study varied widely, and likely reflect differences in the payor status (e.g., commercial vs public) in the two HCS. However, in either case, RD costs PP were still notably higher than matched control. Direct medical costs also only account for a portion of total medical costs on patients, families, and HCS. We were not able to assess out-of-pocket costs and indirect costs (such as social and support services) that patients and societies incur for RD patient care and treatment.


Overall, these preliminary findings suggest several major considerations for RD that should form the basis for additional study.

  • RD patients are likely to be under-recognized and under-estimated in HCS databases and in cost estimates for their medical care. This under-estimation results in the lack of recognition of the true scope of the public health impact of RD on HCS, as well as the vast unmet and ongoing medical needs for RD patients.

  • PP costs on average in this study were around three- to fivefold higher than a matched control; gross extrapolation of this average costs estimate in a large HCS database (Eversana, estimated at approximately ~ $17 K per RD patient per year vs ~ $6 K for the control) for an estimated 25 million RD patients in US would result in total yearly direct medical costs for RD in the range of $400 billion per year, making the cost burden similar to other high-cost diseases, such as cancer [31] and heart failure, [32] and exceeding those of Alzheimer’s disease [33]. Additionally, the large variance in the cost of care of patients with the same RD could be attributed to different reasons—using HCS and insurance claims databases to stratify patient cohorts within a given RD to surface diagnostic, therapeutic, and utilization patterns will be valuable in the quest to better understand disease course and uncover ideal disease management interventions.

  • Machine-assisted strategies for early identification and diagnosis of likely RD patients may be feasible. Journey maps in selected RD patients revealed potential characteristics, such as young age, high utilization, recurrent hospitalizations and severe clinical presentations, that may assist with early identification and escalation for definitive diagnosis. Genetic diagnosis as part of the early diagnosis strategy has been shown to be beneficial in other analyses, and importantly, impact clinical course and patient management, especially if implemented earlier [34,35,36,37].

Thus, we conclude that the results from this small pilot study of RD impact on HCS show that the 14 RD included in this pilot have high medical burdens to patients and HCS, likely in a similar range to burdens experienced by patients with other serious diseases, such as cancer, heart failure and Alzheimer’s disease; however, these results will need to be confirmed in a larger cohort of RD. This suggests that RD represent a major impact to public health, have high unmet medical needs, and that there is an urgent and considerable need for earlier and accurate RD diagnosis and intervention to address medical management for RD patients that is further supported by similar high-cost burden results seen in two other recent cost-burden studies [21, 22].

Finally, with the information and data gathered from this small pilot study, we have sought to bring attention to key considerations (such as limitations in coding) that have been recognized for many years in the RD community that continue to limit our ability to better understand RD and their impacts on patients and the public health. This is an important line of inquiry and we hope that efforts such as this study, will begin to open new areas of research that can improve our ability to identify RD patients more accurately, and assess and mitigate the impacts (utilization and cost) of RDs by leveraging available HCS data.

Availability of data and materials

The datasets are not publicly available because the information was extracted from healthcare systems databases, but pooled/summary datasets are available from the corresponding author on reasonable request.



Also known as


Australian Refined Diagnosis Related Groups


Batten disease


Beclomethasone dipropionate metered dose inhaler




Cystic fibrosis


Late infantile neuronal ceroid lipofuscinosis type 2


Charcot Marie Tooth disease


Current Procedural Terminology




Electronic health record


Eosinophilic esophagitis


Focal segmental glomerulosclerosis




Hereditary hemorrhagic telangiectasis


Healthcare system


International Classification of Disease


Lennox Gastaut syndrome


Muscular dystrophy


Mitochondrial neurogastrointestinal encephalopathy


National Center for Advancing Translational Sciences


Neuronal ceroid lipofuscinosis


National Institutes of Health


Osteogenesis imperfecta


Oregon Health and Science University


Office of Rare Diseases Research


Pancreatic enzyme replacement therapy




Per patient


Per patient per year


Rare diseases


Sickle cell disease


Takayasu’s arteritis


Urea cycle disorders


United States


Weighted average


  1. 1.

    Institute of Medicine (IOM). Chapter 2. Profile of rare diseases. IOM (US) committee on accelerating rare diseases research and orphan product development. In: Field MJ, Boat TF (eds). National Academies Press (US), Washington, DC; 2010.

  2. 2.

    NCOD (National Commission on Orphan Diseases). Report of the national commission on orphan diseases. Public Health Service, U.S. Department of Health and Human Services, Rockville, MD; 1989.

  3. 3.

    NIH NCATS. Genetics and rare diseases information center. FAQs about rare diseases. Accessed 14 April 2021.

  4. 4.

    Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, Murphy D, Le Cam Y, Rath A. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–73.

    Article  PubMed  Google Scholar 

  5. 5.

    Haendel M, Vasilevsky N, Unni D, Bologa C, Harris N, Rehm H, Hamosh A, Baynam G, Groza T, McMurry J, Dawkins J, Rath A, Thaxon C, Bocci G, Joachimiak MP, Kohler S, Robinson PN, Mungall C, Oprea RI. How many rare diseases are there? Nat Rev Drug Discov. 2020;19(2):77–8.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Online Mendelian Inheritance in Man®: an online catalog of human genes and genetic disorders. OMIM gene map statistics. Accessed 14 April 2021.

  7. 7.

    Orphanet. Orphanet Report Series. Prevalence and incidence of rare diseases: Bibliographic data. January 2021. Accessed 14 April 2021.

  8. 8.

    Hartley T, Balci TB, Rojas SK, Eaton A, Care4Rare Canada, Dyment DA, Boycott KM. The unsolved rare genetic disease atlas? An analysis of the unexplained phenotypic descriptions in OMIM®. Am J Med Genet. 2018;178C:458–63.

    CAS  Article  Google Scholar 

  9. 9.

    Klimova B, Storek M, Valis M, Kuca K. Global view on rare diseases: a mini review. Curr Med Chem. 2017;24:3153–8.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Angelis A, Tordrup D, Kanavos P. Socio-economic burden of rare diseases: a systematic review of cost of illness evidence. Health Policy. 2015;119:964–79.

    Article  PubMed  Google Scholar 

  11. 11.

    Ryder S, Leadley RM, Armstrong N, Westwood M, de Kock S, Butt T, Jain M, Kleijnen J. The burden, epidemiology, costs and treatment for Duchenne muscular dystrophy: an evidence review. Orphanet J Rare Dis. 2017;12:79.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    EURORDIS. Survey of the delay in diagnosis for 8 rare diseases in Europe (EurordisCare2). Fact Sheet EurordisCare2. 2007. Accessed 14 April 2021.

  13. 13.

    Molster C, Urwin D, Di Pietro L, Fookes M, Petrie D, van der Laan S, Dawkins H. Survey of healthcare experiences of Australian adults living with rare diseases. Orphanet J Rare Dis. 2016;11:30.

    Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Zhu Q, Nguyen DT, Grishagin I, Southall N, Sid E, Pariser A. An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD). J Biomed Semant. 2020;11(1):13.

    Article  Google Scholar 

  15. 15.

    Boycott KM, Hartley T, Biesecker LG, Gibbs RA, Innes AM, Riess O, Belmont J, Dunwoodie SL, Jojic N, Lassmann T, Mackay D, Temple IK, Visel Z, Baynam G. A diagnosis for all rare genetic diseases: the horizon and the next frontiers. Cell. 2019;1:32–7.

    CAS  Article  Google Scholar 

  16. 16.

    Cohen AM, Chamberlin S, Deloughery T, Nguyen M, Bedrick S, Meninger S, Ko JJ, Amin JJ, Wei AJ, Hersh W. Detecting rare diseases in electronic health records using machine learning and knowledge engineering: case study of acute hepatic porphyria. PLoS ONE. 2020;15: e0235574.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Gunne E, McGarvey C, Hamilton K, Treacy E, Lambert DM, Lynch SA. A retrospective review of the contribution of rare diseases to paediatric mortality in Ireland. Orphanet J Rare Dis. 2020;15:311.

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Walker CE, Mahede T, Davis G, Miller LJ, Girschik J, Brameld K, Sun W, Rath A, Aymé S, Zubrick SR, Baynam GS, Molster C, Dawkins HJS, Weeramanthri TS. The collective impact of rare diseases in Western Australia: an estimate using a population-based cohort. Genet Med. 2017;19:546–52.

    Article  PubMed  Google Scholar 

  19. 19.

    Reaven NL, Funk SE, Lyons PD, Story TJ. The direct cost of seizure events in severe childhood-onset epilepsies: a retrospective claims-based analysis. Epilepsy Behav. 2019;93:65–72.

    Article  PubMed  Google Scholar 

  20. 20.

    Fernandez IS, Amengual-Gual M, Aguilar CB, Loddenkemper T. Estimating the cost of status epilepticus admissions in the United States of America using ICD-10 codes. Seizure. 2019;17:295–303.

    Article  Google Scholar 

  21. 21.

    Everylife Foundation. The national burden of rare disease study. 2021. Accessed 15 April 2021.

  22. 22.

    Navarrete-Opazo AA, Singh M, Tisdale A, Cutillo CM, Garrison SR. Can you hear us now? The impact of health-care utilization by rare disease patients in the United States. Genet Med. 2021.

    Article  PubMed  Google Scholar 

  23. 23.

    Biffi A, Montini E, Lorioli L, Cesani M, Fumagalli F, Plati T, Baldoli C, Martino S, Calabria A, Canale S, Benedicenti F, Vallanti G, Biasco L, Leo S, Kabbara N, Zanetti G, Rizzo WB, Mehta NAL, Cicalese MP, Casiraghi M, Boelens JJ, Del Carro U, Dow DJ, Schmidt M, Assanelli A, Neduva V, Di Serio C, Stupka E, Gardner J, von Kalle C, Bordignon C, Ciceri F, Rovelli A, Roncarolo MG, Aiuti A, Sessa M, Naldini L. Lentiviral hematopoietic stem cell gene therapy benefits metachromatic leukodystrophy. Science. 2013;341:1233158.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Finkel RS, Mercuri E, Darras BT, Connolly AM, Kuntz NL, Kirschner J, Chiriboga CA, Saito K, Servais L, Tizzano E, Topaloglu H, Tulinius M, Montes J, Glanzman AM, Bishop K, Zhong ZJ, Gheuens S, Bennett CF, Schneider E, Farwell W, De Vivo DC, for the ENDEAR Study Group. Nusinersen versus sham control in infantile-onset spinal muscular atrophy. N Engl J Med. 2017;377:1723–32.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Australian Government. Australian Institute of Health and Welfare (AIHW). Australian refined diagnosis-related groups (AR-DRG) data cubes. Updated 07-Dec-2020. Accessed 27 April 2021.

  26. 26.

    Dimitropoulos V, Yeend T, Zhou Q, McAlister S, Navakatikyan M, Hoyle P, Pilla J, Loggie C, Elsworthy A, Marshall R, Madden R. A new clinical complexity model for the Australian refined diagnosis related groups. Health Policy. 2019;123:1049–52.

    Article  PubMed  Google Scholar 

  27. 27.

    Garcia M, Downs J, Russel A, Wang W. Impact of biobanks on research outcomes in rare diseases: a systematic review. Orphanet J Rare Dis. 2018;13:202.

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Rubinstein YR, Robinson PN, Gahl WA, Avillach P, Baynam G, Cederroth H, Goodwin RM, Groft SC, Hansson MG, Harris NL, Huser V, Mascalzoni D, McMurry JA, Might M, Nellaker C, Mons B, Paltoo DN, Pevsner J, Posada M, Rockett-Frase AP, Roos M, Rubinstein TB, Taruscio D, van Enckevort E, Haendel MA. The case for open science: rare diseases. JAMIA Open. 2020;3:4720486.

    Article  Google Scholar 

  29. 29.

    Feng LB, Grosse SD, Green RF, Fink AK, Sawicki GS. Precision medicine in action: the impact of ivacaftor on cystic fibrosis-related hospitalizations. Health Aff. 2017;37:773–9.

    Article  Google Scholar 

  30. 30.

    PR Newswire. Finn Partners national survey reveals how fragmented health system places greater burden on patients. Accessed 27 April 2021.

  31. 31.

    Park J, Look KA. Health care expenditure burden of cancer care in the United States. Inquiry. 2019;56:1–9.

    Article  Google Scholar 

  32. 32.

    Urbich M, Globe G, Pantiri K, Heisen M, Bennison C, Wirtz HS, Di Tanna GL. A systematic review of medical costs associated with heart failure in the USA (2014–2020). Pharmacoeconomics. 2020;38:1219–36.

    Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Wong W. Economic burden of Alzheimer disease and managed care considerations. Am J Manag Care. 2020;26:S177–83.

    Article  PubMed  Google Scholar 

  34. 34.

    Costain G, Walker S, Marano M, Veenma D, Snell M, Curtis M, Luca S, Buera J, Arje D, Reuter MS, Thiruvahindrapuram B, Trost B, Sung WWL, Yuen RK, Chirayat D, Mendoza-Londono R, Stavropoulos J, Scherer SW, Marshall CR, Cohn RD, Cohen E, Orkin J, Meyn MS, Hayeems RZ. Genome sequencing as a diagnostic test in children with unexplained medical complexity. JAMA Netw Open. 2020;3:e2018109.

    Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Farnaes L, Hildreth A, Sweeney NM, Clark MM, Chowdhury S, Nahas S, Cakici JA, Benson W, Kaplan RH, Kronick R, Bainbridge MN, Friedman J, Gold JJ, Ding Y, Veeraraghavan N, Dimmock D, Kingsmore SF. Rapid whole-genome sequencing decreases infant morbidity and cost of hospitalization. NPJ Genom Med. 2018;3:10.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Grosse SD, Farnaes L. Genomic sequencing in acutely ill infants: what will it take to demonstrate clinical value? Genet Med. 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Reuter CM, Kohler JN, Bonner D, Zastrow D, Fernandez L, Dries A, Marwaha S, Davidson J, Brokamp E, Herzog M, Hong J, Macnamara E, Rosenfeld JA, Schoch K, Spillmann R, Undiagnosed Diseases Network, Loscalzo J, Krier J, Stoler J, Sweetser D, Palmer CGS, Phillips JA, Shashi V, Adams DA, Yang Y, Ashley EA, Fisher PG, Mulvihill JJ, Bernstein JA, Wheeler MT. Yield of whole exome sequencing in undiagnosed patients facing insurance coverage barriers to genetic testing. J Genet Couns. 2019;28:1107–18.

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    National Organization for Rare Disorders. Rare Disease Database. 2021. Accessed 15 April 2021.

  39. 39.

    Summar ML, Koelker S, Freedenberg D, Le Mons C, Haberle J, Lee HS, Kirmse B. The European Registry and network for intoxication type metabolic diseases (E-IMD), and the members of the urea cycle disorders consortium (UCDC). The incidence of urea cycle disorders. Mol Genet Metab. 2013;110(1–2):179–80.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    ASCO®. Pheochromocytoma and Paraganglioma: Statistics. 2021.,of%20these%20cases%20are%20malignant. Accessed 15 April 2021.

  41. 41.

    Orphanet. The portal for rare diseases and orphan drugs. Mitochondrial neurogastrointestinal encephalomyopathy. 2021. Accessed 15 April 2021.

Download references


Not applicable.


Open Access funding provided by the National Institutes of Health (NIH). AT, CC, JR, AP are full-time employees of the NIH. Otherwise, no grants or other funding were used in support of this manuscript.

Author information




All of the authors contributed to the writing of this manuscript and have consented to its submission for publication. Data analyses were performed by AT, CC, RN, BL, DN, CH, CHC, EG. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Anne R. Pariser.

Ethics declarations

Ethics approval and consent to participate

Not applicable. Retrospective review of pooled anonymized data.

Consent for publication

Not applicable.

Competing interests

AT, AP, CC, JR, BL, DP, CH, DN, CHC, EG, HD, RN, PR and OS declare that they have no competing interests relevant to this manuscript. MH declares that she is the co-founder of Pryzm Health.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Table S1: ICD and CPT Codes and Descriptions.

Additional file 2

. Figure S1. Weighted Average Cost Formula Weighted average (wtavg) costs were calculated by taking the sum of the number of patients in each RD cohort (#ptRD1-13) and dividing it by the sum of all RD patient cohorts (sum pt), then multiplying by the sum of the PPPY costs of all RD patient cohorts (sumPPPY). Then the individual RD weighted averages were combined to create a weighted average for our total 13 RD population (wtavg RDpop) Abbreviations: WTavg, weighted average, RDPOP, total population for 13 representative RD, #pt, number of patients, sumPPPY, sum of the PPPY costs of all RD patient cohorts.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tisdale, A., Cutillo, C.M., Nathan, R. et al. The IDeaS initiative: pilot study to assess the impact of rare diseases on patients and healthcare systems. Orphanet J Rare Dis 16, 429 (2021).

Download citation


  • Rare diseases
  • Health care costs
  • Diagnosis
  • Utilization