Patient and observer reported outcome measures to evaluate health-related quality of life in inherited metabolic diseases: a scoping review

Background Health-related Quality of Life (HrQoL) is a multidimensional measure, which has gained clinical and social relevance. Implementation of a patient-centred approach to both clinical research and care settings, has increased the recognition of patient and/or observer reported outcome measures (PROMs or ObsROMs) as informative and reliable tools for HrQoL assessment. Inherited Metabolic Diseases (IMDs) are a group of heterogeneous conditions with phenotypes ranging from mild to severe and mostly lacking effective therapies. Consequently, HrQoL evaluation is particularly relevant. Objectives We aimed to: (1) identify patient and/or caregiver-reported HrQoL instruments used among IMDs; (2) identify the main results of the application of each HrQoL tool and (3) evaluate the main limitations of HrQoL instruments and study design/methodology in IMDs. Methods A scoping review was conducted using methods outlined by Arksey and O’Malley. Additionally, we critically analysed each article to identify the HrQoL study drawbacks. Results Of the 1954 studies identified, 131 addressed HrQoL of IMDs patients using PROMs and/or ObsROMs, both in observational or interventional studies. In total, we identified 32 HrQoL instruments destined to self- or proxy-completion; only 2% were disease-specific. Multiple tools (both generic and disease-specific) proved to be responsive to changes in HrQoL; the SF-36 and PedsQL questionnaires were the most frequently used in the adult and pediatric populations, respectively. Furthermore, proxy data often demonstrated to be a reliable approach complementing self-reported HrQoL scores. Nevertheless, numerous limitations were identified especially due to the rarity of these conditions. Conclusions HrQoL is still not frequently assessed in IMDs. However, our results show successful examples of the use of patient-reported HrQoL instruments in this field. The importance of HrQoL measurement for clinical research and therapy development, incites to further research in HrQoL PROMs’ and ObsROMs’ creation and validation in IMDs. Electronic supplementary material The online version of this article (10.1186/s13023-018-0953-9) contains supplementary material, which is available to authorized users.


Background
Rare diseases are usually characterized by striking heterogeneity and complexity associated with a lack of evidence-based data and access to clinical experts. This makes it difficult for patients and caregivers to guide their decisions about disease management [1]. These factors have a tremendous impact on patients' health-related quality of life (HrQoL) [2]. HrQoL is defined as a multidimensional concept referring to the subjective evaluation of the impact of the health status in domains related to physical, mental, emotional, and social functioning. It goes beyond direct measures of population health, life expectancy, and causes of death [3]. Patient-centred care is a recent approach in research and clinical practice [4]. One of the ways in which this concept has been advanced is through the development of patient or observer reported outcome measures (PROMs or ObsROMs, respectively). PROMs are direct reports from patients regarding their health condition registered via validated questionnaires with robust psychometric properties [5]. ObsROMs are reports made by a proxy who is in direct contact with the patient when it is not possible to obtain self-reports [6].
PROMs have numerous applications in diverse settings, including research, policy decision-making or treatment effectiveness assessment [7][8][9]. Besides facilitating communication between patients and clinicians (which improves healthcare provision), they can also improve patient outcomes [10]. Importantly, the use of PROMs is highly supported by regulatory agencies, such as the European Medicine Agency (EMA) and the Food and Drug Administration (FDA) [5,11]. They are frequently used in clinical trials, mainly as surrogate endpoints [12,13]. Among other parameters and variables, PROMs can be used to evaluate HrQoL [13][14][15]. While PROMs and ObsROMs are often used in common chronic conditions [16,17], these tools have not often been applied or even developed in rare diseases, due to their specificities [6,18,19]. However, there are a few examples of the successful use of PROMs in rare diseases, particularly in academia, by PROM methodologists, and in collaboration with patient organizations [6,20].
There is a need for PROMs in rare inherited metabolic disorders (IMDs), a group of more than 1000 heterogeneous and often life-threatening diseases [21]. The introduction of newborn screening programs detecting currently more than 50 IMDs and the development of new therapies, increased survival, prevalence, and improved patient outcomes [22,23]. However, living with an IMD affects the patients' HrQoL due to the wide range of diverse and debilitating symptoms and their chronic restrictive diets [24]. Moreover, the natural history of many IMDs is not well defined. This hampers the establishment of care guidelines. Most importantly, a curative treatment is rarely available. Subsequently, patients are generally confronted with a low HrQoL. Without adequate PROMs it is difficult to robustly monitor both symptoms and therapies in IMDs.
PROMs that specifically assess HrQoL are a particularly relevant outcome in chronic and debilitating conditions, especially because biomarkers are not always associated with meaningful benefits to the patients [25]. Until 2013, fewer than 30% of clinical studies for orphan drugs included QoL-related outcomes [26]. We reviewed the literature with three aims: (1) to identify patient and/or caregiver-reported HrQoL instrumentsboth generic and disease-specific -used among IMDs (2) to identify the main results of the application of each HrQoL tool and (3) to evaluate the main limitations of HrQoL instruments and study design/methodology in IMDs to guide future studies.

Data source and search strategy
Our scoping review strategy followed the methodological framework outlined by Arksey and O'Malley [27]. We searched the PubMed database with pre-defined search terms from inception until 26th February 2018. No other sources were included because we are a non-profit organization without external funding and, consequently, no access to databases that require subscription. Two groups of search terms (Additional file 1) were employed: 1) QoL related and 2) IMDs related terms. Free text terms were generated through a pilot PubMed search. We applied every combination of search terms from both groups connected by the Boolean operator "AND". Resulting articles from our search were exported to Mendeley Desktop and duplicated articles were eliminated. References of relevant articles were screened, and additional articles were included by author referral.

Study selection and data extraction
The review was conducted by two researchers. Inclusion criteria were as follows: studies had to be written in English and measure HrQoL using a PROM/ObsROM in an IMD context. Clinician reported outcomes, interviews and reviews were excluded, in order to focus on empirical research-generated evidence. Titles and abstracts were screened and studies that did not meet the criteria were excluded. The remaining article full-text versions were then read and included or excluded according to pre-defined criteria. Any disagreements were settled by discussion.

Critical appraisal strategy
The methodological quality of included studies was assessed using a checklist (Additional file 2) with criteria adapted from a published HrQoL assessment study [28]. The main limitations of the included studies were identified resorting to this checklist.

Results
The initial search of PubMed identified 1954 articles, 13 of which were duplicates. The title and abstract-based selection excluded 1744 articles. One hundred ninety-seven studies moved to the second round of selection. Based on the full-text review of each article, 91 were excluded and 108 were included. Reference screening or author referral led to the inclusion of 23 additional articles. Finally, 131 articles met the inclusion criteria. These were published between 1999 and 2017. The selection workflow is presented in Fig. 1.

Generic and specific HrQoL instruments in IMDs pediatric and adult populations
In total, we identified 32 HrQoL assessment instruments used across IMDs (Table 1). While 84% of the manuscripts reported the use of only one HrQoL instrument, 13% applied two and another 3% used three or more instruments. The 36-Item Short Form Health Survey (SF-36) was the most frequently used measure to evaluate HrQoL across IMDs, applied in 53% of all included studies, followed by the EuroQoL-Five Dimension (EQ-5D) questionnaire (17%). Concerning the pediatric population, the Pediatric Quality of Life Inventory (PedsQL) and the Child Heath Questionnaire (CHQ) were found to be the most utilized either as self-or proxy-reports, present in 12 and 8% of the studies, respectively. Of note, the total number of instruments applied to the pediatric population is similar to that of the adult population (17 and 15 tools, respectively). However, the number of studies designed to assess HrQoL in adults is far superior to the ones directed to children. Only 2% of the studies resorted to specific instruments. Remarkably, only one of these tools is specific for adults, whereas three target children. Among the IMDs in which HrQoL was assessed, only phenylketonuria (PKU) has disease-specific questionnaires, however, there are still studies that use generic instruments to evaluate HrQoL in PKU patients. Importantly, among the identified tools there is a broader instrument targeting metabolic diseases, namely the QoL Scale for Metabolic Diseases-Parent Form. This is a validated, author-built questionnaire specifically developed to evaluate the HrQoL of children with restrictive diets [24]. Despite clear efforts, only about 7% of the more than the 1000 identified IMDs have registered patient or observer-reported HrQoL assessments.

Proxy-reports to complement self-reports
Most metabolic disorders affect not only adults but also the pediatric population. Tools adjusted to obtain information on this population, such as PedsQL and CHQ, have been developed including proxy-versions. It is a common methodological assumption that self-reports are the best method for collecting information [29], because they are the vision of oneself. In contrast, proxy-reports tend to reflect the point of view of another. However, young age and/or disease impact raise the need for proxy-reports [6]. In fact, only 6% of the analysed manuscripts registered the sole use of proxy-reports [30][31][32][33][34][35][36][37].
In a set of cases, the employment of HrQoL instruments did not show substantial impairments compared to control populations. Specifically, the administration of SF-36, PLC, TAAQOL, WHOQOL-100 and KINDL (the last used in 4% of the studies) found normal HrQoL scores in PKU patients [30,43,45,93,94]. Also, in familial hypercholesterolemia patients, SF-12, TACQOL and 15D (applied in one study) did not show differences in HrQoL compared to healthy peers [91,95,96]. No differences in HrQoL were found in patients with mevalonate kinase deficiency when the cognition scale of TAAQOL was administered alone. Importantly, KINDL application in propionic acidemia revealed normal HrQoL scores despite poor neurological and psychosocial outcomes [97].

HrQoL tools detect changes upon treatment in IMD
Besides clinical parameters and disease-specific biomarkers, HrQoL is now recognized as an essential instrument to determine therapeutic effects. Multiple studies included the assessment of changes in HrQoL upon treatment initiation to better evaluate the therapeutic benefits ( Table 2). The SF-36 was predominantly utilized (67% of the interventional studies). Generally, this tool was capable of detecting alterations in HrQoL scores across several IMDs [60,64,[98][99][100][101][102][103][104][105][106][107][108][109][110][111][112][113][114], with the exception of MPS IV and Wilson disease [115,116]. Interestingly, whilst no changes were observed using SF-36 in late-onset Pompe disease, NHP was responsive in this patient niche [117][118][119]. Regarding EQ-5D, detectable variations were observed in Fabry disease patients [120][121][122]. However, HUI and CHQ administration in Fabry disease children did not perceive deviations in HrQoL [40]. Changes in HrQoL were captured by SF-12 in LC-FAOD. On the other hand, in this pediatric population, administration of parent-reported SF-10 did not reveal any differences [38]. In MPS IV, PODCI only detected differences in 1 out of 4 children [115]. PedsQL was able to find fluctuations in HrQoL of Fabry disease and nephropathic cystinosis patients [83,123]. In MPS VI, the application of TAPQOL/TAC-QOL also detected modifications in HrQoL [49]. When HrQoL was measured using general instruments, namely TAAQOL, KINDL and PedsQL in PKU, BH 4 responsive patients' HrQoL did not differ from non-responsive ones 24 months after treatment initiation [44,46]. In contrast, HrQoL assessment with the specific PKU-QOLQ instrument detected a significant improvement in the responsive patients' life quality for up to 1 year [124].
Attesting the importance of HrQoL PROMs and ObsROMs instruments, the continuous increase of HrQoL assessment reports in IMDs is accompanied by an increase in the number of approved therapies for these disorders (Fig. 2).

Critical appraisal and main limitations of HrQoL studies
We critically assessed the quality of the included studies and identified some concerns due to an increased risk of bias (see Additional file 2). Although most studies used a standardized generic tool, validation is rarely obtained for the population in which the instrument is being used. Therefore, the results of the use of a non-specifically validated tool for a certain population should be interpreted with caution. Mostly, the problems faced by clinical researchers dealing with IMDs are inherent to the rarity of these conditions. The biggest limitation is the small sample size [34,37,41,53,56,65,77,82,85,100,102,105,112,115,[125][126][127][128][129][130]. This makes it difficult to reach statistically significant conclusions [35,39,43,44,46,75,79,83,86,90,93,99,[131][132][133] and also precludes the use of adequate control samples as placebo-control groups [40,58,70,76,103,108,109,119,121,134,135]. To counteract the problem of small samples, multinational studies are being performed. However, this can raise other limitations, particularly cross-cultural differences [42,70,89] as well as variations in the range of investigations, protocol and rigor between centres [92,121]. Still, pooling data from multinational studies might be adequate if the study design, the methods, linguistic and cognitive equivalence of the concepts being measured are achieved [136]. Thus, validation should be a never-ending process and one should always look for the psychometric and cross-cultural validity, reliability and acceptability of the measure in the context of each study. Nevertheless, small sample sizes might be considered representative due to the rarity of the condition and the study objectives [39,52,55,78,89,95]. Some countries may not have normative data available and comparison of results with foreign data or with other chronic conditions may be a source of bias [52,54,70,83,128,131]. Moreover, 55% manuscripts do not justify the selection of the HrQoL  -Unchanged HrQoL and similar between responsive and non-responsive patients [46] Children and adolescents (6.6-18.7) BH4 6 months KINDL -Unchanged HrQoL and similar between responsive and non-responsive patients [44] Children and adults (10-49) BH4 1 year PKU-QOLQ -↑ HrQoL in responders, provisional responders and non-responders in terms of impact, satisfaction [124] instrument used. Since a variety of general instruments are available, this selection process should be clearly defined. Additionally, some answer option systems might not be capable of detecting small changes in health (i.e. EQ-5D) [74]. Some HrQoL studies are performed to assess the effects of a certain therapy. Consequently, they are administered before and after the therapy to provide baseline data and to register any possible improvement, respectively. In these cases, loss to follow-up due to noncompliance or abandonment is a recurrent issue [46]. Furthermore, missing data decreases the statistical validity/power of the study. Imputation of missing values can be made by researchers, but this is not always possible [137]. Finally, unreported clinical history and/or genotype data impair possible correlations with HrQoL scores [83].
Disease heterogeneity is very common to IMDs [31,40,75,89,99,103,115]. On the one hand, in cross-sectional studies severe forms of a disease can counterbalance the higher HrQoL scores reached by patients with less severe presentations and vice-versa [43,61,71,87,138]. On the other hand, in order to avoid erroneous conclusions drawn from differently affected patient samples, careful individual inclusion and sampling should be performed [42]. In line with this, longitudinal designs are a better approach to establish the natural history of the disease, pinpoint predictive factors of impaired HrQoL as well as to identify pre  −/post-treatment alterations [41,43,52,70,139]. Nevertheless, variability in treatment or therapy duration in long-term follow-up studies [37,41,112,140] as well as previous therapies and symptom management solutions can constrain long-term analysis [44,53,89,105,110,129,141].
Observation of changes in HrQoL over time can be difficult due to ceiling effects (i.e. patients with scores near the upper scale limit have less room for improvement) [109,124]. In this case, high HrQoL values have been proposed to be due to (i) the disability paradox, e.g., disabled individuals reporting good HrQoL because they focus on their coping strategies and positive emotions [46,50,116]; (ii) patients' expectations adaptation throughout their illness experience; (iii) adaptation by repeatedly using the same instrument [74] and (iv) variation of symptom severity [53]. Importantly, studies focusing on adult patients may often underrate the overall disease impact on HrQoL since severely affected patients may die during infancy. The spectrum of disease severity is thus not fully represented [80].
Selection bias may arise from recruitment through patient advocacy groups, health care institutions and patient registries (convenience sampling) [142]. This can cause a shift towards the inclusion of patients with either severe or milder forms of a disease who can be more or less likely to seek medical care and community support [54, 57-59, 64, 73, 76, 78, 80, 82, 92, 93, 96, 99, 143-146]. Volunteer participation and the patient-reported character of the questionnaires may indirectly exclude patients with poor literacy or cognition [79]. Further inherent drawbacks to patient-reported data include non-verification of diagnosis and symptomatology in medical records [31,58,63,80,143,144,146], recall bias [31,34,70,147] and social desirability [70].

Discussion
Nowadays, regulatory agencies such as FDA, EMA or health technology assessment bodies are turning to HrQoL PROMs data to support decision making. However, PROMs are not yet used routinely in clinical practice. These tools can provide natural history data, but also clinical endpoints for therapeutic trials. Consequently, there is a rise of HrQoL assessment studies using PROMs or ObsROMs in rare IMDs to fill in this gap. This reinforces the importance and need for these instruments to accelarate research and effective clinical solutions.
We identified 32 instruments used either to assess patients' HrQoL or to evaluate the risk or benefit of a specific treatment. Most of the instruments found are generic since for most IMDs, specific instruments are still inexistent. The predominant use of SF-36 is probably due to its validity in an extensive group of populations, languages, and the fact that it comprises a wide age range (14 years and older) [148]. The second most used tool was the EQ-5D, mainly within adult populations. Recently, the EuroQoL group developed the EQ-5D-Y, the child-friendly version of EQ-5D, which might increase the use of this instrument. However, until today, no population norms using EQ-5D-Y have been published. There are considerable fewer studies focused on assessing HrQoL in children. The lack of natural history studies in this population, the perception of a better health, the inability to respond for themselves, and the fact that there are much less pediatric clinical trials might be contributing factors. Nevertheless, the PedsQL and the CHQ were used in 12 and 8% of the studies respectively. The fact that these instruments can be used either as PROMs and/or ObsROMs might contribute to their frequent use compared to other pediatric tools. Frequently, studies with a combined approach of selfand proxy-reports are carried out in this population. This is even more common in disorders presenting cognitive developmental disability. Also important is the reduced ability of young children to identify issues in emotional functioning [83]. Proxy reports can accurately reflect the same aspects as observed in patients. However, they can also reflect the parent' s state of mind, fears and doubts regarding their children, which results in lower HrQoL scores [46]. Therefore, self and proxy HrQoL data should be acquired and compared.
Due to the great symptom heterogeneity found among IMDs, we cannot however grade the 32 instruments found accordingly to their appropriateness to use in these disorders. Therefore, the research team should always look for the conceptual design of each instrument and analyse if it is suitable for the features of the patient population under assessment.
Generic instruments present some advantages. They can be applied to every disease or clinical manifestation and allow comparisons between different patient groups or between patients and healthy populations. However, they are not directed to the features of each condition. This may omit meaningful clinical outcomes limiting the study power [36,46,47,50,62,64,77,147]. Disease-specific instruments include relevant questions related to a particular disease and thus are more responsive and sensitive [149]. However, they only confer the capability of making comparisons within the same patient group. Only two studies applied disease-specific tools, namely PKU-QOL and PKU-QOLQ [89,124]. This fact further highlights the lack of specific HrQoL PROMs in the field of IMDs. As disease-specific and generic instruments assess different aspects of HrQoL, the use of both instruments in a complementary way has been suggested [150,151]. A new group of HrQoL instruments is arising, namely disease group-specific instruments. In the metabolic field, we identified the QoL scale for Metabolic Diseases-Parent Form. More recently, a new promising but still not validated tool was developed for pediatric patients with intoxication type inherited errors of metabolism [152]. These tools focus on common aspects of different diseases, thus allowing comparisons across related but distinct patient populations. Furthermore, they are particularly important in rare diseases since they can overcome limitations associated with small sample size.
We identified instruments capable of detecting changes in HrQoL compared to normative data or following treatment/therapy initiation while others were not responsive. Additionally, in the case of Fabry adult patients, the results of the administration of SF-36 in three different studies evaluating the same therapeutic intervention diverged [109,110,141]. Thus, conclusions should be drawn with caution since other variables besides the quality of the intervention may influence the results. In fact, study design, sampling methods, suitability of the HrQoL instrument for a specific population and its selection according to the study characteristics are important factors to consider in order to obtain robust and reliable results. Additionally, the illness burden in several IMDs is not always easy to prove. For example, in propionic acidemia patients, no significant changes in HrQoL were found in comparison with normal individuals in spite of their poor neurocognitive and psychosocial outcomes [97]. However, we cannot exclude the fact that some IMDs affect patient' s HrQoL to a lesser degree. This has been observed in PKU [30,43,45,93,94] and familial hypercholesterolemia [91,95,96] after efficient treatment following early diagnosis. Nevertheless, although existing tools are not responsive in these subgroups, the measurement of the impact of a highly restrictive diet on patients' QoL is extremely relevant and needed. In fact, PROMs are being developed on this topic to correctly evaluate the HrQoL of these patients [24].
Promising strategies to develop specific HrQoL PROMs that efficiently capture the patient's perspective, prognosis, impactful clinical manifestations and that establish the natural history of the disease include: qualitative interviews with patients, their families and caregivers; patient registries, which also motivate patient enrolment in research projects and clinical trials [10]. The fact that Fabry disease is the condition with higher HrQoL assessment, is likely to be a direct consequence of the successful establishment of two patient registries -the Fabry Outcome Survey (NCT03289065) and the Fabry Registry (NCT00196742). Both include periodic HrQoL evaluations as a clinical outcome; clinical trials networks that facilitate data sharing and collaborations, ultimately improving access to the available information [153].

Conclusion and future directions
Patient-centred approaches based on patients' HrQoL are expanding by the implementation of PROMs or ObsROMs in clinical practice and research settings. However, this review makes it clear that they are still poorly utilized in the field of IMDs. There is a huge gap in the development of responsive disease-specific HrQoL measurement instruments that could be useful endpoints in clinical trials. To overcome the limitations inherent to the rarity of these conditions, efforts should be made not only to develop but also to adequately validate these tools. The successful establishment of international patient registry platforms might be the path with biggest potential to upgrade HrQoL studies across IMDs. They facilitate patient recruitment and uniform data collection worldwide. In line with this, the European Commission health program included a project that consists in a novel registry platform for all known IMDsthe Unified European Registry for Inherited Metabolic Disorders (https://u-imd.org/). Although there is still a long way to go as far as the proper implementation of patient-centred care is concerned, these studies and instruments are important efforts in the right direction. HrQoL assessment through PROMs and ObsROMs are an efficient way of prioritizing the patient perspective. They drive research and more rapidly create therapeutic solutions that meet the patients' needs and expectations. Availability of data and materials All data generated or analysed during this study are included in this published article [and its supplementary information files].
Authors' contributions VRF and LB developed the concept and design of this study. CP and SB performed all the search methodology, results analysis and the writing of this manuscript. RF contributed to the result analysis, writing and to the revision of the manuscript. LB, JJ, PV, AR and DMS critically revised the manuscript for important intellectual content. All authors gave final approval of the version to be published.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.