This consensus procedure was designed to develop a numerical scale for assessing MPS I phenotypic severity at diagnosis to facilitate treatment decisions and patient communication.
Consensus was reached on a list of six items that were considered to be most important for phenotypically classifying MPS I patients at diagnosis (Table 2, 'final consensus list'). Our consensus procedure also identified several items generally considered to be major hallmarks of MPS I that were nevertheless excluded from the list of key items for the following reasons. First, the signs and symptoms important for establishing an MPS I diagnosis often do not differentiate between mild and severe phenotypes, e.g., umbilical or inguinal hernias, coarse facial features, and the presence or severity of corneal clouding and hepatosplenomegaly. Second, although some symptoms (e.g., hydrocephalus) are frequently encountered in young patients with severe phenotypes, the experts did not consider their absence to be an indicator of a mild phenotype. The 'diagnosis at a young age' item was also excluded because it can be influenced by several factors, such as diagnostic difficulties or delays in seeking medical attention. The age of onset of any of the six key signs and symptoms was considered to be a more valuable indicator of phenotypic severity. Finally, cognitive decline was not included as a key item because this item can only be assessed during follow-up (in contrast to developmental delay, which can usually be ascertained at the time of diagnosis).
As a result of the process of item generation, selection and validation, we decided that constructing a reliable numerical scale for assessing phenotypic severity in MPS I patients was not feasible due to the remarkable variability in the expert assessments (Figures 1 and 2), which resulted in a large inter-observer variability. As a result, the expert score that served as the 'gold standard' proved to be unreliable. Even in the patients with high median expert scores, indicating a severe phenotype, the individual expert scores varied considerably, with some experts also assigning the cases intermediate scores (Figures 1 and 2). As a result of this variability among the expert assessments, there would always be patients (even those with the most severe MPS I-H phenotype) whose severity scores from a system based on the six selected items would differ considerably from the median expert score and from a severity score given by several of the experts.
Comparably significant variability in expert severity assessment will likely also occur for other rare diseases, including other lysosomal storage diseases, in which there are pleiotropic and progressive disease manifestations. Although the methods applied in our study may be used to tease out factors related to assessing disease severity that are comparable to the six key items that we obtained for MPS I (Table 2), constructing a reliable severity scale based on clinical signs and symptoms may often be impossible.
The need for reliable and early prediction of MPS I phenotypic severity has become even more pressing with the development of high-throughput newborn screening (NBS) techniques based on measuring IDUA activity and/or immune-quantification of the IDUA protein in dried blood spots [26–28]. A severity score based on clinical signs and symptoms will certainly not be useful in this context, given that a number of signs and symptoms will not be present in the neonatal period.
Because clinical signs and symptoms appear to be insufficiently reliable to assess phenotypic severity at diagnosis, other methods should be vigorously investigated. Combined genotyping and biomarker analysis in plasma and/or urine, such as the recently reported plasma heparin cofactor II-thrombin (HCII-T) complex and the urinary dermatan sulfate:chondroitin sulfate (DS:CS) ratio, promises to be a good strategy for determining disease severity in newly diagnosed MPS I patients [2, 3, 29, 30]
Our study has several limitations. First, the experts differed with respect to the age, phenotype and ethnic background of their patient experiences. These observations may have influenced their opinion of disease severity. Second, the patient information used to write the case descriptions was gathered retrospectively. Thus, some follow-up results were known for most of the patients, which may have biased the data retrieval and the description of the cases. Moreover, the information that had been recorded in the patient files may have been influenced by knowledge of which interventions had (or had not) been performed. Finally, assessing phenotypic severity is hampered by the subjective rating of certain items, e.g., the presence or absence of kyphosis and frontal bossing on clinical examination, the parents' report of the age of symptom onset and the influence of decreased range of motion due to joint disease on performing activities of daily living.