A nomenclature and classification for the congenital myasthenic syndromes: preparing for FAIR data in the genomic era

Background Congenital myasthenic syndromes (CMS) are a heterogeneous group of inherited neuromuscular disorders sharing the common feature of fatigable weakness due to defective neuromuscular transmission. Despite rapidly increasing knowledge about the genetic origins, specific features and potential treatments for the known CMS entities, the lack of standardized classification at the most granular level has hindered the implementation of computer-based systems for knowledge capture and reuse. Where individual clinical or genetic entities do not exist in disease coding systems, they are often invisible in clinical records and inadequately annotated in information systems, and features that apply to one disease but not another cannot be adequately differentiated. Results We created a detailed classification of all CMS disease entities suitable for use in clinical and genetic databases and decision support systems. To avoid conflict with existing coding systems as well as with expert-defined group-level classifications, we developed a collaboration with the Orphanet nomenclature for rare diseases, creating a clinically understandable name for each entity and placing it within a logical hierarchy that paves the way towards computer-aided clinical systems and improved knowledge bases for CMS that can adequately differentiate between types and ascribe relevant expert knowledge to each. Conclusions We suggest that data science approaches can be used effectively in the clinical domain in a way that does not disrupt preexisting expert classification and that enhances the utility of existing coding systems. Our classification provides a comprehensive view of the individual CMS entities in a manner that supports differential diagnosis and understanding of the range and heterogeneity of the disease but that also enables robust computational coding and hierarchy for machine-readability. It can be extended as required in the light of future scientific advances, but already provides the starting point for the creation of FAIR (Findable, Accessible, Interoperable and Reusable) knowledge bases of data on the congenital myasthenic syndromes.


Background
Congenital myasthenic syndromes (CMS) are rare inherited neuromuscular disorders characterized by fatigable weakness of skeletal muscle owing to compromised function of the neuromuscular junction (NMJ). First described in the 1940s [1] as a potential rare "familial" form of infantile myasthenia contrasting with the more common autoimmune-mediated myasthenia gravis, the first genetic defects associated with the condition were reported in the 1990s [2]. With the advent of next-generation sequencing (NGS), the number of genetic defects reported as causative of a CMS phenotype has increased dramatically, with as many as 31 genes now implicated [3]. The known types of CMS range in frequency from more than 1000 individuals to single sporadic reported cases, and around 20 to 40% of cases remain without a genetic diagnosis after exome sequencing [3]. Although all CMS share the common features of NMJ pathology and fatigable weakness, the severity of the disease, its course of progression, specific phenotypic manifestations and even effective treatments are highly variable between the different types. Furthermore, different pathogenic changes within the same gene may result in different pathological processes and therefore markedly different disease manifestations and therapeutic options [4].
Within this complex environment, it is clear not only that precision in diagnosis is important in order to correctly define the disease and institute appropriate treatment, but that precision in coding or classification of this diagnosis is a prerequisite for any attempt at systematizing knowledge and linking it to a specific CMS type. Yet coding and classification has long been a vexed issue in the rare disease field as a whole, going far beyond CMS [5]. Where clinical or genetic entities do not have a named entry in disease coding systems, they are often invisible in clinical records and inadequately annotated in information systems, since features that apply to one disease but not another cannot be adequately differentiated [6].
Coinciding with the dramatic increase in genomic data and computational approaches to diagnosis, recent years have seen the emergence of new data science approaches and their application to clinical problems to allow the systematization of existing and newly generated clinical knowledge in a way that is more accessible to computational analysis. This has been termed the FAIR data approach, an acronym that stands for Findable, Accessible, Interoperable and Reusable and represents the concept that the utility of clinical and research data is dramatically increased if it can be made accessible to reuse by others [7]. Precision in nomenclature terms is just one aspect of making a dataset FAIR, but nevertheless a crucial one in order to attach the right knowledge to the right disease. Our present study aimed to create a comprehensive classification for all CMS disease entities as a starting point that will then allow generation of FAIR-compliant datasets of knowledge about each type.

Methods
We began by defining the CMS disease entities to be considered in the classification. We adopted a broad definition of CMS as any genetic neuromuscular condition manifesting with fatigable weakness of skeletal muscle and apparent NMJ involvement. We defined individual CMS "unique entities" at (a) gene level in cases where the presumed pathomechanism is identical for defects anywhere in a given gene, or (b) sub-gene level in cases where different defects in different regions of the same gene result in different disease manifestations due to differing pathomechanisms (e.g. to differentiate slow-channel from fast-channel syndromes within the same acetylcholine receptor gene). We did not split the classification to account for variable severity, age of onset or incomplete penetrance of phenotypic features where the underlying pathomechanism is the same, and we excluded non-CMS presentations of disorders caused by defects in the same genes that may also cause CMS presentation (e.g. kidney presentations of LAMB2 defects). In the case of genetic entities affecting ubiquitous metabolic pathways (glycosylation defects, mitochondrial defects), some specific mutations cause a primary neuromuscular transmission defect, and these are included in our classification, while other mutations cause wider organ involvement, where the neuromuscular transmission defect may become irrelevant or not detectable (e.g. syndromic congenital disorders of glycosylation, encephalomyopathy), and these are then classified elsewhere.
The entities thus defined therefore aim to be those that from a data science perspective are sufficiently granular to allow the mapping of disease to feature and extend the range of knowledge about that specific disease entity. Based on this framework, through a literature review we developed a comprehensive listing of all unique CMS clinical and genetic entities described to date that met our criteria for inclusion. We captured the range of terminology used in the literature to describe or name each entity or group of entities. Using their publicly available online browsers, we then reviewed the most widely used medical and genetic coding systems to establish their coverage of these published CMS disease entities. Table 1 provides details of the coding systems analyzed, the browsers used and the summary results of the search.
From the results of the initial stage of the research, we concluded that all existing coding systems had major gaps in coverage, in most cases caused by inadequate levels of granularity, with the most granular entities either completely or partially absent. Given the pressing need to define a fully granular classification for the "data science" purposes described above, we initiated a collaboration with Orphanet to extend the Orphanet nomenclature [8] to include our unique CMS disease entities. We aimed to avoid creation of a competing classification given the multiplicity of systems already in existence, and Orphanet was selected as the most suitable system for this collaboration because it aims to be a fully comprehensive coding system specifically designed for rare disease; it makes use of a hierarchical system or tree-like structure in which disease entities can be grouped in different logical ways; it includes mappings to many other coding systems at appropriate levels of granularity thus ensuring interoperability [9]; and it welcomes collaborations with domain experts for the purposes of extending its nomenclature. Orphanet has published a procedural document [10] for rare disease nomenclature in English that provides detailed guidance for naming entities, which states that names should be based on clinical practice, validated by experts in the field, comprehensive, consistent, and as stable as possible with regard to evolution of scientific knowledge. We therefore defined "descriptive names" for each entity in a manner consistent with the Orphanet guidelines, creating a clinically understandable name for each entity that should be stable notwithstanding the rapid advances in understanding the genetics of CMS. It is important to note that while the descriptive names are valuable from the perspective of human understanding, the essential point is that the disease entities are assigned unique identifiers within the coding system, which enables computer-readability and interoperability with other systems.
At the initial stage, the full listing of unique clinical entities that are classed as a CMS according to our definition is a non-hierarchical nosology or "flat" table (Table 2) mapped to the existing coding systems as appropriate. However, since Orphanet allows the creation of a hierarchical classification in which individual disorders may be grouped into one or multiple parent groups based on specific features, we also created an additional table in which we grouped all the unique entities from Table 2 based on etiological or other features (Table 3).

Results
We defined a total of 39 unique clinical/genetic CMS entities and provided descriptive names for each ( Table 2). These were mapped to existing OMIM and Orphanet classifications and existing expert-defined descriptive terms for each were captured from the literature to aid in the definition of group-level classification. Treatment options were obtained from the literature [4,11] and outlined in Table 2. We then placed the defined entities within the Orphanet classification and hierarchy below the pre-existing entry for congenital myasthenic syndrome, modifying one existing class name and adding 10 group-level phenotypic classes at various levels of the hierarchy and 39 unique disease entities (Table 3).

Discussion
CMS is classed within the European Union as a rare disease (defined as one that affects fewer than 1 in 2000 individuals) and many of the individual CMS entities are ultra-rare. This has substantial implications for knowledge management, since while much highly expert knowledge on CMS does exist, in common with many other rare diseases this knowledge is often "siloed" in individual research or clinical databases in a few expert centers [12]. Academic publishing still largely relies on         "non-machine-readable" formats such as PDF and this again provides a barrier to easy access and reuse [13]. This means that not only do fewer clinicians who encounter CMS patients have the relevant experience themselves, but it is also more challenging and time-consuming for them to locate the information they need. Clinical, genetic and scientific experts in CMS have come together periodically to review and update classifications of the disease at workshops hosted by the European Neuromuscular Centre [14][15][16], in NCBI's GeneReviews series [17] and several comprehensive recent review publications [3,4,18]. Broad classifications of CMS into presynaptic, synaptic and postsynaptic CMS and CMS with glycosylation defect were originally proposed in 2001 [15], but it is only with the very latest update to the International Classification of Disease (ICD), Revision 11 [19], that these subgroups even receive a mention (without, however, being allocated a classification number). Meanwhile, as the number and variety of CMS disease entities published in the literature has increased, expert-proposed groupings have been extended to include a new group containing defects of endplate development and maintenance [18]. However, the expert reviews have not attempted any standardization of nomenclature in the coding systems, and at the most granular level, individual "atomic" disease entities or subtypes are conspicuous in their absence from all the coding systems except the Online Mendelian Inheritance in Man (OMIM) database [20]. OMIM itself has good (although not entirely comprehensive) coverage of the individual disease entities, each represented by a "phenotype MIM number" and a sequentially numbered name, and is recognized as the authoritative reference for genetic disorders, but is not itself a nosology or ontology but rather a catalogue, which is thus complementary to (and mappable to) the classification we create here.
To counter the problem of lack of representation of rare disease entities in knowledge systems, bringing data science approaches into the clinical domain has been the focus of a number of recent activities at the European and international level, including the Global Alliance for Genomics and Health (GA4GH) [21], the European Open Science Cloud [22], Big Data to Knowledge (BD2K) [23], the Monarch Initiative [24], GO-FAIR [25], RD-Connect [26] and the new European Joint Programme for Rare Disease to be launched in 2019. Making use of ontologies and coding systems when capturing clinical information and diagnoses is a key step in preparing data for reanalysis and machine-readability [27], but in order for this to be of benefit, the coding system must be fit for purposewhich means it must contain the relevant items in the correct relative positions and at appropriate levels of granularity. If this is not the case, data cannot be appropriately connected or connections may produce misleading results. For example, to a clinician familiar with CMS, it goes without saying that the connection between "congenital myasthenic syndrome" and "responsive to pyridostigmine" is true for CMS caused by RAPSN defects and false for that caused by defects in DOK7, but a database that only contains an entry for "congenital myasthenic syndrome" has no way of making that distinction. The result of this is that the specific knowledge that is so familiar to the disease experts cannot easily gain wider currency by being made part of online databases or clinical decision support systems, and furthermore the evidence gathered in a clinical setting in support of particular interventions or particular phenotypic associations cannot be fed back into wider practice by from medical or prescribing records, for example.
Of course, no classification in such a rapidly evolving and heterogeneous field can ever be completely comprehensive, and there are always areas where different decisions could be made, such as about the level of granularity or the range of conditions to include. Our inclusion criteria were based primarily on clinical and phenotypic presentation together with some pathomechanistic insights, while a purely gene-based approach might have produced a classification not exclusively including CMS presentations but also kidney or skin disease presentations caused by different defects in the same genes. In addition, there are other neuromuscular conditions that do have detectable morphological and functional disturbances of the neuromuscular junction, but where these are considered to be secondary to the primary pathology or of minor clinical relevance as compared to the primary clinical manifestation (e.g. spinal muscular atrophy or myotubular myopathy). These conditions are classified in different systems and do not appear in our CMS classification. However, from a data science perspective, the choice of what to include or exclude can indeed be left to expert opinion and is of secondary importance compared to the depth and detail of what is covered, and crucially, its internal logic and relationships with other entities and other classification systems [27]. The CMS entities that we have defined fit perfectly as subclasses within the broader coding systems like ICD and SNOMED-CT and map at a 1:1 level to the phenotype MIM numbers where these exist (see Table 2). They can be grouped into preexisting etiological groups such as pre-and post-synaptic (Table 3), and are amenable to multiple other functional, phenotypic and therapeutic groupings as appropriate ("responsive to acetylcholinesterase inhibitors", "with limb-girdle phenotype", "associated with episodic apnea" or "characterized by tubular aggregates", for example). To take full advantage of the classification developed here, it will be necessary that these next steps are taken, since the development of classification systems, even with names that aim to have some clinical relevance, is of limited diagnostic or therapeutic value in itself. Rather, it should be thought of as the essential foundation onto which more precise clinical and diagnostic pictures of each disease entity can be built, and it is this systematization of knowledge that can then be brought back into the diagnostic and clinical arena to result in improved diagnostic algorithms and clinical information systems. One future development well supported by Orphanet that is a logical extension of the classification to allow improved diagnostic algorithms is the mapping of entities from the classification to their individual phenotypic features using appropriate phenotypic descriptors from ontologies such as the Human Phenotype Ontology [28]. This creates a matrix of detailed information about each disease entity in both computer-accessible and human-readable formats, and is something that can now be achieved for CMS by a similar consensus process. In addition, since many CMSs are treatable, but the treatment varies by type, we can use the classification to differentiate treatments by type as shown in Table 3 and also now have the opportunity to take this further in a machine-readable manner through the development of pharmacogenomic algorithms that give clinicians easier access to specific treatment recommendations once a particular CMS type has been identified. Furthermore, although NGS techniques have still not solved every CMS case, as science advances, we can expect that new genetic defects will be uncovered that account for some of the remaining undiagnosed congenital myasthenic syndromes, and we have thus ensured that this present classification can easily be extended with new entities.

Conclusions
Knowledge about the full range, etiology and heterogeneity of the congenital myasthenic syndromes has increased rapidly in the NGS era. These diseases present specific challenges owing to their rarity and heterogeneity but also possess certain featuresnot the least of which is responsiveness to treatmentthat make their unambiguous differentiation worthwhile. The benefits of developing a fully granular classification for this group of conditions are thus not purely academic. Although not designed as a diagnostic tool, the detailed classification in a single system of each individual CMS with a defect of neuromuscular transmission as the primary feature provides clinicians and geneticists with an overview of the currently recognized congenital myasthenic syndromes both as individual entities and as logical groupings and this can provide guidance towards the differential diagnoses for a patient with a broad CMS phenotypic presentation. Making use of an unambiguous clinically understandable descriptive name assists in the clinical differentiation of the different diseases, particularly by clinicians less familiar with these rare conditions, while attaching the descriptive name to a code within a recognized coding system enables existing knowledge to be better systematized, thus paving the way towards computer-aided clinical systems and machine-learning algorithms suitable for the NGS era. Through this collaboration between clinical experts and data science experts, we have shown that data science approaches can be used effectively in the clinical domain in a way that does not disrupt preexisting classification by experts and that enhances the utility of preexisting coding systems, building on both to create a more comprehensive result. The classification we have defined can be used in clinical administration systems as an integral part of the Orphanet nomenclature and can be used in scientific publications and clinical case reports to unambiguously define the CMS type in question. It can be extended and modified as required by future scientific advances, but already provides the starting point for the creation of FAIR knowledge bases of data related to the congenital myasthenic syndromes.