The mutational and phenotypic spectrum of TUBA1A-associated tubulinopathy

Background The TUBA1A-associated tubulinopathy is clinically heterogeneous with brain malformations, microcephaly, developmental delay and epilepsy being the main clinical features. It is an autosomal dominant disorder mostly caused by de novo variants in TUBA1A. Results In three individuals with developmental delay we identified heterozygous de novo missense variants in TUBA1A using exome sequencing. While the c.1307G > A, p.(Gly436Asp) variant was novel, the two variants c.518C > T, p.(Pro173Leu) and c.641G > A, p.(Arg214His) were previously described. We compared the variable phenotype observed in these individuals with a carefully conducted review of the current literature and identified 166 individuals, 146 born and 20 fetuses with a TUBA1A variant. In 107 cases with available clinical information we standardized the reported phenotypes according to the Human Phenotype Ontology. The most commonly reported features were developmental delay (98%), anomalies of the corpus callosum (96%), microcephaly (76%) and lissencephaly (agyria-pachygyria) (70%), although reporting was incomplete in the different studies. We identified a total of 121 specific variants, including 15 recurrent ones. Missense variants cluster in the C-terminal region around the most commonly affected amino acid position Arg402 (13.3%). In a three-dimensional protein model, 38.6% of all disease-causing variants including those in the C-terminal region are predicted to affect the binding of microtubule-associated proteins or motor proteins. Genotype-phenotype analysis for recurrent variants showed an overrepresentation of certain clinical features. However, individuals with these variants are often reported in the same publication. Conclusions With 166 individuals, we present the most comprehensive phenotypic and genotypic standardized synopsis for clinical interpretation of TUBA1A variants. Despite this considerable number, a detailed genotype-phenotype characterization is limited by large inter-study variability in reporting. Electronic supplementary material The online version of this article (10.1186/s13023-019-1020-x) contains supplementary material, which is available to authorized users.


Introduction
The superfamily of tubulin genes is composed of alpha-, beta-, gamma-, delta-and epsilon families. The alpha and beta families, consisting of at least 15 alpha and 21 beta-tubulin genes, respectively [1], encode tubulin proteins which form heterodimers as fundamental components of microtubules [2]. Along with microtubule associated proteins (MAPs) and motor proteins on the external surface, tubulin proteins participate in substantial cellular processes of intracellular transport, cell division and neuronal migration [3,4].
Using exome analysis in three unrelated individuals with severe developmental delay we identified three heterozygous de novo missense variants in the TUBA1A gene. We extensively reviewed and systematically reanalyzed available public data to provide a standardized synopsis of described variants together with reported neuroradiological and clinical features of TUBA1A-associated tubulinopathy. We used this comprehensive information to perform a detailed analysis of the genotypic and phenotypic spectrum highlighting a possible genotype-phenotype relationship and probable bias in reporting.

Clinical reports of 3 novel cases
For the purpose of this study the clinical course of three individuals, who presented between 1999 and 2016 at our Center of Developmental Neurology and Social Pediatrics for investigation of the etiology of developmental delay, was retrospectively summarized after pathogenic missense variants in TUBA1A had been identified. In summary, we present novel clinical data for two boys aged 13 years 7 months (individual i084n) and 11 years 6 months (individual i085n) and a 9 years 3 months old girl (individual i086n) with global developmental delay and neuroradiological abnormalities due to TUBA1A-associated tubulinopathy. The identification of the TUBA1A variant in the girl was part of a previous publication without detailed clinical description (reported as ID S_006) [16]. Narrative case reports with representative MRI planes for all three individuals and facial phenotype pictures for i086n (Additional file 1: Figure S1-S3) are provided in the Supplementary notes.

Exome sequencing
Informed written consent was obtained for all participants. The study was approved by the Ethical Committee of the Medical Faculty of the Friedrich-Alexander-Universität Erlangen-Nürnberg. DNA from peripheral blood lymphocytes was extracted using standard methods. Exome sequencing was performed after SureSelect v5 (i085n, i086n) and v6 (i084n) targeted capturing on HiSeq 2500 for i084n and i085n (Trio analysis [17]) and i086n (Exome Pool-Seq [16]). After mapping of sequence reads to the GRCh37/ hg19 reference genome and variant calling using standard methods for the trio analysis [17] or as described by Popp et al. for the exome Pool-Seq [16], variants in coding regions including splice sites were selected based on population frequency (gnomAD) and computational prediction scores, e.g. CADD score [18]. Variants were confirmed, and segregation tested by Sanger sequencing.

Review of reported TUBA1A cases from literature and databases
We identified 112 articles, published between 01/2007 and 06/2018, from PubMed applying the search term "TUBA1A". Of these, 28 provided clinical reports and were thus included in this study. All available clinical data was standardized in accordance to terms of the Human Phenotype Ontology (HPO) [19]. In contrast to a previously established classification combining cortical and subcortical features like "classic lissencephaly", "lissencephaly with cerebellar hypoplasia", "lissencephaly with agenesis of the corpus callosum" and "centrally predominant pachygyria" [12,20], we analyzed the features independently. If only the classification was mentioned, we used the independent underlying features where HPO terms were available (e.g. "microlissencephaly": microcephaly HP:0000252 + agyria HP:0031882). Nevertheless, we kept composite terms typically used together in the literature such as "agyria-pachygyria" (HP:0031882, HP:0001302) if they affected the same brain structure. Data assessment comprised 11 neuroradiological features, including anomalies of cortical gyration, corpus callosum, brainstem, basal ganglia, internal capsule, cerebellum, cerebellar vermis, hippocampus, ventricular dilatation, 4th ventricle dilatation, grey matter heterotopia, and other radiological findings. Clinical features included congenital microcephaly, microcephaly, developmental delay, epilepsy, neuro-ophthalmological findings including strabismus and nystagmus, other neurological symptoms including spasticity and muscular hypotonia, and additional features (HPO terms shown in Tables 1 and 2,  Additional file 2).
We further included available likely pathogenic or pathogenic variants from ClinVar [21], denovo-db [22] and DECIPHER [23]. As phenotype information was insufficient in most of these database cases, only variant information was included.
All variants were harmonized to the NM_006009.3 transcript of the GRCh37/hg19 human reference genome based on Human Genome Variation Society (HGVS) recommendations using the Mutalyzer [24] web services. To ensure consistency in the clinical interpretation we independently applied the American College of Medical Genetics and Genomics (ACMG) criteria [25] to all variants with the WGLAB InterVar-tool [26].
Protein structure analysis of the tubulin alpha-1A variants Using R and ggplot2 [27] we analyzed spatial distribution of all variants in the linear gene model to provide an insight into the variant distribution. Utilizing Pymol Anaconda Inc.) publicly available tertiary protein structure data of TUBA1A (PDB-ID: J5CO [28]) was used to classify variants in different groups of potential functional effects as suggested previously [29]. This classification is based on the interaction of the tubulin monomer with neighboring tubulin proteins within the polymer (heterodimer, protofilament, microtubule), with MAPs, or motor proteins. While functional evidence was present only for a minority of the variants [5,30], most mutational effects are based on localization-dependent predictions. As a template we used 51 already classified TUBA1A variants [31] likely affecting the binding of microtubule associated proteins ("MAP binding") or motor proteins, the tubulin folding ("Tubulin folding"), heterodimer and microtubule stability ("Intradimer interaction" and "Longitudinal interaction") the formation of the hollow tubular structure of the microtubule ("Lateral interaction") [32,33] or microtubule dynamics, protein folding and heterodimer stability ("GTP [Guanosintriphosphat] binding") [29,32]. The specific detrimental effect of variants facing the luminal protein surface ("Lumen facing") is currently unknown.

Computational analyses of TUBA1A missense variant spectrum
We here analyzed the ability of six different computational classifiers (three ensemble scores: CADD, M-CAP, REVEL and the three commonly used scores Polyphen-2, SIFT,  [27] collection were used for plotting and analysis of this variant data provided in the Additional file 2. To analyze possible mutational hotspots, we generated density plots of pathogenic missense variant frequencies reported in the literature and missense variants reported in controls from gnomAD with the "geom_density" function ("adjust" parameter set to 1/4) in ggplot2. To analyze protein regions of higher conservation we plotted all missense variants sorted by amino-acid position with each respective computational score and fitted generalized additive models using the "geom_smooth" function in ggplot2 to produce a smoothed line. Additionally, variants and scores were plotted as scatter and violin plots and two-sided Wilcoxon signed-rank test from the ggsignif package was used to determine whether there was a statistically significant difference between four different missense variant groups ("clinical review", "database", "gnomAD", "simulated"). "Clinical review" included variants from individuals with available phenotype information from our literature review and the three cases reported here, "database" included (likely) pathogenic variants from databases like ClinVar without clinical information, "gnomAD" included all variants present in healthy controls without neurodevelopmental disorders from the gnomAD database, and "simulated" all other possible missense variants in TUBA1A.

Analysis of genotype-phenotype relation
We used the curated set of clinical information and corresponding harmonized variant information to analyze a possible genotype-phenotype relationship by comparing the radiological and clinical features with variant characteristics. We visualized and structured the acquired categorical data into a grid plot using ggplot2 and the tidyverse [27] package for hypothesis formation. Based on this presentation we used the vcd package [38] to analyze the relationship between variant characteristics and clinical data of the individuals by generating mosaic or association plots. As many values in the resulting contingency tables contained values below five, we estimated p-values using a two-sided Fisher's exact test with the "simulate.p.value" setting based on 2000 replicates in R. One-letter amino-acid nomenclature is used in the resulting plots because of space constrains.  [68]. Annotation of variants on the linear gene model revealed that variants were distributed all over the TUBA1A gene with a statistically significant clustering around the Arg402 residue in exon 4 in the C-terminal domain. This cluster correlates with high computational prediction scores for missense variants (Fig. 1a, b and c; Additional file 1: Figure S4). Variants in the linear C-terminal region predominantly affect the binding of MAPs or motor proteins. Strikingly, computational scores for the different missense variant groups ("clinical review", "database", "gnomAD", "simulated") mostly showed no significant difference ( Fig. 1d; Additional file 1: Figure S4). After mapping of the amino acid residues on the 3D protein structure, we observed that most unique variants in "clinical review" (n = 121) are predicted to compromise tubulin folding (34.7%) or possibly affecting the interaction with MAPs or motor proteins, such as kinesins and dyneins (24.8%) (Fig. 2). A minority of variants is predicted to affect longitudinal (8.3%), lateral (8.3%) and intradimer (7.4%) interactions, respectively. Finally, 14% of variants are lumen facing and only 2.5% likely affect GTP binding. Considering all assembled variants including the recurrent ones (n = 166), the majority (38.6%) is predicted to impair the interaction of MAPs or motor proteins. Of these, 22 affect the Arg402 position. Variants identified in the three individuals i084n, i085n, i086n described here are predicted to affect tubulin folding (c.518C > T, p.    1 Distribution and computational scores of TUBA1A variants. a TUBA1A domains and localization of variants (missense variants in red, truncating variants in black). Variants above protein scheme are from published data in PubMed, below from databases (ClinVar, DECIPHER, denovo-db). Variants reported ≥3 times (green) and from the cases reported here (blue). While the size of the circle is proportional to the reported frequency, the height is proportional to the CADD-score. b Density plot of all missense variants (pathogenic in red, present in gnomAD in blue). The dashed highlighted grey box indicates the region around Arg402 with significant clustering of pathogenic variants (see Additional file 1: Figure S5). c Generalized additive models of the CADD, M-CAP and REVEL scores for all possible missense variants (see also Additional file 1: Figure S4 A). d Violin-and scatter-plot comparing the three computational scores for missense variants found in two clinical groups of individuals ("clinical review": 104 cases from literature review and the three cases reported here for a total of 62 distinct variants; "database": 59 individuals from ClinVar, denovo-db and DECIPHER for a total of 59 variants), healthy controls ("gnomAD": 9 variants) and all other possible missense variants ("simulated": 2841 variants) (see also Additional file 1: Figure S4 B) Fig. 2 Mapping of reported variants onto 3D structure of tubulin alpha-1A. a TUBA1A (light blue) monomer in the center surrounded by TUBA1A monomers to the lateral sides and TUBB3 monomers to the longitudinal sides (transparent surfaces). The TUBA1A (light blue) -TUBB3 (grey) heterodimer is highlighted and shown in ribbon representation (based on PDB: 5JCO [28]). Exemplary for a motor protein KIF1A (green; PDB: 2HXF [73]) is shown interacting on the external surface. Mutated residues are shown in spheres and likely affect the binding of MAPs or motor proteins (red), tubulin folding (black), intradimer interactions (yellow), longitudinal interactions (magenta), lateral interactions (green) or GTPbinding pocket (beige). Variants on the luminal side are shown in blue. A cross section and longitudinal view of a microtubule [74] is provided for orientation. b Close-up view of the central TUBA1A monomer and c lateral-view with TUBB3 removed from the dimer. The GTP molecule (beige), required for polymerization, is presented in stick representation. Variants identified in the three individuals i084n (P173L), i085n (G436D), i086n (R214H) described here affect tubulin folding, MAP binding and intradimer interaction, respectively. d Simplified representation of TUBA1A and KIF1A with protein surface and spheres removed. The amino acid residue R402 (red stick representation) of TUBA1A is localized near the KIF1A protein, in particular to the amino acid residue K280 (minimal distance 1.9 Å; green stick representation)

Relation between genotype and phenotype
We used the clinical information of the 104 individuals from the "clinical review" group and the herein described three patients (total n = 107) to analyze a possible relationship between genotype and phenotype. Individuals with recurrent variants, mostly affecting MAP or motor protein binding, show a similar phenotype combination in the matrix plot ( Fig. 3a; see also Additional file 1: Figure S7). Patients with the missense variant p.(Arg402Cys) are mostly described with a cortical-gyration pattern of agyria-pachgyria ("Ag-Pg"), dysplastic corpus callosum ("D"), a cerebellar vermis hypoplasia ("H") and have no information reported for the brainstem.
Because prenatally diagnosed fetal cases show a more severe phenotype than born individuals, we analyzed a possible contribution of variant characteristics to this observation. The missense variants reported in fetuses and in born individuals showed no significant difference in structural classification (Fig. 3b) and the computational scores did not significantly differ in these two groups (Additional file 1: Figure S6). In addition, the structural groups of missense variants were not overrepresented in females or males and the gender was also not associated with prenatal diagnosis (Fig. 3b).
The visual inspection of the matrix plot ( Fig. 3a; Additional file 1: Figure S7) indicated that certain clinical features are enriched in individuals with recurrent variants. Indeed, explorative comparison between missense variants at recurrent and non-recurrent positions confirmed differences in reported clinical features of the individuals carrying these missense variants ( Fig. 3c; Additional file 1: Figure S8). Despite our effort to collect all variants and clinical information described for TUBA1A-associated tubulinopathy, we did not obtain enough data to further analyze the phenotype differences for these variants.
Finally, we observed that individuals with the same variants and similar phenotypes were often reported together (e.g. Fig.3a "+" symbol for Arg402Cys reported 5 times in PMID:20466733 [30]). Regarding this observation, we found a significant difference in the use of clinical descriptions in publications describing multiple individuals. Kumar Figure S9).

Discussion
In this study, we identified three de novo missense variants in TUBA1A in three individuals with global developmental delay and brain malformations. Since the first identification of disease-causing variants in TUBA1A in 2007 in two affected individuals with cortical dysgenesis [5], at least 121 distinct heterozygous variants in a total of at least 166 patients including our 3 affected individuals are now described. Our efforts to systematically reanalyze published data enabled insights into the current state of information about TUBA1A-associated tubulinopathy.
Anomalies of the corpus callosum ranging from partial to complete agenesis or hypoplasia are with 96.2% (102/ 106) the predominantly reported feature of TUBA1Aassociated tubulinopathy. Cortical anomalies are the second leading clinical feature reported in 95/96 individuals (99.0%) followed by dysgenesis of the basal ganglia in 58/59 (98.3%). Two of the herein described individuals (i084n, i085n) also presented these features. Individual i086n had complete agenesis corpus callosum and additionally manifested unilateral optic nerve hypoplasia, a feature linked to TUBA8-associated tubulinopathy [6] but also described in individuals with TUBB2B [69] and TUBB3 [9] variants and present in 7/35 (20.0%) of individuals with TUBA1A-associated tubulinopathy.
Analysis of the type and localization of all possible 2969 missense variants from the simulation showed that the large majority of TUBA1A missense variants are predicted to be deleterious (CADD ≥20: 84.2%, M-CAP ≥0,025: 98.0%, REVEL ≥0,5: 78.8%; Fig. 1d). This is in agreement with an ExAC Z-score [35] of 6.23, confirming that TUBA1A is extremely depleted of missense variants in the general population. This resulted in high computational prediction scores independent of causality. Thus, variants might be reported to be likely pathogenic or pathogenic (ACMG class 4 or 5) despite relatively low computational scores and variants found in healthy controls might have scores above the recommended respective thresholds (Fig. 1d). After analyzing the relation of three ensemble computational prediction scores and expected pathogenicity, we concluded that computational prediction scores are of limited utility for predicting pathogenicity in TUBA1A. We suggest that segregation with the disease in the family or de novo occurrence, two major criteria of the ACMG guidelines for variant interpretation, are more appropriate for variant classification.
Based on the observation of the mutational distribution we analyzed a possible relationship between genotype and phenotype. We observed clustering of disease-causing variants in the region around the amino acid residue Arg402 (Fig. 1a, b and c, Additional file 1: Figure S2). The residue Arg402 is located in the interaction site of various MAPs or motor proteins [30]  Different symbols indicate the PubMed identifier (PMID) of publications describing ≥5 individuals (legend 2). Individuals described here or in the literature are sorted on the x-axis by variant functional class, localization and publication. On the y-axis phenotype categories with at least 60% data availability are presented (see also Additional file 1: Figure S7). Grey highlighted boxes indicate variants at the same amino acid position (also labeled) and boxes with dashed lines indicate related individuals with the same variant. b Mosaic plots showing the relations between individual groups (fetuses, born), variant structural function (MAP_binding = "MB", Tubulin_folding = "TF", Lumen_facing = "LF", Intradimer_interaction = "II", Longitudinal_interaction = "LoI", Lateral_interaction = "LaI", GTP_binding = "GB") and sex of the individual (female = f, male = m). c Association plot showing the relation between recurrently affected amino-acid positions (recurrent_AA) and the neuroradiological feature of cortical gyration (pachygyria = "Pg", polymicrogyria = "PMG", perisylvian polymicrogyria = "PsPMG", cortical gyral simplification = "CgS", agyria = "Ag", other = "O", absent = "a"). This example (see Additional file 1: Figure S8) indicates a possible genotype-phenotype correlation for certain recurrent variants. d Association plot showing the relation between publications describing ≥5 individuals ("pubmed_ID") and the neuroradiological feature of cortical gyration. This example (see also Additional file 1: Figure S9) indicates a probable reporting bias for this clinical feature. Two-sided Fisher's exact test has been used to estimate the presented p-values which are involved in different processes including the polymerization and stabilization of microtubules and intracellular vesicle transport [70]. Defects in some MAPs or motor proteins result in a similar clinical spectrum as observed for specific MAP-associated TUBA1A variants [30,71]. Overall, variants of the Arg402 residue and other specific recurrent variants, which are predominantly MAP or motor protein interacting, were previously associated with overlapping neuro-radiological features [13,30]. Indeed, we could show a non-uniform distribution for reported clinical features and the recurrent variants (Fig. 3c), indicating a possible genotype-phenotype relation. This observation might in part be attributed to detailed structured morphological categorization of brain anomalies used by different authors and individual preferences for certain terms. In addition, difficulties in the interpretation of the radiographic cortical and subcortical anomalies or technical differences in brain imaging could represent a possible confounder. Of note, recurrent variants with similar phenotype combinations were often reported by the same authors indicating a possible observational bias (Fig. 3d), thus limiting the interpretation of these genotype-phenotype relations. Another problem hindering a more detailed investigation is the high degree of missing data we recognized for several phenotypic categories. The directed acyclic graphs structure of HPO allows grouping of specialized terms into less specialized parent terms. Future development of algorithms comparing the phenotypic similarity between groups of individuals with the same or functionally similar pathogenic variants might alleviate some of these problems and allow further characterization of variant specific phenotypes. However, some of these endeavors could be hampered by the difficulty to distinguish between missing information and normal phenotype in published reports. This is especially problematic as HPO describes "phenotype abnormalities" but has no terms for normal phenotypes. We propose standardization in clinical reporting of rare disease cases based on expert recommendations with a minimal scheme covering disease specific phenotypes.
Even though TUBA1A-associated tubulinopathy is the most common tubulinopathy form, our results indicate that more clinical and mutational information is necessary to evaluate a potential genotype-phenotype correlation. This became apparent in fetuses, where we and others observed the most severe phenotypic spectrum compared to born cases [13,20]. This could not be explained by specific properties of the identified variants (Fig. 3b, Additional file 1: Figure S6). We therefore propose that additional variants in other genes or random developmental processes in cellular pathways in the respective individuals are underlying the phenotypic variability. Genome wide and functional studies might help to allow further characterization into specific clinical groups.

Conclusion
Our systematic reanalysis of published clinical data allowed an explorative investigation of a genotype-phenotype relationship. We found an enrichment of specific radiological features in recurrent variants; however, insufficient data availability, data variability and a possible observer bias were limiting factors for possible associations. A thoroughly conducted clinical examination and the standardized reporting of phenotype and genotype information in online databases, e.g. ClinVar [21] and LOVD [72] are fundamental for the systematic analysis of rare diseases such as TUBA1A-associated tubulinopathy.

Additional files
Additional file 1: Figure S1. Cranial MRI planes of individual i084n. Figure S2. Cranial MRI planes of individual i085n. Figure S3. Cranial MRI planes and clinical pictures of individual i086n. Figure S4. Additional computational scores for TUBA1A variants. Figure S5. Analysis of the variant cluster around amino acid position 400. Figure S6. Comparison of computational scores for TUBA1A variants identified in fetuses and born individuals. Figure S7. Matrix plot of all HPO phenotype categories. Figure S8. Association plots for recurrently affected amino-acid positions and all neuroradiological features. Figure S9. Association plots publications describing ≥ 5 individuals and all neuroradiological features. Table S1. Barthel Index of Activities of Daily Living [3]