Use of animal models for exome prioritization of rare disease genes
- Damian Smedley1,
- Sebastian Kohler2, 3,
- William Bone4,
- Anika Oellrich1,
- Jules Jacobsen1,
- Sanger Mouse Genetics Project1,
- Kai Wang5,
- Chris Mungall6,
- Nicole Washington6,
- Sebastian Bauer2, 3,
- Dominic Seelow7,
- Peter Krawitz2, 3, 8,
- Cornelius Boerkel4,
- Christian Gilissen9,
- Melissa Haendel10,
- Suzanna E Lewis5 and
- Peter N Robinson2, 3, 7
© Smedley et al; licensee BioMed Central Ltd. 2014
Published: 11 November 2014
Over 100 disease-gene associations have been identified by whole-exome sequencing since the first reports in 2010, leading to a revolution in rare disease-gene discovery [1, 2]. However, many cases remain unsolved due to the fact that ~100-1000 loss of function, candidate variants remain after removing those deemed as common, low quality or non-pathogenic. In some cases it may be possible to use multiple affected individuals, linkage data, identity-by-descent inference, identification of de novo heterozygous mutations from trio analysis, or prior knowledge of affected pathways to narrow down to the causative variant . Where this is not possible or successful, one approach is to use phenotype data to evaluate whether a particular candidate variant is likely to result in the patient’s clinical manifestations.
Model organism phenotype data represents a highly pertinent but under-utilised resource for such disease gene discovery. Whilst some 1800 human genes were associated with human phenotype ontology annotations (HPO) at the time of publication, a further 5700 genes have been shown to have phenotype data available from mouse and zebrafish model organism databases . We have previously developed algorithmic approaches to semantically compare disease phenotypes with mouse and zebrafish phenotypes for disease candidate gene identification [5–8].
We have previously reported that comparisons to mouse phenotype data can dramatically increase the performance of exome analysis prioritization . In the work presented here we combine the comparison of patient phenotypes to known disease as well as mouse and zebrafish phenotypes for each candidate variant in the exome. Where phenotype data is not available for a candidate we use proximity in protein-protein networks to genes with phenotype data to inform on candidacy based on guilt-by-association. The output is combined with measures of variant candidacy such as pathogenicity and allele frequency and synergistically improves performance: the causative variant is identified as the top hit in up to 96% of exomes for known associations and 49% of exomes for previously undescribed associations.
Our software, Exomiser, is openly available to use at our website [http://www.sanger.ac.uk/resources/databases/exomiser/query] and for download to perform local analysis. We are currently collaborating with the NIH Undiagnosed Disease Program to achieve diagnosis of problematic cases through exome analysis. In conclusion, our results clearly show the value of collecting comprehensive clinical phenotype data for translational bioinformatics and future work will focus on producing a robust solution for clinical diagnostics.
This work was supported by grants from the Deutsche Forschungsgemeinschaft (DFG RO 2005/4-1), the Bundesministerium für Bildung und Forschung (BMBF project number 0313911), core infrastructure funding from the Wellcome Trust, NIH 1R24OD011883-01, and by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under contract no. DE-AC02-05CH11231.
- Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA: Exome sequencing identifies the cause of a Mendelian disorder. Nat Genet. 2010, 42: 30-35. 10.1038/ng.499.PubMed CentralView ArticlePubMedGoogle Scholar
- Rabbani B, Mahdieh N, Hosomichi K, Nakaoka H, Inoue I: Next-generation sequencing: Impact of exome sequencing in characterizing Mendelian disorders. J Hum Genet. 2012, 57: 621-632. 10.1038/jhg.2012.91.View ArticlePubMedGoogle Scholar
- Robinson PN, Krawitz P, Mundlos S: Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clin Genet. 2011, 80: 127-132. 10.1111/j.1399-0004.2011.01713.x.View ArticlePubMedGoogle Scholar
- Doelken SC, Köhler S, Mungall CJ, Gkoutos GV, Ruef BJ, Smith C, Smedley D, Bauer S, Klopocki E, Schofield PN: Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish. Dis Model Mech. 2013, 6: 358-372. 10.1242/dmm.010322.PubMed CentralView ArticlePubMedGoogle Scholar
- Smedley D, Oellrich A, Köhler S, Ruef B, Sanger Mouse Genetics Project, Westerfield M, Robinson P, Lewis S, Mungall C: PhenoDigm: Analyzing curated annotations to associate animal models with human diseases. Database (Oxford). 2013, bat025Google Scholar
- Chen CK, Mungall CJ, Gkoutos GV, Doelken SC, Köhler S, Ruef BJ, Smith C, Westerfield M, Robinson PN, Lewis SE, Schofield PN, Smedley D: MouseFinder: Candidate disease genes from mouse phenotype data. Hum Mutat. 2012, 33 (5): 858-66. 10.1002/humu.22051.PubMed CentralView ArticlePubMedGoogle Scholar
- Mungall CJ, Gkoutos GV, Smith CL, Haendel MA, Lewis SE, Ashburner M: Integrating phenotype ontologies across multiple species. Genome Biol. 2010, 11 (1): R2-10.1186/gb-2010-11-1-r2.PubMed CentralView ArticlePubMedGoogle Scholar
- Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, Lewis SE: Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol. 2009, 7 (11): e1000247-10.1371/journal.pbio.1000247.PubMed CentralView ArticlePubMedGoogle Scholar
- Robinson PN, Köhler S, Oellrich A, Sanger Mouse Genetics Project, Wang K, Mungall CJ, Lewis SE, Washington N, Bauer S, Seelow D, Krawitz P, Gilissen C, Haendel M, Smedley D: Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 2014, 24 (2): 340-8. 10.1101/gr.160325.113.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.