Development and application of a next-generation-sequencing (NGS) approach to detect known and novel gene defects underlying retinal diseases

Background Inherited retinal disorders are clinically and genetically heterogeneous with more than 150 gene defects accounting for the diversity of disease phenotypes. So far, mutation detection was mainly performed by APEX technology and direct Sanger sequencing of known genes. However, these methods are time consuming, expensive and unable to provide a result if the patient carries a new gene mutation. In addition, multiplicity of phenotypes associated with the same gene defect may be overlooked. Methods To overcome these challenges, we designed an exon sequencing array to target 254 known and candidate genes using Agilent capture. Subsequently, 20 DNA samples from 17 different families, including four patients with known mutations were sequenced using Illumina Genome Analyzer IIx next-generation-sequencing (NGS) platform. Different filtering approaches were applied to identify the genetic defect. The most likely disease causing variants were analyzed by Sanger sequencing. Co-segregation and sequencing analysis of control samples validated the pathogenicity of the observed variants. Results The phenotype of the patients included retinitis pigmentosa, congenital stationary night blindness, Best disease, early-onset cone dystrophy and Stargardt disease. In three of four control samples with known genotypes NGS detected the expected mutations. Three known and five novel mutations were identified in NR2E3, PRPF3, EYS, PRPF8, CRB1, TRPM1 and CACNA1F. One of the control samples with a known genotype belongs to a family with two clinical phenotypes (Best and CSNB), where a novel mutation was identified for CSNB. In six families the disease associated mutations were not found, indicating that novel gene defects remain to be identified. Conclusions In summary, this unbiased and time-efficient NGS approach allowed mutation detection in 75% of control cases and in 57% of test cases. Furthermore, it has the possibility of associating known gene defects with novel phenotypes and mode of inheritance.


Background
Inherited retinal disorders affect approximately 1 in 2000 individuals worldwide [1]. Symptoms and associated phenotypes are variable. In some groups the disease can be mild and stationary such as in congenital stationary night blindness (CSNB) or achromatopsia (ACHM), whereas other disorders are progressive leading to severe visual impairment such as in rod-cone dystrophies, also known as retinitis pigmentosa (RP) or cone and cone-rod dystrophies. The heterogeneity of these diseases is reflected in the number of underlying gene defects. To date more than 150 genes have been implicated in different forms of retinal disorders http://www.sph.uth.tmc.edu/Retnet/home.htm and yet in a significant proportion of patients the disease causing mutation could not be identified, suggesting additional novel genes that remain to be discovered. Furthermore, recent studies have outlined that distinct phenotypes can be related to the dysfunction of the same gene [2][3][4]. Furthermore, there may be additional phenotype-genotype associations that are still not recognized. The state-of-theart phenotypic characterization including precise family history and functional as well as structural assessment (i.e. routine ophthalmic examination, perimetry, color vision, full field and multifocal electroretinography (ERG), fundus autofluorescence (FAF) imaging and optical coherence tomography (OCT)) allows targeted mutation analysis for some disorders. However, in most cases of inherited retinal diseases, similar phenotypic features can be due to a large number of different gene defects.
Various methods can be used for the identification of the corresponding genetic defect. All these methods have advantages and disadvantages. Sanger sequencing is still the gold-standard in determining the gene defect, but due to the heterogeneity of the disorders it is time consuming and expensive to screen all known genes. Mutation detection by commercially available APEX genotyping microarrays (ASPER Ophthalmics, Estonia) [5,6] allows the detection of only known mutations. In addition, a separate microarray has been designed for each inheritance pattern, which tends to escalate the costs especially in simplex cases, for which inheritance pattern cannot be predetermined. Indirect methods with single nucleotide polymorphism (SNP) microarrays for linkage and homozygosity mapping are also powerful tools, which has proven its reliability in identifying novel and known gene defects [7][8][9][10][11][12]. However, in case of homozygosity mapping the method can only be applied to consanguineous families or inbred populations. To overcome these challenges, we designed a custom sequencing array in collaboration with a company (IntegraGen, Evry, France) to target all exons and part of flanking sequences for 254 known and candidate retinal genes. This array was subsequently applied through NGS to a cohort of 20 patients from 17 families with different inheritance pattern and clinical diagnosis including RP, CSNB, Best disease, earlyonset cone dystrophy and Stargardt disease.

Clinical investigation
The study protocol adhered to the tenets of the Declaration of Helsinki and was approved by the local Ethics Committee (CPP, Ile de France V). Informed written consent was obtained from each study participant. Index patients underwent full ophthalmic examination as described before [13]. Whenever available, blood samples from affected and unaffected family members were collected for co-segregation analysis.

Previous molecular genetic analysis
Total genomic DNA was extracted from peripheral blood leucocytes according to manufacturer's recommendations (Qiagen, Courtaboeuf, France). DNA samples from some patients with a diagnosis of RP were first analyzed and excluded for known mutations by applying commercially available microarray analysis (arRP and adRP ASPER Ophthalmics, Tartu, Estonia). In some cases, pathogenic variants in EYS, C2orf71, RHO, PRPF31, PRPH2 and RP1 were excluded by direct Sanger sequencing of the coding exonic and flanking intronic regions of the respective genes [13][14][15][16][17]. Conditions used to amplify PRPH2 can be provided on request.

Molecular genetic analysis using NGS
A custom-made SureSelect oligonucleotide probe library was designed to capture the exons of 254 genes for different retinal disorders and candidate genes according to Agilent's recommendations (Table 1). These genes include 177 known genes underlying retinal dysfunction (http://www.sph.uth.tmc.edu/retnet/sum-dis.htm, October 2010, Table 1) and 77 candidate genes associated with existing animal models and expression data ( Table  2). The eArray web-based probe design tool was used for this purpose https://earray.chem.agilent.com/earray. The following parameters were chosen for probe design: 120 bp length, 3× probe-tiling frequency, 20 bp overlap in restricted regions, which were identified by the implementation of eArray's RepeatMasker program. A total of 27,430 probes, covering 1177 Mb, were designed and synthesized by Agilent Technologies (Santa Clara, CA, USA). Sequence capture, enrichment, and elution were performed according to the manufacturer's instructions (SureSelect, Agilent). Briefly, three μg of each genomic DNA were fragmented by sonication and purified to yield fragments of 150-200 bps. Paired-end adaptor oligonucleotides from Illumina were ligated on repaired DNA fragments, which were then purified and enriched by six PCR cycles. 500 ng of the purified libraries were hybridized to the SureSelect oligo probe capture library for 24 h. After hybridization, washing, and elution, the eluted fraction underwent 14 cycles of PCR-amplification. This was followed by purification and quantification by qPCR to obtain sufficient DNA template for downstream applications. Each eluted-enriched DNA sample was then sequenced on an Illumina GAIIx as paired-end 75 bp reads. Image analysis and base calling was performed using Illumina Real Time Analysis (RTA) Pipeline version 1.10 with default parameters. Sequence reads were aligned to the reference human genome (UCSC hg19) using commercially available software (CASAVA1.7, Illumina) and the ELANDv2 alignment algorithm. Sequence variation annotation was performed using the IntegraGen in-house pipeline, which consisted of gene annotation (RefSeq), detection of known polymorphisms (dbSNP 131, 1000 Genome) followed by mutation characterization (exonic, intronic,

Investigation of annotated sequencing data
We received the annotated sequencing data in the form of excel tables. On average 946 SNPs and 83 insertions and deletions were identified for each sample ( Figure 1). By using the filtering system, we first investigated variants (nonsense and missense mutations, intronic

PIAS3
Rod photoreceptor development [56] variants located +/-5 apart from exon), which were absent in dbSNP and NCBI databases http://ncbi.nlm. nih.gov/. In the absence of known gene defects or putative pathogenic variants (see below) in the first step, we selected known genes, which were previously clinically associated including variants present in dbSNP and NCBI databases ( Figure 1). Each predicted pathogenic variant was confirmed by Sanger sequencing.

Assessment of the pathogenicity of variants
Following criteria were applied to evaluate the pathogenic nature of novel variations identified by NGS: 1) stop/frameshift variants were considered as most likely to be disease causing; 2) co-segregation in the family; 3) absence in control samples; 4) for missense mutations amino acid conservation was studied in the UCSC Genome Browser http://genome.ucsc.edu/ across species from all different evolutionary branches. If the amino acid residue did not change it was considered as "highly conserved". If a different change was seen in fewer than five species and not in the primates then it was considered as "moderately conserved" and if a change was present in 5-7, it was considered as "weakly conserved", otherwise the amino acid residue was considered as "not conserved", 5) pathogenicity predictions with bioinformatic tools (

Validation of the novel genetic testing tool for retinal disorders
To validate the novel genetic testing tool for retinal disorders, we used four DNA samples from families, in which we had previously identified different types of mutations by Sanger sequencing: one 1 bp duplication and one 1 bp deletion in PRPF31 and missense mutations in TRPM1 and BEST1 ( Table 3). Three of the four mutations were detectable by NGS, whereas the deletion in PRPF31 was not identified. To validate if this was due to a technical problem of deletion detection in general or low coverage at this position, the sequencing depth was investigated in detail. Indeed the coverage at this position reflected by the mean depth was only~1-6 for all samples. This indicates that although the coverage in Figure 1 Flow chart of variant analysis. IntegraGen provided the results in form of excel tables. For each sample on average 946 SNPs and 83 inDels were detected, of which 11 represent missense, nonsense or putative splice site mutations, which were absent in dbSNB, NCBI and 1000 genome databases. Of those 1-5 variants were predicted to be pathogenic. In case where none of the variants were predicted to be pathogenic, dbSNB, NCBI and 1000 genome databases were included to detect mutations referenced with an rs-number. Co-segregation analysis was performed in families with putative pathogenic variants.
general was very good, specific probes used here need to be redesigned to improve the capture for specific exons.

Detection of known and novel mutations
Some of the patients from the 14 families with no known gene defect were previously excluded for known mutations using microarray analysis and by Sanger sequencing in the known genes EYS, C2orf71, RHO, PRPF31, PRPH2 and RP1. Other samples were never genetically investigated. In four DNA samples known mutations were detected (Table 4) from three different families with autosomal dominant (ad) or recessive (ar) RP. All mutations co-segregated with the phenotype (Figure 2). In seven samples, novel mutations in known genes were identified. These mutations co-segregated with the phenotype from five different families with adCSNB, x-linked incomplete CSNB, adRP, arRP and xlinked RP (Table 5, Figures 3 and 4). One of the cases from these five families was also used as a control for Best disease carrying a known BEST1 mutation (Table  3). In addition to the Best phenotype, ERG-responses of this patient resembled those of complete CSNB, i.e. showing selective ON-bipolar pathway dysfunction. This phenotype was independent of the Best phenotype (Figure 3). The most likely disease causing mutation detected by NGS was a novel heterozygous TRPM1 mutation (Table 4, Figure 3).

Unsolved cases
In six of the 14 families with Stargardt disease, adRP, adCD with postreceptoral defects, arRP, early onset arCD with macrocephaly and mental retardation described in affected sister and x-linked cCSNB, the disease associated mutations remain to be elucidated or validated (Table 6, Figure 5).

Discussion
By using NGS in 254 known and candidate genes we were able to detect known and novel mutations in 57% of families tested. In order to achieve this goal, we applied a rigorous protocol (Figure 1). To our knowledge, this is the first report using NGS to investigate all inherited retinal disorders at once. In a study restricted to adRP, Bowne and co-workers used a similar approach including 46 known and candidate genes for adRP [18]. All their cases had previously been screened and excluded for most of the known genes underlying adRP. The authors were able to identify known or novel mutations in five out of 21 cases in genes not included in a pre-screening [18]. This added five patients to their  adRP cohort with known gene defects, indicating that 64% of their patients show known mutations with new genes still to be discovered in the remaining 36%. The current study provides a more exhaustive tool, since it incorporates screening of 254 genes implicated in various retinal disorders of different inheritance patterns and additional candidate genes for these phenotypes. With this approach a cohort of both pre-screened and unscreened samples, was investigated. The mutation detection rate of 57% is high and was never obtained before by high throughput screening methods. Furthermore, this approach is probably less time consuming and expensive than existing methods such as direct sequencing of all known genes or microarray analysis. Of note however is one of the variants detected with the NGS approach (i.e. p.V973L exchange in GUCY2D), which was not confirmed by direct Sanger sequencing, suggesting the possibility of false positive using the high throughput screening. Verification by direct Sanger sequencing of most likely pathogenic variants is therefore essential to validate NGS data, although the false positive rate is assumed to be low (in our study 1/28 verified sequence variants represented a false positive).
Overall, the study of 20 subjects from 17 families by NGS showed that most of the targeted regions are well covered (more than 98%). However, some of the regions showed a lower coverage (GC-rich regions) or were not captured (repetitive regions). This was for instance the case for two genes underlying cCSNB, (i.e. NYX and GRM6) and the repetitive region of ORF15 of RPGR. For GC-rich regions the capture design could be improved in the future by modifying NGS chemistry, as mutations in PRPF3, which co-segregates with the phenotype. Two family members never clinically investigated from the last generation (984 and 1167 carrying a question mark) were reported to be not affected but carried the mutation. They may develop the phenotype at a later stage. In addition variability of the phenotype of this mutation was documented [35]. Two patients, 128 and 943 of family 100 with arRP from Jewish origin revealed the known EYS mutation p.N137VfsX24, which was found in all screened affected family members. The index patient 893 of family 574 showed the previously described NR2E3 p.G56R mutation, which co-segregated with the phenotype.
it was successfully achieved for Sanger sequencing using different additives, which improved the amplification and subsequent sequencing. If repetitive regions like ORF15 of RPGR remain problematic for sequencing by NGS, direct Sanger sequencing of these targets might be the first screening of choice; in particular for disorders caused only by a few gene defects such as CSNB, and xl-RP. By applying NGS sequencing to our retinal panel, known and novel mutations were detected in different patients. We believe that our diagnostic tool is particularly important for heterogeneous disorders like RP, for which many gene defects with different prevalence have been associated to one phenotype. It also allows the rapid detection of novel mutations in minor genes which are often not screened as a priority by direct Sanger sequencing. This was the case in our study for three individuals from one family with adRP in which NGS detected a novel PRPF8 mutation in both affected and one unaffected family member (Table 4, Figure 4). In this family, the RP phenotype is mild and therefore it is possible that the unaffected member may develop symptoms later in life or alternatively it may be a case of incomplete penetrance as reported for another splicing factor gene, PRPF31 and recently for PRPF8 as well [19][20][21][22]. Interestingly, a novel TRPM1 mutation was identified in a patient with adCSNB, a gene previously only associated with arCSNB [23][24][25][26]. This is the first report of a TRPM1 mutation co-segregating with ad Schubert-Bornschein type complete CSNB. Since the location of this mutation is not different compared to other mutations leading to arCSNB, it is not quite clear how TRPM1 mutations might lead to either ad or arCSNB. Functional investigations are needed to validate the pathogenicity of this variant. Furthermore, this finding suggests that TRPM1 heterozygous mutation carriers from arCSNB families should be investigated by electroretinography to determine whether they display similar retinal dysfunction as in affected members of the presented adCSNB family. Detection of a novel RPGR splice site mutation in family 146 presented a challenge. The actual disease causing change was concealed under a wrongly annotated rs62638633, which had previously been clinically associated to RP by a German group http://www.ncbi.nlm.nih.gov/sites/varvu?gen-e=6103&rs=62638633, (personal communication, Markus Preising). These observations indicate that the stringent filtering we applied initially can mask those referenced disease causing variants. Bearing this in mind one can still first investigate unknown variants, but should then examine dbSNP for referenced variants either described to be disease causing, having a low minor allele frequency or present in interesting candidate genes. An accurate discrimination of non-pathogenic polymorphisms versus disease causing polymorphism in SNP databases is warranted to resolve this challenge. In six families from the investigated cohort the disease causing mutations still remain to be identified. In the Stargardt patient with no pathogenic ABCA4 mutations two variants in CFH were detected, one of which (rs1061170) had previously been reported to predispose to age related macular degeneration (AMD) [27][28][29]. The second CFH change is a novel variant, affecting a highly conserved residue, not found in NGS data from the other 19 samples and never associated with a disease. The variants co-segregated in the only available family members, which were the patient's parents. Apart from the association with AMD, CFH mutations have been previously associated with renal diseases, the most common being membranoproliferative glomerulonephritis and hemolytic uremic syndrome, which can be also associated with an eye phenotype [30,31]. No renal dysfunction was present in our patient. To validate if the two variants identified in CFH are indeed disease causing, the DNA samples from other available family members for co-segregation analysis as well as characterization of functional consequences of the novel variant are needed. One patient with complete CSNB had an affected nephew and thus x-linked inheritance was assumed. However, neither Sanger nor NGS detected a mutation in the only known x-linked gene, NYX, causing cCSNB. To exclude recessive inheritance TRPM1 and GRM6 were investigated in detail. Indeed the patient carried a novel heterozygous  TRPM1 variant, which affects a highly conserved amino acid and was not identified in the other 19 samples investigated here (Table 6). However, direct Sanger sequencing of lower covered regions did not identify a second mutation in this gene. Similarly no mutations in GRM6 were identified. These findings outline the need for additional family members to determine, through co-segregation, the pathogenicity of the numerous variants identified by NGS. This was also true for two other families with nonsense mutations in CUBN (Fam795) and RP1L1 (Fam761) ( Table 6). The nonsense mutation in CUBN, co-segregated with the phenotype in most of the family members ( Figure 5). Had we not had access to additional family members, we might have retained this gene defect as the underlying cause for adCD and considered CUBN as a new gene involved in adCD. None of the other putatively pathogenic mutations identified in CUBN, TRPM1 and GUCY2D co-segregated with the phenotype in this family (Table 6, Figure 5). RP1L1 was already a candidate for adRP [32] but was previously associated with occult macular dystrophy [33]. In our study, this variant did not co-segregate with the phenotype in other affected family members (data not shown). This NGS study ended with six genetically unresolved families, which can be further investigated with whole exome sequencing. Although, no clear information about the actual percentage of missing gene defects underlying each group of inherited retinal disorders exists, previous studies have reported that in many cases the genetic cause still needs to be determined [18,34]. Whole exome sequencing approaches allow the detection of both, novel and known gene defects, but also generate numerous variants and therefore require the inclusion of more than one DNA sample for each family to rapidly exclude non-pathogenic variants. Due to the higher costs of exome sequencing for one sample compared to targeted sequencing, we propose to initially perform targeted sequencing in the index patient and proceed only after exclusion of a known gene defect to whole exome sequencing.

Conclusions
In summary, our diagnostic tool is an unbiased time efficient method, which not only allows detecting known and novel mutations in known genes but also potentially associates known gene defects with novel phenotypes. This genetic testing tool can now be applied to large cohorts of inherited retinal disorders and should rapidly deliver the prevalence of known genes and the percentage of cases with missing genetic defect for underlying forms of retinal disorders.
List of abbreviations ad: autosomal dominant; ar: autosomal recessive; as: asymptomatic; het: heterozygous; homo: homozygous; hemi: hemizygous; -not noted; consang.: x-linked inheritance and phenotype verification Index patients and respective gene defect are highlighted in bold. In some cases also family members were used for NGS.  Figure 5 Detection of novel mutation by using NGS in 254 retinal genes. Family 795 reveals autosomal dominant cone dystrophy with post-receptoral defects. Four putative disease causing mutations were investigated on the basis of co-segregation. However, none of them cosegregated in all affected family members with the phenotype and thus are not considered to be disease causing. Individuals marked with a star were clinically investigated, patients with a question mark are asymptomatic and patients with a plus sign show high myopia. consanguinity was reported; n.a.: not applicable; CSNB: congenital stationary night blindness; RP: retinitis pigmentosa: