High-throughput genomic technologies are revolutionizing biology and medicine by providing more and more resolution in the data produced. However, at the same time, are posing new challenges in the way such data can be analyzed and interpreted. In particular, conventional marker-based analysis of GWAS requires of large sample sizes to find significant associations given that some genes may be really associated with the disease, but may not reach a stringent genome-wide significance threshold required in a massive multiple testing scenario . This makes of GWAS a methodology especially impracticable in the field of rare diseases. In order to overcome the limitations of conventional single-marker based association analysis, alternative approaches for the analysis of GWAS analysis have been proposed in the last few years. Beyond testing solutions that use multiple markers or marker information, such as linkage, etc., the recent proposal of PBA approaches has introduced a new angle in the analysis of GWAS data, closer to the principles of systems biology. Typically, PBA approaches check whether some statistics have consistent yet moderate deviation from chance for a group of related genes (for example, belonging to a GO category). The rationale for this approach is the accepted notion that genes do not work in an isolated way, but rather in complex molecular networks and cellular pathways that are often involved in disease susceptibility and progression [23, 41, 46, 47]. Then, the use of prior biological knowledge about relationships among genes and pathways will increase the possibility of identifying the genes and mechanisms that are involved in disease pathogenesis . Here we have adopted a widely accepted definition of gene functionality, which is represented by the GO. One obvious limitation of this approach is that variants not mapping within, or close to, genomic elements with a functional annotation will be missed. However, it is known that coding genes and functional elements in their neighborhood (e.g. splice acceptor and donor sites, transcription factor binding sites, etc.) harbor 85% of the mutations with large disease related effects [49, 50].
In this study we have applied a new approach which can be considered a functional version of a two stage analysis of GWAS data. In our approach, the conventional discovery stage based on markers is substituted by a PBA, that allows us to identify biological processes (represented by GO terms) associated to the disease in the Spanish population. The subsequent step of validation allowed us to identify 4 new gene associations to the disease among the ones belonging to the associated GO categories.
It is believed that aganglionosis that characterizes Hirschsprung can arise from: (1) reduced size of the stem/progenitor cell pool; (2) loss of stem cell potency leading to premature differentiation of progenitors; (3) reduced cell survival/ increased cell death; (4) intrinsic, cell-autonomous migration defect of the ENCC; (5) abnormal gut microenvironment; and (6) abnormal interaction of enteric neural crest cells with the gut microenvironment. However, factors that affect the ENCC motility/proliferation/differentiation, and thus probably implicated in HSCR, are not clearly known, as are not how the intrinsic cell migration ability or the NCC-gut microenvironment interaction is controlled at the cellular and molecular level. The PBA here conducted on affected patients, not only confirms the most accepted theory for the biological processes implicated in HSCR, but also provides an important initial hypothesis to select candidate genes for further evaluation in the context of the disease. In this sense, and to reinforce the power of the proposed approach, a significant association to the disease could be confirmed for 4 genes after the evaluation of 68 candidate genes chosen on the basis of their inclusion in the GOs associated to the disease.
Moreover, expression of those genes was demonstrated through RT-PCR and immunohistochemistry analyses in human postnatal gut tissue, obtained from patients with different clinical conditions. Today is widely accepted that neurogenesis takes place in human postnatal ENS, a process which mimics the embryonic events during the formation of the ENS. Due to the limitations of working with human embryonic gut tissue, a postnatal gut context is a powerful tool to identify new genes implicated in the development of ENS, and therefore our results would support again the implication of the tested genes in Hirschsprung disease.
In fact, some of the validated genes had been already related to HSCR either directly or indirectly. That is the case of RASGEF1A gene, whose 3 SNPs selected for validation resulted to be all significantly over-transmitted to affected patients. This is not surprising, since RASGEF1A is located around 65.5 Kb upstream RET, and previous TDT studies had already showed statistically significant disease associations spanning a region immediately 5′ of RET through to this gene  reflecting the high background linkage disequilibrium in this region. RasGEF1A acts as very specific guanine nucleotide exchange factor for Rap2, a member of the Rap subfamily of Ras-like G-proteins implicated in the regulation of cell adhesion, the establishment of cell morphology, and the modulation of synapses in neurons . Although association tests have excluded the occurrence of a common mutation at RASGEF1A in HSCR, the possibility remains that this gene might carry relevant rare mutations related to HSCR. Importantly, RASGEF1A has been reported to be highly expressed in early embryonic development, at a stage coincident with peak RET expression and colonization of the gut by neural crest-derived neuronal precursors, which would fit with a potential role in enteric neural crest migration . Regarding CHRNA7, it encodes a nicotinic acetylcholine receptor implicated in synaptic transmission regulated by NRG1 among others . Very interestingly, NRG1 has been associated to the disease through both common [17, 53] and rare variants [6, 54]. In addition, NRG1 SNPs can modulate significantly CHRNA7 expression  and for this reason it would be necessary to evaluate the possibility that NRG1 SNPs associated to HSCR act in combination with CHRNA7 leading to a susceptibility for the manifestation of HSCR phenotype.
On the other hand, the remaining significant genes have major functions in early embryogenesis or ENS development, which makes conceivable that, although not previously related to HSCR, they might play a role in its pathogenesis. DLC1 is essential for embryonic development affecting to neural tube development  modulating the cytoskeleton and producing morphological changes . IQGAP2 function is still unclear, although studies in with morpholinos X. laevis showed that it regulates cell-cell adhesion during early development [57, 58]. According to the network analysis, IQGAP2 is strongly connected to DLC1 through five simultaneous connections mediated by the genes RAC1, RAC2, RAC3, RHOG and CDC42. It is also connected to GDNF (already associated to HSCR) through the NFKB1 gene. So, despite its still undefined functional role, the experimental evidences of protein interactions firmly link IQGAP2 to the disease.
In addition valuable information has been also obtained from the network analysis, since it has let us to link the newly identified genes to ones already known to be involved in Hirschsprung. This supports of the associations found and reinforces the role of the new genes in the disease, even though when real network associated to Hirschsprung is probably underestimated because of the lack of information on some gene interactions.
In summary, here we report a number of new candidate genes for HSCR. They need further investigation to elucidate their role in the disease and must be also validated in other populations to discern if their effect in HSCR is universal or is restricted to the Spanish population. Nevertheless, our most important conclusion is that this comprehensive profile of GO terms has demonstrated to be a useful resource for developmental, biochemical and genetic studies. Our report indicate that this approach can help to identify candidate genes for human disease susceptibility loci. These findings could be of especial importance in the field of rare diseases, where large cohorts are often unavailable. In that scenario, the lower sample size requirements make of this approach a suitable and efficient alternative to marker-based analyses.
Beyond technical considerations on the advantages of using GO modules in the analysis of genotype data, the biological pathways highlighted by our study provide insights into the complex nature of HSCR, opens new opportunities for validation of new disease genes and may help in the definition of relatively tractable targets for therapeutic intervention.