- Position statement
- Open Access
Increasing African genomic data generation and sharing to resolve rare and undiagnosed diseases in Africa: a call-to-action by the H3Africa rare diseases working group
Orphanet Journal of Rare Diseases volume 17, Article number: 230 (2022)
The rich and diverse genomics of African populations is significantly underrepresented in reference and in disease-associated databases. This renders interpreting the Next Generation Sequencing (NGS) data and reaching a diagnostic more difficult in Africa and for the African diaspora. It increases chances for false positives with variants being misclassified as pathogenic due to their novelty or rarity. We can increase African genomic data by (1) making consent for sharing aggregate frequency data an essential component of research toolkit; (2) encouraging investigators with African data to share available data through public resources such as gnomAD, AVGD, ClinVar, DECIPHER and to use MatchMaker Exchange; (3) educating African research participants on the meaning and value of sharing aggregate frequency data; and (4) increasing funding to scale-up the production of African genomic data that will be more representative of the geographical and ethno-linguistic variation on the continent. The RDWG of H3Africa is hereby calling to action because this underrepresentation accentuates the health disparities. Applying the NGS to shorten the diagnostic odyssey or to guide therapeutic options for rare diseases will fully work for Africans only when public repositories include sufficient data from African subjects.
On February 28th 2022 the world celebrated the 14th Rare Diseases Day, and then the sixth Undiagnosed Diseases Day was celebrated on April 29th. These and previous such days have been opportunities for the members of the Rare Disease Working Group of the Human Heredity and Health in Africa Consortium (H3Africa) to reflect on major barriers that cause a high proportion of patients with rare diseases in Africa to remain undiagnosed. The Rare Disease Working Group is comprised of delegates from H3Africa-funded projects that work within the rare genetic disorder niche. Their research initiatives are aimed at identifying and filling the knowledge gaps in rare disease in Africa by characterizing the clinical and molecular epidemiology of specific groups of rare diseases, including developmental delay, deafness, neurodegenerative and neuromuscular diseases. This group meets monthly to explore barriers to the implementation of genomic medicine for rare diseases and to engage with relevant stakeholders to promote rare genomic disease research in Africa.
About 3.5–5.9% of the world population are affected by a rare disease . This corresponds to around 50 million people in Africa and represents a large community of individuals and families in need of diagnostics and care. About 72% of rare diseases are thought to have a genetic etiology . The completion of the Human Genome Project in April 2003 offered the opportunity to translate this unprecedented scientific accomplishment into tangible diagnostic improvements in human health , especially for rare and undiagnosed disease patients. Next Generation Sequencing (NGS) technology has facilitated major diagnostic progress triggered by the availability of a human reference genome. This tool has significantly accelerated the pace of disease gene and therapeutic discovery for rare diseases . Moreover, NGS is shortening the diagnostic odyssey by its ability to explore multiple types of genomic aberrations at once, even without a clear clinical hypothesis. This has led to the implementation of rapid NGS testing for critically ill patients in Europe , America [5, 6], Australia  and the United Kingdom , with a turn-around-time that allows NGS-guided therapeutic adjustments to be made.
In Africa, there is a sharp contrast between the implementation of NGS in pathogen genetics versus in human genetics. Multiple financial international supports are available for technology transfer to Africa for pathogen genetics. The Ebola epidemic in West Africa and the Covid-19 response have created a consensus on the utility of NGS on the continent and offered a momentum for broad expansion of short and long reads sequencing technologies as routine testing of pathogens in Africa [9, 10]. With regards to rare diseases, genetic service delivery in general, and NGS-based testing in particular, remains limited and extremely fragile in Africa. NGS-based diagnostics tests are not offered routinely to rare diseases patients in Africa yet. The main barriers include limited existing infrastructure and processes, insufficient funding and lack of political support, and poorly structured health systems . However, research projects such as the Deciphering Developmental Disorders in Africa (DDD-Africa, South Africa and the Democratic Republic of Congo), Hearing Impairment Genetics Studies in Africa (HI-GENES Africa, South Africa, Cameroun, Ghana, Mali), Clinical and Genetic Studies of Hereditary Neurological Disorders in Mali or the genetic study of neuromuscular diseases (South Africa) are ensuring rare diseases patients in Africa can receive NGS tests. In addition, there is an important contribution of NGS tests free-of-charge to African rare diseases patients from international research collaborations and philanthropic initiatives such as the Centers for Mendelian Genomics (CMG)  and the iHope foundation. All these efforts have allowed ending diagnostic odyssey in many African patients and adjusting care when possible as well as identifying new disease genes [13,14,15,16,17,18].
Generating genomic data is only the first step on the path toward resolving and managing undiagnosed genetic diseases. The second and most important part for patients is interpreting those data with regards to their health conditions and potential management. For unaffected parents and relatives, a diagnosis answers questions around recurrence risk and offers opportunities for prenatal testing, where relevant. While allowing a broader exploration of the genome, new sequencing technologies identify an enormous amount of genomic variation, causing increased complexity in data interpretation . Correlating the molecular findings with a clinical phenotype contributes to overcoming this issue. In depth clinical phenotyping therefore provides critical morphological information for clinical variant interpretation . However, multiple rare diseases have atypical facial presentations or remain uncharacterized in understudied populations. At the end, many patients with rare diseases remain undiagnosed despite access to high resolution sequencing technologies. Over the past years, the H3Africa has been contacted by multiple international clinical geneticists and researchers seeking to access African data in order to gain a better understanding of candidate variants identified by their laboratories in patients with undiagnosed rare developmental diseases. Therefore, the limited ability to fully interpret genomic data is contributing to a high burden of undiagnosed diseases, and represents a bottleneck shared by all patients irrespective of their background and geographic location.
This manuscript presents the view of the H3A-RDWG for the improvement of the resolution of rare and undiagnosed diseases.
Challenges facing Africa for resolution of rare undiagnosed diseases
Clinical interpretation of genomic data for rare diseases is complex and challenging. This process consists of filtering variants based on knowledge gathered from different sources (disease databases and literature, population databases labelled as reference databases, existing functional data), phenotypic overlap, segregation data (inheritance) and computational predictions (in silico tools). Variants are then prioritized based on different criteria which favor a pathogenic or benign interpretation, and finally classified into five categories according to the guidelines from the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) . Considering that functional data are not available for the vast majority of variants, and that the phenotypic spectrum of many rare undiagnosed diseases is not known or is poorly characterized in understudied communities, the population frequency remains among the strongest filters for NGS data interpretation. Therefore, the lack of representation in population frequency databases makes clinical interpretation of genomic data significantly more challenging in Africa and in African diaspora. In 2021, the H3Africa was contacted by a team in Canada, seeking information regarding 3 genomic variants identified in patients of African origin with rare diseases in Canada. The query in the H3Africa reference panel and in the internal database of the Center for Human Genetics of the University of Kinshasa indicated that two of these variant were absent and the last had very low frequency with no homozygous (AF = 0.0078). We were able to provide further evidence of the novelty or the rarity of these variants in an African dataset. This illustrate the complexity of data interpretation even in developed countries and supports the value of data sharing.
The concept of underrepresentation encompasses two components, first the paucity of samples from African individuals, second the failure to capture the African granularity. The African continent is recognized for its great genomic diversity. Underneath this great continental diversity, Choudhury et al., (2020)  reported large differential allele frequencies between and within African populations, which supports previous indications that geographical and ethno-linguistic lineages broadly define genetic relationships in sub-Saharan Africa [23, 24]. Therefore, it is plausible that historical, cultural and geographical contexts may have created multiple genetic sub-communities or clusters within Africa, thus resulting in a fine granularity within the African continent. Because of such granularity, variants as well as some undiagnosed diseases may possibly be restricted to one or a few particular African clusters. Although multiple efforts are being undertaken to increase the volume of African samples, the great African diversity and the fine granularity could have prevented an even inclusion of samples from people of African ancestry .
The first consequence of the underrepresentation of African samples from commonly used databases is the higher level of difficulty to reach a molecular diagnosis in African individuals. The data disparity ultimately results in health disparity . There is a risk that NGS will be perceived as deeply flawed and thus of limited clinical utility for some populations, which would work against all of the efforts to make this technology more accessible on the African continent. The call for global action for rare diseases in Africa that followed the 11th International Conference on Rare Diseases and Orphan Drugs (ICORD) emphasized that the chance of a false-positive result is greater in populations with a paucity of genetic reference data . Another consequence is the misclassification of non-pathogenic variants in existing disease-related databases. Many variants are being inaccurately classified as pathogenic or likely pathogenic because of their novelty or rarity in reference databases , compounded by the lack of diversified frequency data. As previously demonstrated, filtering the most reputable disease database, ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), against a good population database illustrated enrichment of ClinVar with a significant proportion of wrongly ascertained variants . Also, re-analysis of NGS data after databases have been populated helped in both identifying previously discarded plausible variants and down-classifying previously misclassified variants [28,29,30,31]. Such misclassification was found to be higher in individuals of African ancestry . The enrichment of reference databases is recognized to play a key role in the improvement of NGS-based diagnostics .
Opportunities from Africa
Simulations showed that increasing the number of samples from African ancestry individuals in reference databases would significantly prevent misclassification of variants [22, 32]. Therefore, increasing diversity in databases with diverse African data will be a powerful tool to improve the diagnostic yield and the diagnostic accuracy for African and non-African rare undiagnosed diseases patients. Fortunately, the production of genomic data from African individuals has significantly increased over the last decade. This production has been significantly facilitated by the H3Africa Consortium, funded by the National Institute of Health (NIH) and Wellcome Trust foundation . Although most of the funded projects focused on complex and infectious diseases, research participants in those projects would qualify to populate an aggregate reference frequency database for rare diseases. An aggregate reference frequency populated with African data will be a game changer in resolving undiagnosed diseases not only in Africa but for 3.5–5.9% of the world population. A good illustration of the power of African data is the recent submission by the H3Africa of 41 variants to the ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/). Some of these variant had conflicting interpretation in ClinVar but were all observed in more than 5% of the H3Africa participants. Adding this information from H3Africa data allowed these variants to be reclassified as benign.
Another initiative poised to have a significant impact is the Africa Pathogen Genomics Initiative (PGI) developed by the Africa CDC Institute for Pathogen Genomics, which aims to build and host an African-owned data library and real-time data sharing platform, as well as to expand and strengthen laboratory and bioinformatics capacity.
It is clear that African researchers must join forces to address the paucity of African genomic data in the public domain, its impact on the diagnostic yield and accuracy for rare undiagnosed disease patients, and promote data sharing. The Genome Aggregation Database (gnomAD) consortium with its aggregate reference frequency browser (https://gnomad.broadinstitute.org) is recognized as the most reputable source for reference frequency in the interpretation of variants with regards to rare diseases. We call here for investigators who have generated genomic data from African participants to help the rare and undiagnosed diseases community by sharing the data through the gnomAD effort. Thus far, the high sensitivity of African data is often identified as the main limit to sharing African data in gnomAD. Other investigators report the lack of proper consent for large scale sharing of data. Although the legitimacy may be discussed and should be addressed going forward, the issue with sharing preexisting African genomic data requires customized approach. Such customization is being implemented in the African Genome Variation Database (AGVD, https://agvd-dev.h3abionet.org/). The AGVD, developed by the pan-African bioinformatics network for H3Africa (H3A-Bionet) , will contain creative solutions with multiple layers of controlled access for sharing aggregate frequencies from African research participants, including the more sensitive data. The AGVD will also host a beacon for data collected in rare diseases research through the African Rare Diseases Initiative (ARDI). We also call the patients and advocacy groups as well as ethics committees to be more active in the discussion about the consent for sharing of aggregate summary data.
Existing H3Africa data contain multiple gaps that prevent the full diversity and fine granularity of the African genome from being captured . An estimated three million African genomes will be needed to reach the expected diversity in reference databases [35, 36]. A strong commitment from funders is needed to work with African researchers to reach this goal.
Therefore, we call on local and international funding agencies to support the production of more African genomic data. Funding strategies that will enable smaller, locally relevant genomics studies as well as large-scale precision genomics initiatives and genomic data science are needed to comprehensively address African genomic data shortages and apply the findings in improving health in all regions of the world. International collaborative research grants that bring together a diverse group of researchers from different countries, continents and research niches will be particularly useful to complement and address skills- and experience gaps. In addition to calling the NIH and the Wellcome Trust to expanding their funding, we are calling for the European Union to consider data diversity as a major challenge to be addressed by the Beyond One Million Genomes Project. We believe that the African-Japan collaborative research, the China Africa Research Initiative (SAIS-CARI) and the Australian’s Partnership & Research Development Fund (PRDF) should specifically consider increasing African genomic data as one of their transversal theme.
While emphasizing the usefulness of sharing data collected from reference individuals, it is worth noting that sharing deidentified patients variant is also needed. Beside increasing the diversity in disease-associated databases, such submissions also help data contributors in improving the diagnostic in their own patients. For instance, sharing variants through ClinVar, DECIPHER and MatchMaker Exchange (MME) will also provide the submitter with additional evidence for better classification of their variants. The “follow” button in ClinVar keeps the submitter and other users informed about any change occurring in the classification of their variant of interest. The DDD-Africa research project and the Center for Human Genetics of the University of Kinshasa are already taking advantage of the DECIPHER and MacthMaker Exchange to increase their diagnostic yield for African patients with developmental disorders and intellectual disability. More African projects, laboratories and clinics are invited to use those resources for the ultimate benefits of African patients.
In summary, to be able to respond to the challenge of improving the resolution and care for undiagnosed rare diseases, there is a need to address data disparity by (1) making consent for sharing aggregate frequency data an essential component of research toolkit; (2) encouraging investigators with African data to share available data through public resources such as gnomAD, AVGD, ClinVar, DECIPHER and to use MatchMaker Exchange; (3) educating African research participants on the meaning and value of sharing aggregate frequency data; and (4) increasing funding to scale-up the production of African genomic data that will be more representative of the geographical and ethno-linguistic variation on the continent. This would ultimately lead to improved diagnosis and management of patients with rare disease both on the African continent and in the rest of the world.
Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–73.
Austin CP. The impact of the completed human genome sequence on the development of novel therapeutics for human disease. Annu Rev Med. 2004;55:1–13.
Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet. 2013;14:681–91.
Bamborschke D, Özdemir Ö, Kreutzer M, Motameny S, Thiele H, Kribs A, et al. Ultra-rapid emergency genomic diagnosis of Donahue syndrome in a preterm infant within 17 hours. Am J Med Genet A. 2021;185:90–6.
Sanford EF, Clark MM, Farnaes L, Williams MR, Perry JC, Ingulli EG, et al. Rapid whole genome sequencing has clinical utility in children in the PICU*. Pediatr Crit Care Med. 2019;20:1007–20.
Gorzynski JE, Goenka SD, Shafin K, Jensen TD, Fisk DG, Grove ME, et al. Ultrarapid nanopore genome sequencing in a critical care setting. N Engl J Med. 2022;386:700–2.
Lunke S, Eggers S, Wilson M, Patel C, Barnett CP, Pinner J, et al. Feasibility of ultra-rapid exome sequencing in critically Ill infants and children with suspected monogenic conditions in the australian public health care system. JAMA. 2020;323:2503.
French CE, Delon I, Dolling H, Sanchis-Juan A, Shamardina O, Mégy K, et al. Whole genome sequencing reveals that genetic conditions are frequent in intensively ill children. Intensive Care Med. 2019;45:627–36.
Inzaule SC, Tessema SK, Kebede Y, Ogwell Ouma AE, Nkengasong JN. Genomic-informed pathogen surveillance in Africa: opportunities and challenges. Lancet Infect Dis. 2021;21:e281–9.
Makoni M. Africa’s $100-million pathogen genomics initiative. Lancet Microbe. 2020;1:e318.
Kamp M, Krause A, Ramsay M. Has translational genomics come of age in Africa? Hum Mol Genet. 2021;30:R164–73.
Bamshad MJ, Shendure JA, Valle D, Hamosh A, Lupski JR, Gibbs RA, et al. The centers for mendelian genomics: a new large-scale initiative to identify the genes underlying rare Mendelian conditions. Am J Med Genet A. 2012;158A:1523–5.
Wonkam A, Adadey SM, Schrauwen I, Aboagye ET, Wonkam-Tingang E, Esoh K, et al. Exome sequencing of families from Ghana reveals known and candidate hearing impairment genes. Commun Biol. 2022;5:369.
Flynn K, Feben C, Lamola L, Carstens N, Krause A, Lombard Z, et al. Ending a diagnostic odyssey-the first case of Takenouchi-Kosaki syndrome in an African patient. Clin Case Rep. 2021;9:2144–8.
Landouré G, Dembélé K, Diarra S, Cissé L, Samassékou O, Bocoum A, et al. A novel variant in the spatacsin gene causing SPG11 in a Malian family. J Neurol Sci. 2020;411:116675.
Yalcouyé A, Diallo SH, Cissé L, Karembé M, Diallo S, Coulibaly T, et al. GJB1 variants in Charcot-Marie-Tooth disease X-linked type 1 in Mali. J Peripher Nerv Syst. 2022;27(2):113–119. https://doi.org/10.1111/jns.12486.
Mubungu G, Makay P, Boujemla B, Yanda S, Posey JE, Lupski JR, et al. Clinical presentation and evolution of Xia-Gibbs syndrome due to p.Gly375ArgfsTer3 variant in a patient from DR Congo (Central Africa). Am J Med Genet A. 2021;185:990–4.
Lumaka A, Race V, Peeters H, Corveleyn A, Coban-Akdemir Z, Jhangiani SN, et al. A comprehensive clinical and genetic study in 127 patients with ID in Kinshasa, DR Congo. Am J Med Genet A. 2018;176:1897–909.
Hoffman-Andrews L. The known unknown: the challenges of genetic variants of uncertain significance in clinical practice. J Law Biosci. 2017;4:648–57.
Hurst ACE, Robin NH. Dysmorphology in the era of genomic diagnosis. J Pers Med. 2020;10(1):18. https://doi.org/10.3390/jpm10010018.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical genetics and genomics and the association for molecular pathology. Genet Med. 2015;17:405–24.
Choudhury A, Aron S, Botigué LR, Sengupta D, Botha G, Bensellak T, et al. High-depth African genomes inform human migration and health. Nature. 2020;586:741–8.
Busby GB, Band G, Le Si Q, Jallow M, Bougama E, Mangano VD, et al. Admixture into and within sub-Saharan Africa. Elife. 2016. https://doi.org/10.7554/eLife.15266.
Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The genetic structure and history of Africans and African Americans. Science. 1979;2009(324):1035–44.
Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016;538:161–4.
Baynam GS, Groft S, van der Westhuizen FH, Gassman SD, du Plessis K, Coles EP, et al. A call for global action for rare diseases in Africa. Nat Genet. 2020;52:21–6.
Shah N, Hou Y-CC, Yu H-C, Sainger R, Caskey CT, Venter JC, et al. Identification of misclassified ClinVar variants via disease population prevalence. Am J Hum Genet. 2018;102:609–19.
Al-Nabhani M, Al-Rashdi S, Al-Murshedi F, Al-Kindi A, Al-Thihli K, Al-Saegh A, et al. Reanalysis of exome sequencing data of intellectual disability samples: yields and benefits. Clin Genet. 2018;94:495–501.
Jalkh N, Corbani S, Haidar Z, Hamdan N, Farah E, Abou Ghoch J, et al. The added value of WES reanalysis in the field of genetic diagnosis: lessons learned from 200 exomes in the Lebanese population. BMC Med Genomics. 2019;12:11.
Mubungu G, Makay P, Boujemla B, Yanda S, Posey JE, Lupski JR, et al. Clinical presentation and evolution of Xia-Gibbs syndrome due to p.Gly375ArgfsTer3 variant in a patient from DR Congo (Central Africa). Am J Med Genetics Part A. 2021;185:990–4.
Wenger AM, Guturu H, Bernstein JA, Bejerano G. Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers. Genet Med. 2017;19:209–14.
Manrai AK, Funke BH, Rehm HL, Olesen MS, Maron BA, Szolovits P, et al. Genetic misdiagnoses and the potential for health disparities. N Engl J Med. 2016;375:655–65.
Mulder N, Abimiku A, Adebamowo SN, de Vries J, Matimba A, Olowoyo P, et al. H3Africa: current perspectives. Pharmgenomics Pers Med. 2018;11:59–66.
Mulder NJ, Adebiyi E, Alami R, Benkahla A, Brandful J, Doumbia S, et al. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa. Genome Res. 2016;26:271–7.
Wonkam A. Sequence three million genomes across Africa. Nature. 2021;590:209–11.
Pagán JA, Brown HS, Rowe J, Schneider JE, Veenstra DL, Gupta A, et al. Genetic variant reinterpretation: economic and population health management challenges. Popul Health Manag. 2021;24:310–3.
The authors express their gratitude to Jennifer Troyer for the fruitful discussions, to Rolanda Julius and the H3Africa Coordinating Center for the meeting platform and the assistance.
The Author declare that there is no funding provided by any organization.
Ethics approval and consent to participate.
Consent for publication
All authors are researchers supported by various grants from the National Institute of Health (NIH) and the Wellcome Trust Foundation. The views and opinions contained in this article are from the authors and do not necessarily state or reflect those of the funders. The authors declare that they have no competing interests with regards to this article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lumaka, A., Carstens, N., Devriendt, K. et al. Increasing African genomic data generation and sharing to resolve rare and undiagnosed diseases in Africa: a call-to-action by the H3Africa rare diseases working group. Orphanet J Rare Dis 17, 230 (2022). https://doi.org/10.1186/s13023-022-02391-w
- Data sharing
- NGS interpretation
- Reference database