Two decades ago, the sequence of the first human genome was published. Since then, advances in genome technologies have resulted in whole-genome sequencing and microarray-based genotyping of millions ...of human genomes. However, genetic and genomic studies are predominantly based on populations of European ancestry. As a result, the potential benefits of genomic research-including better understanding of disease etiology, early detection and diagnosis, rational drug design and improved clinical care-may elude the many underrepresented populations. Here, we describe factors that have contributed to the imbalance in representation of different populations and, leveraging our experiences in setting up genomic studies in diverse global populations, we propose a roadmap to enhancing inclusion and ensuring equal health benefits of genomics advances. Our Perspective highlights the importance of sincere, concerted global efforts toward genomic equity to ensure the benefits of genomic medicine are accessible to all.
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups ...have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.
Atherosclerosis precedes the onset of clinical manifestations of cardiovascular diseases (CVDs). We used carotid intima-media thickness (cIMT) to investigate genetic susceptibility to atherosclerosis ...in 7894 unrelated adults (3963 women, 3931 men; 40 to 60 years) resident in four sub-Saharan African countries. cIMT was measured by ultrasound and genotyping was performed on the H3Africa SNP Array. Two new African-specific genome-wide significant loci for mean-max cIMT, SIRPA (p = 4.7E-08), and FBXL17 (p = 2.5E-08), were identified. Sex-stratified analysis revealed associations with one male-specific locus, SNX29 (p = 6.3E-09), and two female-specific loci, LARP6 (p = 2.4E-09) and PROK1 (p = 1.0E-08). We replicate previous cIMT associations with different lead SNPs in linkage disequilibrium with SNPs primarily identified in European populations. Our study find significant enrichment for genes involved in oestrogen response from female-specific signals. The genes identified show biological relevance to atherosclerosis and/or CVDs, sex-differences and transferability of signals from non-African studies.
The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic ...diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10
) differentiation, and F
analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease.
Smoking is a leading risk factor for many of the top ten causes of death worldwide. Of the 1.3 billion smokers globally, 80% live in low- and middle-income countries, where the number of deaths due ...to tobacco use is expected to double in the next decade according to the World Health Organization. Genetic studies have helped to identify biological pathways for smoking behaviours, but have mostly focussed on individuals of European ancestry or living in either North America or Europe. We performed a genome-wide association study of two smoking behaviour traits in 10,558 men of African ancestry living in five African countries and the UK. Eight independent variants were associated with either smoking initiation or cessation at P-value < 5 × 10
, four being monomorphic or rare in European populations. Gene prioritisation strategy highlighted five genes, including SEMA6D, previously described as associated with several smoking behaviour traits. These results confirm the importance of analysing underrepresented populations in genetic epidemiology, and the urgent need for larger genomic studies to boost discovery power to better understand smoking behaviours, as well as many other traits.
Genetic associations for lipid traits have identified hundreds of variants with clear differences across European, Asian and African studies. Based on a sub-Saharan-African GWAS for lipid traits in ...the population cross-sectional AWI-Gen cohort (N = 10,603) we report a novel LDL-C association in the GATB region (P-value=1.56 × 10
). Meta-analysis with four other African cohorts (N = 23,718) provides supporting evidence for the LDL-C association with the GATB/FHIP1A region and identifies a novel triglyceride association signal close to the FHIT gene (P-value =2.66 × 10
). Our data enable fine-mapping of several well-known lipid-trait loci including LDLR, PMFBP1 and LPA. The transferability of signals detected in two large global studies (GLGC and PAGE) consistently improves with an increase in the size of the African replication cohort. Polygenic risk score analysis shows increased predictive accuracy for LDL-C levels with the narrowing of genetic distance between the discovery dataset and our cohort. Novel discovery is enhanced with the inclusion of African data.
Most hypertension-related genome-wide association studies (GWASs) focus on non-African populations, despite hypertension (a major risk factor for cardiovascular disease) being highly prevalent in ...Africa. The AWI-Gen study GWAS meta-analysis for blood pressure (BP)-related traits (systolic and diastolic BP, pulse pressure, mean-arterial pressure and hypertension) from three sub-Saharan African geographic regions (N = 10,775), identifies two novel genome-wide significant signals (p < 5E-08): systolic BP near P2RY1 (rs77846204; intergenic variant, p = 4.95E-08) and pulse pressure near LINC01256 (rs80141533; intergenic variant, p = 1.76E-08). No genome-wide signals are detected for the AWI-Gen GWAS meta-analysis with previous African-ancestry GWASs (UK Biobank (African), Uganda Genome Resource). Suggestive signals (p < 5E-06) are observed for all traits, with 29 SNPs associating with more than one trait and several replicating known associations. Polygenic risk scores (PRSs) developed from studies on different ancestries have limited transferability, with multi-ancestry PRS providing better prediction. This study provides insights into the genetics of BP variation in African populations.
Chloroquine/hydroxychloroquine have been proposed as potential treatments for COVID-19. These drugs have warning labels for use in individuals with glucose-6-phosphate dehydrogenase (G6PD) ...deficiency. Analysis of whole genome sequence data of 458 individuals from sub-Saharan Africa showed significant G6PD variation across the continent. We identified nine variants, of which four are potentially deleterious to G6PD function, and one (rs1050828) that is known to cause G6PD deficiency. We supplemented data for the rs1050828 variant with genotype array data from over 11,000 Africans. Although this variant is common in Africans overall, large allele frequency differences exist between sub-populations. African sub-populations in the same country can show significant differences in allele frequency (e.g. 16.0% in Tsonga vs 0.8% in Xhosa, both in South Africa, p = 2.4 × 10
). The high prevalence of variants in the G6PD gene found in this analysis suggests that it may be a significant interaction factor in clinical trials of chloroquine and hydroxychloroquine for treatment of COVID-19 in Africans.