The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic ...diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10
) differentiation, and F
analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease.
South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups ...have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.
Pancreatic cancer accounts for 2.8% of new cancer cases worldwide and is projected to become the second leading cause of cancer-related deaths by 2030. Patients of African ancestry appear to be at an ...increased risk for pancreatic ductal adenocarcinoma (PDAC), with more severe disease and outcomes. The purpose of this study was to map the proteomic and genomic landscape of a cohort of PDAC patients of African ancestry. Thirty tissues (15 tumours and 15 normal adjacent tissues) were obtained from consenting South African PDAC patients. Optimisation of the sample preparation method allowed for the simultaneous extraction of high-purity protein and DNA for SWATH-MS and OncoArray SNV analyses. We quantified 3402 proteins with 49 upregulated and 35 downregulated proteins at a minimum 2.1 fold change and FDR adjusted p-value (q-value) ≤ 0.01 when comparing tumour to normal adjacent tissue. Many of the upregulated proteins in the tumour samples are involved in extracellular matrix formation (ECM) and related intracellular pathways. In addition, proteins such as EMIL1, KBTB2, and ZCCHV involved in the regulation of ECM proteins were observed to be dysregulated in pancreatic tumours. Downregulation of pathways involved in oxygen and carbon dioxide transport were observed. Genotype data showed missense mutations in some upregulated proteins, such as MYPN, ESTY2 and SERPINB8. Approximately 11% of the dysregulated proteins, including ISLR, BP1, PTK7 and OLFL3, were predicted to be secretory proteins. These findings help in further elucidating the biology of PDAC and may aid in identifying future plausible markers for the disease.
Dichapetalum cymosum produces the toxic fluorinated metabolite, fluoroacetate, presumably as a defence mechanism. Given the rarity of fluorinated metabolites in nature, the biosynthetic origin and ...function of fluoroacetate have been of particular interest. However, the mechanism for fluorination in D. cymosum was never elucidated. More importantly, there is a severe lack in knowledge on a genetic level for fluorometabolite-producing plants, impeding research on the subject. Here, we report on the first transcriptome for D. cymosum and investigate the wound response for insights into fluorometabolite production. Mechanical wounding studies were performed and libraries of the unwounded (control) and wounded (30 and 60 min post wounding) plant were sequenced using the Illumina HiSeq platform. A combined reference assembly generated 77,845 transcripts. Using the SwissProt, TrEMBL, GO, eggNOG, KEGG, Pfam, EC and PlantTFDB databases, a 69% annotation rate was achieved. Differential expression analysis revealed the regulation of 364 genes in response to wounding. The wound responses in D. cymosum included key mechanisms relating to signalling cascades, phytohormone regulation, transcription factors and defence-related secondary metabolites. However, the role of fluoroacetate in inducible wound responses remains unclear. Bacterial fluorinases were searched against the D. cymosum transcriptome but transcripts with homology were not detected suggesting the presence of a potentially different fluorinating enzyme in plants. Nevertheless, the transcriptome produced in this study significantly increases genetic resources available for D. cymosum and will assist with future research into fluorometabolite-producing plants.
The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an ...African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging.
H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community.
The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
With more microbiome studies being conducted by African-based research groups, there is an increasing demand for knowledge and skills in the design and analysis of microbiome studies and data. ...However, high-quality bioinformatics courses are often impeded by differences in computational environments, complicated software stacks, numerous dependencies, and versions of bioinformatics tools along with a lack of local computational infrastructure and expertise. To address this, H3ABioNet developed a 16S rRNA Microbiome Intermediate Bioinformatics Training course, extending its remote classroom model. The course was developed alongside experienced microbiome researchers, bioinformaticians, and systems administrators, who identified key topics to address. Development of containerised workflows has previously been undertaken by H3ABioNet, and Singularity containers were used here to enable the deployment of a standard replicable software stack across different hosting sites. The pilot ran successfully in 2019 across 23 sites registered in 11 African countries, with more than 200 participants formally enrolled and 106 volunteer staff for onsite support. The pulling, running, and testing of the containers, software, and analyses on various clusters were performed prior to the start of the course by hosting classrooms. The containers allowed the replication of analyses and results across all participating classrooms running a cluster and remained available posttraining ensuring analyses could be repeated on real data. Participants thus received the opportunity to analyse their own data, while local staff were trained and supported by experienced experts, increasing local capacity for ongoing research support. This provides a model for delivering topic-specific bioinformatics courses across Africa and other remote/low-resourced regions which overcomes barriers such as inadequate infrastructures, geographical distance, and access to expertise and educational materials.
Africa is not unique in its need for basic bioinformatics training for individuals from a diverse range of academic backgrounds. However, particular logistical challenges in Africa, most notably ...access to bioinformatics expertise and internet stability, must be addressed in order to meet this need on the continent. H3ABioNet (www.h3abionet.org), the Pan African Bioinformatics Network for H3Africa, has therefore developed an innovative, free-of-charge "Introduction to Bioinformatics" course, taking these challenges into account as part of its educational efforts to provide on-site training and develop local expertise inside its network. A multiple-delivery-mode learning model was selected for this 3-month course in order to increase access to (mostly) African, expert bioinformatics trainers. The content of the course was developed to include a range of fundamental bioinformatics topics at the introductory level. For the first iteration of the course (2016), classrooms with a total of 364 enrolled participants were hosted at 20 institutions across 10 African countries. To ensure that classroom success did not depend on stable internet, trainers pre-recorded their lectures, and classrooms downloaded and watched these locally during biweekly contact sessions. The trainers were available via video conferencing to take questions during contact sessions, as well as via online "question and discussion" forums outside of contact session time. This learning model, developed for a resource-limited setting, could easily be adapted to other settings.
Bioinformatics training programs have been developed independently around the world based on the perceived needs of the local and global academic communities. The field of bioinformatics is ...complicated by the need to train audiences from diverse backgrounds in a variety of topics to various levels of competencies. While there have been several attempts to develop standardised approaches to provide bioinformatics training globally, the challenges encountered in resource limited settings hinder the adaptation of these global approaches. H3ABioNet, a Pan-African Bioinformatics Network with 27 nodes in 16 African countries, has realised that there is no single simple solution to this challenge and has rather, over the years, evolved and adapted training approaches to create a sustainable training environment, with several components that allow for the successful dissemination of bioinformatics knowledge to diverse audiences. This has been achieved through the implementation of a combination of training modalities and sharing of high quality training material and experiences. The results highlight the success of implementing this multi-pronged approach to training, to reach audiences from different backgrounds and provide training in a variety of different areas of expertise. While face-to-face training was initially required and successful, the mixed-model teaching approach allowed for an increased reach, providing training in advanced analysis topics to reach large audiences across the continent with minimal teaching resources. The transition to hackathons provided an environment to allow the progression of skills, once basic skills had been developed, together with the development of real-world solutions to bioinformatics problems. Ensuring our training materials are FAIR, and through synergistic collaborations with global training partners, the reach of our training materials extends beyond H3ABioNet. Coupled with the opportunity to develop additional career building soft skills, such as scientific communication, H3ABioNet has created a flexible, sustainable and high quality bioinformatics training environment that has successfully been implemented to train several highly skilled African bioinformaticians on the continent.
FOXP2 is a member of the P subfamily of FOX transcription factors, the DNA-binding domain of which is the winged helix forkhead domain (FHD). In this work we show that the FOXP2 FHD is able to bind ...to various DNA sequences, including a novel sequence identified in this work, with different affinities and rates as detected using surface plasmon resonance. Combining the experimental work with molecular docking, we show that high-affinity sequences remain bound to the protein for longer, form a greater number of interactions with the protein and induce a greater structural change in the protein than low-affinity sequences. We propose a binding model for the FOXP2 FHD that involves three types of binding sequence: low affinity sites which allow for rapid scanning of the genome by the protein in a partially unstructured state; moderate affinity sites which serve to locate the protein near target sites and high-affinity sites which secure the protein to the DNA and induce a conformational change necessary for functional binding and the possible initiation of downstream transcriptional events.