HLA-G is a promiscuous immune checkpoint molecule. The HLA-G gene presents substantial nucleotide variability in its regulatory regions. However, it encodes a limited number of proteins compared to ...classical HLA class I genes. We characterized the HLA-G genetic variability in 4640 individuals from 88 different population samples across the globe by using a state-of-the-art method to characterize polymorphisms and haplotypes from high-coverage next-generation sequencing data. We also provide insights regarding the HLA-G genetic diversity and a resource for future studies evaluating HLA-G polymorphisms in different populations and association studies. Despite the great haplotype variability, we demonstrated that: (1) most of the HLA-G polymorphisms are in introns and regulatory sequences, and these are the sites with evidence of balancing selection, (2) linkage disequilibrium is high throughout the gene, extending up to HLA-A, (3) there are few proteins frequently observed in worldwide populations, with lack of variation in residues associated with major HLA-G biological properties (dimer formation, interaction with leukocyte receptors). These observations corroborate the role of HLA-G as an immune checkpoint molecule rather than as an antigen-presenting molecule. Understanding HLA-G variability across populations is relevant for disease association and functional studies.
The human genetic diversity of the Americas has been affected by several events of gene flow that have continued since the colonial era and the Atlantic slave trade. Moreover, multiple waves of ...migration followed by local admixture occurred in the last two centuries, the impact of which has been largely unexplored. Here, we compiled a genome-wide dataset of ∼12,000 individuals from twelve American countries and ∼6,000 individuals from worldwide populations and applied haplotype-based methods to investigate how historical movements from outside the New World affected (1) the genetic structure, (2) the admixture profile, (3) the demographic history, and (4) sex-biased gene-flow dynamics of the Americas. We revealed a high degree of complexity underlying the genetic contribution of European and African populations in North and South America, from both geographic and temporal perspectives, identifying previously unreported sources related to Italy, the Middle East, and to specific regions of Africa.
•European and African genomic signature in the Americas shows high complexity•Sex-biased gene flow occurred between European and American mixing groups•Admixture is geographically and chronologically correlated with historical records•Source-specific demographic histories reveal the huge impact of recent admixture
The complexity of the admixture dynamics that shaped American populations is unveiled by Ongaro et al., where genetic data for more than 12,000 individuals from the continents are investigated. This study evaluates the dramatic impact of events after the colonial era, revealing a spatial and temporal heterogeneity and mirroring historical records.
The inference of genetic ancestry plays an increasingly prominent role in clinical, population, and forensic genetics studies. Several genotyping strategies and analytical methodologies have been ...developed over the last few decades to assign individuals to specific biogeographic regions. However, despite these efforts, ancestry inference in populations with a recent history of admixture, such as those in Brazil, remains a challenge. In admixed populations, proportion and components of genetic ancestry vary on different levels: (i) between populations; (ii) between individuals of the same population, and (iii) throughout the individual's genome. The present study evaluated 1171 admixed Brazilian samples to compare the genetic ancestry inferred by tri-/tetra-hybrid admixture models and evaluated different marker sets from those with small numbers of ancestry informative markers panels (AIMs), to high-density SNPs (HDSNP) and whole-genome-sequence (WGS) data. Analyses revealed greater variation in the correlation coefficient of ancestry components within and between admixed populations, especially for minority ancestral components. We also observed positive correlation between the number of markers in the AIMs panel and HDSNP/WGS. Furthermore, the greater the number of markers, the more accurate the tri-/tetra-hybrid admixture models.
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN ...Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6–8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes.
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, ...we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
Approximately 5% of the human genome shows common structural variation, which is enriched for genes involved in the immune response and cell-cell interactions. A well-established region of extensive ...structural variation is the glycophorin gene cluster, comprising three tandemly-repeated regions about 120 kb in length and carrying the highly homologous genes GYPA, GYPB and GYPE. Glycophorin A (encoded by GYPA) and glycophorin B (encoded by GYPB) are glycoproteins present at high levels on the surface of erythrocytes, and they have been suggested to act as decoy receptors for viral pathogens. They are receptors for the invasion of the protist parasite Plasmodium falciparum, a causative agent of malaria. A particular complex structural variant, called DUP4, creates a GYPB-GYPA fusion gene known to confer resistance to malaria. Many other structural variants exist across the glycophorin gene cluster, and they remain poorly characterised.
Here, we analyse sequences from 3234 diploid genomes from across the world for structural variation at the glycophorin locus, confirming 15 variants in the 1000 Genomes project cohort, discovering 9 new variants, and characterising a selection of these variants using fibre-FISH and breakpoint mapping at the sequence level. We identify variants predicted to create novel fusion genes and a common inversion duplication variant at appreciable frequencies in West Africans. We show that almost all variants can be explained by non-allelic homologous recombination and by comparing the structural variant breakpoints with recombination hotspot maps, confirm the importance of a particular meiotic recombination hotspot on structural variant formation in this region.
We identify and validate large structural variants in the human glycophorin A-B-E gene cluster which may be associated with different clinical aspects of malaria.
Research in the field of pharmacogenomics (PGx) aims to identify genetic variants that modulate response to drugs, through alterations in their pharmacokinetics (PK) or pharmacodynamics (PD). The ...distribution of PGx variants differs considerably among populations, and whole-genome sequencing (WGS) plays a major role as a comprehensive approach to detect both common and rare variants. This study evaluated the frequency of PGx markers in the context of the Brazilian population, using data from a population-based admixed cohort from Sao Paulo, Brazil, which includes variants from WGS of 1,171 unrelated, elderly individuals.
The Stargazer tool was used to call star alleles and structural variants (SVs) from 38 pharmacogenes. Clinically relevant variants were investigated, and the predicted drug response phenotype was analyzed in combination with the medication record to assess individuals potentially at high-risk of gene-drug interaction.
In total, 352 unique star alleles or haplotypes were observed, of which 255 and 199 had a frequency < 0.05 and < 0.01, respectively. For star alleles with frequency > 5% (
= 97), decreased, loss-of-function and unknown function accounted for 13.4%, 8.2% and 27.8% of alleles or haplotypes, respectively. Structural variants (SVs) were identified in 35 genes for at least one individual, and occurred with frequencies >5% for CYP2D6, CYP2A6, GSTM1, and UGT2B17. Overall 98.0% of the individuals carried at least one high risk genotype-predicted phenotype in pharmacogenes with PharmGKB level of evidence 1A for drug interaction. The Electronic Health Record (EHR) Priority Result Notation and the cohort medication registry were combined to assess high-risk gene-drug interactions. In general, 42.0% of the cohort used at least one PharmGKB evidence level 1A drug, and 18.9% of individuals who used PharmGKB evidence level 1A drugs had a genotype-predicted phenotype of high-risk gene-drug interaction.
This study described the applicability of next-generation sequencing (NGS) techniques for translating PGx variants into clinically relevant phenotypes on a large scale in the Brazilian population and explores the feasibility of systematic adoption of PGx testing in Brazil.
Western South America was one of the worldwide cradles of civilization. The well-known Inca Empire was the tip of the iceberg of an evolutionary process that started 11,000 to 14,000 years ago. ...Genetic data from 18 Peruvian populations reveal the following: 1) The between-population homogenization of the central southern Andes and its differentiation with respect to Amazonian populations of similar latitudes do not extend northward. Instead, longitudinal gene flow between the northern coast of Peru, Andes, and Amazonia accompanied cultural and socioeconomic interactions revealed by archeology. This pattern recapitulates the environmental and cultural differentiation between the fertile north, where altitudes are lower, and the arid south, where the Andes are higher, acting as a genetic barrier between the sharply different environments of the Andes and Amazonia. 2) The genetic homogenization between the populations of the arid Andes is not only due to migrations during the Inca Empire or the subsequent colonial period. It started at least during the earlier expansion of the Wari Empire (600 to 1,000 years before present). 3) This demographic history allowed for cases of positive natural selection in the high and arid Andes vs. the low Amazon tropical forest: in the Andes, a putative enhancer in HAND2-AS1 (heart and neural crest derivatives expressed 2 antisense RNA1, a noncoding gene related to cardiovascular function) and rs269868-C/Ser1067 in DUOX2 (dual oxidase 2, related to thyroid function and innate immunity) genes and, in the Amazon, the gene encoding for the CD45 protein, essential for antigen recognition by T and B lymphocytes in viral–host interaction.
•Delayed or insufficient humoral immune response to SARS-CoV-2 in patients with Turner syndrome (TS).•Lower interferon-γ production in volunteers with TS after stimulation with toll-like receptors ...7/8 agonists.•Higher cytotoxic activity by cluster of differentiation 8+ and natural killer cells after phorbol myristate acetate (PMA)/ionomycin stimuli in TS.
The X-chromosome contains the largest number of immune-related genes, which play a major role in COVID-19 symptomatology and susceptibility. Here, we had a unique opportunity to investigate, for the first time, COVID-19 outcomes in six unvaccinated young Brazilian patients with Turner syndrome (TS; 45, X0), including one case of critical illness in a child aged 10 years, to evaluate their immune response according to their genetic profile.
A serological analysis of humoral immune response against SARS-CoV-2, phenotypic characterization of antiviral responses in peripheral blood mononuclear cells after stimuli, and the production of cytotoxic cytokines of T lymphocytes and natural killer cells were performed in blood samples collected from the patients with TS during the convalescence period. Whole exome sequencing was also performed.
Our volunteers with TS showed a delayed or insufficient humoral immune response to SARS-CoV-2 (particularly immunoglobulin G) and a decrease in interferon-γ production by cluster of differentiation (CD)4+ and CD8+ T lymphocytes after stimulation with toll-like receptors 7/8 agonists. In contrast, we observed a higher cytotoxic activity in the volunteers with TS than the volunteers without TS after phorbol myristate acetate/ionomycin stimulation, particularly granzyme B and perforin by CD8+ and natural killer cells. Interestingly, two volunteers with TS carry rare genetic variants in genes that regulate type I and III interferon immunity.
Following previous reports in the literature for other conditions, our data showed that patients with TS may have an impaired immune response against SARS-CoV-2. Furthermore, other medical conditions associated with TS could make them more vulnerable to COVID-19.
Human genomics has quickly evolved, powering genome‐wide association studies (GWASs). SNP‐based GWASs cannot capture the intense polymorphism of HLA genes, highly associated with disease ...susceptibility. There are methods to statistically impute HLA genotypes from SNP‐genotypes data, but lack of diversity in reference panels hinders their performance. We evaluated the accuracy of the 1000 Genomes data as a reference panel for imputing HLA from admixed individuals of African and European ancestries, focusing on (a) the full dataset, (b) 10 replications from 6 populations, and (c) 19 conditions for the custom reference panels. The full dataset outperformed smaller models, with a good F1‐score of 0.66 for HLA‐B. However, custom models outperformed the multiethnic or population models of similar size (F1‐scores up to 0.53, against up to 0.42). We demonstrated the importance of using genetically specific models for imputing populations, which are currently underrepresented in public datasets, opening the door to HLA imputation for every genetic population.