The HLA (Human Leukocyte Antigens) genes are well-documented targets of balancing selection, and variation at these loci is associated with many disease phenotypes. Variation in expression levels ...also influences disease susceptibility and resistance, but little information exists about the regulation and population-level patterns of expression. This results from the difficulty in mapping short reads originated from these highly polymorphic loci, and in accounting for the existence of several paralogues. We developed a computational pipeline to accurately estimate expression for HLA genes based on RNA-seq, improving both locus-level and allele-level estimates. First, reads are aligned to all known HLA sequences in order to infer HLA genotypes, then quantification of expression is carried out using a personalized index. We use simulations to show that expression estimates obtained in this way are not biased due to divergence from the reference genome. We applied our pipeline to the GEUVADIS dataset, and compared the quantifications to those obtained with reference transcriptome. Although the personalized pipeline recovers more reads, we found that using the reference transcriptome produces estimates similar to the personalized pipeline (r ≥ 0.87) with the exception of HLA-DQA1. We describe the impact of the HLA-personalized approach on downstream analyses for nine classical HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). Although the influence of the HLA-personalized approach is modest for eQTL mapping, the p-values and the causality of the eQTLs obtained are better than when the reference transcriptome is used. We investigate how the eQTLs we identified explain variation in expression among lineages of HLA alleles. Finally, we discuss possible causes underlying differences between expression estimates obtained using RNA-seq, antibody-based approaches and qPCR.
Next generation sequencing (NGS) is currently being adapted by different biotechnological platforms to the standard typing method for HLA polymorphism, the huge diversity of which makes this ...initiative particularly challenging. Boosting the molecular characterization of the HLA genes through efficient, rapid, and low-cost technologies is expected to amplify the success of tissue transplantation by enabling us to find donor-recipient matching for rare phenotypes. But the application of NGS technologies to the molecular mapping of the MHC region also anticipates essential changes in population genetic studies. Huge amounts of HLA sequence data will be available in the next years for different populations, with the potential to change our understanding of HLA variation in humans. In this review, we first explain how HLA sequencing allows a better assessment of the HLA diversity in human populations, taking also into account the methodological difficulties it introduces at the statistical level; secondly, we show how analyzing HLA sequence variation may improve our comprehension of population genetic relationships by facilitating the identification of demographic events that marked human evolution; finally, we discuss the interest of both HLA and genome-wide sequencing and genotyping in detecting functionally significant SNPs in the MHC region, the latter having also contributed to the makeup of the HLA molecular diversity observed today.
When humans moved from Asia toward the Americas over 18,000 y ago and eventually peopled the New World they encountered a new environment with extreme climate conditions and distinct dietary ...resources. These environmental and dietary pressures may have led to instances of genetic adaptation with the potential to influence the phenotypic variation in extant Native American populations. An example of such an event is the evolution of the fatty acid desaturases (FADS) genes, which have been claimed to harbor signals of positive selection in Inuit populations due to adaptation to the cold Greenland Arctic climate and to a protein-rich diet. Because there was evidence of intercontinental variation in this genetic region, with indications of positive selection for its variants, we decided to compare the Inuit findings with other Native American data. Here, we use several lines of evidence to show that the signal of FADS-positive selection is not restricted to the Arctic but instead is broadly observed throughout the Americas. The shared signature of selection among populations living in such a diverse range of environments is likely due to a single and strong instance of local adaptation that took place in the common ancestral population before their entrance into the New World. These first Americans peopled the whole continent and spread this adaptive variant across a diverse set of environments.
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known ...to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.
The American continent was the last to be occupied by modern humans, and native populations bear the marks of recent expansions, bottlenecks, natural selection, and population substructure. Here we ...investigate how this demographic history has shaped genetic variation at the strongly selected HLA loci. In order to disentangle the relative contributions of selection and demography process, we assembled a dataset with genome-wide microsatellites and HLA-A, -B, -C, and -DRB1 typing data for a set of 424 Native American individuals. We find that demographic history explains a sizeable fraction of HLA variation, both within and among populations. A striking feature of HLA variation in the Americas is the existence of alleles which are present in the continent but either absent or very rare elsewhere in the world. We show that this feature is consistent with demographic history (i.e., the combination of changes in population size associated with bottlenecks and subsequent population expansions). However, signatures of selection at HLA loci are still visible, with significant evidence selection at deeper timescales for most loci and populations, as well as population differentiation at HLA loci exceeding that seen at neutral markers.
Despite the high number of individuals infected by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) who develop coronavirus disease 2019 (COVID-19) symptoms worldwide, many exposed ...individuals remain asymptomatic and/or uninfected and seronegative. This could be explained by a combination of environmental (exposure), immunological (previous infection), epigenetic, and genetic factors. Aiming to identify genetic factors involved in immune response in symptomatic COVID-19 as compared to asymptomatic exposed individuals, we analyzed 83 Brazilian couples where one individual was infected and symptomatic while the partner remained asymptomatic and serum-negative for at least 6 months despite sharing the same bedroom during the infection. We refer to these as "discordant couples". We performed whole-exome sequencing followed by a state-of-the-art method to call genotypes and haplotypes across the highly polymorphic major histocompatibility complex (MHC) region. The discordant partners had comparable ages and genetic ancestry, but women were overrepresented (65%) in the asymptomatic group. In the antigen-presentation pathway, we observed an association between
alleles encoding Lys at residue 71 (mostly DRB1*03:01 and DRB1*04:01) and DOB*01:02 with symptomatic infections and
alleles encoding 144Q/151R with asymptomatic seronegative women. Among the genes related to immune modulation, we detected variants in
and
associated with symptomatic infections. These variants are related to higher expression of soluble MICA and low expression of MICB. Thus, quantitative differences in these molecules that modulate natural killer (NK) activity could contribute to susceptibility to COVID-19 by downregulating NK cell cytotoxic activity in infected individuals but not in the asymptomatic partners.
The analysis of genomic data (~400,000 autosomal SNPs) enabled the reliable estimation of inbreeding levels in a sample of 541 individuals sampled from a highly admixed Brazilian population isolate ...(an African-derived quilombo in the State of São Paulo). To achieve this, different methods were applied to the joint information of two sets of markers (one complete and another excluding loci in patent linkage disequilibrium). This strategy allowed the detection and exclusion of markers that biased the estimation of the average population inbreeding coefficient (Wright's fixation index FIS), which value was eventually estimated as around 1% using any of the methods we applied. Quilombo demographic inferences were made by analyzing the structure of runs of homozygosity (ROH), which were adapted to cope with a highly admixed population with a complex foundation history. Our results suggest that the amount of ROH <2Mb of admixed populations should be somehow proportional to the genetic contribution from each parental population.
An immunogenetic view of COVID-19 Aguiar, Vitor R C; Augusto, Danillo G; Castelli, Erick C ...
Genetics and Molecular Biology,
01/2021, Volume:
44, Issue:
1 Suppl 1
Journal Article
Peer reviewed
Open access
Meeting the challenges brought by the COVID-19 pandemic requires an interdisciplinary approach. In this context, integrating knowledge of immune function with an understanding of how genetic ...variation influences the nature of immunity is a key challenge. Immunogenetics can help explain the heterogeneity of susceptibility and protection to the viral infection and disease progression. Here, we review the knowledge developed so far, discussing fundamental genes for triggering the innate and adaptive immune responses associated with a viral infection, especially with the SARS-CoV-2 mechanisms. We emphasize the role of the HLA and KIR genes, discussing what has been uncovered about their role in COVID-19 and addressing methodological challenges of studying these genes. Finally, we comment on questions that arise when studying admixed populations, highlighting the case of Brazil. We argue that the interplay between immunology and an understanding of genetic associations can provide an important contribution to our knowledge of COVID-19.
The majority of aneuploid fetuses are spontaneously miscarried. Nevertheless, some aneuploid individuals survive despite the strong genetic insult. Here, we investigate if the survival probability of ...aneuploid fetuses is affected by the genome-wide burden of slightly deleterious variants. We analyzed two cohorts of live-born Down syndrome individuals (388 genotyped samples and 16 fibroblast transcriptomes) and observed a deficit of slightly deleterious variants on Chromosome 21 and decreased transcriptome-wide variation in the expression level of highly constrained genes. We interpret these results as signatures of embryonic selection, and propose a genetic handicap model whereby an individual bearing an extremely severe deleterious variant (such as aneuploidy) could escape embryonic lethality if the genome-wide burden of slightly deleterious variants is sufficiently low. This approach can be used to study the composition and effect of the numerous slightly deleterious variants in humans and model organisms.
Pyruvate kinase (PK), encoded by the PKLR gene, is a key player in glycolysis controlling the integrity of erythrocytes. Due to Plasmodium selection, mutations for PK deficiency, which leads to ...hemolytic anemia, are associated with resistance to malaria in sub-Saharan Africa and with susceptibility to intracellular pathogens in experimental models. In this case-control study, we enrolled 4,555 individuals and investigated whether PKLR single nucleotide polymorphisms (SNPs) putatively selected for malaria resistance are associated with susceptibility to leprosy across Brazil (Manaus-North; Salvador-Northeast; Rondonópolis-Midwest and Rio de Janeiro-Southeast) and with tuberculosis in Mozambique. Haplotype T/G/G (rs1052176/rs4971072/rs11264359) was associated with leprosy susceptibility in Rio de Janeiro (OR = 2.46, p = 0.00001) and Salvador (OR = 1.57, p = 0.04), and with tuberculosis in Mozambique (OR = 1.52, p = 0.07). This haplotype downregulates PKLR expression in nerve and skin, accordingly to GTEx, and might subtly modulate ferritin and haptoglobin levels in serum. Furthermore, we observed genetic signatures of positive selection in the HCN3 gene (xpEHH>2 -recent selection) in Europe but not in Africa, involving 6 SNPs which are PKLR/HCN3 eQTLs. However, this evidence was not corroborated by the other tests (F.sub.ST, Tajima's D and iHS). Altogether, we provide evidence that a common PKLR locus in Africans contribute to mycobacterial susceptibility in African descent populations and also highlight, for first, PKLR as a susceptibility gene for leprosy and TB.