Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro ...nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population ...history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (N
), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.
The ability to taste phenylthiocarbamide (PTC) and 6-n-propylthiouracil (PROP) is a polymorphic trait mediated by the TAS2R38 bitter taste receptor gene. It has long been hypothesized that global ...genetic diversity at this locus evolved under pervasive pressures from balancing natural selection. However, recent high-resolution population genetic studies of TAS2Rs suggest that demographic events have played a critical role in the evolution of these genes. We here utilized the largest TAS2R38 database yet analyzed, consisting of 5,589 individuals from 105 populations, to examine natural selection, haplotype frequencies and linkage disequilibrium to estimate the effects of both selection and demography on contemporary patterns of variation at this locus. We found signs of an ancient balancing selection acting on this gene but no post Out-Of-Africa departures from neutrality, implying that the current observed patterns of variation can be predominantly explained by demographic, rather than selective events. In addition, we found signatures of ancient selective forces acting on different African TAS2R38 haplotypes. Collectively our results provide evidence for a relaxation of recent selective forces acting on this gene and a revised hypothesis for the origins of the present-day worldwide distribution of TAS2R38 haplotypes.
A combination of evidence, based on genetic, fossil and archaeological findings, indicates that Homo sapiens spread out of Africa between ~70-60 thousand years ago (kya). However, it appears that ...once outside of Africa, human populations did not expand across all of Eurasia until ~45 kya. The geographic whereabouts of these early settlers in the timeframe between ~70-60 to 45 kya has been difficult to reconcile. Here we combine genetic evidence and palaeoecological models to infer the geographic location that acted as the Hub for our species during the early phases of colonisation of Eurasia. Leveraging on available genomic evidence we show that populations from the Persian Plateau carry an ancestry component that closely matches the population that settled the Hub outside Africa. With the paleoclimatic data available to date, we built ecological models showing that the Persian Plateau was suitable for human occupation and that it could sustain a larger population compared to other West Asian regions, strengthening this claim.
The genomic variation of the Italian peninsula populations is currently under characterised: the only Italian whole-genome reference is represented by the Tuscans from the 1000 Genome Project. To ...address this issue, we sequenced a total of 947 Italian samples from three different geographical areas. First, we defined a new Italian Genome Reference Panel (IGRP1.0) for imputation, which improved imputation accuracy, especially for rare variants, and we tested it by GWAS analysis on red blood traits. Furthermore, we extended the catalogue of genetic variation investigating the level of population structure, the pattern of natural selection, the distribution of deleterious variants and occurrence of human knockouts (HKOs). Overall the results demonstrate a high level of genomic differentiation between cohorts, different signatures of natural selection and a distinctive distribution of deleterious variants and HKOs, confirming the necessity of distinct genome references for the Italian population.
Whole genome sequencing (WGS) allows the identification of human knockouts (HKOs), individuals in whom loss of function (LoF) variants disrupt both alleles of a given gene. HKOs are a valuable model ...for understanding the consequences of genes function loss. Naturally occurring biallelic LoF variants tend to be significantly enriched in "genetic isolates," making these populations specifically suited for HKO studies. In this work, a meticulous WGS data analysis combined with an in-depth phenotypic assessment of 947 individuals from three Italian genetic isolates led to the identification of ten biallelic LoF variants in ten OMIM genes associated with known autosomal recessive diseases. Notably, only a minority of the identified HKOs (C7, F12, and GPR68 genes) displayed the expected phenotype. For most of the genes, instead, (ACADSB, FANCL, GRK1, LGI4, MPO, PGAM2, and RP1L1), the carriers showed none or few of the signs and symptoms typically associated with the related diseases. Of particular interest is a case presenting with a FANCL biallelic LoF variant and a positive diepoxybutane test but lacking a full Fanconi anemia phenotypic spectrum. Identifying KO subjects displaying expected phenotypes suggests that the lack of correct genetic diagnoses may lead to inappropriate and delayed treatment. In contrast, the presence of HKOs with phenotypes deviating from the expected patterns underlines how LoF variants may be responsible for broader phenotypic spectra. Overall, these results highlight the importance of in-depth phenotypical characterization to understand the role of LoF variants and the advantage of studying these variants in genetic isolates.
The Kalash represent an enigmatic isolated population of Indo-European speakers who have been living for centuries in the Hindu Kush mountain ranges of present-day Pakistan. Previous Y-chromosomal ...and mitochondrial DNA markers provided no support for their claimed Greek descent following invasion of this region by Alexander III of Macedon and analysis of autosomal loci provided evidence of a strong genetic bottleneck. To understand their origins and demography further, we genotyped 23 unrelated Kalash samples on the Illumina HumanOmni 2.5M-8 BeadChip and sequenced one male individual at high coverage on an Illumina Hi-Seq 2000. Comparison with published data from ancient hunter-gatherers and European farmers show that the Kalash share drift with the Paleolithic Siberian hunter-gatherer and may represent an extremely drifted ancient North Eurasian population which also contributed to European and Near Eastern ancestry. Since the split from other South Asian populations the Kalash have maintained a low long-term effective population size (2,319-2,603) and experienced no detectable gene flow from their geographic neighbors in Pakistan or from other extant Eurasian populations. The mean time of divergence between Kalash and other populations currently residing in this region, was estimated to be 11.8 (10.6 -12.6) thousand years ago, and thus they represent present day descendants of some of the earliest migrants into the Indian sub-continent from West Asia.
We genotyped 738 individuals belonging to 49 populations from Nepal, Bhutan, North India, or Tibet at over 500,000 SNPs, and analyzed the genotypes in the context of available worldwide population ...data in order to investigate the demographic history of the region and the genetic adaptations to the harsh environment. The Himalayan populations resembled other South and East Asians, but in addition displayed their own specific ancestral component and showed strong population structure and genetic drift. We also found evidence for multiple admixture events involving Himalayan populations and South/East Asians between 200 and 2,000 years ago. In comparisons with available ancient genomes, the Himalayans, like other East and South Asian populations, showed similar genetic affinity to Eurasian hunter-gatherers (a 24,000-year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya. The high-altitude Himalayan populations all shared a specific ancestral component, suggesting that genetic adaptation to life at high altitude originated only once in this region and subsequently spread. Combining four approaches to identifying specific positively selected loci, we confirmed that the strongest signals of high-altitude adaptation were located near the Endothelial PAS domain-containing protein 1 and Egl-9 Family Hypoxia Inducible Factor 1 loci, and discovered eight additional robust signals of high-altitude adaptation, five of which have strong biological functional links to such adaptation. In conclusion, the demographic history of Himalayan populations is complex, with strong local differentiation, reflecting both genetic and cultural factors; these populations also display evidence of multiple genetic adaptations to high-altitude environments.
Age-related hearing loss (ARHL) is the most common sensory deficit in the elderly. The disease has a multifactorial etiology with both environmental and genetic factors involved being largely ...unknown. SLC7A8/SLC3A2 heterodimer is a neutral amino acid exchanger. Here, we demonstrated that SLC7A8 is expressed in the mouse inner ear and that its ablation resulted in ARHL, due to the damage of different cochlear structures. These findings make SLC7A8 transporter a strong candidate for ARHL in humans. Thus, a screening of a cohort of ARHL patients and controls was carried out revealing several variants in
, whose role was further investigated by in vitro functional studies. Significant decreases in SLC7A8 transport activity was detected for patient's variants (p.Val302Ile, p.Arg418His, p.Thr402Met and p.Val460Glu) further supporting a causative role for SLC7A8 in ARHL. Moreover, our preliminary data suggest that a relevant proportion of ARHL cases could be explained by SLC7A8 mutations.
The genetic features of isolated populations can boost power in complex-trait association studies, and an in-depth understanding of how their genetic variation has been shaped by their demographic ...history can help leverage these advantageous characteristics. Here, we perform a comprehensive investigation using 3,059 newly generated low-depth whole-genome sequences from eight European isolates and two matched general populations, together with published data from the 1000 Genomes Project and UK10K. Sequencing data give deeper and richer insights into population demography and genetic characteristics than genotype-chip data, distinguishing related populations more effectively and allowing their functional variants to be studied more fully. We demonstrate relaxation of purifying selection in the isolates, leading to enrichment of rare and low-frequency functional variants, using novel statistics, DVxy and SVxy. We also develop an isolation-index (Isx) that predicts the overall level of such key genetic characteristics and can thus help guide population choice in future complex-trait association studies.