Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We ...sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10-261).
Meiotic recombinations contribute to genetic diversity by yielding new combinations of alleles. Recently, high-resolution recombination maps were inferred from high-density single-nucleotide ...polymorphism (SNP) data using linkage disequilibrium (LD) patterns that capture historical recombination events. The use of these maps has been demonstrated by the identification of recombination hotspots and associated motifs, and the discovery that the PRDM9 gene affects the proportion of recombinations occurring at hotspots. However, these maps provide no information about individual or sex differences. Moreover, locus-specific demographic factors like natural selection can bias LD-based estimates of recombination rate. Existing genetic maps based on family data avoid these shortcomings, but their resolution is limited by relatively few meioses and a low density of markers. Here we used genome-wide SNP data from 15,257 parent-offspring pairs to construct the first recombination maps based on directly observed recombinations with a resolution that is effective down to 10 kilobases (kb). Comparing male and female maps reveals that about 15% of hotspots in one sex are specific to that sex. Although male recombinations result in more shuffling of exons within genes, female recombinations generate more new combinations of nearby genes. We discover novel associations between recombination characteristics of individuals and variants in the PRDM9 gene and we identify new recombination hotspots. Comparisons of our maps with two LD-based maps inferred from data of HapMap populations of Utah residents with ancestry from northern and western Europe (CEU) and Yoruba in Ibadan, Nigeria (YRI) reveal population differences previously masked by noise and map differences at regions previously described as targets of natural selection.
Aortic valve stenosis (AS) is the most common valvular heart disease, and valve replacement is the only definitive treatment. Here we report a large genome-wide association (GWA) study of 2,457 ...Icelandic AS cases and 349,342 controls with a follow-up in up to 4,850 cases and 451,731 controls of European ancestry. We identify two new AS loci, on chromosome 1p21 near PALMD (rs7543130; odds ratio (OR) = 1.20, P = 1.2 × 10
) and on chromosome 2q22 in TEX41 (rs1830321; OR = 1.15, P = 1.8 × 10
). Rs7543130 also associates with bicuspid aortic valve (BAV) (OR = 1.28, P = 6.6 × 10
) and aortic root diameter (P = 1.30 × 10
), and rs1830321 associates with BAV (OR = 1.12, P = 5.3 × 10
) and coronary artery disease (OR = 1.05, P = 9.3 × 10
). The results implicate both cardiac developmental abnormalities and atherosclerosis-like processes in the pathogenesis of AS. We show that several pathways are shared by CAD and AS. Causal analysis suggests that the shared risk factors of Lp(a) and non-high-density lipoprotein cholesterol contribute substantially to the frequent co-occurence of these diseases.
Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism
, yet our knowledge of the causes and consequences of this is limited. Here, using a ...computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases.
The consensus approach to genome-wide association studies (GWAS) has been to assign equal prior probability of association to all sequence variants tested. However, some sequence variants, such as ...loss-of-function and missense variants, are more likely than others to affect protein function and are therefore more likely to be causative. Using data from whole-genome sequencing of 2,636 Icelanders and the association results for 96 quantitative and 123 binary phenotypes, we estimated the enrichment of association signals by sequence annotation. We propose a weighted Bonferroni adjustment that controls for the family-wise error rate (FWER), using as weights the enrichment of sequence annotations among association signals. We show that this weighted adjustment increases the power to detect association over the standard Bonferroni correction. We use the enrichment of associations by sequence annotation we have estimated in Iceland to derive significance thresholds for other populations with different numbers and combinations of sequence variants.
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic ...variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data
. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank
. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Objective
The level of cartilage acidic protein 1 (CRTAC1) in plasma was recently discovered to be associated with osteoarthritis (OA) risk and progression to joint replacement in Iceland. This study ...was undertaken to validate these findings in an independent population.
Methods
In this study, 1,462 plasma proteins were measured in 54,265 participants from the UK Biobank on the Olink Explore platform. We analyzed the association of plasma proteins with prevalent OA, incident OA, and progression to joint replacement. We assessed the specificity of OA association through comparison of associations with inflammatory joint diseases and with previous joint replacement.
Results
The CRTAC1 protein showed the strongest association with prevalent knee OA (odds ratio OR 1.34 95% confidence interval (95% CI) 1.27, 1.41) and was associated with hip OA (OR 1.19 95% CI 1.11, 1.28). It predicted incident diagnosis of OA in the knee (hazard ratio HR 1.40 95% CI 1.35, 1.46) and hip (HR 1.25 95% CI 1.19, 1.31), as well as progression to joint replacement (HR 1.20 95% CI 1.08, 1.33 for the knee and HR 1.22 95% CI 1.08, 1.38 for the hip), while no association was found with inflammatory joint diseases. Individuals in the highest quintile of risk based on CRTAC1 level, age, sex, and body mass index had a 10‐fold risk of knee or hip OA within 5 years compared to those in the lowest quintile. Adding aggrecan core protein (ACAN) and neurocan core protein (NCAN) to the model improved the prediction of OA but not joint replacement. Furthermore, we replicated the association of CUB domain–containing protein 1 with prior joint replacement.
Conclusion
Plasma CRTAC1 is a specific biomarker for OA and a predictor of OA risk and progression to joint replacement. Adding ACAN and NCAN protein levels to the CRTAC1 model improved the prediction of OA.
Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a ...median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.
The characterization of mutational processes that generate sequence diversity in the human genome is of paramount importance both to medical genetics and to evolutionary studies. To understand how ...the age and sex of transmitting parents affect de novo mutations, here we sequence 1,548 Icelanders, their parents, and, for a subset of 225, at least one child, to 35× genome-wide coverage. We find 108,778 de novo mutations, both single nucleotide polymorphisms and indels, and determine the parent of origin of 42,961. The number of de novo mutations from mothers increases by 0.37 per year of age (95% CI 0.32-0.43), a quarter of the 1.51 per year from fathers (95% CI 1.45-1.57). The number of clustered mutations increases faster with the mother's age than with the father's, and the genomic span of maternal de novo mutation clusters is greater than that of paternal ones. The types of de novo mutation from mothers change substantially with age, with a 0.26% (95% CI 0.19-0.33%) decrease in cytosine-phosphate-guanine to thymine-phosphate-guanine (CpG>TpG) de novo mutations and a 0.33% (95% CI 0.28-0.38%) increase in C>G de novo mutations per year, respectively. Remarkably, these age-related changes are not distributed uniformly across the genome. A striking example is a 20 megabase region on chromosome 8p, with a maternal C>G mutation rate that is up to 50-fold greater than the rest of the genome. The age-related accumulation of maternal non-crossover gene conversions also mostly occurs within these regions. Increased sequence diversity and linkage disequilibrium of C>G variants within regions affected by excess maternal mutations indicate that the underlying mutational process has persisted in humans for thousands of years. Moreover, the regional excess of C>G variation in humans is largely shared by chimpanzees, less by gorillas, and is almost absent from orangutans. This demonstrates that sequence diversity in humans results from evolving interactions between age, sex, mutation type, and genomic location.
Meiotic recombination involves a combination of gene conversion and crossover events that, along with mutations, produce germline genetic diversity. Here we report the discovery of 3,176 SNP and 61 ...indel gene conversions. Our estimate of the non-crossover (NCO) gene conversion rate (G) is 7.0 for SNPs and 5.8 for indels per megabase per generation, and the GC bias is 67.6%. For indels, we demonstrate a 65.6% preference for the shorter allele. NCO gene conversions from mothers are longer than those from fathers, and G is 2.17 times greater in mothers. Notably, G increases with the age of mothers, but not the age of fathers. A disproportionate number of NCO gene conversions in older mothers occur outside double-strand break (DSB) regions and in regions with relatively low GC content. This points to age-related changes in the mechanisms of meiotic gene conversion in oocytes.