Sequence variants in the parental genomes that are not transmitted to a child (the proband) are often ignored in genetic studies. Here we show that nontransmitted alleles can affect a child through ...their impacts on the parents and other relatives, a phenomenon we call "genetic nurture." Using results from a meta-analysis of educational attainment, we find that the polygenic score computed for the nontransmitted alleles of 21,637 probands with at least one parent genotyped has an estimated effect on the educational attainment of the proband that is 29.9% (
= 1.6 × 10
) of that of the transmitted polygenic score. Genetic nurturing effects of this polygenic score extend to other traits. Paternal and maternal polygenic scores have similar effects on educational attainment, but mothers contribute more than fathers to nutrition- and heath-related traits.
Genetic diversity arises from recombination and de novo mutation (DNM). Using a combination of microarray genotype and whole-genome sequence data on parent-child pairs, we identified 4,531,535 ...crossover recombinations and 200,435 DNMs. The resulting genetic map has a resolution of 682 base pairs. Crossovers exhibit a mutagenic effect, with overrepresentation of DNMs within 1 kilobase of crossovers in males and females. In females, a higher mutation rate is observed up to 40 kilobases from crossovers, particularly for complex crossovers, which increase with maternal age. We identified 35 loci associated with the recombination rate or the location of crossovers, demonstrating extensive genetic control of meiotic recombination, and our results highlight genes linked to the formation of the synaptonemal complex as determinants of crossovers.
Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a ...median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.
The plasma proteome can help bridge the gap between the genome and diseases. Here we describe genome-wide association studies (GWASs) of plasma protein levels measured with 4,907 aptamers in 35,559 ...Icelanders. We found 18,084 associations between sequence variants and levels of proteins in plasma (protein quantitative trait loci; pQTL), of which 19% were with rare variants (minor allele frequency (MAF) < 1%). We tested plasma protein levels for association with 373 diseases and other traits and identified 257,490 associations. We integrated pQTL and genetic associations with diseases and other traits and found that 12% of 45,334 lead associations in the GWAS Catalog are with variants in high linkage disequilibrium with pQTL. We identified 938 genes encoding potential drug targets with variants that influence levels of possible biomarkers. Combining proteomics, genomics and transcriptomics, we provide a valuable resource that can be used to improve understanding of disease pathogenesis and to assist with drug discovery and development.
Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively ...large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.
Abstract
Motivation
Meiotic recombination is the main driving force of human genetic diversity, along with mutations. Recombinations split into crossovers, separating large chromosomal regions ...originating from different homologous chromosomes, and non-crossovers (NCOs), where a small segment from one chromosome is embedded in a region originating from the homologous chromosome. NCOs are much less studied than mutations and crossovers as NCOs are short and can only be detected at markers heterozygous in the transmitting parent, leaving most of them undetectable.
Results
The detectable NCOs, known as gene conversions, hide information about NCOs, including their number and length, waiting to be unveiled. We introduce NCOurd, software, and algorithm, based on an expectation–maximization algorithm, to estimate the number of NCOs and their length distribution from gene conversion data.
Availability and implementation
https://github.com/DecodeGenetics/NCOurd
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios ...that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Abstract
Motivation
Data analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology.
Results
In ...human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data.
Availabilityand implementation
github.com/DecodeGenetics/read_haps.
The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without ...similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions.
We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach.
The source code of PopIns is available from http://github.com/bkehr/popins
birte.kehr@decode.is
Supplementary data are available at Bioinformatics online.
Imprinting is the preferential expression of one parental allele over the other. It is controlled primarily through differential methylation of cytosine at CpG dinucleotides. Here we combine 285 ...methylomes and 11,617 transcriptomes from peripheral blood samples with parent-of-origin phased haplotypes, to produce a new map of imprinted methylation and gene expression patterns across the human genome. We demonstrate how imprinted methylation is a continuous rather than a binary characteristic. We describe at high resolution the parent-of-origin methylation pattern at the 15q11.2 Prader-Willi/Angelman syndrome locus, with nearly confluent stochastic paternal methylation punctuated by 'spikes' of maternal methylation. We find examples of polymorphic imprinted methylation unrelated (at VTRNA2-1 and PARD6G) or related (at CHRNE) to nearby SNP genotypes. We observe RNA isoform-specific imprinted expression patterns suggestive of a methylation-sensitive transcriptional elongation block. Finally, we gain new insights into parent-of-origin-specific effects on phenotypes at the DLK1/MEG3 and GNAS loci.