The nature of nurture: Effects of parental genotypes Kong, Augustine; Thorleifsson, Gudmar; Frigge, Michael L ...
Science (American Association for the Advancement of Science),
2018-Jan-26, 2018-01-26, 20180126, Letnik:
359, Številka:
6374
Journal Article
Recenzirano
Odprti dostop
Sequence variants in the parental genomes that are not transmitted to a child (the proband) are often ignored in genetic studies. Here we show that nontransmitted alleles can affect a child through ...their impacts on the parents and other relatives, a phenomenon we call "genetic nurture." Using results from a meta-analysis of educational attainment, we find that the polygenic score computed for the nontransmitted alleles of 21,637 probands with at least one parent genotyped has an estimated effect on the educational attainment of the proband that is 29.9% (
= 1.6 × 10
) of that of the transmitted polygenic score. Genetic nurturing effects of this polygenic score extend to other traits. Paternal and maternal polygenic scores have similar effects on educational attainment, but mothers contribute more than fathers to nutrition- and heath-related traits.
Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ...ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.
Genetic diversity arises from recombination and de novo mutation (DNM). Using a combination of microarray genotype and whole-genome sequence data on parent-child pairs, we identified 4,531,535 ...crossover recombinations and 200,435 DNMs. The resulting genetic map has a resolution of 682 base pairs. Crossovers exhibit a mutagenic effect, with overrepresentation of DNMs within 1 kilobase of crossovers in males and females. In females, a higher mutation rate is observed up to 40 kilobases from crossovers, particularly for complex crossovers, which increase with maternal age. We identified 35 loci associated with the recombination rate or the location of crossovers, demonstrating extensive genetic control of meiotic recombination, and our results highlight genes linked to the formation of the synaptonemal complex as determinants of crossovers.
Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies ...can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment.
We present KmerStream, a streaming algorithm for estimating the number of distinct k-mers present in high-throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are logarithmic in the size of the input. We derive a simple model that allows us to estimate the error rate of the sequencing experiment, as well as the genome size, using only the aggregate statistics reported by KmerStream. As an application we show how KmerStream can be used to compute the error rate of a DNA sequencing experiment. We run KmerStream on a set of 2656 whole genome sequenced individuals and compare the error rate to quality values reported by the sequencing equipment. We discover that while the quality values alone are largely reliable as a predictor of error rate, there is considerable variability in the error rates between sequencing runs, even when accounting for reported quality values.
The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without ...similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions.
We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach.
The source code of PopIns is available from http://github.com/bkehr/popins
birte.kehr@decode.is
Supplementary data are available at Bioinformatics online.
Max point-tolerance graphs Catanzaro, Daniele; Chaplick, Steven; Felsner, Stefan ...
Discrete Applied Mathematics,
01/2017, Letnik:
216
Journal Article
Recenzirano
Odprti dostop
A graph G is a max point-tolerance (MPT) graph if each vertex v of G can be mapped to a pointed-interval(Iv,pv) where Iv is an interval of R and pv∈Iv such that uv is an edge of G iff Iu∩Iv⊇{pu,pv}. ...MPT graphs model relationships among DNA fragments in genome-wide association studies as well as basic transmission problems in telecommunications. We formally introduce this graph class, characterize it, study combinatorial optimization problems on it, and relate it to several well known graph classes. We characterize MPT graphs as a special case of several 2D geometric intersection graphs; namely, triangle, rectangle, L-shape, and line segment intersection graphs. We further characterize MPT as having certain linear orders on their vertex set. Our last characterization is that MPT graphs are precisely obtained by intersecting special pairs of interval graphs. We also show that, on MPT graphs, the maximum weight independent set problem can be solved in polynomial time, the coloring problem is NP-complete, and the clique cover problem has a 2-approximation. Finally, we demonstrate several connections to known graph classes; e.g., MPT graphs strictly contain interval graphs and outerplanar graphs, but are incomparable to permutation, chordal, and planar graphs.
Bone mineral density (BMD) is a heritable complex trait used in the clinical diagnosis of osteoporosis and the assessment of fracture risk. We performed meta-analysis of five genome-wide association ...studies of femoral neck and lumbar spine BMD in 19,195 subjects of Northern European descent. We identified 20 BMD loci that reached genome-wide significance (GWS; P < 5 x 10(-8)), of which 13 map to regions not previously associated with this trait: 1p31.3 (GPR177), 2p21 (SPTBN1), 3p22 (CTNNB1), 4q21.1 (MEPE), 5q14 (MEF2C), 7p14 (STARD3NL), 7q21.3 (FLJ42280), 11p11.2 (LRP4, ARHGAP1, F2), 11p14.1 (DCDC5), 11p15 (SOX6), 16q24 (FOXL1), 17q21 (HDAC5) and 17q12 (CRHR1). The meta-analysis also confirmed at GWS level seven known BMD loci on 1p36 (ZBTB40), 6q25 (ESR1), 8q24 (TNFRSF11B), 11q13.4 (LRP5), 12q13 (SP7), 13q14 (TNFSF11) and 18q21 (TNFRSF11A). The many SNPs associated with BMD map to genes in signaling pathways with relevance to bone metabolism and highlight the complex genetic architecture that underlies osteoporosis and variation in BMD.
Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type ...of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them.
Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes.
Source code is available on Github: https://github.com/DecodeGenetics/popSTR.
snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.
Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a ...median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.