Objective: Familial hypercholesterolemia (FH) is traditionally defined as a monogenic disease characterized by severely elevated LDL-C (low-density lipoprotein cholesterol) levels. In practice, FH is ...commonly a clinical diagnosis without confirmation of a causative mutation. In this study, we sought to characterize and compare monogenic and clinically defined FH in a large sample of Icelanders.
Approach and Results: We whole-genome sequenced 49 962 Icelanders and imputed the identified variants into an overall sample of 166 281 chip-genotyped Icelanders. We identified 20 FH mutations in LDLR, APOB, and PCSK9 with combined prevalence of 1 in 836. Monogenic FH was associated with severely elevated LDL-C levels and increased risk of premature coronary disease, aortic valve stenosis, and high burden of coronary atherosclerosis. We used a modified version of the Dutch Lipid Clinic Network criteria to screen for the clinical FH phenotype among living adult participants (N=79 058). Clinical FH was found in 2.2% of participants, of whom only 5.2% had monogenic FH. Mutation-negative clinical FH has a strong polygenic basis. Both individuals with monogenic FH and individuals with mutation-negative clinical FH were markedly undertreated with cholesterol-lowering medications and only a minority attained an LDL-C target of <2.6 mmol/L (<100 mg/dL; 11.0% and 24.9%, respectively) or <1.8 mmol/L (<70 mg/dL; 0.0% and 5.2%, respectively), as recommended for primary prevention by European Society of Cardiology/European Atherosclerosis Society cholesterol guidelines.
Conclusions: Clinically defined FH is a relatively common phenotype that is explained by monogenic FH in only a minority of cases. Both monogenic and clinical FH confer high cardiovascular risk but are markedly undertreated.
Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type ...of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them.
Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes.
Source code is available on Github: https://github.com/DecodeGenetics/popSTR.
snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is.
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios ...that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively ...large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.
Lipoprotein(a) Lp(a) is a causal risk factor for cardiovascular diseases that has no established therapy. The attribute of Lp(a) that affects cardiovascular risk is not established. Low levels of ...Lp(a) have been associated with type 2 diabetes (T2D).
This study investigated whether cardiovascular risk is conferred by Lp(a) molar concentration or apolipoprotein(a) apo(a) size, and whether the relationship between Lp(a) and T2D risk is causal.
This was a case-control study of 143,087 Icelanders with genetic information, including 17,715 with coronary artery disease (CAD) and 8,734 with T2D. This study used measured and genetically imputed Lp(a) molar concentration, kringle IV type 2 (KIV-2) repeats (which determine apo(a) size), and a splice variant in LPA associated with small apo(a) but low Lp(a) molar concentration to disentangle the relationship between Lp(a) and cardiovascular risk. Loss-of-function homozygotes and other subjects genetically predicted to have low Lp(a) levels were evaluated to assess the relationship between Lp(a) and T2D.
Lp(a) molar concentration was associated dose-dependently with CAD risk, peripheral artery disease, aortic valve stenosis, heart failure, and lifespan. Lp(a) molar concentration fully explained the Lp(a) association with CAD, and there was no residual association with apo(a) size. Homozygous carriers of loss-of-function mutations had little or no Lp(a) and increased the risk of T2D.
Molar concentration is the attribute of Lp(a) that affects risk of cardiovascular diseases. Low Lp(a) concentration (bottom 10%) increases T2D risk. Pharmacologic reduction of Lp(a) concentration in the 20% of individuals with the greatest concentration down to the population median is predicted to decrease CAD risk without increasing T2D risk.
Display omitted
Imprinting is the preferential expression of one parental allele over the other. It is controlled primarily through differential methylation of cytosine at CpG dinucleotides. Here we combine 285 ...methylomes and 11,617 transcriptomes from peripheral blood samples with parent-of-origin phased haplotypes, to produce a new map of imprinted methylation and gene expression patterns across the human genome. We demonstrate how imprinted methylation is a continuous rather than a binary characteristic. We describe at high resolution the parent-of-origin methylation pattern at the 15q11.2 Prader-Willi/Angelman syndrome locus, with nearly confluent stochastic paternal methylation punctuated by 'spikes' of maternal methylation. We find examples of polymorphic imprinted methylation unrelated (at VTRNA2-1 and PARD6G) or related (at CHRNE) to nearby SNP genotypes. We observe RNA isoform-specific imprinted expression patterns suggestive of a methylation-sensitive transcriptional elongation block. Finally, we gain new insights into parent-of-origin-specific effects on phenotypes at the DLK1/MEG3 and GNAS loci.
The corneal endothelium is vital for transparency and proper hydration of the cornea. Here, we conduct a genome-wide association study of corneal endothelial cell density (cells/mm
), coefficient of ...cell size variation (CV), percentage of hexagonal cells (HEX) and central corneal thickness (CCT) in 6,125 Icelanders and find associations at 10 loci, including 7 novel. We assess the effects of these variants on various ocular biomechanics such as corneal hysteresis (CH), as well as eye diseases such as glaucoma and corneal dystrophies. Most notably, an intergenic variant close to ANAPC1 (rs78658973A, frequency = 28.3%) strongly associates with decreased cell density and accounts for 24% of the population variance in cell density (β = -0.77 SD, P = 1.8 × 10
) and associates with increased CH (β = 0.19 SD, P = 2.6 × 10
) without affecting risk of corneal diseases and glaucoma. Our findings indicate that despite correlations between cell density and eye diseases, low cell density does not increase the risk of disease.
Microsatellites are polymorphic tracts of short tandem repeats with one to six base-pair (bp) motifs and are some of the most polymorphic variants in the genome. Using 6084 Icelandic parent-offspring ...trios we estimate 63.7 (95% CI: 61.9-65.4) microsatellite de novo mutations (mDNMs) per offspring per generation, excluding one bp repeats motifs (homopolymers) the estimate is 48.2 mDNMs (95% CI: 46.7-49.6). Paternal mDNMs occur at longer repeats than maternal ones, which are in turn larger with a mean size of 3.4 bp vs 3.1 bp for paternal ones. mDNMs increase by 0.97 (95% CI: 0.90-1.04) and 0.31 (95% CI: 0.25-0.37) per year of father's and mother's age at conception, respectively. Here, we find two independent coding variants that associate with the number of mDNMs transmitted to offspring; The minor allele of a missense variant (allele frequency (AF) = 1.9%) in MSH2, a mismatch repair gene, increases transmitted mDNMs from both parents (effect: 13.1 paternal and 7.8 maternal mDNMs). A synonymous variant (AF = 20.3%) in NEIL2, a DNA damage repair gene, increases paternally transmitted mDNMs (effect: 4.4 mDNMs). Thus, the microsatellite mutation rate in humans is in part under genetic control.
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic ...variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data
. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank
. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a ...median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.