Long-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a ...median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits. We discovered an association of a rare deletion in PCSK9 with lower low-density lipoprotein (LDL) cholesterol levels, compared to the population average. We also discovered an association of a multiallelic SV in ACAN with height; we found 11 alleles that differed in the number of a 57-bp-motif repeat and observed a linear relationship between the number of repeats carried and height. These results show that SVs can be accurately characterized at the population scale using LRS data in a genome-wide non-targeted approach and demonstrate how SVs impact phenotypes.
A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for ...efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.
Despite the important role that monozygotic twins have played in genetics research, little is known about their genomic differences. Here we show that monozygotic twins differ on average by 5.2 early ...developmental mutations and that approximately 15% of monozygotic twins have a substantial number of these early developmental mutations specific to one of them. Using the parents and offspring of twins, we identified pre-twinning mutations. We observed instances where a twin was formed from a single cell lineage in the pre-twinning cell mass and instances where a twin was formed from several cell lineages. CpG>TpG mutations increased in frequency with embryonic development, coinciding with an increase in DNA methylation. Our results indicate that allocations of cells during development shapes genomic differences between monozygotic twins.
Sequence variants affecting blood lipids and coronary artery disease (CAD) may enhance understanding of the atherogenicity of lipid fractions. Using a large resource of whole-genome sequence data, we ...examined rare and low-frequency variants for association with non-HDL cholesterol, HDL cholesterol, LDL cholesterol, and triglycerides in up to 119,146 Icelanders. We discovered 13 variants with large effects (within ANGPTL3, APOB, ABCA1, NR1H3, APOA1, LIPC, CETP, LDLR, and APOC1) and replicated 14 variants. Five variants within PCSK9, APOA1, ANGPTL4, and LDLR associate with CAD (33,090 cases and 236,254 controls). We used genetic risk scores for the lipid fractions to examine their causal relationship with CAD. The non-HDL cholesterol genetic risk score associates most strongly with CAD (P = 2.7 × 10(-28)), and no other genetic risk score associates with CAD after accounting for non-HDL cholesterol. The genetic risk score for non-HDL cholesterol confers CAD risk beyond that of LDL cholesterol (P = 5.5 × 10(-8)), suggesting that targeting atherogenic remnant cholesterol may reduce cardiovascular risk.
De novo mutations (DNMs) cause a large proportion of severe rare diseases of childhood. DNMs that occur early may result in mosaicism of both somatic and germ cells. Such early mutations can cause ...recurrence of disease. We scanned 1,007 sibling pairs from 251 families and identified 878 DNMs shared by siblings (ssDNMs) at 448 genomic sites. We estimated DNM recurrence probability based on parental mosaicism, sharing of DNMs among siblings, parent-of-origin, mutation type and genomic position. We detected 57.2% of ssDNMs in the parental blood. The recurrence probability of a DNM decreases by 2.27% per year for paternal DNMs and 1.78% per year for maternal DNMs. Maternal ssDNMs are more likely to be T>C mutations than paternal ssDNMs, and less likely to be C>T mutations. Depending on the properties of the DNM, the recurrence probability ranges from 0.011% to 28.5%. We have launched an online calculator to allow estimation of DNM recurrence probability for research purposes.
Glaucoma is a leading cause of irreversible blindness. A genome-wide search yielded multiple single-nucleotide polymorphisms (SNPs) in the 15q24.1 region associated with glaucoma. Further ...investigation revealed that the association is confined to exfoliation glaucoma (XFG). Two nonsynonymous SNPs in exon 1 of the gene LOXL1 explain the association, and the data suggest that they confer risk of XFG mainly through exfoliation syndrome (XFS). About 25% of the general population is homozygous for the highest-risk haplotype, and their risk of suffering from XFG is more than 100 times that of individuals carrying only low-risk haplotypes. The population-attributable risk is more than 99%. The product of LOXL1 catalyzes the formation of elastin fibers found to be a major component of the lesions in XFG.
Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have ...remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r
> 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10
, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.
Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide ...associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Mycobacterium tuberculosis infections cause 9 million new tuberculosis cases and 1.5 million deaths annually. To identify variants conferring risk of tuberculosis, we tested 28.3 million variants ...identified through whole-genome sequencing of 2,636 Icelanders for association with tuberculosis (8,162 cases and 277,643 controls), pulmonary tuberculosis (PTB) and M. tuberculosis infection. We found association of three variants in the region harboring genes encoding the class II human leukocyte antigens (HLAs): rs557011T (minor allele frequency (MAF) = 40.2%), associated with M. tuberculosis infection (odds ratio (OR) = 1.14, P = 3.1 × 10(-13)) and PTB (OR = 1.25, P = 5.8 × 10(-12)), and rs9271378G (MAF = 32.5%), associated with PTB (OR = 0.78, P = 2.5 × 10(-12))--both located between HLA-DQA1 and HLA-DRB1--and a missense variant encoding p.Ala210Thr in HLA-DQA1 (MAF = 19.1%, rs9272785), associated with M. tuberculosis infection (P = 9.3 × 10(-9), OR = 1.14). We replicated association of these variants with PTB in samples of European ancestry from Russia and Croatia (P < 5.9 × 10(-4)). These findings show that the HLA class II region contributes to genetic risk of tuberculosis, possibly through reduced presentation of protective M. tuberculosis antigens to T cells.
Marfan syndrome (MFS) is an autosomal dominant condition characterized by aortic aneurysm, skeletal abnormalities, and lens dislocation, and is caused by variants in the FBN1 gene. To explore causes ...of MFS and the prevalence of the disease in Iceland we collected information from all living individuals with a clinical diagnosis of MFS in Iceland (n = 32) and performed whole-genome sequencing of those who did not have a confirmed genetic diagnosis (27/32). Moreover, to assess a potential underdiagnosis of MFS in Iceland we attempted a genotype-based approach to identify individuals with MFS. We interrogated deCODE genetics' database of 35,712 whole-genome sequenced individuals to search for rare sequence variants in FBN1. Overall, we identified 15 pathogenic or likely pathogenic variants in FBN1 in 44 individuals, only 22 of whom were previously diagnosed with MFS. The most common of these variants, NM_000138.4:c.8038 C > T p.(Arg2680Cys), is present in a multi-generational pedigree, and was found to stem from a single forefather born around 1840. The p.(Arg2680Cys) variant associates with a form of MFS that seems to have an enrichment of abdominal aortic aneurysm, suggesting that this may be a particularly common feature of p.(Arg2680Cys)-associated MFS. Based on these combined genetic and clinical data, we show that MFS prevalence in Iceland could be as high as 1/6,600 in Iceland, compared to 1/10,000 based on clinical diagnosis alone, which indicates underdiagnosis of this actionable genetic disorder.