Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We ...sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10-261).
A fundamental requirement for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for ...efficient ways to use this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in the whole genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in both small and population-scale sequencing studies.
Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We ...describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
Sequence variants affecting blood lipids and coronary artery disease (CAD) may enhance understanding of the atherogenicity of lipid fractions. Using a large resource of whole-genome sequence data, we ...examined rare and low-frequency variants for association with non-HDL cholesterol, HDL cholesterol, LDL cholesterol, and triglycerides in up to 119,146 Icelanders. We discovered 13 variants with large effects (within ANGPTL3, APOB, ABCA1, NR1H3, APOA1, LIPC, CETP, LDLR, and APOC1) and replicated 14 variants. Five variants within PCSK9, APOA1, ANGPTL4, and LDLR associate with CAD (33,090 cases and 236,254 controls). We used genetic risk scores for the lipid fractions to examine their causal relationship with CAD. The non-HDL cholesterol genetic risk score associates most strongly with CAD (P = 2.7 × 10(-28)), and no other genetic risk score associates with CAD after accounting for non-HDL cholesterol. The genetic risk score for non-HDL cholesterol confers CAD risk beyond that of LDL cholesterol (P = 5.5 × 10(-8)), suggesting that targeting atherogenic remnant cholesterol may reduce cardiovascular risk.
We tested 16 million SNPs, identified through whole-genome sequencing of 457 Icelanders, for association with gout and serum uric acid levels. Genotypes were imputed into 41,675 chip-genotyped ...Icelanders and their relatives, for effective sample sizes of 968 individuals with gout and 15,506 individuals for whom serum uric acid measurements were available. We identified a low-frequency missense variant (c.1580C>G) in ALDH16A1 associated with gout (OR = 3.12, P = 1.5 × 10−16, at-risk allele frequency = 0.019) and serum uric acid levels (effect = 0.36 s.d., P = 4.5 × 10−21). We confirmed the association with gout by performing Sanger sequencing on 6,017 Icelanders. The association with gout was stronger in males relative to females. We also found a second variant on chromosome 1 associated with gout (OR = 1.92, P = 0.046, at-risk allele frequency = 0.986) and serum uric acid levels (effect = 0.48 s.d., P = 4.5 × 10−16). This variant is close to a common variant previously associated with serum uric acid levels. This work illustrates how whole-genome sequencing data allow the detection of associations between low-frequency variants and complex traits.
We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 ...more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports.
Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 ...Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.