Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best ...draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to "phase 3 finished" status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides "lift-over" co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.
As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly ...relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads.
We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/.
Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli's circular genome.
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a ...novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Particulate matter (PM) pollution from China is transported eastward to Korea and Japan and has been suggested to influence surface air quality on the West Coast of the United States. However, remote ...sensing studies have been inconclusive as to recent trends in Chinese emissions and transport. We reconciled different passive remote sensing points of view and found that while aerosol optical thickness (AOT) as an indicator of particulate pollution has increased from the start of the observation period (2000) to 2006–2007 from the main Chinese coastal outflow regions, since then there has been a 10–20% decrease in AOT (with respect to 2007). Reductions were observed in spring, summer, and fall seasons. No improvement in exported PM pollution is found for the winter season.
Key Points
Followed by an increase in AOT from 2000 to 2007, an ~10–20% decrease is found for AOT exiting east coastal of China from 2008 to 2015
Decreases in AOT are found for spring, summer, and fall seasons. No change is detected for the winter season
Fine‐mode aerosols dominate the detected changes in summer and fall. A decrease in coarse‐mode AOT is also found for spring
Plain Language Summary
Particulate matter pollution from China is transported eastward to Korea and Japan and has been suggested to influence surface air quality on the West Coast of the United States. However, remote sensing studies have been inconclusive as to recent trends in Chinese emissions and transport. We reconciled different passive remote sensing points of view and found that while aerosol optcial thickness (AOT) as an indicator of particulate pollution has increased to 2006–2007 in the main exit regions of China's coast, since then there has been a 10–20% decrease in AOT (with respect to 2007). Reductions were observed in spring, summer, and fall seasons. No improvement in exported particulate matter pollution is found for the winter season.
Currently, the Moderate‐resolution Imaging Spectroradiometers (MODIS) level II aerosol product (MOD04/MYD04) is the best aerosol optical depth product suitable for near‐real‐time aerosol data ...assimilation. However, a careful analysis of biases and error variances in MOD04/MYD04 aerosol optical depth product is necessary before implementing the MODIS aerosol product in aerosol forecasting applications. Using 1 year's worth of Sun photometer and MOD04/MYD04 aerosol optical depth (τ) data over global oceans, we studied the major biases in MODIS aerosol over‐ocean product due to wind speed, cloud contamination, and aerosol microphysical properties. For τ less than 0.6, we found similar uncertainties in the mean MOD04/MYD04 τ as suggested by the MODIS aerosol group, while biases are nonlinear for τ larger than 0.6. We showed that uncertainties in MOD04/MYD04 data can be reduced, and the correlation between MODIS and Sun photometer τ can be improved by reducing the systematic biases in MOD04/MYD04 data through empirical corrections and quality assurance procedures. By removing noise and outliers and ensuring that only the highest‐quality data were included, we created a modified aerosol optical depth product that removes most massive outliers and ultimately reduced the absolute error (MODIS–Sun photometer) in MODIS τ at 0.55 μm (τ0.55) by 10–20%. Averaged over 1 year's worth of Terra MODIS aerosol product over global oceans, we found a 12% reduction in MODIS τ0.55 with extremes of 30% over the southern midlatitudes and the North Pacific due to a reduction in cloud contamination. This modified aerosol optical depth product will be used operationally.
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world
. Here we describe the release ...of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Familial hypercholesterolemia (FH) remains underdiagnosed despite widespread cholesterol screening. Exome sequencing and electronic health record (EHR) data of 50,726 individuals were used to assess ...the prevalence and clinical impact of FH-associated genomic variants in the Geisinger Health System. The estimated FH prevalence was 1:256 in unselected participants and 1:118 in participants ascertained via the cardiac catheterization laboratory. FH variant carriers had significantly increased risk of coronary artery disease. Only 24% of carriers met EHR-based presequencing criteria for probable or definite FH diagnosis. Active statin use was identified in 58% of carriers; 46% of statin-treated carriers had a low-density lipoprotein cholesterol level below 100 mg/dl. Thus, we find that genomic screening can prompt the diagnosis of FH patients, most of whom are receiving inadequate lipid-lowering therapy.
Higher-than-normal levels of circulating triglycerides are a risk factor for ischemic cardiovascular disease. Activation of lipoprotein lipase, an enzyme that is inhibited by angiopoietin-like 4 ...(ANGPTL4), has been shown to reduce levels of circulating triglycerides.
We sequenced the exons of ANGPTL4 in samples obtain from 42,930 participants of predominantly European ancestry in the DiscovEHR human genetics study. We performed tests of association between lipid levels and the missense E40K variant (which has been associated with reduced plasma triglyceride levels) and other inactivating mutations. We then tested for associations between coronary artery disease and the E40K variant and other inactivating mutations in 10,552 participants with coronary artery disease and 29,223 controls. We also tested the effect of a human monoclonal antibody against ANGPTL4 on lipid levels in mice and monkeys.
We identified 1661 heterozygotes and 17 homozygotes for the E40K variant and 75 participants who had 13 other monoallelic inactivating mutations in ANGPTL4. The levels of triglycerides were 13% lower and the levels of high-density lipoprotein (HDL) cholesterol were 7% higher among carriers of the E40K variant than among noncarriers. Carriers of the E40K variant were also significantly less likely than noncarriers to have coronary artery disease (odds ratio, 0.81; 95% confidence interval, 0.70 to 0.92; P=0.002). K40 homozygotes had markedly lower levels of triglycerides and higher levels of HDL cholesterol than did heterozygotes. Carriers of other inactivating mutations also had lower triglyceride levels and higher HDL cholesterol levels and were less likely to have coronary artery disease than were noncarriers. Monoclonal antibody inhibition of Angptl4 in mice and monkeys reduced triglyceride levels.
Carriers of E40K and other inactivating mutations in ANGPTL4 had lower levels of triglycerides and a lower risk of coronary artery disease than did noncarriers. The inhibition of Angptl4 in mice and monkeys also resulted in corresponding reductions in these values. (Funded by Regeneron Pharmaceuticals.).
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing
to explore ...protein-altering variants and their consequences in 454,787 participants in the UK Biobank study
. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10
. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Hegel’s critique of Early German Romanticism and its theory of irony resonates to the core of his own philosophy in the same way that Plato’s polemics with the Sophists have repercussions that go to ...the centre of his thought. The Anti-Romantic examines Hegel’s critique of Fr. Schlegel, Novalis and Schleiermacher. Hegel rarely mentions these thinkers by name and the texts dealing with them often exist on the periphery of his oeuvre. Nonetheless, individually, they represent embodiments of specific forms of irony: Schlegel, a form of critical individuality; Novalis, a form of sentimental nihilism; Schleiermacher, a monstrous hybrid of the other two. The strength of Hegel’s polemical approach to these authors shows how irony itself represents for him a persistent threat to his own idea of systematic Science. This is so, we discover, because Romantic irony is more than a rival ideology; it is an actual form of discourse, one whose performative objectivity interferes with the objectivity of Hegel’s ownlogos. Thus, Hegel’s critique of irony allows us to reciprocally uncover a Hegelian theory of scientific discourse. Far from seeing irony as a form of consciousness overcome by Spirit, Hegel sees it as having become a pressing feature of his own contemporary world, as witnessed in the popularity of his Berlin rival, Schleiermacher. Finally, to the extent that ironic discourse seems, for Hegel, to imply a certain world beyond his own notion of modernity, we are left with the hypothesis that Hegel’s critique of irony may be viewed as a critique of post-modernity.