While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, ...recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.
Abstract
Motivation
The COVID-19 pandemic has ignited a broad scientific interest in viral research in general and coronavirus research in particular. The identification and characterization of viral ...species in natural reservoirs typically involves de novo assembly. However, existing genome, metagenome and transcriptome assemblers often are not able to assemble many viruses (including coronaviruses) into a single contig. Coverage variation between datasets and within dataset, presence of close strains, splice variants and contamination set a high bar for assemblers to process viral datasets with diverse properties.
Results
We developed coronaSPAdes, a novel assembler for RNA viral species recovery in general and coronaviruses in particular. coronaSPAdes leverages the knowledge about viral genome structures to improve assembly extending ideas initially implemented in biosyntheticSPAdes. We have shown that coronaSPAdes outperforms existing SPAdes modes and other popular short-read metagenome and viral assemblers in the recovery of full-length RNA viral genomes.
Availability and implementation
coronaSPAdes version used in this article is a part of SPAdes 3.15 release and is freely available at http://cab.spbu.ru/software/spades.
Supplementary information
Supplementary data are available at Bioinformatics online.
A recently published article in BMCGenomics by Fuentes-Trillo et al. contains a comparison of assembly approaches of several noroviral samples via different tools and preprocessing strategies. It ...turned out that the study used outdated versions of tools as well as tools that were not designed for the viral assembly task. In order to improve the suboptimal assemblies, authors suggested different sophisticated preprocessing strategies that seem to make only minor contributions to the results. We have reproduced the analysis using state-of-the-art tools designed for viral assembly, and we demonstrate that tools from the SPAdes toolkit (rnaviralSPAdes and coronaSPAdes) allow one to assemble the samples from the original study into a single contig without any additional preprocessing.
In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused ...by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling. We applied these methods to clinical specimens gathered from 669 patients in New York City during the first two months of the outbreak, yielding a broad molecular portrait of the emerging COVID-19 disease. We find significant enrichment of a NYC-distinctive clade of the virus (20C), as well as host responses in interferon, ACE, hematological, and olfaction pathways. In addition, we use 50,821 patient records to find that renin-angiotensin-aldosterone system inhibitors have a protective effect for severe COVID-19 outcomes, unlike similar drugs. Finally, spatial transcriptomic data from COVID-19 patient autopsy tissues reveal distinct ACE2 expression loci, with macrophage and neutrophil infiltration in the lungs. These findings can inform public health and may help develop and drive SARS-CoV-2 diagnostic, prevention, and treatment strategies.
Background
A definitive diagnosis of prostate cancer requires a biopsy to obtain tissue for pathologic analysis, but this is an invasive procedure and is associated with complications.
Purpose
To ...develop an artificial intelligence (AI)‐based model (named AI‐biopsy) for the early diagnosis of prostate cancer using magnetic resonance (MR) images labeled with histopathology information.
Study Type
Retrospective.
Population
Magnetic resonance imaging (MRI) data sets from 400 patients with suspected prostate cancer and with histological data (228 acquired in‐house and 172 from external publicly available databases).
Field Strength/Sequence
1.5 to 3.0 Tesla, T2‐weighted image pulse sequences.
Assessment
MR images reviewed and selected by two radiologists (with 6 and 17 years of experience). The patient images were labeled with prostate biopsy including Gleason Score (6 to 10) or Grade Group (1 to 5) and reviewed by one pathologist (with 15 years of experience). Deep learning models were developed to distinguish 1) benign from cancerous tumor and 2) high‐risk tumor from low‐risk tumor.
Statistical Tests
To evaluate our models, we calculated negative predictive value, positive predictive value, specificity, sensitivity, and accuracy. We also calculated areas under the receiver operating characteristic (ROC) curves (AUCs) and Cohen's kappa.
Results
Our computational method (https://github.com/ih-lab/AI-biopsy) achieved AUCs of 0.89 (95% confidence interval CI: 0.86–0.92) and 0.78 (95% CI: 0.74–0.82) to classify cancer vs. benign and high‐ vs. low‐risk of prostate disease, respectively.
Data Conclusion
AI‐biopsy provided a data‐driven and reproducible way to assess cancer risk from MR images and a personalized strategy to potentially reduce the number of unnecessary biopsies. AI‐biopsy highlighted the regions of MR images that contained the predictive features the algorithm used for diagnosis using the class activation map method. It is a fully automatic method with a drag‐and‐drop web interface (https://ai-biopsy.eipm-research.org) that allows radiologists to review AI‐assessed MR images in real time.
Level of Evidence
1
Technical Efficacy Stage
2
Abstract
Synthetic long read sequencing techniques such as UST’s TELL-Seq and Loop Genomics’ LoopSeq combine 3
$$'$$
′
barcoding with standard short-read sequencing to expand the range of linkage ...resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3
$$'$$
′
unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.
Clean rooms of the Space Assembly Facility (SAF) at the Jet Propulsion Laboratory (JPL) at NASA are the final step of spacecraft cleaning and assembly before launching into space. Clean rooms have ...stringent methods of air-filtration and cleaning to minimize microbial contamination for exoplanetary research and minimize the risk of human pathogens, but they are not sterile. Clean rooms make a selective environment for microorganisms that tolerate such cleaning methods. Previous studies have attempted to characterize the microbial cargo through sequencing and culture-dependent protocols. However, there is not a standardized metagenomic workflow nor analysis pipeline for spaceflight hardware cleanroom samples to identify microbial contamination. Additionally, current identification methods fail to characterize and profile the risk of low-abundance microorganisms.
A comprehensive metagenomic framework to characterize microorganisms relevant for planetary protection in multiple cleanroom classifications (from ISO-5 to ISO-8.5) and sample types (surface, filters, and debris collected via vacuum devices) was developed. Fifty-one metagenomic samples from SAF clean rooms were sequenced and analyzed to identify microbes that could potentially survive spaceflight based on their microbial features and whether the microbes expressed any metabolic activity or growth. Additionally, an auxiliary testing was performed to determine the repeatability of our techniques and validate our analyses. We find evidence that JPL clean rooms carry microbes with attributes that may be problematic in space missions for their documented ability to withstand extreme conditions, such as psychrophilia and ability to form biofilms, spore-forming capacity, radiation resistance, and desiccation resistance. Samples from ISO-5 standard had lower microbial diversity than those conforming to ISO-6 or higher filters but still carried a measurable microbial load.
Although the extensive cleaning processes limit the number of microbes capable of withstanding clean room condition, it is important to quantify thresholds and detect organisms that can inform ongoing Planetary Protection goals, provide a biological baseline for assembly facilities, and guide future mission planning. Video Abstract.
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion's share of these non-reference sequences (NRSs) cannot be ...reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.
Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on ...hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the ...time of writing) exceeds 20 petabases and is growing exponentially
. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 10
novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.