The computational reconstruction of genome sequences from shotgun sequencing data has been greatly simplified by the advent of sequencing technologies that generate long reads. In the case of ...relatively small genomes (e.g., bacterial or viral), complete genome sequences can frequently be reconstructed computationally without the need for further experiments. However, large and complex genomes, such as those of most animals and plants, continue to pose significant challenges. In such genomes, assembly software produces incomplete and fragmented reconstructions that require additional experimentally derived information and manual intervention in order to reconstruct individual chromosome arms. Recent technologies originally designed to capture chromatin structure have been shown to effectively complement sequencing data, leading to much more contiguous reconstructions of genomes than previously possible. Here, we survey these technologies and the algorithms used to assemble and analyze large eukaryotic genomes, placed within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it ...usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome.
We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C.
we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics.
Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming ...an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate ...chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.
Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to ...the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems.
The switch/sucrose non-fermentable (SWI/SNF) complex has a crucial role in chromatin remodelling
and is altered in over 20% of cancers
. Here we developed a proteolysis-targeting chimera (PROTAC) ...degrader of the SWI/SNF ATPase subunits, SMARCA2 and SMARCA4, called AU-15330. Androgen receptor (AR)
forkhead box A1 (FOXA1)
prostate cancer cells are exquisitely sensitive to dual SMARCA2 and SMARCA4 degradation relative to normal and other cancer cell lines. SWI/SNF ATPase degradation rapidly compacts cis-regulatory elements bound by transcription factors that drive prostate cancer cell proliferation, namely AR, FOXA1, ERG and MYC, which dislodges them from chromatin, disables their core enhancer circuitry, and abolishes the downstream oncogenic gene programs. SWI/SNF ATPase degradation also disrupts super-enhancer and promoter looping interactions that wire supra-physiologic expression of the AR, FOXA1 and MYC oncogenes themselves. AU-15330 induces potent inhibition of tumour growth in xenograft models of prostate cancer and synergizes with the AR antagonist enzalutamide, even inducing disease remission in castration-resistant prostate cancer (CRPC) models without toxicity. Thus, impeding SWI/SNF-mediated enhancer accessibility represents a promising therapeutic approach for enhancer-addicted cancers.
Abstract
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across ...environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is ...the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes.
We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes.
The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures.
We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. ...Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
Reconstructing genomic segments from metagenomics data is a highly complex task. In addition to general challenges, such as repeats and sequencing errors, metagenomic assembly needs to tolerate the ...uneven depth of coverage among organisms in a community and differences between nearly identical strains. Previous methods have addressed these issues by smoothing genomic variants. We present a variant-aware metagenomic scaffolder called MetaCarvel, which combines new strategies for repeat detection with graph analytics for the discovery of variants. We show that MetaCarvel can accurately reconstruct genomic segments from complex microbial mixtures and correctly identify and characterize several classes of common genomic variants.