The characterization of baseline microbial and functional diversity in the human microbiome has enabled studies of microbiome-related disease, diversity, biogeography, and molecular function. The ...National Institutes of Health Human Microbiome Project has provided one of the broadest such characterizations so far. Here we introduce a second wave of data from the study, comprising 1,631 new metagenomes (2,355 total) targeting diverse body sites with multiple time points in 265 individuals. We applied updated profiling and assembly methods to provide new characterizations of microbiome personalization. Strain identification revealed subspecies clades specific to body sites; it also quantified species with phylogenetic diversity under-represented in isolate genomes. Body-wide functional profiling classified pathways into universal, human-enriched, and body site-enriched subsets. Finally, temporal analysis decomposed microbial variation into rapidly variable, moderately variable, and stable subsets. This study furthers our knowledge of baseline human microbial diversity and enables an understanding of personalized microbiome function and dynamics.
Cytosine methylation is a DNA modification generally associated with transcriptional silencing. Factors that regulate methylation have been linked to human disease, yet how they contribute to ...malignances remains largely unknown. Genomic maps of DNA methylation have revealed unexpected dynamics at gene regulatory regions, including active demethylation by TET proteins at binding sites for transcription factors. These observations indicate that the underlying DNA sequence largely accounts for local patterns of methylation. As a result, this mark is highly informative when studying gene regulation in normal and diseased cells, and it can potentially function as a biomarker. Although these findings challenge the view that methylation is generally instructive for gene silencing, several open questions remain, including how methylation is targeted and recognized and in what context it affects genome readout.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, SBMB, SIK, UILJ, UKNU, UL, UM, UPUK
43.
Whole-Genome Annotation with BRAKER Hoff, Katharina J; Lomsadze, Alexandre; Borodovsky, Mark ...
Methods in molecular biology,
01/2019, Letnik:
1962
Journal Article
Odprti dostop
BRAKER is a pipeline for highly accurate and fully automated gene prediction in novel eukaryotic genomes. It combines two major tools: GeneMark-ES/ET and AUGUSTUS. GeneMark-ES/ET learns its ...parameters from a novel genomic sequence in a fully automated fashion; if available, it uses extrinsic evidence for model refinement. From the protein-coding genes predicted by GeneMark-ES/ET, we select a set for training AUGUSTUS, one of the most accurate gene finding tools that, in contrast to GeneMark-ES/ET, integrates extrinsic evidence already into the gene prediction step. The first published version, BRAKER1, integrated genomic footprints of unassembled RNA-Seq reads into the training as well as into the prediction steps. The pipeline has since been extended to the integration of data on mapped cross-species proteins, and to the usage of heterogeneous extrinsic evidence, both RNA-Seq and protein alignments. In this book chapter, we briefly summarize the pipeline methodology and describe how to apply BRAKER in environments characterized by various combinations of external evidence.
The RNA-guided CRISPR-Cas9 nuclease from Streptococcus pyogenes (SpCas9) has been widely repurposed for genome editing. High-fidelity (SpCas9-HF1) and enhanced specificity (eSpCas9(1.1)) variants ...exhibit substantially reduced off-target cleavage in human cells, but the mechanism of target discrimination and the potential to further improve fidelity are unknown. Here, using single-molecule Förster resonance energy transfer experiments, we show that both SpCas9-HF1 and eSpCas9(1.1) are trapped in an inactive state when bound to mismatched targets. We find that a non-catalytic domain within Cas9, REC3, recognizes target complementarity and governs the HNH nuclease to regulate overall catalytic competence. Exploiting this observation, we design a new hyper-accurate Cas9 variant (HypaCas9) that demonstrates high genome-wide specificity without compromising on-target activity in human cells. These results offer a more comprehensive model to rationalize and modify the balance between target recognition and nuclease activation for precision genome editing.
•Novosphingobium resinovorum SA was the first single isolate capable of utilizing sulfanilic acid as sole carbon nitrogen and sulfur source.•The bacterium can also grow on several other substituted ...aromatic compounds.•The complete and annotated genome of the strain is presented.•Besides the chromosome, the isolate has four extrachromosomal elements of various sizes.•The genes encoding proteins involved in the catabolism of sulfanilic acid and other aromatics are reported.
Sulfanilic acid (4-aminobenzenesulfonic acid) is a sulfonated aromatic amine widely used in chemical industries for synthesis of various organic dyes and sulfa drugs. There are quite a few microbial co-cultures or single isolates capable of completely degrading this compound. Novosphingobium resinovorum SA1 was the first single bacterium which could utilize sulfanilic acid as its sole carbon, nitrogen and sulfur source. The strain has versatile catabolic routes for the bioconversion of numerous other aromatic compounds. Here, the complete genome sequence of the N. resinovorum SA1 strain is reported. The genome consists of a circular chromosome of 3.8 Mbp and four extrachromosomal elements between 67 and 1 759.8 kbp in size. Three alternative 3-ketoadipate pathways were identified on the plasmids. Sulfanilic acid is decomposed via a modified 3-ketoadipate pathway and the oxygenases involved form a phylogenetically separate branch on the tree. Sequence analysis of these elements might provide a genetic background for deeper insight into the versatile catabolic metabolism of various aromatic xenobiotics, including sulfanilic acid and its derivatives. Moreover, this is also a good model strain for understanding the role and evolution of multiple genetic elements within a single strain.
The CRISPR-Cas9 genome-editing system is a part of the adaptive immune system in archaea and bacteria to defend against invasive nucleic acids from phages and plasmids. The single guide RNA (sgRNA) ...of the system recognizes its target sequence in the genome, and the Cas9 nuclease of the system acts as a pair of scissors to cleave the double strands of DNA. Since its discovery, CRISPR-Cas9 has become the most robust platform for genome engineering in eukaryotic cells. Recently, the CRISPR-Cas9 system has triggered enormous interest in therapeutic applications. CRISPR-Cas9 can be applied to correct disease-causing gene mutations or engineer T cells for cancer immunotherapy. The first clinical trial using the CRISPR-Cas9 technology was conducted in 2016. Despite the great promise of the CRISPR-Cas9 technology, several challenges remain to be tackled before its successful applications for human patients. The greatest challenge is the safe and efficient delivery of the CRISPR-Cas9 genome-editing system to target cells in human body. In this review, we will introduce the molecular mechanism and different strategies to edit genes using the CRISPR-Cas9 system. We will then highlight the current systems that have been developed to deliver CRISPR-Cas9 in vitro and in vivo for various therapeutic purposes.
Display omitted
Many proteins regulate the expression of genes by binding to specific regions encoded in the genome
. Here we introduce a new data set of RNA elements in the human genome that are recognized by ...RNA-binding proteins (RBPs), generated as part of the Encyclopedia of DNA Elements (ENCODE) project phase III. This class of regulatory elements functions only when transcribed into RNA, as they serve as the binding sites for RBPs that control post-transcriptional processes such as splicing, cleavage and polyadenylation, and the editing, localization, stability and translation of mRNAs. We describe the mapping and characterization of RNA elements recognized by a large collection of human RBPs in K562 and HepG2 cells. Integrative analyses using five assays identify RBP binding sites on RNA and chromatin in vivo, the in vitro binding preferences of RBPs, the function of RBP binding sites and the subcellular localization of RBPs, producing 1,223 replicated data sets for 356 RBPs. We describe the spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation and RNA localization. These data expand the catalogue of functional elements encoded in the human genome by the addition of a large set of elements that function at the RNA level by interacting with RBPs.
The recent successes of the Materials Genome Initiative have opened up new opportunities for data-centric informatics approaches in several subfields of materials research, including in polymer ...science and engineering. Polymers, being inexpensive and possessing a broad range of tunable properties, are widespread in many technological applications. The vast chemical and morphological complexity of polymers though gives rise to challenges in the rational discovery of new materials for specific applications. The nascent field of polymer informatics seeks to provide tools and pathways for accelerated property prediction (and materials design) via surrogate machine learning models built on reliable past data. We have carefully accumulated a data set of organic polymers whose properties were obtained either computationally (bandgap, dielectric constant, refractive index, and atomization energy) or experimentally (glass transition temperature, solubility parameter, and density). A fingerprinting scheme that captures atomistic to morphological structural features was developed to numerically represent the polymers. Machine learning models were then trained by mapping the fingerprints (or features) to properties. Once developed, these models can rapidly predict properties of new polymers (within the same chemical class as the parent data set) and can also provide uncertainties underlying the predictions. Since different properties depend on different length-scale features, the prediction models were built on an optimized set of features for each individual property. Furthermore, these models are incorporated in a user-friendly online platform named Polymer Genome (www.polymergenome.org). Systematic and progressive expansion of both chemical and property spaces are planned to extend the applicability of Polymer Genome to a wide range of technological domains.
The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created ...in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism ...will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes
. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.