Proteins and RNA functionally and physically intersect in multiple biological processes, however, currently no universal method is available to purify protein-RNA complexes. Here, we introduce XRNAX, ...a method for the generic purification of protein-crosslinked RNA, and demonstrate its versatility to study the composition and dynamics of protein-RNA interactions by various transcriptomic and proteomic approaches. We show that XRNAX captures all RNA biotypes and use this to characterize the sub-proteomes that interact with coding and non-coding RNAs (ncRNAs) and to identify hundreds of protein-RNA interfaces. Exploiting the quantitative nature of XRNAX, we observe drastic remodeling of the RNA-bound proteome during arsenite-induced stress, distinct from autophagy-related changes in the total proteome. In addition, we combine XRNAX with crosslinking immunoprecipitation sequencing (CLIP-seq) to validate the interaction of ncRNA with lamin B1 and EXOSC2. Thus, XRNAX is a resourceful approach to study structural and compositional aspects of protein-RNA interactions to address fundamental questions in RNA-biology.
Display omitted
•XRNAX purifies protein-crosslinked RNA of all biotypes from UV-crosslinked cells•Discovery of the WKF RNA-binding domain•Discovery of more than 700 proteins interacting with non-polyadenylated RNA•Profiling of stress-induced changes in RNA-binding proteomes
A general approach for characterizing cellular RNA-protein interactions allows examination of dynamic changes to the RNA-bound proteome.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. ...After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure
. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold
, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
Full text
Available for:
GEOZS, IJS, IMTLJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK, ZAGLJ
Abstract
The Rfam database is a collection of RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. In this paper ...we introduce Rfam release 13.0, which switches to a new genome-centric approach that annotates a non-redundant set of reference genomes with RNA families. We describe new web interface features including faceted text search and R-scape secondary structure visualizations. We discuss a new literature curation workflow and a pipeline for building families based on RNAcentral. There are 236 new families in release 13.0, bringing the total number of families to 2687. The Rfam website is http://rfam.org.
Abstract
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added ...in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods ...for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13,000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.
Mobile genetic elements (MGEs) sequester and mobilize antibiotic resistance genes across bacterial genomes. Efficient and reliable identification of such elements is necessary to follow resistance ...spreading. However, automated tools for MGE identification are missing. Tyrosine recombinase (YR) proteins drive MGE mobilization and could provide markers for MGE detection, but they constitute a diverse family also involved in housekeeping functions. Here, we conducted a comprehensive survey of YRs from bacterial, archaeal, and phage genomes and developed a sequence‐based classification system that dissects the characteristics of MGE‐borne YRs. We revealed that MGE‐related YRs evolved from non‐mobile YRs by acquisition of a regulatory arm‐binding domain that is essential for their mobility function. Based on these results, we further identified numerous unknown MGEs. This work provides a resource for comparative analysis and functional annotation of YRs and aids the development of computational tools for MGE annotation. Additionally, we reveal how YRs adapted to drive gene transfer across species and provide a tool to better characterize antibiotic resistance dissemination.
SYNOPSIS
A systematic resource for tyrosine recombinase annotation is presented. Comparative sequence analysis of the protein family enables the functional classification of these enzymes and the identification of mobile genetic elements in bacterial genomes.
Phylogenetic analysis of the tyrosine recombinase protein family classifies its members into twenty subgroups.
Members of the subgroups have a specific function, sequence features and host taxonomy.
Tyrosine recombinases of mobile genetic elements carry an additional arm‐binding domain.
Tyrosine recombinase classification enables the identification of new mobile genetic elements in bacterial genomes.
A systematic resource for tyrosine recombinase annotation is presented. Comparative sequence analysis of the protein family enables the functional classification of these enzymes and the identification of mobile genetic elements in bacterial genomes.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
...trees are essential to eliminate excess CO2; on average, a mature tree can sequester 11,000 gCO2 per year 12. Since it depends on the energy needed to power the computer and the carbon footprint ...of producing such energy, it can be calculated fairly accurately. ...the end-to-end environmental impact of computers and data centres is substantial but difficult to quantify. ...try to use your gear for as long as is reasonable.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search ...for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.
TreeFam (http://www.treefam.org) is a database of phylogenetic trees inferred from animal genomes. For every TreeFam family we provide homology predictions together with the evolutionary history of ...the genes. Here we describe an update of the TreeFam database. The TreeFam project was resurrected in 2012 and has seen two releases since. The latest release (TreeFam 9) was made available in March 2013. It has orthology predictions and gene trees for 109 species in 15,736 families covering ∼2.2 million sequences. With release 9 we made modifications to our production pipeline and redesigned our website with improved gene tree visualizations and Wikipedia integration. Furthermore, we now provide an HMM-based sequence search that places a user-provided protein sequence into a TreeFam gene tree and provides quick orthology prediction. The tool uses Mafft and RAxML for the fast insertion into a reference alignment and tree, respectively. Besides the aforementioned technical improvements, we present a new approach to visualize gene trees and alternative displays that focuses on showing homology information from a species tree point of view. From release 9 onwards, TreeFam is now hosted at the EBI.
The database iPfam, available at http://ipfam.org, catalogues Pfam domain interactions based on known 3D structures that are found in the Protein Data Bank, providing interaction data at the ...molecular level. Previously, the iPfam domain-domain interaction data was integrated within the Pfam database and website, but it has now been migrated to a separate database. This allows for independent development, improving data access and giving clearer separation between the protein family and interactions datasets. In addition to domain-domain interactions, iPfam has been expanded to include interaction data for domain bound small molecule ligands. Functional annotations are provided from source databases, supplemented by the incorporation of Wikipedia articles where available. iPfam (version 1.0) contains >9500 domain-domain and 15 500 domain-ligand interactions. The new website provides access to this data in a variety of ways, including interactive visualizations of the interaction data.