Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take ...advantage of advances in proteome-wide amino acid coevolution analysis and deep-learning–based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes within the
proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of yeast proteins, identify 1505 likely to interact, and build structure models for 106 previously unidentified assemblies and 806 that have not been structurally characterized. These complexes, which have as many as five subunits, play roles in almost all key processes in eukaryotic cells and provide broad insights into biological function.
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly ...related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.
Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has ...attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task.
Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent.
Availability: The PROMALS web server is available at: http://prodata.swmed.edu/promals/
Contact:
jpei@chop.swmed.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information ...facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
Cellular granules lacking boundary membranes harbor RNAs and their associated proteins and play diverse roles controlling the timing and location of protein synthesis. Formation of such granules was ...emulated by treatment of mouse brain extracts and human cell lysates with a biotinylated isoxazole (b-isox) chemical. Deep sequencing of the associated RNAs revealed an enrichment for mRNAs known to be recruited to neuronal granules used for dendritic transport and localized translation at synapses. Precipitated mRNAs contain extended 3′ UTR sequences and an enrichment in binding sites for known granule-associated proteins. Hydrogels composed of the low complexity (LC) sequence domain of FUS recruited and retained the same mRNAs as were selectively precipitated by the b-isox chemical. Phosphorylation of the LC domain of FUS prevented hydrogel retention, offering a conceptual means of dynamic, signal-dependent control of RNA granule assembly.
Display omitted
Display omitted
► A crystallized small molecule recruits mRNAs into granule-like aggregates ► The granule-like aggregates contain known granule RNAs ► Aggregates from brain lysates are enriched in mRNAs encoding synaptic proteins ► 3′ UTR lengths of recruited RNAs are significantly longer than average
Aggregation of RNA-binding proteins, including FUS, from mammalian cells within hydrogels retains mRNAs previously associated with RNA granules. Phosphorylation of FUS influences mRNA retention suggesting a mechanism for regulating granule assembly and composition.
Nuclear export signal (NES) motifs function as essential regulators of the subcellular location of proteins by interacting with the major nuclear exporter protein, CRM1. Prediction of NES is of great ...interest in many aspects of research including cancer, but currently available methods, which are mostly based on the sequence-based approaches, have been suffered from high false positive rates since the NES consensus patterns are quite commonly observed in protein sequences. Therefore, finding a feature that can distinguish real NES motifs from false positives is desired to improve the prediction power, but it is quite challenging when only using the sequence. Here, we provide a comprehensive table for the validated cargo proteins, containing the location of the NES consensus patterns with the disordered propensity plots, known protein domain information, and the predicted secondary structures. It could be useful for determining the most plausible NES region in the context of the whole protein sequence and suggests possibilities for some non-binders of the annotated regions. In addition, using the currently available crystal structures of CRM1 bound to various classes of NES peptides, we adopted, for the first time, the structure-based prediction of the NES motifs bound to the CRM1's binding groove. Combining sequence-based and structure-based predictions, we suggest a novel and more straight-forward approach to identify CRM1-binding NES sequences by analysis of their structural prerequisites and energetic evaluation of the stability at the CRM1's binding site.
TMEM120A, also named as TACAN, is a novel membrane protein highly conserved in vertebrates and was recently proposed to be a mechanosensitive channel involved in sensing mechanical pain. Here we ...present the single-particle cryogenic electron microscopy (cryo-EM) structure of human TMEM120A, which forms a tightly packed dimer with extensive interactions mediated by the N-terminal coiled coil domain (CCD), the C-terminal transmembrane domain (TMD), and the re-entrant loop between the two domains. The TMD of each TMEM120A subunit contains six transmembrane helices (TMs) and has no clear structural feature of a channel protein. Instead, the six TMs form an α-barrel with a deep pocket where a coenzyme A (CoA) molecule is bound. Intriguingly, some structural features of TMEM120A resemble those of elongase for very long-chain fatty acids (ELOVL) despite the low sequence homology between them, pointing to the possibility that TMEM120A may function as an enzyme for fatty acid metabolism, rather than a mechanosensitive channel.
The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress ...in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.
Eukaryotic cells contain assemblies of RNAs and proteins termed RNA granules. Many proteins within these bodies contain KH or RRM RNA-binding domains as well as low complexity (LC) sequences of ...unknown function. We discovered that exposure of cell or tissue lysates to a biotinylated isoxazole (b-isox) chemical precipitated hundreds of RNA-binding proteins with significant overlap to the constituents of RNA granules. The LC sequences within these proteins are both necessary and sufficient for b-isox-mediated aggregation, and these domains can undergo a concentration-dependent phase transition to a hydrogel-like state in the absence of the chemical. X-ray diffraction and EM studies revealed the hydrogels to be composed of uniformly polymerized amyloid-like fibers. Unlike pathogenic fibers, the LC sequence-based polymers described here are dynamic and accommodate heterotypic polymerization. These observations offer a framework for understanding the function of LC sequences as well as an organizing principle for cellular structures that are not membrane bound.
Display omitted
Display omitted
► A biotinylated small molecule precipitates RNA granule proteins from cell lysates ► Low complexity sequences in these proteins form hydrogels ► Amyloid-like fibers within the gels can trap LCS domains from other proteins ► The cell-free in vitro reactions model RNA granule architecture and formation
RNA-binding proteins with regions of low complexity sequence can form hydrogels in vitro comprised of amyloid-like fibers either via nucleation by a small molecule or by self-organization. Unlike pathologic amyloids, the fibers are dynamic and can incorporate low complexity domains from different proteins, suggesting a basis for assembly of RNA granules within cells.
SEA (sea urchin sperm protein, enterokinase, agrin) domains, many of which possess autoproteolysis activity, have been found in a number of cell surface and secreted proteins. Despite high sequence ...divergence, SEA domains were also proposed to be present in dystroglycan based on a conserved autoproteolysis motif and receptor‐type protein phosphatase IA‐2 based on structural similarity. The presence of a SEA domain adjacent to the transmembrane segment appears to be a recurring theme in quite a number of type I transmembrane proteins on the cell surface, such as MUC1, dystroglycan, IA‐2, and Notch receptors. By comparative sequence and structural analyses, we identified dystroglycan‐like proteins with SEA domains in Capsaspora owczarzaki of the Filasterea group, one of the closest single‐cell relatives of metazoans. We also detected novel and divergent SEA domains in a variety of cell surface proteins such as EpCAM, α/ε‐sarcoglycan, PTPRR, collectrin/Tmem27, amnionless, CD34, KIAA0319, fibrocystin‐like protein, and a number of cadherins. While these proteins are mostly from metazoans or their single cell relatives such as choanoflagellates and Filasterea, fibrocystin‐like proteins with SEA domains were found in several other eukaryotic lineages including green algae, Alveolata, Euglenozoa, and Haptophyta, suggesting an ancient evolutionary origin. In addition, the intracellular protein Nucleoporin 54 (Nup54) acquired a divergent SEA domain in choanoflagellates and metazoans.