Although mmCIF is the current official format for deposition of protein and nucleic acid structures to the protein data bank (PDB) database, the legacy PDB format is still the primary supported ...format for many structural bioinformatics tools. Therefore, reliable software to convert mmCIF structure files to PDB files is needed. Unfortunately, existing conversion programs fail to correctly convert many mmCIF files, especially those with many atoms and/or long chain identifies.
This study proposed BeEM, which converts any mmCIF format structure files to PDB format. BeEM conversion faithfully retains all atomic and chain information, including chain IDs with more than 2 characters, which are not supported by any existing mmCIF to PDB converters. The conversion speed of BeEM is at least ten times faster than existing converters such as MAXIT and Phenix. Part of the reason for the speed improvement is the avoidance of conversion between numerical values and text strings.
BeEM is a fast and accurate tool for mmCIF-to-PDB format conversion, which is a common procedure in structural biology. The source code is available under the BSD licence at https://github.com/kad-ecoli/BeEM/ .
The COFACTOR web server is a unified platform for structure-based multiple-level protein function predictions. By structurally threading low-resolution structural models through the BioLiP library, ...the COFACTOR server infers three categories of protein functions including gene ontology, enzyme commission and ligand-binding sites from various analogous and homologous function templates. Here, we report recent improvements of the COFACTOR server in the development of new pipelines to infer functional insights from sequence profile alignments and protein-protein interaction networks. Large-scale benchmark tests show that the new hybrid COFACTOR approach significantly improves the function annotation accuracy of the former structure-based pipeline and other state-of-the-art functional annotation methods, particularly for targets that have no close homology templates. The updated COFACTOR server and the template libraries are available at http://zhanglab.ccmb.med.umich.edu/COFACTOR/.
The recently released PyMod GUI integrates many of the individual steps required for protein sequence-structure analysis and homology modeling within the interactive visualization capabilities of ...PyMOL. Here we describe the improvements introduced into the version 2.0 of PyMod.
The original code of PyMod has been completely rewritten and improved in version 2.0 to extend PyMOL with packages such as Clustal Omega, PSIPRED and CAMPO. Integration with the popular web services ESPript and WebLogo is also provided. Finally, a number of new MODELLER functionalities have also been implemented, including SALIGN, modeling of quaternary structures, DOPE scores, disulfide bond modeling and choice of heteroatoms to be included in the final model.
PyMod 2.0 installer packages for Windows, Linux and Mac OS X and user guides are available at http://schubert.bio.uniroma1.it/pymod/index.html. The open source code of the project is hosted at https://github.com/pymodproject/pymod.
alessandro.paiardini@uniroma1.it or giacomo.janson@uniroma1.it
Most proteins exist with multiple domains in cells for cooperative functionality. However, structural biology and protein folding methods are often optimized for single-domain structures, resulting ...in a rapidly growing gap between the improved capability for tertiary structure determination and high demand for multidomain structure models. We have developed a pipeline, termed DEMO, for constructing multidomain protein structures by docking-based domain assembly simulations, with interdomain orientations determined by the distance profiles from analogous templates as detected through domain-level structure alignments. The pipeline was tested on a comprehensive benchmark set of 356 proteins consisting of 2–7 continuous and discontinuous domains, for which DEMO generated models with correct global fold (TM-score > 0.5) for 86% of cases with continuous domains and for 100% of cases with discontinuous domain structures, starting from randomly oriented target-domain structures. DEMO was also applied to reassemble multidomain targets in the CASP12 and CASP13 experiments using domain structures excised from the top server predictions, where the full-length DEMO models showed a significantly improved quality over the original server models. Finally, sparse restraints of mass spectrometry-generated cross-linking data and cryo-EM density maps are incorporated into DEMO, resulting in improvements in the average TM-score by 6.3% and 12.5%, respectively. The results demonstrate an efficient approach to assembling multidomain structures, which can be easily used for automated, genome-scale multidomain protein structure assembly.
As the infection of 2019-nCoV coronavirus is quickly developing into a global pneumonia epidemic, the careful analysis of its transmission and cellular mechanisms is sorely needed. In this ...Communication, we first analyzed two recent studies that concluded that snakes are the intermediate hosts of 2019-nCoV and that the 2019-nCoV spike protein insertions share a unique similarity to HIV-1. However, the reimplementation of the analyses, built on larger scale data sets using state-of-the-art bioinformatics methods and databases, presents clear evidence that rebuts these conclusions. Next, using metagenomic samples from Manis javanica, we assembled a draft genome of the 2019-nCoV-like coronavirus, which shows 73% coverage and 91% sequence identity to the 2019-nCoV genome. In particular, the alignments of the spike surface glycoprotein receptor binding domain revealed four times more variations in the bat coronavirus RaTG13 than in the Manis coronavirus compared with 2019-nCoV, suggesting the pangolin as a missing link in the transmission of 2019-nCoV from bats to human.
Abstract Significant efforts have been devoted to develop efficient visible-light-driven photocatalysts for the conversion of CO 2 to chemical fuels. The photocatalytic efficiency for this ...transformation largely depends on CO 2 adsorption and diffusion. However, the CO 2 adsorption on the surface of photocatalysts is generally low due to their low specific surface area and the lack of matched pores. Here we report a well-defined porous hypercrosslinked polymer-TiO 2 -graphene composite structure with relatively high surface area i.e., 988 m 2 g −1 and CO 2 uptake capacity i.e., 12.87 wt%. This composite shows high photocatalytic performance especially for CH 4 production, i.e., 27.62 μmol g −1 h −1 , under mild reaction conditions without the use of sacrificial reagents or precious metal co-catalysts. The enhanced CO 2 reactivity can be ascribed to their improved CO 2 adsorption and diffusion, visible-light absorption, and photo-generated charge separation efficiency. This strategy provides new insights into the combination of microporous organic polymers with photocatalysts for solar-to-fuel conversion.
We report the results of two fully automated structure prediction pipelines, “Zhang‐Server” and “QUARK”, in CASP13. The pipelines were built upon the C‐I‐TASSER and C‐QUARK programs, which in turn ...are based on I‐TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence‐profiles for contact prediction; (b) an improved meta‐method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact‐maps by coupling precision‐matrices with deep residual convolutional neural‐networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM‐scores of the first models produced by C‐I‐TASSER and C‐QUARK were 28% and 56% higher than those constructed by I‐TASSER and QUARK, respectively. For the first time, contact‐map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM‐scores of C‐I‐TASSER models were significantly higher than those of I‐TASSER models with a P‐value <.05. Detailed data analyses showed that the success of C‐I‐TASSER and C‐QUARK was mainly due to the increased accuracy of deep‐learning‐based contact‐maps, as well as the careful balance between sequence‐based contact restraints, threading templates, and generic knowledge‐based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi‐domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact‐based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue ...contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.
Display omitted
•C-I-TASSER adds deep-learning contact prediction to fragment assembly simulations•C-I-TASSER enables ab initio folding of proteins lacking homology in the PDB•The inherent force field is critical for proteins with poor templates and sparse MSAs•Half of unsolved Pfam families are foldable by C-I-TASSER
Taking advantage of the rapid progress in deep-learning technologies, residue-residue contact-map prediction recently achieved impressive breakthroughs. However, how to efficiently convert the binary contact maps into atomic-level structure models remains an important unsolved problem in ab initio protein structure prediction. In this work, we integrated the deep-learning contact-map predictions with cutting-edge threading assembly simulations and found that the inherent force field of the structural folding simulations is essential to maximize the potential of contact-assisted protein structure prediction, especially for the targets and regions that lack spatial restraints and sufficient evolutionary data.
Zheng et al. develop C-I-TASSER, which integrates interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations, for high-accuracy protein structure prediction. C-I-TASSER folds more than twice the number of proteins without homology than I-TASSER and has successfully folded 50% of Pfam families without solved experimental structures.
In this work, a very simple, label-free, isothermal, and ultrasensitive electrochemical DNA biosensor has been developed on the basis of an autocatalytic and exonuclease III (Exo III)-assisted target ...recycling amplification strategy. A duplex DNA probe constructed by the hybridization of a quadruplex-forming oligomer with a molecular beacon is ingeniously designed and assembled on the electrode as recognition element. Upon sensing of the analyte nucleic acid, the strand of molecular beacon in the duplex DNA probe could be stepwise removed by Exo III accompanied by the releasing of target DNA and autonomous generation of new secondary target DNA fragment for the successive hybridization and cleavage process. Simultaneously, numerous quadruplex-forming oligomers are liberated and folded into G-quadruplex–hemin complexes with the help of K+ and hemin on the electrode surface to give a remarkable electrochemical response. Because of this autocatalytic target recycling amplification and the specifically catalyzed formation of G-quadruplex–hemin complexes, this newly designed protocol provides an ultrasensitive electrochemical detection of DNA down to the 10 fM level, can discriminate mismatched DNA from perfectly matched target DNA, and holds a great potential for early diagnosis in gene-related diseases. It further could be developed as a universal protocol for the detection of various DNA sequences and may be extended for the detection of aptamer-binding molecules.
In this article, we report 3D structure prediction results by two of our best server groups (“Zhang‐Server” and “QUARK”) in CASP14. These two servers were built based on the D‐I‐TASSER and D‐QUARK ...algorithms, which integrated four newly developed components into the classical protein folding pipelines, I‐TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact‐based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network‐based method, DeepPotential, to predict multiple spatial restraints by co‐evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM‐scores of the first models produced by D‐I‐TASSER and D‐QUARK were 96% and 112% higher than those constructed by I‐TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well‐tuned force field that combines spatial restraints, threading templates, and generic knowledge‐based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi‐domain proteins due to low accuracy in inter‐domain distance prediction and modeling protein domains from oligomer complexes, as the co‐evolutionary analysis cannot distinguish inter‐chain and intra‐chain distances. Specifically tuning the deep learning‐based predictors for multi‐domain targets and protein complexes may be helpful to address these issues.