Abstract
Large biomolecular structures are being determined experimentally on a daily basis using established techniques such as crystallography and electron microscopy. In addition, emerging ...integrative or hybrid methods (I/HM) are producing structural models of huge macromolecular machines and assemblies, sometimes containing 100s of millions of non-hydrogen atoms. The performance requirements for visualization and analysis tools delivering these data are increasing rapidly. Significant progress in developing online, web-native three-dimensional (3D) visualization tools was previously accomplished with the introduction of the LiteMol suite and NGL Viewers. Thereafter, Mol* development was jointly initiated by PDBe and RCSB PDB to combine and build on the strengths of LiteMol (developed by PDBe) and NGL (developed by RCSB PDB). The web-native Mol* Viewer enables 3D visualization and streaming of macromolecular coordinate and experimental data, together with capabilities for displaying structure quality, functional, or biological context annotations. High-performance graphics and data management allows users to simultaneously visualise up to hundreds of (superimposed) protein structures, stream molecular dynamics simulation trajectories, render cell-level models, or display huge I/HM structures. It is the primary 3D structure viewer used by PDBe and RCSB PDB. It can be easily integrated into third-party services. Mol* Viewer is open source and freely available at https://molstar.org/.
Graphical Abstract
Graphical Abstract
Overview of the large array of entities and systems that can be visualized and be manipulated with by the Mol* Viewer.
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. ...After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure
. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold
, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
Full text
Available for:
GEOZS, IJS, IMTLJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK, ZAGLJ
The Protein Data Bank (PDB)--the single global repository of experimentally determined 3D structures of biological macromolecules and their complexes--was established in 1971, becoming the first ...open-access digital resource in the biological sciences. The PDB archive currently houses ~130,000 entries (May 2017). It is managed by the Worldwide Protein Data Bank organization (wwPDB; wwpdb.org), which includes the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The four wwPDB partners operate a unified global software system that enforces community-agreed data standards and supports data Deposition, Biocuration, and Validation of ~11,000 new PDB entries annually (deposit.wwpdb.org). The RCSB PDB currently acts as the archive keeper, ensuring disaster recovery of PDB data and coordinating weekly updates. wwPDB partners disseminate the same archival data from multiple FTP sites, while operating complementary websites that provide their own views of PDB data with selected value-added information and links to related data resources. At present, the PDB archives experimental data, associated metadata, and 3D-atomic level structural models derived from three well-established methods: crystallography, nuclear magnetic resonance spectroscopy (NMR), and electron microscopy (3DEM). wwPDB partners are working closely with experts in related experimental areas (small-angle scattering, chemical cross-linking/mass spectrometry, Forster energy resonance transfer or FRET, etc.) to establish a federation of data resources that will support sustainable archiving and validation of 3D structural models and experimental data derived from integrative or hybrid methods.
ABSTRACT
We present the seventh report on the performance of methods for predicting the atomic resolution structures of protein complexes offered as targets to the community‐wide initiative on the ...Critical Assessment of Predicted Interactions. Performance was evaluated on the basis of 36 114 models of protein complexes submitted by 57 groups—including 13 automatic servers—in prediction rounds held during the years 2016 to 2019 for eight protein‐protein, three protein‐peptide, and five protein‐oligosaccharide targets with different length ligands. Six of the protein‐protein targets represented challenging hetero‐complexes, due to factors such as availability of distantly related templates for the individual subunits, or for the full complex, inter‐domain flexibility, conformational adjustments at the binding region, or the multi‐component nature of the complex. The main challenge for the protein‐peptide and protein‐oligosaccharide complexes was to accurately model the ligand conformation and its interactions at the interface. Encouragingly, models of acceptable quality, or better, were obtained for a total of six protein‐protein complexes, which included four of the challenging hetero‐complexes and a homo‐decamer. But fewer of these targets were predicted with medium or higher accuracy. High accuracy models were obtained for two of the three protein‐peptide targets, and for one of the protein‐oligosaccharide targets. The remaining protein‐sugar targets were predicted with medium accuracy. Our analysis indicates that progress in predicting increasingly challenging and diverse types of targets is due to closer integration of template‐based modeling techniques with docking, scoring, and model refinement procedures, and to significant incremental improvements in the underlying methodologies.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
We present the quality assessment of 5613 models submitted by predictor groups from both CAPRI and CASP for the total of 15 most tractable targets from the second joint CASP‐CAPRI protein assembly ...prediction experiment. These targets comprised 12 homo‐oligomers and 3 hetero‐complexes. The bulk of the analysis focuses on 10 targets (of CAPRI Round 37), which included all 3 hetero‐complexes, and whose protein chains or the full assembly could be readily modeled from structural templates in the PDB. On average, 28 CAPRI groups and 10 CASP groups (including automatic servers), submitted models for each of these 10 targets. Additionally, about 16 groups participated in the CAPRI scoring experiments. A range of acceptable to high quality models were obtained for 6 of the 10 Round 37 targets, for which templates were available for the full assembly. Poorer results were achieved for the remaining targets due to the lower quality of the templates available for the full complex or the individual protein chains, highlighting the unmet challenge of modeling the structural adjustments of the protein components that occur upon binding or which must be accounted for in template‐based modeling. On the other hand, our analysis indicated that residues in binding interfaces were correctly predicted in a sizable fraction of otherwise poorly modeled assemblies and this with higher accuracy than published methods that do not use information on the binding partner. Lastly, the strengths and weaknesses of the assessment methods are evaluated and improvements suggested.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Abstract
SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the ...disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design.
Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics.
Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor–antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The Chemical Component Dictionary (CCD) is a chemical reference data resource that describes all residue and small molecule components found in Protein Data Bank (PDB) entries. The CCD contains ...detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors, systematic chemical names and idealized coordinates. The content, preparation, validation and distribution of this CCD chemical reference dataset are described.
The CCD is updated regularly in conjunction with the scheduled weekly release of new PDB structure data. The CCD and amino acid variant reference datasets are hosted in the public PDB ftp repository at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz, ftp://ftp.wwpdb.org/pub/pdb/data/monomers/aa-variants-v1.cif.gz, and its mirror sites, and can be accessed from http://wwpdb.org.
jwest@rcsb.rutgers.edu.
Supplementary data are available at Bioinformatics online.
Abstract
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and ...dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.