Experimental constraints associated with NMR structures are available from the Protein Data Bank (PDB) in the form of "Magnetic Resonance" (MR) files. These files contain multiple types of data ...concatenated without boundary markers and are difficult to use for further research. Reported here are the results of a project initiated to annotate, archive, and disseminate these data to the research community from a searchable resource in a uniform format. The MR files from a set of 1410 NMR structures were analyzed and their original constituent data blocks annotated as to data type using a semi-automated protocol. A new software program called Wattos was then used to parse and archive the data in a relational database. From the total number of MR file blocks annotated as constraints, it proved possible to parse 84% (3337/3975). The constraint lists that were parsed correspond to three data types (2511 distance, 788 dihedral angle, and 38 residual dipolar couplings lists) from the three most popular software packages used in NMR structure determination: XPLOR/CNS (2520 lists), DISCOVER (412 lists), and DYANA/DIANA (405 lists). These constraints were then mapped to a developmental version of the BioMagResBank (BMRB) data model. A total of 31 data types originating from 16 programs have been classified, with the NOE distance constraint being the most commonly observed. The results serve as a model for the development of standards for NMR constraint deposition in computer-readable form. The constraints are updated regularly and are available from the BMRB web site (http://www.bmrb.wisc.edu).
We present two new databases of NMR-derived distance and dihedral angle restraints: the Database Of Converted Restraints (DOCR) and the Filtered Restraints Database (FRED). These databases currently ...correspond to 545 proteins with NMR structures deposited in the Protein Databank (PDB). The criteria for inclusion were that these should be unique, monomeric proteins with author-provided experimental NMR data and coordinates available from the PDB capable of being parsed and prepared in a consistent manner. The Wattos program was used to parse the files, and the CcpNmr FormatConverter program was used to prepare them semi-automatically. New modules, including a new implementation of Aqua in the BioMagResBank (BMRB) software Wattos were used to analyze the sets of distance restraints (DRs) for inconsistencies, redundancies, NOE completeness, classification and violations with respect to the original coordinates. Restraints that could not be associated with a known nomenclature were flagged. The coordinates of hydrogen atoms were recalculated from the positions of heavy atoms to allow for a full restraint analysis. The DOCR database contains restraint and coordinate data that is made consistent with each other and with IUPAC conventions. The FRED database is based on the DOCR data but is filtered for use by test calculation protocols and longitudinal analyses and validations. These two databases are available from websites of the BMRB and the Macromolecular Structure Database (MSD) in various formats: NMR-STAR, CCPN XML, and in formats suitable for direct use in the software packages CNS and CYANA.
The completeness of experimentally observed NOE restraints of a set of 97 NMR protein structures deposited in the PDB has been assessed. Completeness is defined as the ratio of the number of ...experimentally observed NOEs and the number of 'expected NOEs'. A practical definition of 'expected NOEs' based on inter-proton distances in the structures up to a given cut-off distance is proposed. The average completeness for the set of 97 structures is 68, 48, and 26% up to 3, 4, and 5 Å cut-off distances, respectively. For recent state-of-the-art structures these numbers are approximately 90, 75, and 45%. Almost 20% of the observed NOEs are between atoms that are further than 5 Å apart in the final structures. The completeness is independent of the relative surface accessibility and does not depend strongly on residue type, secondary structure or local precision, although the number of observed NOEs in these classes varies considerably. The completeness of NOE restraints is a useful quality criterion in the course of structure refinement. The completeness per residue is more informative than the number of NOEs per residue, which makes it a useful tool to assess the quality of the NMR data set in relation to the resulting structures.PUBLICATION ABSTRACT
Any protein structure determination process contains several steps, starting from obtaining a suitable sample, then moving on to acquiring data and spectral assignment, and lastly to the final steps ...of structure determination and validation. This unit describes all of these steps, starting with the basic physical principles behind NMR and some of the most commonly measured and observed phenomena such as chemical shift, scalar and residual coupling, and the nuclear Overhauser effect. Then, in somewhat more detail, the process of spectral assignment and structure elucidation is explained. Furthermore, the use of NMR to study protein-ligand interaction, protein dynamics, or protein folding is described.
Background: Integrase mediates a crucial step in the life cycle of the human immunodeficiency virus (HIV). The enzyme cleaves the viral DNA ends in a sequence-dependent manner and couples the newly ...generated hydroxyl groups to phosphates in the target DNA. Three domains have been identified in HIV integrase: an amino-terminal domain, a central catalytic core and a carboxy-terminal DNA-binding domain. The amino-terminal region is the only domain with unknown structure thus far. This domain, which is known to bind zinc, contains a HHCC motif that is conserved in retroviral integrases. Although the exact function of this domain is unknown, it is required for cleavage and integration.
Results: The three-dimensional structure of the amino-terminal domain of HIV-2 integrase has been determined using two-dimensional and three-dimensional nuclear magnetic resonance data. We obtained 20 final structures, calculated using 693 nuclear Overhauser effects, which display a backbone root-mean square deviation versus the average of 0.25 Å for the well defined region. The structure consists of three α helices and a helical turn. The zinc is coordinated with His 12 via the Nϵ2 atom, with His 16 via the Nδ1 atom and with the sulfur atoms of Cys40 and Cys43. The α helices form a three-helix bundle that is stabilized by this zinc-binding unit. The helical arrangement is similar to that found in the DNA-binding domains of the trp repressor, the prd paired domain and Tc3A transposase.
Conclusion: The amino-terminal domain of HIV-2 integrase has a remarkable hybrid structure combining features of a three-helix bundle fold with a zinc-binding HHCC motif. This structure shows no similarity with any of the known zinc-finger structures. The strictly conserved residues of the HHCC motif of retroviral integrases are involved in metal coordination, whereas many other well conserved hydrophobic residues are part of the protein core.
The Worldwide Protein Data Bank (wwPDB; wwpdb.org) is the international collaboration that manages the deposition, processing and distribution of the PDB archive. The online PDB archive at ...ftp://ftp.wwpdb.org is the repository for the coordinates and related information for more than 47 000 structures, including proteins, nucleic acids and large macromolecular complexes that have been determined using X-ray crystallography, NMR and electron microscopy techniques. The members of the wwPDB-RCSB PDB (USA), MSD-EBI (Europe), PDBj (Japan) and BMRB (USA)-have remediated this archive to address inconsistencies that have been introduced over the years. The scope and methods used in this project are presented.
Toluene 4-monooxygenase, a four-protein complex from Pseudomonas mendocina KR1, catalyzes the NADH- and O(2)-dependent hydroxylation of toluene to form p-cresol. The solution structure of the ...112-amino-acid Rieske ferredoxin component, T4moC, was determined from 2D and 3D (1)H, (13)C, and (15)N NMR data. The structural model was refined through simulated annealing by molecular dynamics in torsion angle space with input from 1650 experimental restraints, including 1264 inter-proton distance restraints obtained from NOEs, 247 non-redundant intra-residue NOEs, 26 hydrogen bond restraints, and 113 dihedral angle ( phi, psi) restraints. The 20 calculated conformers that best satisfied the input restraints were submitted to refinement in explicit solvent to improve the stereochemical quality. With exclusion of ill-defined N- and C-terminal segments (Ser2; His111-Ser112) and residues near to the 2Fe-2S cluster, the atomic root mean square deviation for the 20 conformers with respect to the mean coordinates was 1.09 A for the backbone and 1.60 A for all non-hydrogen atoms. The T4moC structure consists of 10 beta-strands arranged in the three anti-parallel beta-sheet topology observed in all Rieske 2Fe-2S domain proteins. The S(gamma) of Cys45 and Cys64 and the N(delta1) of His47 and His67 provide the ligands to the 2Fe-2S cluster of T4moC. (1)H-(15)N HSQC measurements show that both His47-N(epsilon2) and His67-N(epsilon2) are protonated at the pH of the NMR experiments. Comparisons are made between the present NMR structure, previous paramagnetic NMR studies of T4moC, and the X-ray structures of other members of the Rieske protein family.
The three-dimensional solution structure of the immunodominant central conserved region of the attachment protein G (BRSV-G) of bovine respiratory syncytial virus has been determined by nuclear ...magnetic resonance (NMR) spectroscopy. In the 32-residue peptide studied, 19 residues form a small rigid core composed of two short helices, connected by a type I‘ turn, and linked by two disulfide bridges. This unique fold is among the smallest stable tertiary structures known and could therefore serve as an ideal building block for the design of de novo proteins and as a test case for modeling studies. A characteristic hydrophobic pocket, lined by conserved residues, lies at the surface of the peptide and may play a role in receptor binding. This work provides a structural basis for further peptide vaccine development against the severe diseases associated with the respiratory syncytial viruses in both cattle and man.
A statistical analysis is reported of experimental data and coordinates of a set of 97 NMR structures deposited in the PDB. The aim is to assess the quality of these structures in relation to the ...amount of experimental information. Experimental restraints were analysed using the program AQUA. Many nomenclature inconsistencies between deposited restraint and coordinate files were observed. The experimental restraint files were found to contain a high proportion of redundant restraints. Procedures for analysing and correcting the inconsistencies and restraint counts are described.
The analysis of NOE restraint violations (using AQUA) and of a wide variety of geometrical quality indicators (using PROCHECK-NMR and WHAT IF) provides a reference for other NMR structure determinations. The extent of NOE violations is anti-correlated with the quality of the Ramachandran map. The precision as measured by the circular variance of backbone dihedral angles, does increase with the amount of experimental data, as expected, but is sometimes overestimated. Bond lengths, bond angles and planarity of groups can deviate considerably from ideal values. Outliers appear to cluster per laboratory, indicating that the results depend on particulars of refinement protocols and/or software. We have identified a problem of atom overlap in a number of refined structures.
We recommend adhering to the standard nomenclature as put forward by an IUPAC Task Group, to ensure consistency between restraints and coordinates, and to omit redundant restraints from the deposition. The results obtained from this analysis and the AQUA program are available through the World Wide Web.