Understanding how proteins and their complex interaction networks convert the genomic information into a dynamic living organism is a fundamental challenge in biological sciences. As an important ...step towards understanding the systems biology of a complex eukaryote, we cataloged 63% of the predicted Drosophila melanogaster proteome by detecting 9,124 proteins from 498,000 redundant and 72,281 distinct peptide identifications. This unprecedented high proteome coverage for a complex eukaryote was achieved by combining sample diversity, multidimensional biochemical fractionation and analysis-driven experimentation feedback loops, whereby data collection is guided by statistical analysis of prior data. We show that high-quality proteomics data provide crucial information to amend genome annotation and to confirm many predicted gene models. We also present experimentally identified proteotypic peptides matching approximately 50% of D. melanogaster gene models. This library of proteotypic peptides should enable fast, targeted and quantitative proteomic studies to elucidate the systems biology of this model organism.
The question of whether proteins originate from random sequences of amino acids is addressed. A statistical analysis is performed in terms of blocked and random walk values formed by binary ...hydrophobic assignments of the amino acids along the protein chains. Theoretical expectations of these variables from random distributions of hydrophobicities are compared with those obtained from functional proteins. The results, which are based upon proteins in the SWISS-PROT data base, convincingly show that the amino acid sequences in proteins differ from what is expected from random sequences in a statisticaly significant way. By performing Fourier transforms on the random walks, one obtains additional evidence for nonrandomness of the distributions. We have also analyzed results from a synthetic model containing only two amino acid types, hydrophobic and hydrophilic. With reasonable criteria on good folding properties in terms of thermodynamical and kinetic behavior, sequences that fold well are isolated. Performing the same statistical analysis on the sequences that fold well indicates similar deviations from randomness as for the functional proteins. The deviations from randomness can be interpreted as originating from anticorrelations in terms of an Ising spin model for the hydrophobicities. Our results, which differ from some previous investigations using other methods, might have impact on how permissive with respect to sequence specificity the protein folding process is--only sequences with nonrandom hydrophobicity distributions fold well. Other distributions give rise to energy landscapes with poor folding properties and hence did not survive the evolution.
Membrane proteins play a central role in biological processes, but their separation and quantification using two‐dimensional gel electrophoresis is often limited by their poor solubility and ...relatively low abundance. We now present a method for the simultaneous recovery, separation, identification, and relative quantification of membrane proteins, following their selective covalent modification with a cleavable biotin derivative. After cell lysis, biotinylated proteins are purified on streptavidin‐coated resin and proteolytically digested. The resulting peptides are analyzed by high‐pressure liquid chromatography and mass spectrometry, thus yielding a two‐dimensional peptide map. Matrix assisted laser desorption/ionization‐time of flight signal intensity of peptides, in the presence of internal standards, is used to quantify the relative abundance of membrane proteins from cells treated in different experimental conditions. As experimental examples, we present (i) an analysis of a BSA‐spiked human embryonic kidney membrane protein extract, and (ii) an analysis of membrane proteins of human umbilical vein endothelial cells cultured in normoxic and hypoxic conditions. This last study allowed the recovery of the vascular endothelial‐cadherin/actin/catenin complex, revealing an increased accumulation of beta‐catenin at 2% O2 concentration.
Polyphasic‐taxonomic studies of the past decade have shown that the Burkholderia cepacia complex (Bcc) comprises at least nine species, which share a high degree of 16S rDNA (98–100%) sequence ...similarity but only moderate levels of DNA‐DNA hybridization. Members of the Bcc are well known as opportunistic pathogens of plants, animals and humans but also as biocontrol and bioremediation agents. In this study intra‐, surface‐associated and extracellular proteins of B. cenocepacia H111, which was isolated from a cystic fibrosis patient, were examined by 2‐DE coupled to MALDI‐TOF MS. MS and MS/MS data were searched against a database comprising all currently available annotated proteins of genetically closely related strains. In total 642 proteins spots were successfully identified corresponding to 390 different protein species, which were classified into functional categories. The majority of these proteins could be linked to housekeeping functions in energy production, amino acid metabolism, protein folding, post‐translational modification and turnover, and translation. Noteworthy is the fact that a significant number of truly secreted and membrane proteins were identified in the extracellular and surface‐associated sub‐proteomes. This indicates that the pre‐fractionation protocol used in this study is a highly valuable strategy for unravelling the cellular location of the identified proteins.
We have developed a simple optimization procedure for assigning binary values to amino acids. The binary values are determined by a maximization of the degree of pattern conservation in groups of ...closely related protein sequences. The maximization is carried out at fixed composition. For compositions approximately corresponding to an equipartition of the residues, the optimal encoding is found to be strongly correlated with hydrophobicity. The stability of the procedure is demonstrated. Our calculations are based upon sequences in the SWISS-PROT database.
Adipose tissue imposes problems in two‐dimensional (2‐D) analysis due to its extremely high content of fat. To improve protein separation detergents and chaotropes were varied in the IEF step. The ...most important factor for obtaining distinct spots in the 2‐D gel was whether thiourea was included or not. Many high molecular weight spots became resolved by using thiourea, while no spots disappeared or showed inferior characteristics, thus approximately twice as many spots were possible to quantify. Hydrophobic indices were compared for a set of proteins that gave rise to sharper spots with proteins that were not improved on the use of thiourea. The comparison did not give any statistically significant difference between the two groups of proteins. One of the effects obtained by inclusion of thiourea was that the dominating protein, serum albumin, appeared as more condensed spots allowing other minor proteins to be detected. This work resulted in a protocol which greatly enhances the resolution of proteins in adipose tissue. A 2‐D map of mouse white adipose tissue from epididymal fat pads was constructed in which 140 spots were identified by mass spectrometry. This work lays the ground for our further studies on white adipose tissue in metabolic diseases such as obesity and dyslipidemia.
An automated peak picking strategy is presented where several peak sets with different signal-to-noise levels are combined to form a more reliable statement on the protein identity. The strategy is ...compared against both manual peak picking and industry standard automated peak picking on a set of mass spectra obtained after tryptic in gel digestion of 2D-gel samples from human fetal fibroblasts. The set of spectra contain samples ranging from strong to weak spectra, and the proposed multiple-scale method is shown to be much better on weak spectra than the industry standard method and a human operator, and equal in performance to these on strong and medium strong spectra. It is also demonstrated that peak sets selected by a human operator display a considerable variability and that it is impossible to speak of a single “true” peak set for a given spectrum. The described multiple-scale strategy both avoids time-consuming parameter tuning and exceeds the human operator in protein identification efficiency. The strategy therefore promises reliable automated user-independent protein identification using peptide mass fingerprints.