Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep ...learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.
After hundreds of generations of adaptive evolution at exponential growth, Escherichia coli grows as predicted using flux balance analysis (FBA) on genome‐scale metabolic models (GEMs). However, it ...is not known whether the predicted pathway usage in FBA solutions is consistent with gene and protein expression in the wild‐type and evolved strains. Here, we report that >98% of active reactions from FBA optimal growth solutions are supported by transcriptomic and proteomic data. Moreover, when E. coli adapts to growth rate selective pressure, the evolved strains upregulate genes within the optimal growth predictions, and downregulate genes outside of the optimal growth solutions. In addition, bottlenecks from dosage limitations of computationally predicted essential genes are overcome in the evolved strains. We also identify regulatory processes that may contribute to the development of the optimal growth phenotype in the evolved strains, such as the downregulation of known regulons and stringent response suppression. Thus, differential gene and protein expression from wild‐type and adaptively evolved strains supports observed growth phenotype changes, and is consistent with GEM‐computed optimal growth states.
Synopsis
When prokaryotes are maintained at early‐ to mid‐log phase growth through serial passaging for hundreds of generations, the strains improve fitness and evolve a higher growth rate (Lenski and Travisano, 1994; Ibarra et al, 2002). This increased growth rate is the result of the appearance of a few causal mutations (Herring et al, 2006; Conrad et al, 2009). In Escherichia coli, these altered growth phenotypes are consistent with predictions from genome‐scale models of metabolism (GEMs) (Ibarra et al, 2002; Fong and Palsson, 2004). However, it is still not known (1) whether absolute gene and protein expression levels and expression changes are consistent with optimal growth predictions from in silico GEMs or (2) whether measured expression changes can be linked to physiological changes that are based on known mechanisms or pathways. In this study, we begin to address these questions using constraint‐based modeling of E. coli K‐12 metabolism (Feist and Palsson, 2008) to analyze omic data that document the expression changes in E. coli under adaptive evolution in three different growth conditions.
Mapping high‐throughput data to a network can be useful for interpretation. However, it does not account for upstream and downstream effects of gene and protein expression changes. The analysis of data in the context of GEMs can suggest if predicted activity is consistent with the data. For this work, we used a variant of flux balance analysis (FBA), called Parsimonious enzyme usage FBA (pFBA) (Figure 1), to classify all genes according to whether they are used in the optimal growth solutions. Results from these models were compared with the data to assess whether the data were consistent with genes and proteins within the predicted optimal solutions, and whether the expression changes were consistent with measured physiology. Through this analysis, we find that the data provide a high coverage of genes that contribute to the optimal growth solutions (Figure 1B). In fact, the union of the proteomic and transcriptomic data for non‐essential genes provides support for 97.7% of all non‐essential gene‐associated reactions within the optimal growth predictions. Thus, the spectrum of expressed genes and proteins is consistent with the pathway utilization that is predicted for these optimal growth phenotypes.
Laboratory‐evolved strains attain a higher growth rate. This higher growth rate is usually associated with an increased substrate uptake rate (Ibarra et al, 2002; Fong et al, 2005) and in some cases more efficient metabolism (Ibarra et al, 2002). Both of these properties are also witnessed in the strains studied here. It has been reported that in most cases, evolved strain growth phenotype is consistent with GEM predictions (Ibarra et al, 2002; Teusink et al, 2009). Here, we evaluate whether the laboratory‐evolved strains adjust the gene and protein expression levels in accordance with pathway usage in the optimal growth predictions. Essential and non‐essential genes and proteins within the optimal growth solutions are significantly upregulated (Figure 1B). This suggests that these proteins may be acting as bottlenecks that are relieved through the adaptive process, thereby allowing for a higher substrate uptake rate and growth rate. However, genes and proteins associated with reactions that cannot carry a flux in the given growth conditions are downregulated in the evolved strains (Figure 1B). Furthermore, there is downregulation of genes associated with less efficient pathways (Figure 5C). Thus, the omic data support the emergence of the predicted optimal growth states, consistent with the increased substrate uptake upstream and the increased biomass production downstream of these internal pathways.
Regulatory mechanisms, both known and unknown, are responsible for the changes seen here. Across all data sets, several metabolic regulons are significantly downregulated. However, no known regulons were enriched among upregulated genes or proteins for all but one data set. Aside from just regulating the metabolic pathways directly, these mechanisms lead to additional physiological changes. For example, in the minimal media growth conditions used here, the stringent response normally represses growth while upregulating amino‐acid biosynthetic processes. However, evolved strain gene expression shows a suppression of the stringent response, as evolved strain gene expression shows either no expression change or changes opposite to the normal stringent response.
The implications of this work are as follows: (1) genome‐scale gene and protein expression data are consistent with FBA computed optimal growth states, and evolved strains reinforce these optimal states; (2) genome‐scale models will have an important function bridging the gap between genotype and phenotype; and (3) the development of additional genome‐scale models of other growth‐related processes such as transcription and translation (Thiele et al, 2009) will have an important function in elucidating the mechanisms that contribute the most to altered phenotypes (Lewis et al, 2009a). In addition, reconstruction of the transcriptional regulation network will aid in identifying the control of expression changes seen in the other systems.
Proteomic and transcriptomic data from wild‐type and laboratory‐evolved strains of Escherichia coli are consistent with predicted pathway usage from optimal growth rate solutions.
In laboratory‐evolved strains, there is an upregulation of the pathways in the computed optimal growth states, and downregulation of non‐functional pathways.
Known regulatory mechanisms are only partially responsible for altered metabolic pathway activity.
In this study, we evaluated a concatenated low pH (pH 3) and high pH (pH 10) reversed‐phase liquid chromatography strategy as a first dimension for two‐dimensional liquid chromatography tandem mass ...spectrometry (“shotgun”) proteomic analysis of trypsin‐digested human MCF10A cell sample. Compared with the more traditional strong cation exchange method, the use of concatenated high pH reversed‐phase liquid chromatography as a first‐dimension fractionation strategy resulted in 1.8‐ and 1.6‐fold increases in the number of peptide and protein identifications (with two or more unique peptides), respectively. In addition to broader identifications, advantages of the concatenated high pH fractionation approach include improved protein sequence coverage, simplified sample processing, and reduced sample losses. The results demonstrate that the concatenated high pH reversed‐phased strategy is an attractive alternative to strong cation exchange for two‐dimensional shotgun proteomic analysis.
A new strategy for the fast monitoring of peptide biomarkers is described. It is based on the use of accelerated in-solution trypsin digestions under an ultrasonic field provided by high-intensity ...focused ultrasound (HIFU) and the monitoring of several peptides by selected MS/MS ion monitoring in a linear ion trap mass spectrometer. The performance of the method was established for the unequivocal identification of all commercial fish species belonging to the Merlucciidae family. Using a particular combination of only 11 peptides, resulting from the HIFU-assisted tryptic digestion of the thermostable proteins parvalbumins, the workflow allowed the unequivocal identification of these closely related fish species in any seafood product, including processed and precooked products, in less than 2 h. The present strategy constitutes the fastest method for peptide biomarker monitoring. Its application for food quality control provides to the authorities an effective and rapid method of food authentication and traceability to guarantee the quality and safety to the consumers.
Maintenance of macrophages in their basal state and their rapid activation in response to pathogen detection are central to the innate immune system, acting to limit nonspecific oxidative damage and ...promote pathogen killing following infection. To identify possible age-related alterations in macrophage function, we have assayed the function of peritoneal macrophages from young (3–4 months) and aged (14–15 months) Balb/c mice. In agreement with prior suggestions, we observe age-dependent increases in the extent of recruitment of macrophages into the peritoneum, as well as ex vivo functional changes involving enhanced nitric oxide production under resting conditions that contribute to a reduction in the time needed for full activation of senescent macrophages following exposure to lipopolysaccharides (LPS). Further, we observe enhanced bactericidal activity following Salmonella uptake by macrophages isolated from aged Balb/c mice in comparison with those isolated from young animals. Pathways responsible for observed phenotypic changes were interrogated using tandem mass spectrometry, which identified age-dependent increases in levels of proteins linked to immune cell pathways under basal conditions and following LPS activation. Immune pathways upregulated in macrophages isolated from aged mice include proteins critical to the formation of the immunoproteasome. Detection of these latter proteins is dramatically enhanced following LPS exposure for macrophages isolated from aged animals; in comparison, the identification of immunoproteasome subunits is insensitive to LPS exposure for macrophages isolated from young animals. Consistent with observed global changes in the proteome, quantitative proteomic measurements indicate that there are age-dependent abundance changes involving specific proteins linked to immune cell function under basal conditions. LPS exposure selectively increases the levels of many proteins involved in immune cell function in aged Balb/c mice. Collectively, these results indicate that macrophages isolated from old mice are in a preactivated state that enhances their sensitivities to LPS exposure. The hyper-responsive activation of macrophages in aged animals may act to minimize infection by general bacterial threats that arise due to age-dependent declines in adaptive immunity. However, this hypersensitivity and the associated increase in the level of formation of reactive oxygen species are likely to contribute to observed age-dependent increases in the level of oxidative damage that underlie many diseases of the elderly.
Sample preparation is a fundamental step in the proteomics workflow. However, it is not easy to find compiled information updating this subject. In this paper, the strategies and protocols for ...protein extraction and identification, following either classical or second generation proteomics methodologies, are reviewed. Procedures for: tissue disruption, cell lysis, sample pre-fractionation, protein separation by 2-DE, protein digestion, mass spectrometry analysis, multidimensional peptide separations and quantification of protein expression level are described.
Nitric oxide is implicated in a variety of signaling pathways in different systems, notably in endothelial cells. Some of its effects can be exerted through covalent modifications of proteins and, ...among these modifications, increasing attention is being paid to S-nitrosylation as a signaling mechanism. In this work, we show by a variety of methods (ozone chemiluminescence, biotin switch, and mass spectrometry) that the molecular chaperone Hsp90 is a target of S-nitrosylation and identify a susceptible cysteine residue in the region of the C-terminal domain that interacts with endothelial nitric oxide synthase (eNOS). We also show that the modification occurs in endothelial cells when they are treated with S-nitroso-L-cysteine and when they are exposed to eNOS activators. Hsp90 ATPase activity and its positive effect on eNOS activity are both inhibited by S-nitrosylation. Together, these data suggest that S-nitrosylation may functionally regulate the general activities of Hsp90 and provide a feedback mechanism for limiting eNOS activation.
The process of protein digestion is a critical step for successful protein identification in bottom-up proteomic analyses. To substitute the present practice of in-solution protein digestion, which ...is long, tedious, and difficult to automate, many efforts have been dedicated for the development of a rapid, recyclable and automated digestion system. Recent advances of nanobiocatalytic approaches have improved the performance of protein digestion by using various nanomaterials such as nanoporous materials, magnetic nanoparticles, and polymer nanofibers. Especially, the unprecedented success of trypsin stabilization in the form of trypsin-coated nanofibers, showing no activity decrease under repeated uses for 1 year and retaining good resistance to proteolysis, has demonstrated its great potential to be employed in the development of automated, high-throughput, and on-line digestion systems. This review discusses recent developments of nanobiocatalytic approaches for the improved performance of protein digestion in speed, detection sensitivity, recyclability, and trypsin stability. In addition, we also introduce approaches for protein digestion under unconventional energy input for protein denaturation and the development of microfluidic enzyme reactors that can benefit from recent successes of these nanobiocatalytic approaches.
Properties of Average Score Distributions of SEQUEST Martiánez-Bartolomé, Salvador; Navarro, Pedro; Martián-Maroto, Fernando ...
Molecular & cellular proteomics,
June 2008, 20080601, 2008-06-00, Letnik:
7, Številka:
6
Journal Article
Recenzirano
Odprti dostop
High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification ...results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.