A full description of the human proteome relies on the challenging task of detecting mature and changing forms of protein molecules in the body. Large-scale proteome analysis has routinely involved ...digesting intact proteins followed by inferred protein identification using mass spectrometry. This 'bottom-up' process affords a high number of identifications (not always unique to a single gene). However, complications arise from incomplete or ambiguous characterization of alternative splice forms, diverse modifications (for example, acetylation and methylation) and endogenous protein cleavages, especially when combinations of these create complex patterns of intact protein isoforms and species. 'Top-down' interrogation of whole proteins can overcome these problems for individual proteins, but has not been achieved on a proteome scale owing to the lack of intact protein fractionation methods that are well integrated with tandem mass spectrometry. Here we show, using a new four-dimensional separation system, identification of 1,043 gene products from human cells that are dispersed into more than 3,000 protein species created by post-translational modification (PTM), RNA splicing and proteolysis. The overall system produced greater than 20-fold increases in both separation power and proteome coverage, enabling the identification of proteins up to 105 kDa and those with up to 11 transmembrane helices. Many previously undetected isoforms of endogenous human proteins were mapped, including changes in multiply modified species in response to accelerated cellular ageing (senescence) induced by DNA damage. Integrated with the latest version of the Swiss-Prot database, the data provide precise correlations to individual genes and proof-of-concept for large-scale interrogation of whole protein molecules. The technology promises to improve the link between proteomics data and complex phenotypes in basic biology and disease research.
Top-down proteomics has improved over the past decade despite the significant challenges presented by the analysis of large protein ions. Here, the detection of these high mass species by ...electrospray-based mass spectrometry (MS) is examined from a theoretical perspective to understand the mass-dependent increases in the number of charge states, isotopic peaks, and interfering species present in typical protein mass spectra. Integrating these effects into a quantitative model captures the reduced ability to detect species over 25 kDa with the speed and sensitivity characteristic of proteomics based on <3 kDa peptide ions. The model quantifies the challenge that top-down proteomics faces with respect to current MS instrumentation and projects that depletion of 13C and 15N isotopes can improve detection at high mass by only <2-fold at 100 kDa whereas the effect is up to 5-fold at 10 kDa. Further, we find that supercharging electrosprayed proteins to the point of producing <5 charge states at high mass would improve detection by more than 20-fold.
Characterizing whole proteins by top-down proteomics avoids a step of inference encountered in the dominant bottom-up methodology when peptides are assembled computationally into proteins for ...identification. The direct interrogation of whole proteins and protein complexes from the venom of Ophiophagus hannah (king cobra) provides a sharply clarified view of toxin sequence variation, transit peptide cleavage sites and post-translational modifications (PTMs) likely critical for venom lethality. A tube-gel format for electrophoresis (called GELFrEE) and solution isoelectric focusing were used for protein fractionation prior to LC-MS/MS analysis resulting in 131 protein identifications (18 more than bottom-up) and a total of 184 proteoforms characterized from 14 protein toxin families. Operating both GELFrEE and mass spectrometry to preserve non-covalent interactions generated detailed information about two of the largest venom glycoprotein complexes: the homodimeric l-amino acid oxidase (∼130 kDa) and the multichain toxin cobra venom factor (∼147 kDa). The l-amino acid oxidase complex exhibited two clusters of multiproteoform complexes corresponding to the presence of 5 or 6 N-glycans moieties, each consistent with a distribution of N-acetyl hexosamines. Employing top-down proteomics in both native and denaturing modes provides unprecedented characterization of venom proteoforms and their complexes. A precise molecular inventory of venom proteins will propel the study of snake toxin variation and the targeted development of new antivenoms or other biotherapeutics.
The direct analysis of intact proteins via MS offers compelling advantages in comparison to alternative methods due to the direct and unambiguous identification and characterization of protein ...sequences it provides. The inability to efficiently analyze proteins in the “middle mass range,” defined here as proteins from 30 to 80 kDa, in a robust fashion has limited the adoption of these “top‐down” methods. Largely, a result of poor liquid chromatographic performance, the limitations in this mass range may be addressed by alternative separations that replace chromatography. Herein, the short migration times of CZE‐ESI‐MS/MS have been extended to size‐sorted whole proteins in complex mixtures from Pseudomonas aeruginosa PA01. An electrokinetically pumped nanospray interface, a coated capillary, and a stacking method for on‐column sample concentration were developed to achieve high‐loading capacity and separation resolution. We achieved full width at half maximum of 8–16 s for model proteins up to 29 kDa and identified 30 proteins in the mass range of 30–80 kDa from P. aeruginosa PA01 whole cell lysate. These results suggest that CZE‐ESI‐MS/MS is capable of identifying proteins in the middle mass range in top‐down proteomics.
An Orbitrap-based ion analysis procedure determines the direct charge for numerous individual protein ions to generate true mass spectra. This individual ion mass spectrometry (I
MS) method for ...charge detection enables the characterization of highly complicated mixtures of proteoforms and their complexes in both denatured and native modes of operation, revealing information not obtainable by typical measurements of ensembles of ions.
It is well-known that with Orbitrap-based Fourier-transform-mass-spectrometry (FT-MS) analysis, longer-time-domain signals are needed to better resolve species of interest. Unfortunately, increasing ...the signal-acquisition period comes at the expense of increasing ion decay, which lowers signal-to-noise ratios and ultimately limits resolution. This is especially problematic for intact proteins, including antibodies, which demonstrate rapid decay because of their larger collisional cross-sections, and result in more frequent collisions with background gas molecules. Provided here is a method that utilizes numerous low-ion-count spectra and single-ion processing to reconstruct a conventional m/z spectrum. This technique has been applied to proteins varying in molecular weight from 8 to 150 kDa, with a resolving power of 677 000 achieved for transients of carbonic anhydrase (29 kDa) with a duration of only ∼250 ms. A resolution improvement ranging from 10- to 20-fold was observed for all proteins, providing isotopic resolution where none was previously present.
Targeted top-down (TD) and middle-down (MD) mass spectrometry (MS) offer reduced sample manipulation during protein analysis, limiting the risk of introducing artifactual modifications to better ...capture sequence information on the proteoforms present. This provides some advantages when characterizing biotherapeutic molecules such as monoclonal antibodies, particularly for the class of biosimilars. Here, we describe the results obtained analyzing a monoclonal IgG1, either in its ∼150 kDa intact form or after highly specific digestions yielding ∼25 and ∼50 kDa subunits, using an Orbitrap mass spectrometer on a liquid chromatography (LC) time scale with fragmentation from ion–photon, ion–ion, and ion–neutral interactions. Ultraviolet photodissociation (UVPD) used a new 213 nm solid-state laser. Alternatively, we applied high-capacity electron-transfer dissociation (ETD HD), alone or in combination with higher energy collisional dissociation (EThcD). Notably, we verify the degree of complementarity of these ion activation methods, with the combination of 213 nm UVPD and ETD HD producing a new record sequence coverage of ∼40% for TD MS experiments. The addition of EThcD for the >25 kDa products from MD strategies generated up to 90% of complete sequence information in six LC runs. Importantly, we determined an optimal signal-to-noise threshold for fragment ion deconvolution to suppress false positives yet maximize sequence coverage and implemented a systematic validation of this process using the new software TDValidator. This rigorous data analysis should elevate confidence for assignment of dense MS2 spectra and represents a purposeful step toward the application of TD and MD MS for deep sequencing of monoclonal antibodies.
Post-translational modifications (PTMs) of proteins play a central role in cellular information encoding, but the complexity of PTM state has been challenging to unravel. A single molecule can ...exhibit a “modform” or combinatorial pattern of co-occurring PTMs across multiple sites, and a molecular population can exhibit a distribution of amounts of different modforms. How can this “modform distribution” be estimated by mass spectrometry (MS)? Bottom-up MS, based on cleavage into peptides, destroys correlations between PTMs on different peptides, but it is conceivable that multiple proteases with appropriate patterns of cleavage could reconstruct the modform distribution. We introduce a mathematical language for describing MS measurements and show, on the contrary, that no matter how many distinct proteases are available, the shortfall in information required for reconstruction worsens exponentially with increasing numbers of sites. Whereas top-down MS on intact proteins can do better, current technology cannot prevent the exponential worsening. However, our analysis also shows that all forms of MS yield linear equations for modform amounts. This permits different MS protocols to be integrated and the modform distribution to be constrained within a high-dimensional “modform region”, which may offer a feasible proxy for analyzing information encoding.
Histones, and their modifications, are critical components of cellular programming and epigenetic inheritance. Recently, cancer genome sequencing has uncovered driver mutations in chromatin modifying ...enzymes spurring high interest how such mutations change histone modification patterns. Here, we applied Top-Down mass spectrometry for the characterization of combinatorial modifications (i.e. methylation and acetylation) on full length histone H3 from human cell lines derived from multiple myeloma patients with overexpression of the histone methyltransferase MMSET as the result of a t(4;14) chromosomal translocation. Using the latest in Orbitrap-based technology for clean isolation of isobaric proteoforms containing up to 10 methylations and/or up to two acetylations, we provide extensive characterization of histone H3.1 and H3.3 proteoforms. Differential analysis of modifications by electron-based dissociation recapitulated antagonistic crosstalk between K27 and K36 methylation in H3.1, validating that full-length histone H3 (15 kDa) can be analyzed with site-specific assignments for multiple modifications. It also revealed K36 methylation in H3.3 was affected less by the overexpression of MMSET because of its higher methylation levels in control cells. The co-occurrence of acetylation with a minimum of three methyl groups in H3K9 and H3K27 suggested a hierarchy in the addition of certain modifications. Comparative analysis showed that high levels of MMSET in the myeloma-like cells drove the formation of hypermethyled proteoforms containing H3K36me2 co-existent with the repressive marks H3K9me2/3 and H3K27me2/3. Unique histone proteoforms with such “trivalent hypermethylation” (K9me2/3-K27me2/3-K36me2) were not discovered when H3.1 peptides were analyzed by Bottom-Up. Such disease-correlated proteoforms could link tightly to aberrant transcription programs driving cellular proliferation, and their precise description demonstrates that Top-Down mass spectrometry can now decode crosstalk involving up to three modified sites.
Native mass spectrometry (MS) is becoming an important integral part of structural proteomics and system biology research. The approach holds great promise for elucidating higher levels of protein ...structure: from primary to quaternary. This requires the most efficient use of tandem MS, which is the cornerstone of MS-based approaches. In this work, we advance a two-step fragmentation approach, or (pseudo)-MS3, from native protein complexes to a set of constituent fragment ions. Using an efficient desolvation approach and quadrupole selection in the extended mass-to-charge (m/z) range, we have accomplished sequential dissociation of large protein complexes, such as phosporylase B (194 kDa), pyruvate kinase (232 kDa), and GroEL (801 kDa), to highly charged monomers which were then dissociated to a set of multiply charged fragmentation products. Fragment ion signals were acquired with a high resolution, high mass accuracy Orbitrap instrument that enabled highly confident identifications of the precursor monomer subunits. The developed approach is expected to enable characterization of stoichiometry and composition of endogenous native protein complexes at an unprecedented level of detail.