De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model ...organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Abstract
The Human Proteoform Atlas (HPfA) is a web-based repository of experimentally verified human proteoforms on-line at http://human-proteoform-atlas.org and is a direct descendant of the ...Consortium of Top-Down Proteomics’ (CTDP) Proteoform Atlas. Proteoforms are the specific forms of protein molecules expressed by our cells and include the unique combination of post-translational modifications (PTMs), alternative splicing and other sources of variation deriving from a specific gene. The HPfA uses a FAIR system to assign persistent identifiers to proteoforms which allows for redundancy calling and tracking from prior and future studies in the growing community of proteoform biology and measurement. The HPfA is organized around open ontologies and enables flexible classification of proteoforms. To achieve this, a public registry of experimentally verified proteoforms was also created. Submission of new proteoforms can be processed through email vianrtdphelp@northwestern.edu, and future iterations of these proteoform atlases will help to organize and assign function to proteoforms, their PTMs and their complexes in the years ahead.
A full description of the human proteome relies on the challenging task of detecting mature and changing forms of protein molecules in the body. Large-scale proteome analysis has routinely involved ...digesting intact proteins followed by inferred protein identification using mass spectrometry. This 'bottom-up' process affords a high number of identifications (not always unique to a single gene). However, complications arise from incomplete or ambiguous characterization of alternative splice forms, diverse modifications (for example, acetylation and methylation) and endogenous protein cleavages, especially when combinations of these create complex patterns of intact protein isoforms and species. 'Top-down' interrogation of whole proteins can overcome these problems for individual proteins, but has not been achieved on a proteome scale owing to the lack of intact protein fractionation methods that are well integrated with tandem mass spectrometry. Here we show, using a new four-dimensional separation system, identification of 1,043 gene products from human cells that are dispersed into more than 3,000 protein species created by post-translational modification (PTM), RNA splicing and proteolysis. The overall system produced greater than 20-fold increases in both separation power and proteome coverage, enabling the identification of proteins up to 105 kDa and those with up to 11 transmembrane helices. Many previously undetected isoforms of endogenous human proteins were mapped, including changes in multiply modified species in response to accelerated cellular ageing (senescence) induced by DNA damage. Integrated with the latest version of the Swiss-Prot database, the data provide precise correlations to individual genes and proof-of-concept for large-scale interrogation of whole protein molecules. The technology promises to improve the link between proteomics data and complex phenotypes in basic biology and disease research.
A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post‐translational modifications. In top‐down proteomic analyses, proteoforms ...are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top‐down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.
Successful high-throughput characterization of intact proteins from complex biological samples by mass spectrometry requires instrumentation capable of high mass resolving power, mass accuracy, ...sensitivity, and spectral acquisition rate. These limitations often necessitate the performance of hundreds of LC–MS/MS experiments to obtain reasonable coverage of the targeted proteome, which is still typically limited to molecular weights below 30 kDa. The National High Magnetic Field Laboratory (NHMFL) recently installed a 21 T FT-ICR mass spectrometer, which is part of the NHMFL FT-ICR User Facility and available to all qualified users. Here we demonstrate top-down LC-21 T FT-ICR MS/MS of intact proteins derived from human colorectal cancer cell lysate. We identified a combined total of 684 unique protein entries observed as 3238 unique proteoforms at a 1% false discovery rate, based on rapid, data-dependent acquisition of collision-induced and electron-transfer dissociation tandem mass spectra from just 40 LC–MS/MS experiments. Our identifications included 372 proteoforms with molecular weights over 30 kDa detected at isotopic resolution, which substantially extends the accessible mass range for high-throughput top-down LC–MS/MS.
Premature ovarian insufficiency (POI) affects approximately 1% of women. We aim to understand the ovarian microenvironment, including the extracellular matrix (ECM) and associated proteins ...(matrisome), and its role in controlling folliculogenesis. We mapped the composition of the matrisome of porcine ovaries through the cortical compartment, where quiescent follicles reside and the medullary compartment, where the larger follicles grow and mature. To do this we sliced the ovaries, uniformly in two anatomical planes, enriched for matrisome proteins and performed bottom-up shotgun proteomic analyses. We identified 42 matrisome proteins that were significantly differentially expressed across depths, and 11 matrisome proteins that have not been identified in previous ovarian protein analyses. We validated these data for nine proteins and confirmed compartmental differences with a second processing method. Here we describe a processing and proteomic analysis pipeline that revealed spatial differences and matrisome protein candidates that may influence folliculogenesis.
Amyloid-beta (Aβ) plays a key role in the pathogenesis of Alzheimer's disease (AD), but little is known about the proteoforms present in AD brain. We used high-resolution mass spectrometry to analyze ...intact Aβ from soluble aggregates and insoluble material in brains of six cases with severe dementia and pathologically confirmed AD. The soluble aggregates are especially relevant because they are believed to be the most toxic form of Aβ. We found a diversity of Aβ peptides, with 26 unique proteoforms including various N- and C-terminal truncations. N- and C-terminal truncations comprised 73% and 30%, respectively, of the total Aβ proteoforms detected. The Aβ proteoforms segregated between the soluble and more insoluble aggregates with N-terminal truncations predominating in the insoluble material and C- terminal truncations segregating into the soluble aggregates. In contrast, canonical Aβ comprised the minority of the identified proteoforms (15.3%) and did not distinguish between the soluble and more insoluble aggregates. The relative abundance of many truncated Aβ proteoforms did not correlate with post-mortem interval, suggesting they are not artefacts. This heterogeneity of Aβ proteoforms deepens our understanding of AD and offers many new avenues for investigation into pathological mechanisms of the disease, with implications for therapeutic development.
It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and ...artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chemical formulas, or controlled vocabulary terms, including cross-links (natural and chemical) and atomic isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.
Human biology is tightly linked to proteins, yet most measurements do not precisely determine alternatively spliced sequences or posttranslational modifications. Here, we present the primary ...structures of ~30,000 unique proteoforms, nearly 10 times more than in previous studies, expressed from 1690 human genes across 21 cell types and plasma from human blood and bone marrow. The results, compiled in the Blood Proteoform Atlas (BPA), indicate that proteoforms better describe protein-level biology and are more specific indicators of differentiation than their corresponding proteins, which are more broadly expressed across cell types. We demonstrate the potential for clinical application, by interrogating the BPA in the context of liver transplantation and identifying cell and proteoform signatures that distinguish normal graft function from acute rejection and other causes of graft dysfunction.