Independent component analysis was applied to human breast cancer proteogenomic data, and pathway-level signatures were further integrated with clinical information. Our results demonstrated that ICA ...can be used to extract biological relevant signals from multi-omics data in an unsupervised manner.
Display omitted
Highlights
•Unsupervised feature extraction from proteogenomics data.•Pathway level integration of multi-omics data based on clinical features.
Recent advances in the multi-omics characterization necessitate knowledge integration across different data types that go beyond individual biomarker discovery. In this study, we apply independent component analysis (ICA) to human breast cancer proteogenomics data to retrieve mechanistic information. We show that as an unsupervised feature extraction method, ICA was able to construct signatures with known biological relevance on both transcriptome and proteome levels. Moreover, proteome and transcriptome signatures can be associated by their respective correlation with patient clinical features, providing an integrated description of phenotype-related biological processes. Our results demonstrate that the application of ICA to proteogenomics data could lead to pathway-level knowledge discovery. Potential extension of this approach to other data and cancer types may contribute to pan-cancer integration of multi-omics information.
PGx: Putting Peptides to BED Askenazi, Manor; Ruggles, Kelly V; Fenyö, David
Journal of proteome research,
03/2016, Letnik:
15, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Every molecular player in the cast of biology’s central dogma is being sequenced and quantified with increasing ease and coverage. To bring the resulting genomic, transcriptomic, and proteomic data ...sets into coherence, tools must be developed that do not constrain data acquisition and analytics in any way but rather provide simple links across previously acquired data sets with minimal preprocessing and hassle. Here we present such a tool: PGx, which supports proteogenomic integration of mass spectrometry proteomics data with next-generation sequencing by mapping identified peptides onto their putative genomic coordinates.
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and subtype of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma ...(LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them-STK11, EGFR, FAT1, SETBP1, KRAS and TP53-can be predicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH .
This paper investigates the use of survival functions and expectation values to evaluate the results of protein identification experiments. These functions are standard statistical measures that can ...be used to reduce various protein identification scoring schemes to a common, easily interpretably representation. The relative merits of scoring systems were explored using this approach, as well as the effects of altering primary identification parameters. We would advocate the widespread use of these simple statistical measures to simplify and standardize the reporting of the confidence of protein identification results, allowing the users of different identification algorithms to compare their results in a straightforward and statistically significant manner. A method is described for measuring these distributions using information that is being discarded by most protein identification search engines, resulting in accurate survival functions that are specific to any combination of scoring algorithms, sequence databases, and mass spectra.
Cell function requires formation of molecular clusters localized to discrete subdomains. The composition of these interactomes, and their spatial organization, cannot be discerned by conventional ...microscopy given the resolution constraints imposed by the diffraction limit of light (∼200-300 nm). Our aims were (i) Implement single-molecule imaging and analysis tools to resolve the nano-scale architecture of cardiac myocytes. (ii) Using these tools, to map two molecules classically defined as components 'of the desmosome' and 'of the gap junction', and defined their spatial organization.
We built a set-up on a conventional inverted microscope using commercially available optics. Laser illumination, reducing, and oxygen scavenging conditions were used to manipulate the blinking behaviour of individual fluorescent reporters. Movies of blinking fluorophores were reconstructed to generate subdiffraction images at ∼20 nm resolution. With this method, we characterized clusters of connexin43 (Cx43) and of 'the desmosomal protein' plakophilin-2 (PKP2). In about half of Cx43 clusters, we observed overlay of Cx43 and PKP2 at the Cx43 plaque edge. SiRNA-mediated loss of Ankyrin-G expression yielded larger Cx43 clusters, of less regular shape, and larger Cx43-PKP2 subdomains. The Cx43-PKP2 subdomain was validated by a proximity ligation assay (PLA) and by Monte-Carlo simulations indicating an attraction between PKP2 and Cx43.
(i) Super-resolution fluorescence microscopy, complemented with Monte-Carlo simulations and PLAs, allows the study of the nanoscale organization of an interactome in cardiomyocytes. (ii) PKP2 and Cx43 share a common hub that permits direct physical interaction. Its relevance to excitability, electrical coupling, and arrhythmogenic right ventricular cardiomyopathy, is discussed.
Truly comprehensive proteome analysis is highly desirable in systems biology and biomarker discovery efforts. But complete proteome characterization has been hindered by the dynamic range and ...detection sensitivity of experimental designs, which are not adequate to the very wide range of protein abundances. Experimental designs for comprehensive analytical efforts involve separation followed by mass spectrometry-based identification of digested proteins. Because results are generally reported as a collection of identifications with no information on the fraction of the proteome that was missed, they are difficult to evaluate and potentially misleading. Here we address this problem by taking a holistic view of the experimental design and using computer simulations to estimate the success rate for any given experiment. Our approach demonstrates that simple changes in typical experimental designs can enhance the success rate of proteome analysis by five- to tenfold.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
18.
Selenocysteine: Wherefore Art Thou? Fenyö, David; Beavis, Ronald C
Journal of proteome research,
02/2016, Letnik:
15, Številka:
2
Journal Article
Recenzirano
Selenocysteine is a naturally occurring proteogenic amino acid that is encoded in the genomic sequence of relatively abundant proteins in many of the model species commonly used for biomedical ...research. On the basis of an analysis of publicly available proteomics information, it was discovered that peptides containing selenocysteine were not being identified in tandem mass spectrometry proteomics data. Once the chemical basis for this exclusion was understood, a simple alteration in search parameters led to the confident identification of selenocysteine containing peptides from existing proteomics data, with no change in experimental protocols required.
The emergence of SARS-CoV-2 variants threatens current vaccines and therapeutic antibodies and urgently demands powerful new therapeutics that can resist viral escape. We therefore generated a large ...nanobody repertoire to saturate the distinct and highly conserved available epitope space of SARS-CoV-2 spike, including the S1 receptor binding domain, N-terminal domain, and the S2 subunit, to identify new nanobody binding sites that may reflect novel mechanisms of viral neutralization. Structural mapping and functional assays show that indeed these highly stable monovalent nanobodies potently inhibit SARS-CoV-2 infection, display numerous neutralization mechanisms, are effective against emerging variants of concern, and are resistant to mutational escape. Rational combinations of these nanobodies that bind to distinct sites within and between spike subunits exhibit extraordinary synergy and suggest multiple tailored therapeutic and prophylactic strategies.
Data sharing in the field of MS has advanced greatly thanks to innovations such as the standardized formats, data repositories, and publications guidelines. However, there is currently no data ...sharing mechanism that enables real‐time data browsing and deep linking on a large scale: unrestricted data access (particularly at the quantitative level) ultimately requires the user to download a local copy of the relevant data files (e.g., in order to generate extracted ion chromatograms XICs). In this technical resource, we present a set of technologies (collectively termed OpenSlice) that enable the user to quantitatively query hundreds of hours of proteomics discovery data (i.e., nontargeted acquisition) in real time: the user is able to effectively generate XICs for arbitrary masses on the fly and across the entire dataset (so‐called global ion chromatograms), interacting with the results through a very intuitive browser‐based interface. A key design consideration underlying the OpenSlice approach is the notion that every aspect of the acquired data must be accessible through a RESTful uniform resource locator based application programming interface, up to and including individual chromatographic peaks (hence HyperPeaks). A publicly accessible demonstration of this technology based on the Clinical Proteomics Tumor Analysis Consortium CompRef dataset is made available at http://compref.fenyolab.org.