The PROTEOFORMER pipeline feeds ribosome profiling-driven information into an MS/MS search space. The pipeline has been greatly expanded and updated since its first publication. These novelties are ...presented and validated with matching MS/MS data, leading to the endorsement of a set of new proteoforms on MS/MS level and to a collection of general considerations for the ribosome profiling-based proteogenomics community.
Display omitted
Highlights
•PROTEOFORMER adds ribosome profiling information to MS/MS search spaces.•The PROTEOFORMER pipeline is greatly expanded and updated since its first publication.•New features are demonstrated with matching ribosome profiling and MS/MS data.•Experiments lead to MS/MS-proven proteoforms and general proteogenomic notices.
PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5′ and 3′ extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.
Genomic imprinting plays an important role in growth and development. Loss of imprinting (LOI) has been found in cancer, yet systematic studies are impeded by data-analytical challenges. We developed ...a methodology to detect monoallelically expressed loci without requiring genotyping data, and applied it on The Cancer Genome Atlas (TCGA, discovery) and Genotype-Tissue expression project (GTEx, validation) breast tissue RNA-seq data. Here, we report the identification of 30 putatively imprinted genes in breast. In breast cancer (TCGA), HM13 is featured by LOI and expression upregulation, which is linked to DNA demethylation. Other imprinted genes typically demonstrate lower expression in cancer, often associated with copy number variation and aberrant DNA methylation. Downregulation in cancer frequently leads to higher relative expression of the (imperfectly) silenced allele, yet this is not considered canonical LOI given the lack of (absolute) re-expression. In summary, our novel methodology highlights the massive deregulation of imprinting in breast cancer.
Context:
MEN1 gene alterations have been implicated in lung carcinoids, but their effect on gene expression and disease outcome is unknown.
Objective:
Our objective was to analyze MEN1 gene and ...expression anomalies in lung neuroendocrine neoplasms and their correlations with clinicopathologic data and disease outcome.
Design:
We examined 74 lung neuroendocrine neoplasms including 58 carcinoids and 16 high-grade neuroendocrine carcinomas (HGNECs) for MEN1 mutations (n = 70) and allelic losses (n = 69), promoter hypermethylation (n = 65), and mRNA (n = 74) expression. Results were correlated with disease outcome.
Results:
MEN1 mutations were found in 7 of 55 (13%) carcinoids and in 1 HGNEC, mostly associated with loss of the second allele. MEN1 decreased expression levels correlated with the presence of mutations (P = .0060) and was also lower in HGNECs than carcinoids (P = .0024). MEN1 methylation was not associated with mRNA expression levels. Patients with carcinoids harboring MEN1 mutation and loss had shorter overall survival (P = .039 and P = .035, respectively) and low MEN1 mRNA levels correlated with distant metastasis (P = .00010) and shorter survival (P = .0071). In multivariate analysis, stage and MEN1 allelic loss were independent predictors of prognosis.
Conclusion:
Thirteen percent of pulmonary carcinoids harbor MEN1 mutation associated with reduced mRNA expression and poor prognosis. Also in mutation-negative tumors, low MEN1 gene expression correlates with an adverse disease outcome. Hypermethylation was excluded as the underlying mechanism.
The term peptidomics for a new promising “omics” field was not introduced until the beginning of 2000. The approach has been proven successful in several domains such as neuroendocrine research and ...biomarker or drug discovery. This review reports on bioinformatics tools and methodologies within the peptidomics field and the application thereof. Obviously, a plethora of proteomics data analysis tools lends themselves to direct use in peptidomics because the latter is a subfield of the former, at least to a certain extent. Nevertheless, peptidomics-specific tool extensions, inventions, and validation procedures have emerged, and certain tools are more suitable for this subfield than others due to small but important differences in peptidomics sample analysis. This paper focuses on these topics. Furthermore, it gives a comprehensive overview of available online tools tailored to the peptidomics field. To conclude, an ideal pipeline for bioactive peptide identification is presented.
DNA methylation has a role in mediating epigenetic silencing of CpG island genes in cancer and other diseases. Identification of all gene promoters methylated in cancer cells "the cancer methylome" ...would greatly advance our understanding of gene regulatory networks in tumorigenesis. We previously described a new method of identifying methylated tumor suppressor genes based on pharmacologic unmasking of the promoter region and detection of re-expression on microarray analysis. In this study, we modified and greatly improved the selection of candidates based on new promoter structure algorithm and microarray data generated from 20 cancer cell lines of 5 major cancer types. We identified a set of 200 candidate genes that cluster throughout the genome of which 25 were previously reported as harboring cancer-specific promoter methylation. The remaining 175 genes were tested for promoter methylation by bisulfite sequencing or methylation-specific PCR (MSP). Eighty-two of 175 (47%) genes were found to be methylated in cell lines, and 53 of these 82 genes (65%) were methylated in primary tumor tissues. From these 53 genes, cancer-specific methylation was identified in 28 genes (28 of 53; 53%). Furthermore, we tested 8 of the 28 newly identified cancer-specific methylated genes with quantitative MSP in a panel of 300 primary tumors representing 13 types of cancer. We found cancer-specific methylation of at least one gene with high frequency in all cancer types. Identification of a large number of genes with cancer-specific methylation provides new targets for diagnostic and therapeutic intervention, and opens fertile avenues for basic research in tumor biology.
454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known ...problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform.
We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores.
Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The currently used pharmacogenetic genotyping assays offer limited haplotype information, which can potentially cause specific functional effects to be missed. This study tested if Targeted Locus ...Amplification (TLA), when using non-patient-specific primers combined with Illumina or Nanopore sequencing, can offer an advantage in terms of accurate phasing. The TLA method selectively amplifies and sequences entire genes based on crosslinking DNA in close physical proximity. This way, DNA fragments that were initially further apart in the genome are ligated into one molecule, making it possible to sequence distant variants within one short read. In this study, four pharmacogenes, CYP2D6, CYP2C19, CYP1A2 and BRCA1, were sequenced after enrichment using different primer pairs. Only 24% or 38% of the nucleotides mapped on target when using Illumina or Nanopore sequencing, respectively. With an average depth of more than 1000X for the regions of interest, none of the genes were entirely covered with either sequencing method. For three of the four genes, less than half of the variants were phased correctly compared to the reference. The Nanopore dataset with the optimized primer pair for CYP2D6 resulted in the correct haplotype, showing that this method can be used for reliable genotyping and phasing of pharmacogenes but does require patient-specific primer design and optimization to be effective.
Purpose Concern about possible false-negative prostate biopsy histopathology findings often leads to rebiopsy. A quantitative methylation specific polymerase chain reaction assay panel, including ...GSTP1 , APC and RASSF1 , could increase the sensitivity of detecting cancer over that of pathological review alone, leading to a high negative predictive value and a decrease in unnecessary repeat biopsies. Materials and Methods The MATLOC study blindly tested archived prostate biopsy needle core tissue samples of 498 subjects from the United Kingdom and Belgium with histopathologically negative prostate biopsies, followed by positive (cases) or negative (controls) repeat biopsy within 30 months. Clinical performance of the epigenetic marker panel, emphasizing negative predictive value, was assessed and cross-validated. Multivariate logistic regression was used to evaluate all risk factors. Results The epigenetic assay performed on the first negative biopsies of this retrospective review cohort resulted in a negative predictive value of 90% (95% CI 87–93). In a multivariate model correcting for patient age, prostate specific antigen, digital rectal examination and first biopsy histopathological characteristics the epigenetic assay was a significant independent predictor of patient outcome (OR 3.17, 95% CI 1.81–5.53). Conclusions A multiplex quantitative methylation specific polymerase chain reaction assay determining the methylation status of GSTP1 , APC and RASSF1 was strongly associated with repeat biopsy outcome up to 30 months after initial negative biopsy in men with suspicion of prostate cancer. Adding this epigenetic assay could improve the prostate cancer diagnostic process and decrease unnecessary repeat biopsies.
Despite the major physiological dissimilarities between mature root regions and their tips, differences in their gene expression profiles remain largely unexplored. In this research, the ...transcriptome of rice (Oryza sativaL. subsp.japonica) mature root tissue and root tips was monitored using mRNA-Seq at two time points. Almost 50 million 76 bp reads were mapped onto the rice genome sequence, expression patterns for different tissues and time points were investigated, and at least 1106 novel transcriptionally active regions (nTARs) expressed in rice root tissue were detected. More than 30 000 genes were found to be expressed in rice roots, among which were 1761 root-enriched and 306 tip-enriched transcripts. Mature root tissue appears to respond more strongly to external stimuli than tips, showing a higher expression of, for instance, auxin-responsive and abscisic acid-responsive genes, as well as the phenylpropanoid pathway and photosynthesis upon light. The root tip-enriched transcripts are mainly involved in mitochondrial electron transport, organelle development, secondary metabolism, DNA replication and metabolism, translation, and cellular component organization. During root maturation, genes involved in cell wall biosynthesis and modification, response to oxidative stress, and secondary metabolism were activated. For some nTARs, a potential role in root development can be put forward based on homology to genes involved in CLAVATA signalling, cell cycle regulators, and hormone signalling. A subset of differentially expressed genes and novel transcripts was confirmed using (quantitative) reverse transcription-PCR. These results uncover previously unrecognized tissue-specific expression profiles and provide an interesting starting point to study the different regulation of transcribed regions of these tissues.
Phage lytic proteins are a clinically advanced class of novel enzyme-based antibiotics, so-called enzybiotics. A growing community of researchers develops phage lytic proteins with the perspective of ...their use as enzybiotics. A successful translation of enzybiotics to the market requires well-considered selections of phage lytic proteins in early research stages. Here, we introduce PhaLP, a database of phage lytic proteins, which serves as an open portal to facilitate the development of phage lytic proteins. PhaLP is a comprehensive, easily accessible and automatically updated database (currently 16,095 entries). Capitalizing on the rich content of PhaLP, we have mapped the high diversity of natural phage lytic proteins and conducted analyses at three levels to gain insight in their host-specific evolution. First, we provide an overview of the modular diversity. Secondly, datamining and interpretable machine learning approaches were adopted to reveal host-specific design rules for domain architectures in endolysins. Lastly, the evolution of phage lytic proteins on the protein sequence level was explored, revealing host-specific clusters. In sum, PhaLP can act as a starting point for the broad community of enzybiotic researchers, while the steadily improving evolutionary insights will serve as a natural inspiration for protein engineers.