Genome-wide analysis of cancer/testis gene expression Hofmann, Oliver; Caballero, Otavia L; Stevenson, Brian J ...
Proceedings of the National Academy of Sciences - PNAS,
12/2008, Volume:
105, Issue:
51
Journal Article
Peer reviewed
Open access
Cancer/Testis (CT) genes, normally expressed in germ line cells but also activated in a wide range of cancer types, often encode antigens that are immunogenic in cancer patients, and present ...potential for use as biomarkers and targets for immunotherapy. Using multiple in silico gene expression analysis technologies, including twice the number of expressed sequence tags used in previous studies, we have performed a comprehensive genome-wide survey of expression for a set of 153 previously described CT genes in normal and cancer expression libraries. We find that although they are generally highly expressed in testis, these genes exhibit heterogeneous gene expression profiles, allowing their classification into testis-restricted (39), testis/brain-restricted (14), and a testis-selective (85) group of genes that show additional expression in somatic tissues. The chromosomal distribution of these genes confirmed the previously observed dominance of X chromosome location, with CT-X genes being significantly more testis-restricted than non-X CT. Applying this core classification in a genome-wide survey we identified >30 CT candidate genes; 3 of them, PEPP-2, OTOA, and AKAP4, were confirmed as testis-restricted or testis-selective using RT-PCR, with variable expression frequencies observed in a panel of cancer cell lines. Our classification provides an objective ranking for potential CT genes, which is useful in guiding further identification and characterization of these potentially important diagnostic and therapeutic targets.
WalKR (YycFG) is the only essential two-component regulator in the human pathogen Staphylococcus aureus. WalKR regulates peptidoglycan synthesis, but this function alone does not explain its ...essentiality. Here, to further understand WalKR function, we investigate a suppressor mutant that arose when WalKR activity was impaired; a histidine to tyrosine substitution (H271Y) in the cytoplasmic Per-Arnt-Sim (PAS
) domain of the histidine kinase WalK. Introducing the WalK
mutation into wild-type S. aureus activates the WalKR regulon. Structural analyses of the WalK PAS
domain reveal a metal-binding site, in which a zinc ion (Zn
) is tetrahedrally-coordinated by four amino acids including H271. The WalK
mutation abrogates metal binding, increasing WalK kinase activity and WalR phosphorylation. Thus, Zn
-binding negatively regulates WalKR. Promoter-reporter experiments using S. aureus confirm Zn
sensing by this system. Identification of a metal ligand recognized by the WalKR system broadens our understanding of this critical S. aureus regulon.
Long intergenic non-coding RNAs (lncRNAs) represent an emerging and under-studied class of transcripts that play a significant role in human cancers. Due to the tissue- and cancer-specific expression ...patterns observed for many lncRNAs it is believed that they could serve as ideal diagnostic biomarkers. However, until each tumor type is examined more closely, many of these lncRNAs will remain elusive.
Here we characterize the lncRNA landscape in lung cancer using publicly available transcriptome sequencing data from a cohort of 567 adenocarcinoma and squamous cell carcinoma tumors. Through this compendium we identify over 3,000 unannotated intergenic transcripts representing novel lncRNAs. Through comparison of both adenocarcinoma and squamous cell carcinomas with matched controls we discover 111 differentially expressed lncRNAs, which we term lung cancer-associated lncRNAs (LCALs). A pan-cancer analysis of 324 additional tumor and adjacent normal pairs enable us to identify a subset of lncRNAs that display enriched expression specific to lung cancer as well as a subset that appear to be broadly deregulated across human cancers. Integration of exome sequencing data reveals that expression levels of many LCALs have significant associations with the mutational status of key oncogenes in lung cancer. Functional validation, using both knockdown and overexpression, shows that the most differentially expressed lncRNA, LCAL1, plays a role in cellular proliferation.
Our systematic characterization of publicly available transcriptome data provides the foundation for future efforts to understand the role of LCALs, develop novel biomarkers, and improve knowledge of lung tumor biology.
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework ...provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
miR-24, upregulated during terminal differentiation of multiple lineages, inhibits cell-cycle progression. Antagonizing miR-24 restores postmitotic cell proliferation and enhances fibroblast ...proliferation, whereas overexpressing miR-24 increases the G1 compartment. The 248 mRNAs downregulated upon miR-24 overexpression are highly enriched for DNA repair and cell-cycle regulatory genes that form a direct interaction network with prominent nodes at genes that enhance (
MYC,
E2F2,
CCNB1, and
CDC2) or inhibit (
p27Kip1 and
VHL) cell-cycle progression. miR-24 directly regulates
MYC and
E2F2 and some genes that they transactivate. Enhanced proliferation from antagonizing miR-24 is abrogated by knocking down
E2F2, but not
MYC, and cell proliferation, inhibited by miR-24 overexpression, is rescued by miR-24-insensitive
E2F2. Therefore,
E2F2 is a critical miR-24 target. The
E2F2 3′UTR lacks a predicted miR-24 recognition element. In fact, miR-24 regulates expression of
E2F2,
MYC,
AURKB,
CCNA2,
CDC2,
CDK4, and
FEN1 by recognizing seedless but highly complementary sequences.
Beginning with precursor lesions, aberrant DNA methylation marks the entire spectrum of prostate cancer progression. We mapped the global DNA methylation patterns in select prostate tissues and cell ...lines using MethylPlex-next-generation sequencing (M-NGS). Hidden Markov model-based next-generation sequence analysis identified ∼68,000 methylated regions per sample. While global CpG island (CGI) methylation was not differential between benign adjacent and cancer samples, overall promoter CGI methylation significantly increased from ~12.6% in benign samples to 19.3% and 21.8% in localized and metastatic cancer tissues, respectively (P-value < 2 × 10(-16)). We found distinct patterns of promoter methylation around transcription start sites, where methylation occurred not only on the CGIs, but also on flanking regions and CGI sparse promoters. Among the 6691 methylated promoters in prostate tissues, 2481 differentially methylated regions (DMRs) are cancer-specific, including numerous novel DMRs. A novel cancer-specific DMR in the WFDC2 promoter showed frequent methylation in cancer (17/22 tissues, 6/6 cell lines), but not in the benign tissues (0/10) and normal PrEC cells. Integration of LNCaP DNA methylation and H3K4me3 data suggested an epigenetic mechanism for alternate transcription start site utilization, and these modifications segregated into distinct regions when present on the same promoter. Finally, we observed differences in repeat element methylation, particularly LINE-1, between ERG gene fusion-positive and -negative cancers, and we confirmed this observation using pyrosequencing on a tissue panel. This comprehensive methylome map will further our understanding of epigenetic regulation in prostate cancer progression.
Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing ...protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.
Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.
Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.
Abstract
Translated non-canonical proteins derived from noncoding regions or alternative open reading frames (ORFs) can contribute to critical and diverse cellular processes. In the context of ...cancer, they also represent an under-appreciated source of targets for cancer immunotherapy through their tumor-enriched expression or by harboring somatic mutations that produce neoantigens. Here, we introduce the largest integration and proteogenomic analysis of novel peptides to assess the prevalence of non-canonical ORFs (ncORFs) in more than 900 patient proteomes and 26 immunopeptidome datasets across 14 cancer types. The integrative proteogenomic analysis of whole-cell proteomes and immunopeptidomes revealed peptide support for a nonredundant set of 9760 upstream, downstream, and out-of-frame ncORFs in protein coding genes and 12811 in noncoding RNAs. Notably, 6486 ncORFs were derived from differentially expressed genes and 340 were ubiquitously translated across eight or more cancers. The analysis also led to the discovery of thirty-four epitopes and eight neoantigens from non-canonical proteins in two cohorts as novel cancer immunotargets. Collectively, our analysis integrated both bottom-up proteogenomic and targeted peptide validation to illustrate the prevalence of translated non-canonical proteins in cancer and to provide a resource for the prioritization of novel proteins supported by proteomic, immunopeptidomic, genomic and transcriptomic data, available at https://www.maherlab.com/crypticproteindb.
Graphical Abstract
Graphical Abstract
A growing number of gene-centric studies have highlighted the emerging significance of lncRNAs in cancer. However, these studies primarily focus on a single cancer type. Therefore, we conducted a ...pan-cancer analysis of lncRNAs comparing tumor and matched normal expression levels using RNA-Seq data from ∼ 3,000 patients in 8 solid tumor types. While the majority of differentially expressed lncRNAs display tissue-specific expression we discovered 229 lncRNAs with outlier or differential expression across multiple cancers, which we refer to as 'onco-lncRNAs'. Due to their consistent altered expression, we hypothesize that these onco-lncRNAs may have conserved oncogenic and tumor suppressive functions across cancers. To address this, we associated the onco-lncRNAs in biological processes based on their co-expressed protein coding genes. To validate our predictions, we experimentally confirmed cell growth dependence of 2 novel oncogenic lncRNAs, onco-lncRNA-3 and onco-lncRNA-12, and a previously identified lncRNA CCAT1. Overall, we discovered lncRNAs that may have broad oncogenic and tumor suppressor roles that could significantly advance our understanding of cancer lncRNA biology.