We mapped Polycomb-associated H3K27 trimethylation (H3K27me3) and Trithorax-associated H3K4 trimethylation (H3K4me3) across the whole genome in human embryonic stem (ES) cells. The vast majority of ...H3K27me3 colocalized on genes modified with H3K4me3. These commodified genes displayed low expression levels and were enriched in developmental function. Another significant set of genes lacked both modifications and was also expressed at low levels in ES cells but was enriched for gene function in physiological responses rather than development. Commodified genes could change expression levels rapidly during differentiation, but so could a substantial number of genes in other modification categories. SOX2, POU5F1, and NANOG, pluripotency-associated genes, shifted from modification by H3K4me3 alone to colocalization of both modifications as they were repressed during differentiation. Our results demonstrate that H3K27me3 modifications change during early differentiation, both relieving existing repressive domains and imparting new ones, and that colocalization with H3K4me3 is not restricted to pluripotent cells.
The core promoter of eukaryotic genes is the minimal DNA region that recruits the basal transcription machinery to direct efficient and accurate transcription initiation. The fraction of human and ...yeast genes that contain specific core promoter elements such as the TATA box and the initiator (INR) remains unclear and core promoter motifs specific for TATA-less genes remain to be identified. Here, we present genome-scale computational analyses indicating that ∼
76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1-binding sites. We further identify two motifs – M3 (SCGGAAGY) and M22 (TGCGCANK) – that occur preferentially in human TATA-less core promoters. About 24% of human genes have a TATA-like element and their promoters are generally AT-rich; however, only ∼
10% of these TATA-containing promoters have the canonical TATA box (TATAWAWR). In contrast, ∼
46% of human core promoters contain the consensus INR (YYANWYY) and ∼
30% are INR-containing TATA-less genes. Significantly, ∼
46% of human promoters lack both TATA-like and consensus INR elements. Surprisingly, mammalian-type INR sequences are present – and tend to cluster – in the transcription start site (TSS) region of ∼
40% of yeast core promoters and the frequency of specific core promoter types appears to be conserved in yeast and human genomes. Gene Ontology analyses reveal that TATA-less genes in humans, as in yeast, are frequently involved in basic “housekeeping” processes, while TATA-containing genes are more often highly regulated, such as by biotic or stress stimuli. These results reveal unexpected similarities in the occurrence of specific core promoter types and in their associated biological processes in yeast and humans and point to novel vertebrate-specific DNA motifs that might play a selective role in TATA-independent transcription.
DNAs released from tumor cells into blood (circulating tumor DNAs, ctDNAs) carry tumor-specific genomic aberrations, providing a non-invasive means for cancer detection. In this study, we aimed to ...leverage somatic copy number aberration (SCNA) in ctDNA to develop assays to detect early-stage HCCs.
We conducted low-depth whole-genome sequencing (WGS) to profile SCNAs in 384 plasma samples of hepatitis B virus (HBV)-related HCC and cancer-free HBV patients, using one discovery and two validation cohorts. To fully capture the robust signals of WGS data from the complete genome, we developed a machine learning-based statistical model that is focused on detection accuracy in early-stage HCC.
We built the model using a discovery cohort of 209 patients, achieving an overall area under curve (AUC) of 0.893, with 0.874 for early-stage (Barcelona clinical liver cancer BCLC stage 0-A) and 0.933 for advanced-stage (BCLC stage B-D). The performance of the model was then assessed in two validation cohorts (76 and 99 patients) that only consisted of patients with stage 0-A HCC. Our model exhibited a robust predictive performance, with an AUC of 0.920 and 0.812 for the two validation cohorts. Further analyses showed the impact of tumor sample heterogeneity in model training on detecting early-stage tumors, and a refined model addressing the heterogeneity in the discovery cohort significantly increased model performance in validation.
We developed an SCNA-based, machine learning-driven model in the non-invasive detection of early-stage HCC in HBV patients and demonstrated its performance through strict independent validations.
Hepatocyte nuclear factor 4 alpha (HNF4α), a member of the nuclear receptor superfamily, is essential for liver function and is linked to several diseases including diabetes, hemophilia, ...atherosclerosis, and hepatitis. Although many DNA response elements and target genes have been identified for HNF4α, the complete repertoire of binding sites and target genes in the human genome is unknown. Here, we adapt protein binding microarrays (PBMs) to examine the DNA‐binding characteristics of two HNF4α species (rat and human) and isoforms (HNF4α2 and HNF4α8) in a high‐throughput fashion. We identified ∼1400 new binding sequences and used this dataset to successfully train a Support Vector Machine (SVM) model that predicts an additional ∼10,000 unique HNF4α‐binding sequences; we also identify new rules for HNF4α DNA binding. We performed expression profiling of an HNF4α RNA interference knockdown in HepG2 cells and compared the results to a search of the promoters of all human genes with the PBM and SVM models, as well as published genome‐wide location analysis. Using this integrated approach, we identified ∼240 new direct HNF4α human target genes, including new functional categories of genes not typically associated with HNF4α, such as cell cycle, immune function, apoptosis, stress response, and other cancer‐related genes. Conclusion: We report the first use of PBMs with a full‐length liver‐enriched transcription factor and greatly expand the repertoire of HNF4α‐binding sequences and target genes, thereby identifying new functions for HNF4α. We also establish a web‐based tool, HNF4 Motif Finder, that can be used to identify potential HNF4α‐binding sites in any sequence. (HEPATOLOGY 2009.)
Epithelial formation is a central facet of organogenesis that relies on intercellular junction assembly to create functionally distinct apical and basal cell surfaces. How this process is regulated ...during embryonic development remains obscure. Previous studies using conditional knockout mice have shown that loss of hepatocyte nuclear factor 4α (HNF4α) blocks the epithelial transformation of the fetal liver, suggesting that HNF4α is a central regulator of epithelial morphogenesis. Although HNF4α-null hepatocytes do not express E-cadherin (also called CDH1), we show here that E-cadherin is dispensable for liver development, implying that HNF4α regulates additional aspects of epithelial formation. Microarray and molecular analyses reveal that HNF4α regulates the developmental expression of a myriad of proteins required for cell junction assembly and adhesion. Our findings define a fundamental mechanism through which generation of tissue epithelia during development is coordinated with the onset of organ function.
Pluripotency, the ability of a cell to differentiate and give rise to all embryonic lineages, defines a small number of mammalian cell types such as embryonic stem (ES) cells. While it has been ...generally held that pluripotency is the product of a transcriptional regulatory network that activates and maintains the expression of key stem cell genes, accumulating evidence is pointing to a critical role for epigenetic processes in establishing and safeguarding the pluri-potency of ES cells, as well as maintaining the identity of differentiated cell types. In order to better understand the role of epigenetic mechanisms in pluripotency, we have examined the dynamics of chromatin modifications genome- wide in human ES cells (hESCs) undergoing differentiation into a mesendodermal lineage. We found that chromatin modifications at promoters remain largely invariant during differentiation, except at a small number of promoters where a dynamic switch between acetylation and methylation at H3K27 marks the transition between activation and silencing of gene expression, suggesting a hierarchy in cell fate commitment over most differentially expressed genes. We also mapped over 50 000 potential enhancers, and observed much greater dynamics in chromatin modifications, especially H3K4mel and H3K27ac, which correlate with expression of their potential target genes. Further analysis of these enhancers revealed potentially key transcriptional regulators of pluripotency and a chromatin signature indicative of a poised state that may confer developmental competence in hESCs. Our results provide new evidence supporting the role of chromatin modifications in defining enhancers and plnripotency.
Alu repeats, which account for ~10% of the human genome, were originally considered to be junk DNA. Recent studies, however, suggest that they may contain transcription factor binding sites and hence ...possibly play a role in regulating gene expression.
Here, we show that binding sites for a highly conserved member of the nuclear receptor superfamily of ligand-dependent transcription factors, hepatocyte nuclear factor 4alpha (HNF4α, NR2A1), are highly prevalent in Alu repeats. We employ high throughput protein binding microarrays (PBMs) to show that HNF4α binds > 66 unique sequences in Alu repeats that are present in ~1.2 million locations in the human genome. We use chromatin immunoprecipitation (ChIP) to demonstrate that HNF4α binds Alu elements in the promoters of target genes (ABCC3, APOA4, APOM, ATPIF1, CANX, FEMT1A, GSTM4, IL32, IP6K2, PRLR, PRODH2, SOCS2, TTR) and luciferase assays to show that at least some of those Alu elements can modulate HNF4α-mediated transactivation in vivo (APOM, PRODH2, TTR, APOA4). HNF4α-Alu elements are enriched in promoters of genes involved in RNA processing and a sizeable fraction are in regions of accessible chromatin. Comparative genomics analysis suggests that there may have been a gain in HNF4α binding sites in Alu elements during evolution and that non Alu repeats, such as Tiggers, also contain HNF4α sites.
Our findings suggest that HNF4α, in addition to regulating gene expression via high affinity binding sites, may also modulate transcription via low affinity sites in Alu repeats.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•Differentially expressed genes were analyzed and prognostic markers in LUAD were identified.•A total of 68 protein-coding differentially expressed genes were identified using TCGA dataset.•Nineteen ...genes were individually associated with overall survival.•A risk score was developed for stratifying prognosis in LUAD patients.•The 19-gene prognostic signature was independently validated using GEO datasets.
Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer. Understanding the molecular mechanisms underlying tumor progression is of clinical significance. This study aimed to identify novel molecular markers associated with LUAD prognosis.
RNA sequencing data from the Cancer Genome Atlas (TCGA) database of LUAD tumors and paired normal tissues, and microarray data from the Gene Expression Omnibus (GEO) database were obtained. In the TCGA dataset, differentially expressed (DE) genes were identified by comparing gene expression between early-stage tumors and normal tissue, as well as between advanced-stage and early-stage tumors. A risk score was developed using a weighted linear combination of individual dysregulated protein-coding genes that was associated with overall survival (OS). The prognostic value of the risk score was evaluated using Kaplan-Meier and multivariate Cox analysis. The gene signature was further validated using independent datasets from GEO.
Among the 68 identified DE genes, 19 were individually associated with OS in univariate analyses. A risk score was constructed for each patient based on the coefficients in multivariate Cox model and normalized expression levels of these 19 genes. LUAD patients with a low risk score had a significantly better survival than those with a high risk score (log-rank P < 0.0001). After adjusting for age, sex, clinical stage, smoking history, and treatments, the patients with a low risk score had a 81 % decreased risk for death, compared to those with a high risk score (hazard ratio 0.19, 95 % confidence interval 0.097−0.36). The significant association of the risk score with OS in LUAD patients was further validated in three independent GEO datasets.
A novel 19-gene prognostic signature based on gene expression was identified in LUAD patients. The findings further improve the understanding of LUAD prognostication and have the potential to facilitate risk-stratified disease management.
Well-defined relationships between oligonucleotide properties and hybridization signal intensities (HSI) can aid chip design, data normalization and true biological knowledge discovery. We clarify ...these relationships using the data from two microarray experiments containing over three million probes from 48 high-density chips. We find that melting temperature (Tm) has the most significant effect on HSI while length for the long oligonucleotides studied has very little effect. Analysis of positional effect using a linear model provides evidence that the protruding ends of probes contribute more than tethered ends to HSI, which is further validated by specifically designed match fragment sliding and extension experiments. The impact of sequence similarity (SeqS) on HSI is not significant in comparison with other oligonucleotide properties. Using regression and regression tree analysis, we prioritize these oligonucleotide properties based on their effects on HSI. The implications of our discoveries for the design of unbiased oligonucleotides are discussed. We propose that isothermal probes designed by varying the length is a viable strategy to reduce sequence bias, though imposing selection constraints on other oligonucleotide properties is also essential.
Human embryonic stem (ES) cells exhibit a shorter G1 cell cycle phase than most somatic cells. Here, we examine the role of an abundant, human ES cell‐enriched microRNA, miR‐92b, in cell cycle ...distribution. Inhibition of miR‐92b in human ES cells results in a greater number of cells in the G1 phase and a lower number in the S phase. Conversely, overexpression of miR‐92b in differentiated cells results in a decreased number of cells in G1 phase and an increased number in S‐phase. p57, a gene whose product inhibits G1 to S‐phase progression, is one of the predicted targets of miR‐92b. Inhibition of miR‐92b in human ES cells increases p57 protein levels, and miR‐92b overexpression in differentiated cells decreases p57 protein levels. Furthermore, miR‐92b inhibits a luciferase reporter construct that includes part of the 3′ untranslated region of the p57 gene containing the predicted target of the miR‐92b seed sequence. Thus, we show that the miRNA miR‐92b directly downregulates protein levels of the G1/S checkpoint gene p57. STEM CELLS 2009;27:1524–1528