Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the ...sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.
Display omitted
► High-resolution binding profiles representing most human transcription factors ► High-throughput SELEX can identify long and dimeric sites ► Full-length protein and DNA-binding domain specificities are similar ► Adjacent bases affect TF-DNA binding more than previously thought
High-throughput SELEX is used to determine high-resolution binding profiles representing most human transcription factors. Base-stacking interactions, and dimer orientation and spacing preferences, have a larger role in TF-DNA binding than previously appreciated.
The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome ...formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome in vitro. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences.
We find that a simple linear combination of only 14 simple DNA sequence attributes (G+C content, two transformations of dinucleotide composition, and the frequency of eleven 4-bp sequences) explains nucleosome occupancy in vitro and in vivo in a manner comparable to the Kaplan model. G+C content and frequency of AAAA are the most important features. G+C content is dominant, alone explaining approximately 50% of the variation in nucleosome occupancy in vitro.
Our findings provide a dramatically simplified means to predict and understand intrinsic nucleosome occupancy. G+C content may dominate because it both reduces frequency of poly-A-like stretches and correlates with many other DNA structural characteristics. Since G+C content is enriched or depleted at many types of features in diverse eukaryotic genomes, our results suggest that variation in nucleotide composition may have a widespread and direct influence on chromatin structure.
Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only ∼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from ...multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for ∼34% of the ∼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.
The membrane attack complex of complement (MAC), apart from its classical role of lysing cells, can also trigger a range of non-lethal effects on cells, acting as a drive to inflammation. In the ...present study, we chose to investigate these non-lethal effects on inflammasome activation. We found that, following sublytic MAC attack, there is increased cytosolic Ca(2+) concentration, at least partly through Ca(2+) release from the endoplasmic reticulum lumen via the inositol 1,4,5-triphosphate receptor (IP3R) and ryanodine receptor (RyR) channels. This increase in intracellular Ca(2+) concentration leads to Ca(2+) accumulation in the mitochondrial matrix via the 'mitochondrial calcium uniporter' (MCU), and loss of mitochondrial transmembrane potential, triggering NLRP3 inflammasome activation and IL-1β release. NLRP3 co-localises with the mitochondria, probably sensing the increase in calcium and the resultant mitochondrial dysfunction, leading to caspase activation and apoptosis. This is the first study that links non-lethal effects of sublytic MAC attack with inflammasome activation and provides a mechanism by which sublytic MAC can drive inflammation and apoptosis.
Alternative splicing (AS) is a key process underlying the expansion of proteomic diversity and the regulation of gene expression. Here, we identify an evolutionarily conserved embryonic stem cell ...(ESC)-specific AS event that changes the DNA-binding preference of the forkhead family transcription factor FOXP1. We show that the ESC-specific isoform of FOXP1 stimulates the expression of transcription factor genes required for pluripotency, including
OCT4,
NANOG,
NR5A2, and
GDF3, while concomitantly repressing genes required for ESC differentiation. This isoform also promotes the maintenance of ESC pluripotency and contributes to efficient reprogramming of somatic cells into induced pluripotent stem cells. These results reveal a pivotal role for an AS event in the regulation of pluripotency through the control of critical ESC-specific transcriptional programs.
Display omitted
► An ESC-specific splicing switch in FOXP1 transcripts produces the FOXP1-ES isoform ► FOXP1-ES has distinct DNA-binding properties compared to the canonical FOXP1 isoform ► FOXP1-ES stimulates key pluripotency genes and represses many differentiation genes ► FOXP1-ES is required for ESC pluripotency and efficient iPSC reprogramming
Alternative splicing produces an ESC-specific isoform of FOXP1 that represses genes responsible for differentiation and directly stimulates production of pluripotency genes, including
Oct4 and
Nanog
Metazoan genomes encode hundreds of RNA-binding proteins (RBPs). These proteins regulate post-transcriptional gene expression and have critical roles in numerous cellular processes including mRNA ...splicing, export, stability and translation. Despite their ubiquity and importance, the binding preferences for most RBPs are not well characterized. In vitro and in vivo studies, using affinity selection-based approaches, have successfully identified RNA sequence associated with specific RBPs; however, it is difficult to infer RBP sequence and structural preferences without specifically designed motif finding methods. In this study, we introduce a new motif-finding method, RNAcontext, designed to elucidate RBP-specific sequence and structural preferences with greater accuracy than existing approaches. We evaluated RNAcontext on recently published in vitro and in vivo RNA affinity selected data and demonstrate that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2. The predicted preferences for SF2/ASF are consistent with its recently reported in vivo binding sites. RNAcontext is an accurate and efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures.
A series of reports over the last few years have indicated that a much larger portion of the mammalian genome is transcribed than can be accounted for by currently annotated genes, but the quantity ...and nature of these additional transcripts remains unclear. Here, we have used data from single- and paired-end RNA-Seq and tiling arrays to assess the quantity and composition of transcripts in PolyA+ RNA from human and mouse tissues. Relative to tiling arrays, RNA-Seq identifies many fewer transcribed regions ("seqfrags") outside known exons and ncRNAs. Most nonexonic seqfrags are in introns, raising the possibility that they are fragments of pre-mRNAs. The chromosomal locations of the majority of intergenic seqfrags in RNA-Seq data are near known genes, consistent with alternative cleavage and polyadenylation site usage, promoter- and terminator-associated transcripts, or new alternative exons; indeed, reads that bridge splice sites identified 4,544 new exons, affecting 3,554 genes. Most of the remaining seqfrags correspond to either single reads that display characteristics of random sampling from a low-level background or several thousand small transcripts (median length = 111 bp) present at higher levels, which also tend to display sequence conservation and originate from regions with open chromatin. We conclude that, while there are bona fide new intergenic transcripts, their number and abundance is generally low in comparison to known exons, and the genome is not as pervasively transcribed as previously reported.
The Human Transcription Factors Lambert, Samuel A.; Jolma, Arttu; Campitelli, Laura F. ...
Cell,
02/2018, Letnik:
172, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Transcription factors (TFs) recognize specific DNA sequences to control chromatin and transcription, forming a complex system that guides expression of the genome. Despite keen interest in ...understanding how TFs control gene expression, it remains challenging to determine how the precise genomic binding sites of TFs are specified and how TF binding ultimately relates to regulation of transcription. This review considers how TFs are identified and functionally characterized, principally through the lens of a catalog of over 1,600 likely human TFs and binding motifs for two-thirds of them. Major classes of human TFs differ markedly in their evolutionary trajectories and expression patterns, underscoring distinct functions. TFs likewise underlie many different aspects of human physiology, disease, and variation, highlighting the importance of continued effort to understand TF-mediated gene regulation.
Knowing how and where transcription factors bind to the genome is crucial for understanding how they control gene expression. This Review looks at how human TFs are identified and the ways they interact with DNA sequences.
is widely cultivated for medicinal, food, industrial, and recreational use, but much remains unknown regarding its genetics, including the molecular determinants of cannabinoid content. Here, we ...describe a combined physical and genetic map derived from a cross between the drug-type strain Purple Kush and the hemp variety "Finola." The map reveals that cannabinoid biosynthesis genes are generally unlinked but that aromatic prenyltransferase (
), which produces the substrate for THCA and CBDA synthases (THCAS and CBDAS), is tightly linked to a known marker for total cannabinoid content. We further identify the gene encoding CBCA synthase (
) and characterize its catalytic activity, providing insight into how cannabinoid diversity arises in cannabis.
and
(which determine the drug vs. hemp chemotype) are contained within large (>250 kb) retrotransposon-rich regions that are highly nonhomologous between drug- and hemp-type alleles and are furthermore embedded within ∼40 Mb of minimally recombining repetitive DNA. The chromosome structures are similar to those in grains such as wheat, with recombination focused in gene-rich, repeat-depleted regions near chromosome ends. The physical and genetic map should facilitate further dissection of genetic and molecular mechanisms in this commercially and medically important plant.
Members of the large ETS family of transcription factors (TFs) have highly similar DNA‐binding domains (DBDs)—yet they have diverse functions and activities in physiology and oncogenesis. Some ...differences in DNA‐binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA‐binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high‐throughput microwell‐based TF DNA‐binding specificity assay, and protein‐binding microarrays (PBMs). Both approaches reveal that the ETS‐binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino‐acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP‐seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo.