Transcriptional enhancers play critical roles in regulation of gene expression, but their identification in the eukaryotic genome has been challenging. Recently, it was shown that enhancers in the ...mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of cell types or chromatin marks have previously been investigated for this purpose, leaving the question unanswered whether there exists an optimal set of histone modifications for enhancer prediction in different cell types. Here, we address this issue by exploring genome-wide profiles of 24 histone modifications in two distinct human cell types, embryonic stem cells and lung fibroblasts. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin States) to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. We show that RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction.
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative ...toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.
One of the essential processes in Mobile Ad hoc Networks (MANETs) is blind flooding to discover routes between source and destination mobile nodes. As the density of nodes in the network increases, ...the number of broadcast packets increases exponentially. This can lead to broadcast storms, a drain on the device's battery, and reduced network efficiency. We propose a Cross-layer Adaptive Fuzzy-based Ad hoc On-Demand Distance Vector routing protocol (CLAF-AODV) to minimize the routing broadcast traffic by considering the quality of service (QoS) (e.g. delay, throughput, packet loss), stability, and adaptability of the network. The suggested method employs two-level fuzzy logic and a cross-layer design approach to select the appropriate nodes with a higher probability of participating in broadcasting by considering parameters from the three first layers of the Open Systems Interconnection (OSI) model to achieve a quality of service, stability, and adaptability. It not only investigates the quality of the node and the network density around the node to make a decision but also investigates the path that the broadcast packet traveled to reach this node. Simulation results reveal that our proposed protocol reduces the number of broadcast packets and significantly improves network performance with respect to throughput, packet loss, normalized routing load, collision rate, and average energy consumption compared to the standard AODV and the Fixed Probability AODV (FP-AODV) algorithms.
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least ...5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.
Hundreds of chromatin regulators (CRs) control chromatin structure and function by catalyzing and binding histone modifications, yet the rules governing these key processes remain obscure. Here, we ...present a systematic approach to infer CR function. We developed ChIP-string, a meso-scale assay that combines chromatin immunoprecipitation with a signature readout of 487 representative loci. We applied ChIP-string to screen 145 antibodies, thereby identifying effective reagents, which we used to map the genome-wide binding of 29 CRs in two cell types. We found that specific combinations of CRs colocalize in characteristic patterns at distinct chromatin environments, at genes of coherent functions, and at distal regulatory elements. When comparing between cell types, CRs redistribute to different loci but maintain their modular and combinatorial associations. Our work provides a multiplex method that substantially enhances the ability to monitor CR binding, presents a large resource of CR maps, and reveals common principles for combinatorial CR function.
Display omitted
► We present a systematic approach to infer chromatin regulator (CR) function ► CR organization is modular and combinatorial; CR modules hold opposing activities ► Specific CR combinations bind in defined patterns at genes of coherent functions ► CRs maintain their modular and combinatorial associations between cells types
A multiplex method identifies antibodies that are effective for ChIP of chromatin regulators (CRs). Nearly 50 ChIP-Seq data sets reveal the genome-wide distribution of multiple classes of CRs in leukemia and ES cells, highlighting that CRs often assemble in predictable combinations.
The cellular mechanisms driving cardiac tissue formation remain poorly understood, largely due to the structural and functional complexity of the heart. It is unclear whether newly generated myocytes ...originate from cardiac stem/progenitor cells or from pre-existing cardiomyocytes that re-enter the cell cycle. Here, we identify the source of new cardiomyocytes during mouse development and after injury. Our findings suggest that cardiac progenitors maintain proliferative potential and are the main source of cardiomyocytes during development; however, the onset of αMHC expression leads to reduced cycling capacity. Single-cell RNA sequencing reveals a proliferative, "progenitor-like" population abundant in early embryonic stages that decreases to minimal levels postnatally. Furthermore, cardiac injury by ligation of the left anterior descending artery was found to activate cardiomyocyte proliferation in neonatal but not adult mice. Our data suggest that clonal dominance of differentiating progenitors mediates cardiac development, while a distinct subpopulation of cardiomyocytes may have the potential for limited proliferation during late embryonic development and shortly after birth.
Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) ...provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.
Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. Previous studies have found enrichment of expression quantitative trait loci ...(eQTLs) in disease risk loci, indicating that identifying causal variants for gene expression is important for elucidating the genetic basis of not only gene expression but also complex traits. However, detecting causal variants is challenging due to complex genetic correlation among variants known as linkage disequilibrium (LD) and the presence of multiple causal variants within a locus. Although several fine-mapping approaches have been developed to overcome these challenges, they may produce large sets of putative causal variants when true causal variants are in high LD with many non-causal variants. In eQTL studies, there is an additional source of information that can be used to improve fine-mapping called allelic imbalance (AIM) that measures imbalance in gene expression on two chromosomes of a diploid organism. In this work, we develop a novel statistical method that leverages both AIM and total expression data to detect causal variants that regulate gene expression. We illustrate through simulations and application to 10 tissues of the Genotype-Tissue Expression (GTEx) dataset that our method identifies the true causal variants with higher specificity than an approach that uses only eQTL information. Across all tissues and genes, our method achieves a median reduction rate of 11% in the number of putative causal variants. We use chromatin state data from the Roadmap Epigenomics Consortium to show that the putative causal variants identified by our method are enriched for active regions of the genome, providing orthogonal support that our method identifies causal variants with increased specificity.
Optic nerve degeneration caused by glaucoma is a leading cause of blindness worldwide. Patients affected by the normal-pressure form of glaucoma are more likely to harbor risk alleles for ...glaucoma-related optic nerve disease. We have performed a meta-analysis of two independent genome-wide association studies for primary open angle glaucoma (POAG) followed by a normal-pressure glaucoma (NPG, defined by intraocular pressure (IOP) less than 22 mmHg) subgroup analysis. The single-nucleotide polymorphisms that showed the most significant associations were tested for association with a second form of glaucoma, exfoliation-syndrome glaucoma. The overall meta-analysis of the GLAUGEN and NEIGHBOR dataset results (3,146 cases and 3,487 controls) identified significant associations between two loci and POAG: the CDKN2BAS region on 9p21 (rs2157719 G, OR = 0.69 95%CI 0.63-0.75, p = 1.86×10⁻¹⁸), and the SIX1/SIX6 region on chromosome 14q23 (rs10483727 A, OR = 1.32 95%CI 1.21-1.43, p = 3.87×10⁻¹¹). In sub-group analysis two loci were significantly associated with NPG: 9p21 containing the CDKN2BAS gene (rs2157719 G, OR = 0.58 95% CI 0.50-0.67, p = 1.17×10⁻¹²) and a probable regulatory region on 8q22 (rs284489 G, OR = 0.62 95% CI 0.53-0.72, p = 8.88×10⁻¹⁰). Both NPG loci were also nominally associated with a second type of glaucoma, exfoliation syndrome glaucoma (rs2157719 G, OR = 0.59 95% CI 0.41-0.87, p = 0.004 and rs284489 G, OR = 0.76 95% CI 0.54-1.06, p = 0.021), suggesting that these loci might contribute more generally to optic nerve degeneration in glaucoma. Because both loci influence transforming growth factor beta (TGF-beta) signaling, we performed a genomic pathway analysis that showed an association between the TGF-beta pathway and NPG (permuted p = 0.009). These results suggest that neuro-protective therapies targeting TGF-beta signaling could be effective for multiple forms of glaucoma.
Tissue-specific gene expression defines cellular identity and function, but knowledge of early human development is limited, hampering application of cell-based therapies. Here we profiled 5 distinct ...cell types at a single fetal stage, as well as chondrocytes at 4 stages in vivo and 2 stages during in vitro differentiation. Network analysis delineated five tissue-specific gene modules; these modules and chromatin state analysis defined broad similarities in gene expression during cartilage specification and maturation in vitro and in vivo, including early expression and progressive silencing of muscle- and bone-specific genes. Finally, ontogenetic analysis of freshly isolated and pluripotent stem cell-derived articular chondrocytes identified that integrin alpha 4 defines 2 subsets of functionally and molecularly distinct chondrocytes characterized by their gene expression, osteochondral potential in vitro and proliferative signature in vivo. These analyses provide new insight into human musculoskeletal development and provide an essential comparative resource for disease modeling and regenerative medicine.