The single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and ...biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events.
We develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets.
DrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute .
The scarcity of donor organs may be addressed in the future by using pigs to grow humanized organs with lower potential for immunological rejection after transplantation in humans. Previous studies ...have demonstrated that interspecies complementation of rodent blastocysts lacking a developmental regulatory gene can generate xenogeneic pancreas and kidney
. However, such organs contain host endothelium, a source of immune rejection. We used gene editing and somatic cell nuclear transfer to engineer porcine embryos deficient in ETV2, a master regulator of hematoendothelial lineages
. ETV2-null pig embryos lacked hematoendothelial lineages and were embryonic lethal. Blastocyst complementation with wild-type porcine blastomeres generated viable chimeric embryos whose hematoendothelial cells were entirely donor-derived. ETV2-null blastocysts were injected with human induced pluripotent stem cells (hiPSCs) or hiPSCs overexpressing the antiapoptotic factor BCL2, transferred to synchronized gilts and analyzed between embryonic day 17 and embryonic day 18. In these embryos, all endothelial cells were of human origin.
Assay for Transposase-Accessible Chromatin with sequencing (ATAC-seq) reveals chromatin accessibility across the genome. Currently, no method specifically detects differential chromatin ...accessibility. Here, SeATAC uses a conditional variational autoencoder model to learn the latent representation of ATAC-seq V-plots and outperforms MACS2 and NucleoATAC on six separate tasks. Applying SeATAC to several pioneer factor-induced differentiation or reprogramming ATAC-seq datasets suggests that induction of these factors not only relaxes the closed chromatin but also decreases chromatin accessibility of 20% to 30% of their target sites. SeATAC is a novel tool to accurately reveal genomic regions with differential chromatin accessibility from ATAC-seq data.
The vasculature is an essential organ for the delivery of blood and oxygen to all tissues of the body and is thus relevant to the treatment of ischaemic diseases, injury-induced regeneration and ...solid tumour growth. Previously, we demonstrated that ETV2 is an essential transcription factor for the development of cardiac, endothelial and haematopoietic lineages. Here we report that ETV2 functions as a pioneer factor that relaxes closed chromatin and regulates endothelial development. By comparing engineered embryonic stem cell differentiation and reprogramming models with multi-omics techniques, we demonstrated that ETV2 was able to bind nucleosomal DNA and recruit BRG1. BRG1 recruitment remodelled chromatin around endothelial genes and helped to maintain an open configuration, resulting in increased H3K27ac deposition. Collectively, these results will serve as a platform for the development of therapeutic initiatives directed towards cardiovascular diseases and solid tumours.
DCLEAR is an R package used for single cell lineage reconstruction. The advances of CRISPR-based gene editing technologies have enabled the prediction of cell lineage trees based on observed edited ...barcodes from each cell. However, the performance of existing reconstruction methods of cell lineage trees was not accessed until recently. In response to this problem, the Allen Institute hosted the Cell Lineage Reconstruction Dream Challenge in 2020 to crowdsource relevant knowledge from across the world. Our team won sub-challenges 2 and 3 in the challenge competition.
The DCLEAR package contained the R codes, which was submitted in response to sub-challenges 2 and 3. Our method consists of two steps: (1) distance matrix estimation and (2) the tree reconstruction from the distance matrix. We proposed two novel methods for distance matrix estimation as outlined in the DCLEAR package. Using our method, we find that two of the more sophisticated distance methods display a substantially improved level of performance compared to the traditional Hamming distance method. DCLEAR is open source and freely available from R CRAN and from under the GNU General Public License, version 3.
DCLEAR is a powerful resource for single cell lineage reconstruction.
The breakthrough high-throughput measurement of the cis-regulatory activity of millions of randomly generated promoters provides an unprecedented opportunity to systematically decode the ...cis-regulatory logic that determines the expression values. We developed an end-to-end transformer encoder architecture named Proformer to predict the expression values from DNA sequences. Proformer used a Macaron-like Transformer encoder architecture, where two half-step feed forward (FFN) layers were placed at the beginning and the end of each encoder block, and a separable 1D convolution layer was inserted after the first FFN layer and in front of the multi-head attention layer. The sliding k-mers from one-hot encoded sequences were mapped onto a continuous embedding, combined with the learned positional embedding and strand embedding (forward strand vs. reverse complemented strand) as the sequence input. Moreover, Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data. We empirically determined that this design had significantly better performance than the conventional design such as using the global pooling layer as the output layer for the regression task. These analyses support the notion that Proformer provides a novel method of learning and enhances our understanding of how cis-regulatory sequences determine the expression values.
The mammalian heart has a limited regenerative capacity and typically progresses to heart failure following injury. Here, we defined a hedgehog (HH)-Gli1-Mycn network for cardiomyocyte proliferation ...and heart regeneration from amphibians to mammals. Using a genome-wide screen, we verified that HH signaling was essential for heart regeneration in the injured newt. Next, pharmacological and genetic loss- and gain-of-function of HH signaling demonstrated the essential requirement for HH signaling in the neonatal, adolescent, and adult mouse heart regeneration, and in the proliferation of hiPSC-derived cardiomyocytes. Fate-mapping and molecular biological studies revealed that HH signaling, via a HH-Gli1-Mycn network, contributed to heart regeneration by inducing proliferation of pre-existing cardiomyocytes and not by de novo cardiomyogenesis. Further, Mycn mRNA transfection experiments recapitulated the effects of HH signaling and promoted adult cardiomyocyte proliferation. These studies defined an evolutionarily conserved function of HH signaling that may serve as a platform for human regenerative therapies.
Developmental, stem cell and cancer biologists are interested in the molecular definition of cellular differentiation. Although single-cell RNA sequencing represents a transformational advance for ...global gene analyses, novel obstacles have emerged, including the computational management of dropout events, the reconstruction of biological pathways and the isolation of target cell populations. We develop an algorithm named dpath that applies the concept of metagene entropy and allows the ranking of cells based on their differentiation potential. We also develop self-organizing map (SOM) and random walk with restart (RWR) algorithms to separate the progenitors from the differentiated cells and reconstruct the lineage hierarchies in an unbiased manner. We test these algorithms using single cells from Etv2-EYFP transgenic mouse embryos and reveal specific molecular pathways that direct differentiation programmes involving the haemato-endothelial lineages. This software program quantitatively assesses the progenitor and committed states in single-cell RNA-seq data sets in a non-biased manner.
Abstract
Sonic hedgehog
(
Shh
) is essential for limb development, and the mechanisms that govern the propagation and maintenance of its expression has been well studied; however, the mechanisms that ...govern the initiation of
Shh
expression are incomplete. Here we report that ETV2 initiates
Shh
expression by changing the chromatin status of the developmental limb enhancer, ZRS.
Etv2
expression precedes
Shh
in limb buds, and
Etv2
inactivation prevents the opening of limb chromatin, including the ZRS, resulting in an absence of
Shh
expression.
Etv2
overexpression in limb buds causes nucleosomal displacement at the ZRS, ectopic
Shh
expression, and polydactyly. Areas of nucleosome displacement coincide with ETS binding site clusters. ETV2 also functions as a transcriptional activator of ZRS and is antagonized by ETV4/5 repressors. Known human polydactyl mutations introduce novel ETV2 binding sites in the ZRS, suggesting that ETV2 dosage regulates ZRS activation. These studies identify ETV2 as a pioneer transcription factor (TF) regulating the onset of
Shh
expression, having both a chromatin regulatory role and a transcriptional activation role.
In this study, we constructed a model to predict abnormal cardiac sounds using a diverse set of auscultation data collected from various auscultation positions. Abnormal heart sounds were identified ...by extracting features such as peak intervals and noise characteristics during systole and diastole. Instead of using raw signal data, we transformed them into log-mel 2D spectrograms, which were employed as input variables for the CNN model. The advancement of our model involves integrating a deep learning architecture with feature extraction techniques based on existing knowledge of cardiac data. Specifically, we propose a multi-channel-based heart signal processing (MCHeart) scheme, which incorporates our proposed features into the deep learning model. Additionally, we introduce the ReLCNN model by applying residual blocks and MHA mechanisms to the LCNN architecture. By adding murmur features with a smoothing function and training the ReLCNN model, the weighted accuracy of the model increased from 79.6% to 83.6%, showing a performance improvement of approximately 4% point compared to the LCNN baseline model.