The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is ...challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.
Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression.
We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Effector CD8+ T cell activation and its cytotoxic function are positively correlated with improved survival in breast cancer. tRNA-derived fragments (tRFs) have recently been found to be involved in ...gene regulation in cancer progression. However, it is unclear how interactions between expression of tRFs and T cell activation affect breast cancer patient survival. We used Kaplan–Meier survival and multivariate Cox regression models to evaluate the effect of interactions between expression of tRFs and T cell activation on survival in 1081 breast cancer patients. Spearman correlation analysis and weighted gene co-expression network analysis were conducted to identify genes and pathways that were associated with tRFs. tRFdb-5024a, 5P_tRNA-Leu-CAA-4-1, and ts-49 were positively associated with overall survival, while ts-34 and ts-58 were negatively associated with overall survival. Significant interactions were detected between T cell activation and ts-34 and ts-49. In the T cell exhaustion group, patients with a low level of ts-34 or a high level of ts-49 showed improved survival. In contrast, there was no significant difference in the activation group. Breast cancer related pathways were identified for the five tRFs. In conclusion, the identified five tRFs associated with overall survival may serve as therapeutic targets and improve immunotherapy in breast cancer.
To develop a systems biology model of fibrosis progression within the human lung we performed RNA sequencing and microRNA analysis on 95 samples obtained from 10 idiopathic pulmonary fibrosis (IPF) ...and 6 control lungs. Extent of fibrosis in each sample was assessed by microCT-measured alveolar surface density (ASD) and confirmed by histology. Regulatory gene expression networks were identified using linear mixed-effect models and dynamic regulatory events miner (DREM). Differential gene expression analysis identified a core set of genes increased or decreased before fibrosis was histologically evident that continued to change with advanced fibrosis. DREM generated a systems biology model (www.sb.cs.cmu.edu/IPFReg) that identified progressively divergent gene expression tracks with microRNAs and transcription factors that specifically regulate mild or advanced fibrosis. We confirmed model predictions by demonstrating that expression of POU2AF1, previously unassociated with lung fibrosis but proposed by the model as regulator, is increased in B lymphocytes in IPF lungs and that POU2AF1-knockout mice were protected from bleomycin-induced lung fibrosis. Our results reveal distinct regulation of gene expression changes in IPF tissue that remained structurally normal compared with moderate or advanced fibrosis and suggest distinct regulatory mechanisms for each stage.
Idiopathic Pulmonary Fibrosis (IPF) is a lethal lung disease of unknown etiology. A major limitation in transcriptomic profiling of lung tissue in IPF has been a dependence on snap-frozen fresh ...tissues (FF). In this project we sought to determine whether genome scale transcript profiling using RNA Sequencing (RNA-Seq) could be applied to archived Formalin-Fixed Paraffin-Embedded (FFPE) IPF tissues.
We isolated total RNA from 7 IPF and 5 control FFPE lung tissues and performed 50 base pair paired-end sequencing on Illumina 2000 HiSeq. TopHat2 was used to map sequencing reads to the human genome. On average ~62 million reads (53.4% of ~116 million reads) were mapped per sample. 4,131 genes were differentially expressed between IPF and controls (1,920 increased and 2,211 decreased (FDR < 0.05). We compared our results to differentially expressed genes calculated from a previously published dataset generated from FF tissues analyzed on Agilent microarrays (GSE47460). The overlap of differentially expressed genes was very high (760 increased and 1,413 decreased, FDR < 0.05). Only 92 differentially expressed genes changed in opposite directions. Pathway enrichment analysis performed using MetaCore confirmed numerous IPF relevant genes and pathways including extracellular remodeling, TGF-beta, and WNT. Gene network analysis of MMP7, a highly differentially expressed gene in both datasets, revealed the same canonical pathways and gene network candidates in RNA-Seq and microarray data. For validation by NanoString nCounter® we selected 35 genes that had a fold change of 2 in at least one dataset (10 discordant, 10 significantly differentially expressed in one dataset only and 15 concordant genes). High concordance of fold change and FDR was observed for each type of the samples (FF vs FFPE) with both microarrays (r = 0.92) and RNA-Seq (r = 0.90) and the number of discordant genes was reduced to four.
Our results demonstrate that RNA sequencing of RNA obtained from archived FFPE lung tissues is feasible. The results obtained from FFPE tissue are highly comparable to FF tissues. The ability to perform RNA-Seq on archived FFPE IPF tissues should greatly enhance the availability of tissue biopsies for research in IPF.
Yan and Kaminski discuss the SNP calling analysis, conducted by Yun et al, from bulk RNA sequencing data of lung samples from 1,251 subjects, including disease-free control subjects and patients with ...chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis (IPF), collected by the Lung Tissue Research Consortium. They associated the identified somatic mutations with germline mutations, disease group, and cancer driver genes. When comparing the identified somatic mutations to cancer driver genes, COPD patients showed a significantly higher proportion of cancer driver gene mutations than normal control subjects.
The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma.
We analyzed sputum gene expression for transcriptomic endotypes of ...asthma (TEA), gene signatures that discriminate phenotypes of disease.
Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes.
Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively.
There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.
Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex ...and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.
Chlorophyll–a (Chl–a) concentration is an indicator of phytoplankton pigment, which is associated with the health of marine ecosystems. A commonly used method for the determination of Chl–a is ...satellite remote sensing. However, due to cloud cover, sun glint and other issues, remote sensing data for Chl–a are always missing in large areas. We reconstructed the Chl–a data from MODIS and VIIRS in the Arabian Sea within the geographical range of 12–28° N and 56–76° E from 2020 to 2021 by combining the Data Interpolating Convolutional Auto–Encoder (DINCAE) and the Bayesian Maximum Entropy (BME) methods, which we named the DINCAE–BME framework. The hold–out validation method was used to assess the DINCAE–BME method’s performance. The root–mean–square–error (RMSE) and the mean–absolute–error (MAE) values for the hold–out cross–validation result obtained by the DINCAE–BME were 1.8824 mg m−3 and 0.4682 mg m−3, respectively; compared with in situ Chl–a data, the RMSE and MAE values for the DINCAE–BME–generated Chl–a product were 0.6196 mg m−3 and 0.3461 mg m−3, respectively. Moreover, DINCAE–BME exhibited better performance than the DINEOF and DINCAE methods. The spatial distribution of the Chl–a product showed that Chl–a values in the coastal region were the highest and the Chl–a values in the deep–sea regions were stable, while the Chl–a values in February and March were higher than in other months. Lastly, this study demonstrated the feasibility of combining the BME method and DINCAE.