Underlying every microarray experiment is an experimental question that one would like to address. Finding a useful and satisfactory answer relies on careful experimental design and the use of a ...variety of data-mining tools to explore the relationships between genes or reveal patterns of expression. While other sections of this issue deal with these lofty issues, this review focuses on the much more mundane but indispensable tasks of 'normalizing' data from individual hybridizations to make meaningful comparisons of expression levels, and of 'transforming' them to select genes for further analysis and data mining.
As the only nonlinear and the most diverse biological sequence, glycans offer substantial challenges for computational biology. These carbohydrates participate in nearly all biological processes-from ...protein folding to viral cell entry-yet are still not well understood. There are few computational methods to link glycan sequences to functions, and they do not fully leverage all available information about glycans. SweetNet is a graph convolutional neural network that uses graph representation learning to facilitate a computational understanding of glycobiology. SweetNet explicitly incorporates the nonlinear nature of glycans and establishes a framework to map any glycan sequence to a representation. We show that SweetNet outperforms other computational methods in predicting glycan properties on all reported tasks. More importantly, we show that glycan representations, learned by SweetNet, are predictive of organismal phenotypic and environmental properties. Finally, we use glycan-focused machine learning to predict viral glycan binding, which can be used to discover viral receptors.
RNA-Seq analysis in MeV Howe, Eleanor A; Sinha, Raktim; Schlauch, Daniel ...
Bioinformatics,
11/2011, Letnik:
27, Številka:
22
Journal Article
Recenzirano
Odprti dostop
RNA-Seq is an exciting methodology that leverages the power of high-throughput sequencing to measure RNA transcript counts at an unprecedented accuracy. However, the data generated from this process ...are extremely large and biologist-friendly tools with which to analyze it are sorely lacking. MultiExperiment Viewer (MeV) is a Java-based desktop application that allows advanced analysis of gene expression data through an intuitive graphical user interface. Here, we report a significant enhancement to MeV that allows analysis of RNA-Seq data with these familiar, powerful tools. We also report the addition to MeV of several RNA-Seq-specific functions, addressing the differences in analysis requirements between this data type and traditional gene expression data. These tools include automatic conversion functions from raw count data to processed RPKM or FPKM values and differential expression detection and functional annotation enrichment detection based on published methods.
Availability: MeV version 4.7 is written in Java and is freely available for download under the terms of the open-source Artistic License version 2.0. The website (http://mev.tm4.org/) hosts a full user manual as well as a short quick-start guide suitable for new users.
Contact:
johnq@jimmy.harvard.edu
PURPOSE To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression-based "intrinsic" subtypes ...luminal A, luminal B, HER2-enriched, and basal-like. METHODS A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen.
The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. CONCLUSION Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy.
Although all human tissues carry out common processes, tissues are distinguished by gene expression patterns, implying that distinct regulatory programs control tissue specificity. In this study, we ...investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Expression project. We find that network edges (transcription factor to target gene connections) have higher tissue specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that the regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue network. However, they do assume bottleneck positions due to variability in transcription factor targeting and the influence of non-canonical regulatory interactions. These results suggest that tissue specificity is driven by context-dependent regulatory paths, providing transcriptional control of tissue-specific processes.
Display omitted
•Regulatory network connections are more tissue specific than nodes (genes and transcription factors)•Tissue-specific function is not solely regulated by transcription factor expression•Tissue-specific genes assume bottleneck positions in their corresponding networks•Tissue specificity is driven by context-dependent, non-canonical regulatory paths
Understanding gene regulation is important for many fields in biology and medicine. Sonawane et al. reconstruct and investigate regulatory networks for 38 human tissues. They find that regulation of tissue-specific function is largely independent of transcription factor expression and that tissue specificity appears to be mediated by tissue-specific regulatory network paths.
Gene regulatory network (GRN) models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into ...important cellular processes, disease progression, and intervention design. Learning such gene regulatory ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the underlying GRN governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impede either scalability, explainability, or both.
We developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that overcomes limitations of other methods by flexibly incorporating prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of GRN ODEs. We tested the accuracy of PHOENIX in a series of in silico experiments, benchmarking it against several currently used tools. We demonstrated PHOENIX's flexibility by modeling regulation of oscillating expression profiles obtained from synchronized yeast cells. We also assessed the scalability of PHOENIX by modeling genome-scale GRNs for breast cancer samples ordered in pseudotime and for B cells treated with Rituximab.
PHOENIX uses a combination of user-defined prior knowledge and functional forms from systems biology to encode biological "first principles" as soft constraints on the GRN allowing us to predict subsequent gene expression patterns in a biologically explainable manner.
The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the ...performance of risk prediction models; (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework; and (iii) statistically compare the performance of competitive models.
Availability: The R/Bioconductor package survcomp is provided open source under the Artistic-2.0 License with a user manual containing installation, operating instructions and use case scenarios on real datasets. survcomp requires R version 2.13.0 or higher. http://bioconductor.org/packages/release/bioc/html/survcomp.html
Contact:
bhaibeka@jimmy.harvard.edu; mschroed@jimmy.harvard.edu
Supplementary Information:
Supplementary data are available at Bioinformatics online.
Single-cell analysis is a rapidly evolving approach to characterize genome-scale molecular information at the individual cell level. Development of single-cell technologies and computational methods ...has enabled systematic investigation of cellular heterogeneity in a wide range of tissues and cell populations, yielding fresh insights into the composition, dynamics, and regulatory mechanisms of cell states in development and disease. Despite substantial advances, significant challenges remain in the analysis, integration, and interpretation of single-cell omics data. Here, we discuss the state of the field and recent advances and look to future opportunities.
Tumors are characterized by somatic mutations that drive biological processes ultimately reflected in tumor phenotype. With regard to radiographic phenotypes, generally unconnected through present ...understanding to the presence of specific mutations, artificial intelligence methods can automatically quantify phenotypic characters by using predefined, engineered algorithms or automatic deep-learning methods, a process also known as radiomics. Here we demonstrate how imaging phenotypes can be connected to somatic mutations through an integrated analysis of independent datasets of 763 lung adenocarcinoma patients with somatic mutation testing and engineered CT image analytics. We developed radiomic signatures capable of distinguishing between tumor genotypes in a discovery cohort (
= 353) and verified them in an independent validation cohort (
= 352). All radiomic signatures significantly outperformed conventional radiographic predictors (tumor volume and maximum diameter). We found a radiomic signature related to radiographic heterogeneity that successfully discriminated between EGFR
and EGFR
cases (AUC = 0.69). Combining this signature with a clinical model of EGFR status (AUC = 0.70) significantly improved prediction accuracy (AUC = 0.75). The highest performing signature was capable of distinguishing between EGFR
and KRAS
tumors (AUC = 0.80) and, when combined with a clinical model (AUC = 0.81), substantially improved its performance (AUC = 0.86). A KRAS
/KRAS
radiomic signature also showed significant albeit lower performance (AUC = 0.63) and did not improve the accuracy of a clinical predictor of KRAS status. Our results argue that somatic mutations drive distinct radiographic phenotypes that can be predicted by radiomics. This work has implications for the use of imaging-based biomarkers in the clinic, as applied noninvasively, repeatedly, and at low cost.
.