Abstract
We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The ...package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.
With the advancement of sequencing methodologies, the acquisition of vast amounts of multi-omics data presents a significant opportunity for comprehending the intricate biological mechanisms ...underlying diseases and achieving precise diagnosis and treatment for complex disorders. However, as diverse omics data are integrated, extracting sample-specific features within each omics modality and exploring potential correlations among different modalities while avoiding mutual interference becomes a critical challenge in multi-omics data integration research. In the context of this study, we proposed a framework that unites specificity-aware GATs and cross-modal attention to integrate different omics data (MOSGAT). To be specific, we devise Graph Attention Networks (GATs) tailored for each omics modality data to perform feature extraction on samples. Additionally, an adaptive confidence attention weighting technique is incorporated to enhance the confidence in the extracted features. Finally, a cross-modal attention mechanism was devised based on multi-head self-attention, thoroughly uncovering potential correlations between different omics data. Extensive experiments were conducted on four publicly available medical datasets, highlighting the superiority of the proposed framework when compared to state-of-the-art methodologies, particularly in the realm of classification tasks. The experimental results underscore MOSGAT's effectiveness in extracting features and exploring potential inter-omics associations.
Soil carbon has been measured for over a century in applications ranging from understanding biogeochemical processes in natural ecosystems to quantifying the productivity and health of managed ...systems. Consolidating diverse soil carbon datasets is increasingly important to maximize their value, particularly with growing anthropogenic and climate change pressures. In this progress report, we describe recent advances in soil carbon data led by the International Soil Carbon Network and other networks. We highlight priority areas of research requiring soil carbon data, including (a) quantifying boreal, arctic and wetland carbon stocks, (b) understanding the timescales of soil carbon persistence using radiocarbon and chronosequence studies, (c) synthesizing long-term and experimental data to inform carbon stock vulnerability to global change, (d) quantifying root influences on soil carbon and (e) identifying gaps in model–data integration. We also describe the landscape of soil datasets currently available, highlighting their strengths, weaknesses and synergies. Now more than ever, integrated soil data are needed to inform climate mitigation, land management and agricultural practices. This report will aid new data users in navigating various soil databases and encourage scientists to make their measurements publicly available and to join forces to find soil-related solutions.
Abstract
Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with ...each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.
Ggtree is a comprehensive R package for visualizing and annotating phylogenetic trees with associated data. It can also map and visualize associated external data on phylogenies with two general ...methods. Method 1 allows external data to be mapped on the tree structure and used as visual characteristic in tree and data visualization. Method 2 plots the data with the tree side by side using different geometric functions after reordering the data based on the tree structure. These two methods integrate data with phylogeny for further exploration and comparison in the evolutionary biology context. Ggtree is available from http://www.bioconductor.org/packages/ggtree.
Various forms of machine learning (ML) methods have historically played a valuable role in environmental remote sensing research. With an increasing amount of “big data” from earth observation and ...rapid advances in ML, increasing opportunities for novel methods have emerged to aid in earth environmental monitoring. Over the last decade, a typical and state-of-the-art ML framework named deep learning (DL), which is developed from the traditional neural network (NN), has outperformed traditional models with considerable improvement in performance. Substantial progress in developing a DL methodology for a variety of earth science applications has been observed. Therefore, this review will concentrate on the use of the traditional NN and DL methods to advance the environmental remote sensing process. First, the potential of DL in environmental remote sensing, including land cover mapping, environmental parameter retrieval, data fusion and downscaling, and information reconstruction and prediction, will be analyzed. A typical network structure will then be introduced. Afterward, the applications of DL environmental monitoring in the atmosphere, vegetation, hydrology, air and land surface temperature, evapotranspiration, solar radiation, and ocean color are specifically reviewed. Finally, challenges and future perspectives will be comprehensively analyzed and discussed.
•The potential of deep learning (DL) in environmental remote sensing is analyzed.•Typical DL network architectures in remote sensing applications are introduced.•Progress on DL in remote sensing of ten more environmental parameters is reviewed.•New insights on combining DL and physical/geographical laws are discussed.
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have ...developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi