In various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the ...term "modality" for each such acquisition framework. Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or "challenges," are common to multiple domains. This paper deals with two key issues: "why we need data fusion" and "how we perform it." The first issue is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second issue, "diversity" is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects, and the opportunities that it holds.
Functional traits offer a rich quantitative framework for developing and testing theories in evolutionary biology, ecology and ecosystem science. However, the potential of functional traits to drive ...theoretical advances and refine models of global change can only be fully realised when species‐level information is complete. Here we present the AVONET dataset containing comprehensive functional trait data for all birds, including six ecological variables, 11 continuous morphological traits, and information on range size and location. Raw morphological measurements are presented from 90,020 individuals of 11,009 extant bird species sampled from 181 countries. These data are also summarised as species averages in three taxonomic formats, allowing integration with a global phylogeny, geographical range maps, IUCN Red List data and the eBird citizen science database. The AVONET dataset provides the most detailed picture of continuous trait variation for any major radiation of organisms, offering a global template for testing hypotheses and exploring the evolutionary origins, structure and functioning of biodiversity.
Existing morphological trait datasets for major taxonomic groups are highly incomplete, limiting their utility to ecologists and evolutionary biologists. We present a global dataset containing comprehensive morphological information, coupled with ecological and geographical variables, for all bird species. This detailed assessment of continuous trait variation across 11,009 species offers a global template for testing hypotheses and exploring the evolutionary origins, structure and functioning of biodiversity.
Defining cell types requires integrating diverse single-cell measurements from multiple experiments and biological contexts. To flexibly model single-cell datasets, we developed LIGER, an algorithm ...that delineates shared and dataset-specific features of cell identity. We applied it to four diverse and challenging analyses of human and mouse brain cells. First, we defined region-specific and sexually dimorphic gene expression in the mouse bed nucleus of the stria terminalis. Second, we analyzed expression in the human substantia nigra, comparing cell states in specific donors and relating cell types to those in the mouse. Third, we integrated in situ and single-cell expression data to spatially locate fine subtypes of cells present in the mouse frontal cortex. Finally, we jointly defined mouse cortical cell types using single-cell RNA-seq and DNA methylation profiles, revealing putative mechanisms of cell-type-specific epigenomic regulation. Integrative analyses using LIGER promise to accelerate investigations of cell-type definition, gene regulation, and disease states.
Display omitted
•Shared and dataset-specific metagene factors enable single-cell data integration•LIGER reveals inter-individual differences in bed nucleus and substantia nigra cells•Integration of in situ and dissociated scRNA-seq maps cell types in space•Joint definition of cortical cell types from single-cell RNA and epigenome profiles
A platform called LIGER allows for the integration of gene expression, epigenetic regulation, and spatial relationships across single-cell datasets.
•The 1D-CNN-based vibro-acoustic sensor data fusion (VAF) algorithm is proposed for bearing fault diagnosis.•Multi-modal sensors are used to collect simultaneously the vibration and acoustic signals ...as inputs.•A visualization analysis is conducted to investigate the inner mechanism of the proposed method.
Bearing fault diagnosis is an important part of rotating machinery maintenance. Existing diagnosis methods based on single-modal signals not only have unsatisfactory accuracy, but also bear the inherent risk of being misguided by single-modal signal noise. A new method is put forward that fuses multi-modal sensor signals, i.e. the data collected by an accelerometer and a microphone, to realize more accurate and robust bearing-fault diagnosis. The proposed method extracts features from raw vibration signals and acoustic signals, and fuses them using the 1D-CNN-based networks. Extensive experimental results obtained on ten groups of bearings are used to evaluate the performance of the proposed method. By analyzing the loss function and accuracy rate under different SNRs, it is empirically found that the proposed method achieves higher rate of diagnosis accuracy than the algorithms based on a single-modal sensor. Moreover, a visualization analysis is also conducted to investigate the inner mechanism of the proposed 1D-CNN-based method.
Single-cell atlases often include samples that span locations, laboratories and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable ...data integration. To guide integration method choice, we benchmarked 68 method and preprocessing combinations on 85 batches of gene expression, chromatin accessibility and simulation data from 23 publications, altogether representing >1.2 million cells distributed in 13 atlas-level integration tasks. We evaluated methods according to scalability, usability and their ability to remove batch effects while retaining biological variation using 14 evaluation metrics. We show that highly variable gene selection improves the performance of data integration methods, whereas scaling pushes methods to prioritize batch removal over conservation of biological variation. Overall, scANVI, Scanorama, scVI and scGen perform well, particularly on complex integration tasks, while single-cell ATAC-sequencing integration performance is strongly affected by choice of feature space. Our freely available Python module and benchmarking pipeline can identify optimal data integration methods for new data, benchmark new methods and improve method development.
Effective asset management plays a significant role in delivering the functionality and serviceability of buildings. However, there is a lack of efficient strategies and comprehensive approaches for ...managing assets and their associated data that can help to monitor, detect, record, and communicate operation and maintenance (O&M) issues. With the importance of Digital Twin (DT) concepts being proven in the architecture, engineering, construction and facility management (AEC/FM) sectors, a DT-enabled anomaly detection system for asset monitoring and its data integration method based on extended industry foundation classes (IFC) in daily O&M management are provided in this study. This paper presents a novel IFC-based data structure, using which a set of monitoring data that carries diagnostic information on the operational condition of assets is extracted from building DTs. Considering that assets run under changing loads determined by human demands, a Bayesian change point detection methodology that handles the contextual features of operational data is adopted to identify and filter contextual anomalies through cross-referencing with external operation information. Using the centrifugal pumps in the heating, ventilation and air-cooling (HVAC) system as a case study, the results indicate and prove that the novel DT-based anomaly detection process flow realizes a continuous anomaly detection of pumps, which contributes to efficient and automated asset monitoring in O&M.
•Research on daily O&M management and anomaly detection for asset were summarised•A new DT-based automated anomaly detection process flow is proposed•The data integration based on IFC and extension of O&M activities was developed•Bayesian change point detection was adopted to contextually indicate anomalies
Radar and camera information fusion sensing methods are used to solve the inherent shortcomings of the single sensor in severe weather. Our fusion scheme uses radar as the main hardware and camera as ...the auxiliary hardware framework. At the same time, the Mahalanobis distance is used to match the observed values of the target sequence. Data fusion based on the joint probability function method. Moreover, the algorithm was tested using actual sensor data collected from a vehicle, performing real-time environment perception. The test results show that radar and camera fusion algorithms perform better than single sensor environmental perception in severe weather, which can effectively reduce the missed detection rate of autonomous vehicle environment perception in severe weather. The fusion algorithm improves the robustness of the environment perception system and provides accurate environment perception information for the decision-making system and control system of autonomous vehicles.