Biomedical text summarization is a critical task for comprehension of an ever-growing amount of biomedical literature. Pre-trained language models (PLMs) with transformer-based architectures have ...been shown to greatly improve performance in biomedical text mining tasks. However, existing methods for text summarization generally fine-tune PLMs on the target corpora directly and do not consider how fine-grained domain knowledge, such as PICO elements used in evidence-based medicine, can help to identify the context needed for generating coherent summaries. To fill the gap, we propose KeBioSum, a novel knowledge infusion training framework, and experiment using a number of PLMs as bases, for the task of extractive summarization on biomedical literature. We investigate generative and discriminative training techniques to fuse domain knowledge (i.e., PICO elements) into knowledge adapters and apply adapter fusion to efficiently inject the knowledge adapters into the basic PLMs for fine-tuning the extractive summarization task. Experimental results from the extractive summarization task on three biomedical literature datasets show that existing PLMs (BERT, RoBERTa, BioBERT, and PubMedBERT) are improved by incorporating the KeBioSum knowledge adapters, and our model outperforms the strong baselines.
•A deep-learning-based metro passenger flow prediction architecture.•Multilayer deep-learning architecture which combine LSTM and FC, etc.•External factors, temporal, spatial and metro operation ...characteristics modules.•are fused.•Multiple variant components are compared with HD, ARIMA and FNN baselines.
This study aims to combine the modeling skills of deep learning and the domain knowledge in transportation into prediction of metro passenger flow. We present an end-to-end deep learning architecture, termed as Deep Passenger Flow (DeepPF), to forecast the metro inbound/outbound passenger flow. The architecture of the model is highly flexible and extendable; thus, enabling the integration and modeling of external environmental factors, temporal dependencies, spatial characteristics, and metro operational properties in short-term metro passenger flow prediction. Furthermore, the proposed framework achieves a high prediction accuracy due to the ease of integrating multi-source data. Numerical experiments demonstrate that the proposed DeepPF model can be extended to general conditions to fit the diverse constraints that exist in the transportation domain.
Robust screening of materials on the basis of structure–property–activity relationships to discover active photocatalysts is a highly sought out aspect of photocatalysis research. Recent advancements ...in machine learning offer considerable opportunities to evolve photocatalysts discovery practices. Machine learning has largely facilitated various areas of science and engineering, including heterogeneous catalysis, but adaptation of it in photocatalysis research is still at an elementary stage. The scarcity of consistent training data is a major bottleneck, and we foresee the integration of photocatalysis domain knowledge in mainstream machine learning protocols as a viable solution. Here, we present a holistic framework incorporating machine learning and domain knowledge to set directions toward accelerated discovery of solar photocatalysts. This Perspective begins with a discussion on domain knowledge available in photocatalysis which could potentially be leveraged to liaise with machine learning methods. Subsequently, we present prevalent machine learning practices in heterogeneous catalysis tailored to assist discovery of photocatalysts in a purely data-driven fashion. Lastly, we conceptualize various strategies for complementing data-driven machine learning with photocatalysis domain knowledge. The strategies involve the following: (i) integration of theoretical and prior empirical knowledge during the training of machine learning models; (ii) embedding the knowledge in feature space; and (iii) utilizing existing material databases to constrain machine learning predictions. The aforementioned human-in-loop framework (leveraging both human and machine intelligence) could possibly mitigate the lack of interpretability and reliability associated with data-driven machine learning and reinforce complex model architectures irrespective of data scarcity. The concept could also offer substantial benefits to photocatalysis informatics by promoting a paradigm shift away from the Edisonian approach.
•A novel integration framework of trajectory prediction is proposed for heterogeneous traffic-agents.•Domain-knowledge associated with data-driven method can balance the conflict between prediction ...accuracy and reality.•The combined method has good generalization to new scenes and universality of DD methods selection.
There is a dilemma regarding the accuracy and reality of vehicle trajectory prediction. Balancing and predicting the effective trajectory is a topic of debate in autonomous driving. We investigated this issue using knowledge-driven and data-driven methods to estimate the performance of the two most common methods and found that improving the accuracy, in reality, is challenging. Therefore, we propose a novel trajectory prediction framework for heterogeneous traffic agents, where knowledge residuals are associated with data-driven methods and correct the results to make them more consistent with actual traffic conditions on the premise of high accuracy. Experiments on six public datasets showed that the proposed framework outperforms benchmarks. With an ablation study, we further verified that our method has a good generalisability for new scenarios and high generality in data-driven model selection.
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science ...(TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
Measurements of domain knowledge very often use and report Cronbach's alpha or similar indicators of internal consistency for test construction. In this short article, we argue that this approach is ...often at odds with the theoretical conception of knowledge underlying the measure. While domain knowledge is usually described as a formative construct (formed by the manifest observations) theoretically, the use of Cronbach's alpha to construct and evaluate an empirical measure implies a reflective model (the construct reflects in manifest behaviors). After illustrating the difference between reflective and formative models, we illustrate how this mismatch between theoretical conception and empirical operationalization can have substantial implications for the assessment and modeling of domain knowledge. Specifically, the construct may be operationalized too narrowly or even be misinterpreted by applying criteria for item selection that focus on homogeneity such as Cronbach's alpha. Rather than maximizing items internal consistency, researchers constructing measures of domain knowledge should, therefore, make strong arguments for the theoretical merit of their items even if they are not correlated to each other.
The dramatic increase in the use of knowledge discovery applications requires end users to write complex database search requests to retrieve information. Such users are not only expected to grasp ...the structural complexity of complex databases but also the semantic relationships between data stored in databases. In order to overcome such difficulties, researchers have been focusing on knowledge representation and interactive query generation through ontologies, with particular emphasis on improving the interface between data and search requests in order to bring the result sets closer to users research requirements. This paper discusses ontology-based information retrieval approaches and techniques by taking into consideration the aspects of ontology modelling, processing and the translation of ontological knowledge into database search requests. It also extensively compares the existing ontology-to-database transformation and mapping approaches in terms of loss of data and semantics, structural mapping and domain knowledge applicability. The research outcomes, recommendations and future challenges presented in this paper can bridge the gap between ontology and relational models to generate precise search requests using ontologies. Moreover, the comparison presented between various ontology-based information retrieval, database-to-ontology transformations and ontology-to-database mappings approaches provides a reference for enhancing the searching capabilities of massively loaded information management systems.
•Develop a highly accurate tree-based model for predicting ship deficiencies in PSC inspection.•Develop a ship inspector scheduling model and propose the concepts of inspection template, un-dominated ...inspection template, and strengthened constraint to reduce problem size and improve computation efficiency as well as model flexibility.•Proposed approaches outperform the current practice at port by over 20%.•The gap between the proposed approaches and the perfect-forecast policy is only about 8%.
Maritime transportation is the backbone of global supply chain. To improve maritime safety, protect the marine environment, and set out seafarers’ rights, port state control (PSC) empowers ports to inspect foreign visiting ships to verify them comply with various international conventions. One critical issue faced by the port states is how to optimally allocate the limited inspection resources for inspecting the visiting ships. To address this issue, this study first develops a state-of-the-art XGBoost model to accurately predict ship deficiency number considering ship generic factors, dynamic factors, and inspection historical factors. Particularly, the XGBoost model takes shipping domain knowledge regarding ship flag, recognized organization, and company performance into account to improve model performance and prediction fairness (e.g., for two ships that are different only in their flag performances, the one with a better flag performance should be predicted to have a better condition than the other). Based on the predictions, a PSC officer (PSCO) scheduling model is proposed to help the maritime authorities optimally allocate inspection resources. Considering that a PSCO can inspect at most four ships in a day, we further propose and incorporate the concepts of inspection template and un-dominated inspection template in the optimization models to reduce problem size as well as improve computation efficiency and model flexibility. Numerical experiments show that the proposed PSCO scheduling model with the predictions of XGBoost as the input is more than 20% better than the current inspection scheme at ports regarding the number of deficiencies detected. In addition, the gap between the proposed model and the model under perfect-forecast policy is only about 8% regarding the number of deficiencies detected. Extensive sensitivity experiments show that the proposed PSCO scheduling model has stable performance and is always better than the current model adopted at ports.