Because of its ability to find complex patterns in high dimensional and heterogeneous data, machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and ...genomic data available. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights. Here, we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.
Machine learning (ML) has emerged as a powerful tool for harnessing big biological data. The complex structure underlying ML models can potentially provide insights into the problems they are used to solve.Because of model complexity, their inner logic is not readily intelligible to a human, hence the common critique of ML models as black boxes.However, advances in the field of interpretable ML have made it possible to identify important patterns and features underlying an ML model using various strategies.These interpretation strategies have been applied in genetics and genomics to derive novel biological insights from ML models.This area of research is becoming increasingly important as more complex and difficult-to-interpret ML approaches (i.e., deep learning) are being adopted by biologists.
The rise of sophisticated black-box machine learning models in Artificial Intelligence systems has prompted the need for explanation methods that reveal how these models work in an understandable way ...to users and decision makers. Unsurprisingly, the state-of-the-art exhibits currently a plethora of explainers providing many different types of explanations. With the aim of providing a compass for researchers and practitioners, this paper proposes a categorization of explanation methods from the perspective of the type of explanation they return, also considering the different input data formats. The paper accounts for the most representative explainers to date, also discussing similarities and discrepancies of returned explanations through their visual appearance. A companion website to the paper is provided as a continuous update to new explainers as they appear. Moreover, a subset of the most robust and widely adopted explainers, are benchmarked with respect to a repertoire of quantitative metrics.
Cyanobacterial blooms are a common and serious problem in global freshwater environments. However, the response mechanisms of various cyanobacterial genera to multiple nutrients and pollutants, as ...well as the factors driving their competitive dominance, remain unclear or controversial. The relative abundance and cell density of two dominant cyanobacterial genera (i.e., Cyanobium and Microcystis) in river ecosystems along a gradient of anthropogenic disturbance were predicted by random forest with post-interpretability based on physicochemical indices. Results showed that the optimized predictions all reached strong fitting with R2 > 0.75, and conventional water quality indices played a dominant role. One-dimensional and two-dimensional partial dependence plot (PDP) revealed that the responses of Cyanobium and Microcystis to nutrients and temperature were similar, but they showed differences in preferrable nutrient utilization and response to pollutants. Further prediction and PDP for the ratio of Cyanobium and Microcystis unveiled that their distinct responses to PAHs and SPAHs were crucial drivers for their competitive dominance over each other. This study presents a new way for analyzing the response of cyanobacterial genera to multiple environmental factors and their dominance relationships by interpretable machine learning, which is suitable for the identification and interpretation of high-dimensional nonlinear ecosystems with complex interactions.
The dominant genus (B350)
Microcystis (B372) of cyanobacteria
80 physicochemical indices (PCIs)
1044 bacterial operational taxonomic units (OTUs) Display omitted
•Random forest predicted the growth and ratio of Cyanobium and Microcystis.•Interpretable machine learning uncovered their environmental responses.•2D PDP revealed the interaction of nutrients, pollutants and temperature on them.•The two genera were different in preferred nitrogen source and phosphorus demand.•PAHs and SPAHs were crucial to their mutual competitive dominance.
Abstract
Interpretable machine learning aims at unveiling the reasons behind predictions returned by uninterpretable classifiers. One of the most valuable types of explanation consists of ...counterfactuals. A counterfactual explanation reveals what should have been different in an instance to observe a diverse outcome. For instance, a bank customer asks for a loan that is rejected. The counterfactual explanation consists of what should have been different for the customer in order to have the loan accepted. Recently, there has been an explosion of proposals for counterfactual explainers. The aim of this work is to survey the most recent explainers returning counterfactual explanations. We categorize explainers based on the approach adopted to return the counterfactuals, and we label them according to characteristics of the method and properties of the counterfactuals returned. In addition, we visually compare the explanations, and we report quantitative benchmarking assessing minimality, actionability, stability, diversity, discriminative power, and running time. The results make evident that the current state of the art does not provide a counterfactual explainer able to guarantee all these properties simultaneously.
Recently, a significant amount of research has been investigated on interpretation of deep neural networks (DNNs) which are normally processed as black box models. Among the methods that have been ...developed, local interpretation methods stand out which have the features of clear expression in interpretation and low computation complexity. Different from existing surveys which cover a broad range of methods on interpretation of DNNs, this survey focuses on local interpretation methods with an in-depth analysis of the representative works including the newly proposed approaches. From the perspective of principles, we first divide local interpretation methods into two main categories: model-driven methods and data-driven methods. Then we make a fine-grained distinction between the two types of these methods, and highlight the latest ideas and principles. We further demonstrate the effects of a number of interpretation methods by reproducing the results through open source software plugins. Finally, we point out research directions in this rapidly evolving field.
Forest biomass is an essential indicator of forest ecosystem carbon cycle and global climate change research, and traditional machine learning cannot explain the mechanism of feature variable impact ...on forest aboveground biomass (AGB). Therefore, we proposed an interpretable bamboo forest AGB prediction method based on Shaply Additive exPlanation (SHAP) and XGBoost model to explain the impact mechanism of feature variables on AGB. The bamboo forest AGB is estimated using the monthly and annual scale leaf area index (LAI), enhanced vegetation index (EVI), ratio vegetation index (RVI), precipitation (Pre), maximum temperature (Tmax), minimum temperature (Tmin) and solar radiation (Rad) data. The results showed that the method could be effectively predict AGB, and precipitation more important than temperature. The framework revealed the threshold effect, exceeded the threshold value, the impacts of LAI_Ann, EVI_Ann, and Pre_11 on AGB were stable. The SHAP interaction value between LAI_Ann and EVI_Ann decreased with increasing EVI_Ann and LAI_Ann. By contrast, when Pre_11 increased, the SHAP interaction value between LAI_Ann and Pre_11 increased with increasing LAI_Ann. The framework could also be easily implemented, providing an interpretable machine learning model of forest AGB.
•The interpretable AGB prediction framework enabling both the SHAP and XGBoost model is proposed.•The SHAP method can improve the feature selection process.•The framework can in-depth interpret effect of multiple variables on bamboo forest AGB prediction.•Precipitation impact on AGB more important than temperature in climatic factors.
Artificial intelligence (AI) is currently being utilized in a wide range of sophisticated applications, but the outcomes of many AI models are challenging to comprehend and trust due to their ...black-box nature. Usually, it is essential to understand the reasoning behind an AI model’s decision-making. Thus, the need for eXplainable AI (XAI) methods for improving trust in AI models has arisen. XAI has become a popular research subject within the AI field in recent years. Existing survey papers have tackled the concepts of XAI, its general terms, and post-hoc explainability methods but there have not been any reviews that have looked at the assessment methods, available tools, XAI datasets, and other related aspects. Therefore, in this comprehensive study, we provide readers with an overview of the current research and trends in this rapidly emerging area with a case study example. The study starts by explaining the background of XAI, common definitions, and summarizing recently proposed techniques in XAI for supervised machine learning. The review divides XAI techniques into four axes using a hierarchical categorization system: (i) data explainability, (ii) model explainability, (iii) post-hoc explainability, and (iv) assessment of explanations. We also introduce available evaluation metrics as well as open-source packages and datasets with future research directions. Then, the significance of explainability in terms of legal demands, user viewpoints, and application orientation is outlined, termed as XAI concerns. This paper advocates for tailoring explanation content to specific user types. An examination of XAI techniques and evaluation was conducted by looking at 410 critical articles, published between January 2016 and October 2022, in reputed journals and using a wide range of research databases as a source of information. The article is aimed at XAI researchers who are interested in making their AI models more trustworthy, as well as towards researchers from other disciplines who are looking for effective XAI methods to complete tasks with confidence while communicating meaning from data.
•A novel four-axis framework to examine a model for robustness and explainability.•Formulation of research questions at each axis and its corresponding taxonomy.•Discussion of different explainability assessment methods.•A novel methodological workflow for determining the model and explainability criteria.•Revisited discussion on challenges and future directions of XAI and Trustworthy AI.