Deep learning (DL) has proved successful in medical imaging and, in the wake of the recent COVID-19 pandemic, some works have started to investigate DL-based solutions for the assisted diagnosis of ...lung diseases. While existing works focus on CT scans, this paper studies the application of DL techniques for the analysis of lung ultrasonography (LUS) images. Specifically, we present a novel fully-annotated dataset of LUS images collected from several Italian hospitals, with labels indicating the degree of disease severity at a frame-level, video-level, and pixel-level (segmentation masks). Leveraging these data, we introduce several deep models that address relevant tasks for the automatic analysis of LUS images. In particular, we present a novel deep network, derived from Spatial Transformer Networks, which simultaneously predicts the disease severity score associated to a input frame and provides localization of pathological artefacts in a weakly-supervised way. Furthermore, we introduce a new method based on uninorms for effective frame score aggregation at a video-level. Finally, we benchmark state of the art deep models for estimating pixel-level segmentations of COVID-19 imaging biomarkers. Experiments on the proposed dataset demonstrate satisfactory results on all the considered tasks, paving the way to future research on DL for the assisted diagnosis of COVID-19 from LUS data.
The centrality of the decision maker (DM) is widely recognized in the multiple criteria decision-making community. This translates into emphasis on seamless human-computer interaction, and adaptation ...of the solution technique to the knowledge which is progressively acquired from the DM. This paper adopts the methodology of reactive search optimization (RSO) for evolutionary interactive multiobjective optimization. RSO follows to the paradigm of "learning while optimizing," through the use of online machine learning techniques as an integral part of a self-tuning optimization scheme. User judgments of couples of solutions are used to build robust incremental models of the user utility function, with the objective to reduce the cognitive burden required from the DM to identify a satisficing solution. The technique of support vector ranking is used together with a k-fold cross-validation procedure to select the best kernel for the problem at hand, during the utility function training procedure. Experimental results are presented for a series of benchmark problems.
Research on Explainable Artificial Intelligence has recently started exploring the idea of producing explanations that, rather than being expressed in terms of low-level features, are encoded in ...terms of
. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post hoc explainers and
neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem:
. The key challenge in human-interpretable representation learning (hrl) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring
suitable for both post hoc explainers and concept-based neural networks. Our formalization of hrl builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us derive a principled notion of
between the machine's representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive
game, and clarify the relationship between alignment and a well-known property of representations, namely
. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as
, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.
The classical view on eukaryotic gene expression proposes the scheme of a forward flow for which fluctuations in mRNA levels upon a stimulus contribute to determine variations in mRNA availability ...for translation. Here we address this issue by simultaneously profiling with microarrays the total mRNAs (the transcriptome) and the polysome-associated mRNAs (the translatome) after EGF treatment of human cells, and extending the analysis to other 19 different transcriptome/translatome comparisons in mammalian cells following different stimuli or undergoing cell programs.
Triggering of the EGF pathway results in an early induction of transcriptome and translatome changes, but 90% of the significant variation is limited to the translatome and the degree of concordant changes is less than 5%. The survey of other 19 different transcriptome/translatome comparisons shows that extensive uncoupling is a general rule, in terms of both RNA movements and inferred cell activities, with a strong tendency of translation-related genes to be controlled purely at the translational level. By different statistical approaches, we finally provide evidence of the lack of dependence between changes at the transcriptome and translatome levels.
We propose a model of diffused independency between variation in transcript abundances and variation in their engagement on polysomes, which implies the existence of specific mechanisms to couple these two ways of regulating gene expression.
The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of ...related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale.
We present OCELOT, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as OCELOT), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning.
Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors’ increasing attention to ...Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a challenge in efficiently gathering and aligning the data into a unified framework to derive insights related to Corporate Social Responsibility (CSR). Thus, using Information Extraction (IE) methods becomes an intuitive choice for delivering insightful and actionable data to stakeholders. In this study, we employ Large Language Models (LLMs), In-Context Learning, and the Retrieval-Augmented Generation (RAG) paradigm to extract structured insights related to ESG aspects from companies’ sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights. These analyses revealed that ESG criteria cover a wide range of topics, exceeding 500, often beyond those considered in existing categorizations, and are addressed by companies through a variety of initiatives. Moreover, disclosure similarities emerged among companies from the same region or sector, validating ongoing hypotheses in the ESG literature. Lastly, by incorporating additional company attributes into our analyses, we investigated which factors impact the most on companies’ ESG ratings, showing that ESG disclosure affects the obtained ratings more than other financial or company data.
Bundle recommendation aims to generate bundles of associated products that users tend to consume as a whole under certain circumstances. Modeling the bundle utility for users is a non-trivial task, ...as it requires to account for the potential interdependencies between bundle attributes. To address this challenge, we introduce a new preference-based approach for bundle recommendation exploiting the Choquet integral. This allows us to formalize preferences for coalitions of environmental-related attributes, thus recommending product bundles accounting for synergies among product attributes. An experimental evaluation of a dataset of local food products in Northern Italy shows how the Choquet integral allows the natural formalization of a sensible notion of environmental friendliness and that standard approaches based on weighted sums of attributes end up recommending bundles with lower environmental friendliness even if weights are explicitly learned to maximize it. We further show how preference elicitation strategies can be leveraged to acquire weights of the Choquet integral from user feedback in terms of preferences over candidate bundles, and show how a handful of queries allow to recommend optimal bundles for a diverse set of user prototypes.
Various studies have investigated the predictability of different aspects of human behavior such as mobility patterns, social interactions, and shopping and online behaviors. However, the existing ...researches have been often limited to a single or to the combination of few behavioral dimensions, and they have adopted the perspective of an outside observer who is unaware of the motivations behind the specific behaviors or activities of a given individual. The key assumption of this work is that human behavior is deliberated based on an individual’s own perception of the situation that s/he is in, and that therefore it should also be studied under the same perspective. Taking inspiration from works in ubiquitous and context-aware computing, we investigate the role played by four contextual dimensions (or modalities), namely time, location, activity being carried out, and social ties, on the predictability of individuals’ behaviors, using a month of collected mobile phone sensor readings and self-reported annotations about these contextual modalities from more than two hundred study participants. Our analysis shows that any target modality (e.g. location) becomes substantially more predictable when information about the other modalities (time, activity, social ties) is made available. Multi-modality turns out to be in some sense fundamental, as some values (e.g. specific activities like “shopping”) are nearly impossible to guess correctly unless the other modalities are known. Subjectivity also has a substantial impact on predictability. A location recognition experiment suggests that subjective location annotations convey more information about activity and social ties than objective information derived from GPS measurements. We conclude the paper by analyzing how the identified contextual modalities allow to compute the diversity of personal behavior, where we show that individuals are more easily identified by rarer, rather than frequent, context annotations. These results offer support in favor of developing innovative computational models of human behaviors enriched by a characterization of the context of a given behavior.
Mobile Crowd Sensing (MCS) is a novel IoT paradigm where sensor data, as collected by the user’s mobile devices, are integrated with user-generated content, e.g., annotations, self-reports, or ...images. While providing many advantages, the human involvement also brings big challenges, where the most critical is possibly the poor quality of human-provided content, most often due to the inaccurate input from non-expert users. In this paper, we propose Skeptical Learning, an interactive machine learning algorithm where the machine checks the quality of the user feedback and tries to fix it when a problem arises. In this context, the user feedback consists of answers to machine generated questions, at times defined by the machine. The main idea is to integrate three core elements, which are (i) sensor data, (ii) user answers, and (iii) existing prior knowledge of the world, and to enable a second round of validation with the user any time these three types of information jointly generate an inconsistency. The proposed solution is evaluated in a project focusing on a university student life scenario. The main goal of the project is to recognize the locations and transportation modes of the students. The results highlight an unexpectedly high pervasiveness of user mistakes in the university students life project. The results also shows the advantages provided by Skeptical Learning in dealing with the mislabeling issues in an interactive way and improving the prediction performance.
Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, ...by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues.
We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood.
Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.