Client knowledge remains a key strategic point in hospitality management. However, the role that can be played by large amounts of available information in the Customer Relationship Management (CRM) ...systems, when addressed by using emerging Big Data techniques for efficient client profiling, is still in its early stages. In this work, we addressed the client profile of the data in a CRM system of an international hotel chain, by using Big Data technology and Bootstrap resampling techniques for Proportion Tests. Strong consistency was found on the most representative feature of repeaters being traveling without children. Profiles were more similar for British and German clients, and their main differences with Spanish clients were in the stay duration and in age. For a vacation chain, these results suggest further analysis on the target orientation towards new market segments. Big Data technologies can be extremely useful for analyzing indoor data available in CRM information systems from hospitality industry.
•Big Data technology is useful yielding specific guest profiles from CRM hotel information.•Proportion differences using Bootstrap resampling yield detailed characterization of the client profile.•Repeaters included features such as traveling without children.•Repeaters were similar between German and English but different from Spanish.•German and English included the elderly staying for longer periods whereas Spanish were short-term and younger ones.
In the last few years, there has been a growing expectation created about the analysis of large amounts of data often available in organizations, which has been both scrutinized by the academic world ...and successfully exploited by industry. Nowadays, two of the most common terms heard in scientific circles are Big Data and Deep Learning. In this double review, we aim to shed some light on the current state of these different, yet somehow related branches of Data Science, in order to understand the current state and future evolution within the healthcare area. We start by giving a simple description of the technical elements of Big Data technologies, as well as an overview of the elements of Deep Learning techniques, according to their usual description in scientific literature. Then, we pay attention to the application fields that can be said to have delivered relevant real-world success stories, with emphasis on examples from large technology companies and financial institutions, among others. The academic effort that has been put into bringing these technologies to the healthcare sector are then summarized and analyzed from a twofold view as follows: first, the landscape of application examples is globally scrutinized according to the varying nature of medical data, including the data forms in electronic health recordings, medical time signals, and medical images; second, a specific application field is given special attention, in particular the electrocardiographic signal analysis, where a number of works have been published in the last two years. A set of toy application examples are provided with the publicly-available MIMIC dataset, aiming to help the beginners start with some principled, basic, and structured material and available code. Critical discussion is provided for current and forthcoming challenges on the use of both sets of techniques in our future healthcare.
•Interpretability of the solution is provided by a novel feature selection algorithm.•Relevant, redundant and non-informative input variables are identified.•Analysis of weights learned by resampling ...allows to clarify relations among variables.•Improvement in the interpretability of the results and in classification performance.
There is nowadays an increasing interest in discovering relationships among input variables (also called features) from data to provide better interpretability, which yield more confidence in the solution and provide novel insights about the nature of the problem at hand. We propose a novel feature selection method, called Informative Variable Identifier (IVI), capable of identifying the informative variables and their relationships. It transforms the input-variable space distribution into a coefficient-feature space using existing linear classifiers or a more efficient weight generator that we also propose, Covariance Multiplication Estimator (CME). Informative features and their relationships are determined analyzing the joint distribution of these coefficients with resampling techniques. IVI and CME select the informative variables and then pass them on to any linear or nonlinear classifier. Experiments show that the proposed approach can outperform state-of-art algorithms in terms of feature identification capabilities, and even in classification performance when subsequent classifiers are used.
In recent years, attention has been paid to wireless sensor networks (WSNs) applied to precision agriculture. However, few studies have compared the technologies of different communication standards ...in terms of topology and energy efficiency. This paper presents the design and implementation of the hardware and software of three WSNs with different technologies and topologies of wireless communication for tomato greenhouses in the Andean region of Ecuador, as well as the comparative study of the performance of each of them. Two companion papers describe the study of the dynamics of the energy consumption and of the monitored variables. Three WSNs were deployed, two of them with the IEEE 802.15.4 standard with star and mesh topologies (ZigBee and DigiMesh, respectively), and a third with the IEEE 802.11 standard with access point topology (WiFi). The measured variables were selected after investigation of the climatic conditions required for efficient tomato growth. The measurements for each variable could be displayed in real time using either a laboratory virtual instrument engineering workbench (LabVIEWTM) interface or an Android mobile application. The comparative study of the three networks made evident that the configuration of the DigiMesh network is the most complex for adding new nodes, due to its mesh topology. However, DigiMesh maintains the bit rate and prevents data loss by the location of the nodes as a function of crop height. It has been also shown that the WiFi network has better stability with larger precision in its measurements.
Although certain genetic alterations have been defined as predictive and prognostic biomarkers in the context of ovarian cancer (OC), data science methods represent alternative approaches to identify ...novel correlations and define relevant markers in these gynecological tumors. Considering this potential, our work focused both on clinical and genomic data information collected from patients with OC to identify relationships between clinical and genetic factors and disease progression-related variables. For this aim, we proposed two analyses: (1) a nonlinear exploration of an OC dataset using autoencoders, a type of neural network that can be used as a feature extraction tool to represent a dataset in 3-dimensional latent space, so that we could assess whether there are intrinsic or natural nonlinear separability patterns between disease progression groups (in our case, platinum-sensitive and platinum-resistant groups); and (2) the identification of relevant variable relationships by means of an adaptation of the informative variable identifier (IVI), a feature selection method that labels each input feature as informative or noisy with respect to the task at hand, identifies the relationships among features, and builds a ranking of features, allowing us to study which input features and relationships may be most informative for the OC disease progression classification to define new biomarkers involved in disease progression. Our interest has been in clinical and genetic factors and in the combination of clinical features and genetic profile. Results with autoencoders suggest a pattern of separability between disease progression groups in the clinical part and for the combination of genes and clinical features of the OC dataset, that is increased via supervised fine tuning. In the genetic part, this pattern of separability is not observed, but it is more defined when a supervised fine tuning is performed. Results of the IVI-mediated feature selection method show significance for relevant clinical variables (such as type of surgery and neoadjuvant chemotherapy), some mutation genes, and low-risk genetic features. These results highlight the efficacy of the considered approaches to better understand the clinical course of OC.
•Data science methods are suitable for identifying biomarkers in OC.•Feature selection methods show predictive roles of variables in an OC dataset.•Features extraction methods reveal some patterns of separability in an OC dataset.
Healthcare buildings exhibit a different electrical load predictability depending on their size and nature. Large hospitals behave similarly to small cities, whereas primary care centers are expected ...to have different consumption dynamics. In this work, we jointly analyze the electrical load predictability of a large hospital and that of its associated primary care center. An unsupervised load forecasting scheme using combined classic methods of principal component analysis (PCA) and autoregressive (AR) modeling, as well as a supervised scheme using orthonormal partial least squares (OPLS), are proposed. Both methods reduce the dimensionality of the data to create an efficient and low-complexity data representation and eliminate noise subspaces. Because the former method tended to underestimate the load and the latter tended to overestimate it in the large hospital, we also propose a convex combination of both to further reduce the forecasting error. The analysis of data from 7 years in the hospital and 3 years in the primary care center shows that the proposed low-complexity dynamic models are flexible enough to predict both types of consumption at practical accuracy levels.
During the last years, Electrocardiographic Imaging (ECGI) has emerged as a powerful and promising clinical tool to support cardiologists. Starting from a plurality of potential measurements on the ...torso, ECGI yields a noninvasive estimation of their causing potentials on the epicardium. This unprecedented amount of measured cardiac signals needs to be conditioned and adapted to current knowledge and methods in cardiac electrophysiology in order to maximize its support to the clinical practice. In this setting, many cardiac indices are defined in terms of the so-called bipolar electrograms, which correspond with differential potentials between two spatially close potential measurements. Our aim was to contribute to the usefulness of ECGI recordings in the current knowledge and methods of cardiac electrophysiology. For this purpose, we first analyzed the basic stages of conventional cardiac signal processing and scrutinized the implications of the spatial-temporal nature of signals in ECGI scenarios. Specifically, the stages of baseline wander removal, low-pass filtering, and beat segmentation and synchronization were considered. We also aimed to establish a mathematical operator to provide suitable bipolar electrograms from the ECGI-estimated epicardium potentials. Results were obtained on data from an infarction patient and from a healthy subject. First, the low-frequency and high-frequency noises are shown to be non-independently distributed in the ECGI-estimated recordings due to their spatial dimension. Second, bipolar electrograms are better estimated when using the criterion of the maximum-amplitude difference between spatial neighbors, but also a temporal delay in discrete time of about 40 samples has to be included to obtain the usual morphology in clinical bipolar electrograms from catheters. We conclude that spatial-temporal digital signal processing and bipolar electrograms can pave the way towards the usefulness of ECGI recordings in the cardiological clinical practice. The companion paper is devoted to analyzing clinical indices obtained from ECGI epicardial electrograms measuring waveform variability and repolarization tissue properties.
During the last years, attention and controversy have been present for the first commercially available equipment being used in Electrocardiographic Imaging (ECGI), a new cardiac diagnostic tool ...which opens up a new field of diagnostic possibilities. Previous knowledge and criteria of cardiologists using intracardiac Electrograms (EGM) should be revisited from the newly available spatial-temporal potentials, and digital signal processing should be readapted to this new data structure. Aiming to contribute to the usefulness of ECGI recordings in the current knowledge and methods of cardiac electrophysiology, we previously presented two results: First, spatial consistency can be observed even for very basic cardiac signal processing stages (such as baseline wander and low-pass filtering); second, useful bipolar EGMs can be obtained by a digital processing operator searching for the maximum amplitude and including a time delay. In addition, this work aims to demonstrate the functionality of ECGI for cardiac electrophysiology from a twofold view, namely, through the analysis of the EGM waveforms, and by studying the ventricular repolarization properties. The former is scrutinized in terms of the clustering properties of the unipolar an bipolar EGM waveforms, in control and myocardial infarction subjects, and the latter is analyzed using the properties of T-wave alternans (TWA) in control and in Long-QT syndrome (LQTS) example subjects. Clustered regions of the EGMs were spatially consistent and congruent with the presence of infarcted tissue in unipolar EGMs, and bipolar EGMs with adequate signal processing operators hold this consistency and yielded a larger, yet moderate, number of spatial-temporal regions. TWA was not present in control compared with an LQTS subject in terms of the estimated alternans amplitude from the unipolar EGMs, however, higher spatial-temporal variation was present in LQTS torso and epicardium measurements, which was consistent through three different methods of alternans estimation. We conclude that spatial-temporal analysis of EGMs in ECGI will pave the way towards enhanced usefulness in the clinical practice, so that atomic signal processing approach should be conveniently revisited to be able to deal with the great amount of information that ECGI conveys for the clinician.
Artificial intelligence (AI) is rapidly shaping the global financial market and its services due to the great competence that it has shown for analysis and modeling in many disciplines. What is ...especially remarkable is the potential that these techniques could offer to the challenging reality of credit fraud detection (CFD); but it is not easy, even for financial institutions, to keep in strict compliance with non-discriminatory and data protection regulations while extracting all the potential that these powerful new tools can provide to them. This reality effectively restricts nearly all possible AI applications to simple and easy to trace neural networks, preventing more advanced and modern techniques from being applied. The aim of this work was to create a reliable, unbiased, and interpretable methodology to automatically evaluate CFD risk. Therefore, we propose a novel methodology to address the mentioned complexity when applying machine learning (ML) to the CFD problem that uses state-of-the-art algorithms capable of quantifying the information of the variables and their relationships. This approach offers a new form of interpretability to cope with this multifaceted situation. Applied first is a recent published feature selection technique, the informative variable identifier (IVI), which is capable of distinguishing among informative, redundant, and noisy variables. Second, a set of innovative recurrent filters defined in this work are applied, which aim to minimize the training-data bias, namely, the recurrent feature filter (RFF) and the maximally-informative feature filter (MIFF). Finally, the output is classified by using compelling ML techniques, such as gradient boosting, support vector machine, linear discriminant analysis, and linear regression. These defined models were applied both to a synthetic database, for better descriptive modeling and fine tuning, and then to a real database. Our results confirm that our proposal yields valuable interpretability by identifying the informative features’ weights that link original variables with final objectives. Informative features were living beyond one’s means, lack or absence of a transaction trail, and unexpected overdrafts, which are consistent with other published works. Furthermore, we obtained 76% accuracy in CFD, which represents an improvement of more than 4% in the real databases compared to other published works. We conclude that with the use of the presented methodology, we do not only reduce dimensionality, but also improve the accuracy, and trace relationships among input and output features, bringing transparency to the ML reasoning process. The results obtained here were used as a starting point for the companion paper which reports on our extending the interpretability to nonlinear ML architectures.
World population growth currently brings unequal access to food, whereas crop yields are not increasing at a similar rate, so that future food demand could be unmet. Many recent research works ...address the use of optimization techniques and technological resources on precision agriculture, especially in large demand crops, including climatic variables monitoring using wireless sensor networks (WSNs). However, few studies have focused on analyzing the dynamics of the environmental measurement properties in greenhouses. In the two companion papers, we describe the design and implementation of three WSNs with different technologies and topologies further scrutinizing their comparative performance, and a detailed analysis of their energy consumption dynamics is also presented, both considering tomato greenhouses in the Andean region of Ecuador. The three WSNs use ZigBee with star topology, ZigBee with mesh topology (referred to here as DigiMesh), and WiFi with access point topology. The present study provides a systematic and detailed analysis of the environmental measurement dynamics from multiparametric monitoring in Ecuadorian tomato greenhouses. A set of monitored variables (including CO2, air temperature, and wind direction, among others) are first analyzed in terms of their intrinsic variability and their short-term (circadian) rhythmometric behavior. Then, their cross-information is scrutinized in terms of scatter representations and mutual information analysis. Based on Bland⁻Altman diagrams, good quality rhythmometric models were obtained at high-rate sampling signals during four days when using moderate regularization and preprocessing filtering with 100-coefficient order. Accordingly, and especially for the adjustment of fast transition variables, it is appropriate to use high sampling rates and then to filter the signal to discriminate against false peaks and noise. In addition, for variables with similar behavior, a longer period of data acquisition is required for the adequate processing, which makes more precise the long-term modeling of the environmental signals.