The goal of this paper is to reduce the classification (inference) complexity of tree ensembles by choosing a single representative model out of ensemble of multiple decision-tree models. We compute ...the similarity between different models in the ensemble and choose the model, which is most similar to others as the best representative of the entire dataset. The similarity-based approach is implemented with three different similarity metrics: a syntactic, a semantic, and a linear combination of the two. We compare this tree selection methodology to a popular ensemble algorithm (majority voting) and to the baseline of randomly choosing one of the local models. In addition, we evaluate two alternative tree selection strategies: choosing the tree having the highest validation accuracy and reducing the original ensemble to five most representative trees. The comparative evaluation experiments are performed on six big datasets using two popular decision-tree algorithms (J48 and CART) and splitting each dataset horizontally into six different amounts of equal-size slices (from 32 to 1024). In most experiments, the syntactic similarity approach, named SySM—Syntactic Similarity Method, provides a significantly higher testing accuracy than the semantic and the combined ones. The mean accuracy of SySM over all datasets is
0.835
±
0.065
for CART and
0.769
±
0.066
for J48. On the other hand, we find no statistically significant difference between the testing accuracy of the trees selected by SySM and the trees having the highest validation accuracy. Comparing to ensemble algorithms, the representative models selected by the proposed methods provide a higher speed for big data classification along with being more compact and interpretable.
When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data ...allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by ...fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication.
Abstract
We demonstrate improved performance in the classification of bioelectric data for use in systems such as robotic prosthesis control, by data fusion using low-cost electromyography (EMG) and ...electroencephalography (EEG) devices. Prosthetic limbs are typically controlled through EMG, and whilst there is a wealth of research into the use of EEG as part of a brain-computer interface (BCI) the cost of EEG equipment commonly prevents this approach from being adopted outside the lab. This study demonstrates as a proof-of-concept that multimodal classification can be achieved by using low-cost EMG and EEG devices in tandem, with statistical decision-level fusion, to a high degree of accuracy. We present multiple fusion methods, including those based on Jensen-Shannon divergence which had not previously been applied to this problem. We report accuracies of up to 99% when merging both signal modalities, improving on the best-case single-mode classification. We hence demonstrate the strengths of combining EMG and EEG in a multimodal classification system that could in future be leveraged as an alternative control mechanism for robotic prostheses.
Human-object interaction is of great relevance for robots to operate in human environments. However, state-of-the-art robotic hands are far from replicating humans skills. It is, therefore, essential ...to study how humans use their hands to develop similar robotic capabilities. This article presents a deep dive into hand-object interaction and human demonstrations, highlighting the main challenges in this research area and suggesting desirable future developments. To this extent, the article presents a general definition of the hand-object interaction problem together with a concise review for each of the main subproblems involved, namely: sensing, perception, and learning. Furthermore, the article discusses the interplay between these subproblems and describes how their interaction in learning from demonstration contributes to the success of robot manipulation. In this way, the article provides a broad overview of the interdisciplinary approaches necessary for a robotic system to learn new manipulation skills by observing human behavior in the real world.
SubStrat Lazebnik, Teddy; Somech, Amit; Weinberg, Abraham Itzhak
Proceedings of the VLDB Endowment,
12/2022, Volume:
16, Issue:
4
Journal Article
Peer reviewed
Automated machine learning (AutoML) frameworks have become important tools in the data scientist's arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. ...Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection, and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy.
However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high.
To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data
subset
that preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulting pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on three popular AutoML frameworks, Auto-Sklearn, TPOT, and H2O show that SubStrat reduces their running times by 76.3% (on average), with only a 4.15% average decrease in the accuracy of the resulting ML pipeline.
In recent years, the threat facing airports from growing and increasingly sophisticated cyberattacks has become evident. Airports are considered a strategic national asset, so protecting them from ...attacks, specifically cyberattacks, is a crucial mission. One way to increase airports' security is by using Digital Twins (DTs). This paper shows and demonstrates how DTs can enhance the security mission. The integration of DTs with Generative AI (GenAI) algorithms can lead to synergy and new frontiers in fighting cyberattacks. The paper exemplifies ways to model cyberattack scenarios using simulations and generate synthetic data for testing defenses. It also discusses how DTs can be used as a crucial tool for vulnerability assessment by identifying weaknesses, prioritizing, and accelerating remediations in case of cyberattacks. Moreover, the paper demonstrates approaches for anomaly detection and threat hunting using Machine Learning (ML) and GenAI algorithms. Additionally, the paper provides impact prediction and recovery coordination methods that can be used by DT operators and stakeholders. It also introduces ways to harness the human factor by integrating training and simulation algorithms with Explainable AI (XAI) into the DT platforms. Lastly, the paper offers future applications and technologies that can be utilized in DT environments.
Financial crimes fast proliferation and sophistication require novel approaches that provide robust and effective solutions. This paper explores the potential of quantum algorithms in combating ...financial crimes. It highlights the advantages of quantum computing by examining traditional and Machine Learning (ML) techniques alongside quantum approaches. The study showcases advanced methodologies such as Quantum Machine Learning (QML) and Quantum Artificial Intelligence (QAI) as powerful solutions for detecting and preventing financial crimes, including money laundering, financial crime detection, cryptocurrency attacks, and market manipulation. These quantum approaches leverage the inherent computational capabilities of quantum computers to overcome limitations faced by classical methods. Furthermore, the paper illustrates how quantum computing can support enhanced financial risk management analysis. Financial institutions can improve their ability to identify and mitigate risks, leading to more robust risk management strategies by exploiting the quantum advantage. This research underscores the transformative impact of quantum algorithms on financial risk management. By embracing quantum technologies, organisations can enhance their capabilities to combat evolving threats and ensure the integrity and stability of financial systems.
This paper presents a comprehensive analysis of the shift from the traditional perimeter model of security to the Zero Trust (ZT) framework, emphasizing the key points in the transition and the ...practical application of ZT. It outlines the differences between ZT policies and legacy security policies, along with the significant events that have impacted the evolution of ZT. Additionally, the paper explores the potential impacts of emerging technologies, such as Artificial Intelligence (AI) and quantum computing, on the policy and implementation of ZT. The study thoroughly examines how AI can enhance ZT by utilizing Machine Learning (ML) algorithms to analyze patterns, detect anomalies, and predict threats, thereby improving real-time decision-making processes. Furthermore, the paper demonstrates how a chaos theory-based approach, in conjunction with other technologies like eXtended Detection and Response (XDR), can effectively mitigate cyberattacks. As quantum computing presents new challenges to ZT and cybersecurity as a whole, the paper delves into the intricacies of ZT migration, automation, and orchestration, addressing the complexities associated with these aspects. Finally, the paper provides a best practice approach for the seamless implementation of ZT in organizations, laying out the proposed guidelines to facilitate organizations in their transition towards a more secure ZT model. The study aims to support organizations in successfully implementing ZT and enhancing their cybersecurity measures.