Sarcasm means the opposite of what you desire to express, particularly to insult a person. Sarcasm detection in social networks SNs such as Twitter is a significant task as it has assisted in ...studying tweets using NLP. Many existing study-related methods have always focused only on the content-based on features in sarcastic words, leaving out the lexical-based features and context-based features knowledge in isolation. This shows a loss of the semantics of terms in a sarcastic expression. This study proposes an improved model to detect sarcasm from SNs. We used three feature set engineering: context-based on features set, Sarcastic based on features, and lexical based on features. Two Novel Algorithms for an effective model to detect sarcasm are divided into two stages. The first used two algorithms one with preprocessing, and the second algorithm with feature sets. To deal with data from SNs. We applied various supervised machine learning (ML) such as k-nearest neighbor classifier (KNN), na?ve Bayes (NB), support vector machine (SVM), and Random Forest (RF) classifiers with TF-IDF feature extraction representation data. To model evaluation metrics, evaluate sarcasm detection model performance in precision, accuracy, recall, and F1 score by 100%. We achieved higher results in Lexical features with KNN 89.19 % accuracy campers to other classifiers. Combining two feature sets (Sarcastic and Lexical) has shown slight improvement with the same classifier KNN; we achieved 90.00% accuracy. When combining three feature sets (Sarcastic, Lexical, and context), the accuracy is shown slight improvement. Also, the same classifier we achieved is a 90.51% KNN classifier. We perform the model differently to see the effect of three feature sets through the experiment individual, combining two feature sets and gradually combining three feature sets. When combining all features set together, achieve the best accuracy with the KNN classifier.
In this paper, the model output machine learning (MOML) method is proposed for simulating weather consultation, which can improve the forecast results of numerical weather prediction (NWP). During ...weather consultation, the forecasters obtain the final results by combining the observations with the NWP results and giving opinions based on their experience. It is obvious that using a suitable post-processing algorithm for simulating weather consultation is an interesting and important topic. MOML is a post-processing method based on machine learning, which matches NWP forecasts against observations through a regression function. By adopting different feature engineering of datasets and training periods, the observational and model data can be processed into the corresponding training set and test set. The MOML regression function uses an existing machine learning algorithm with the processed dataset to revise the output of NWP models combined with the observations, so as to improve the results of weather forecasts. To test the new approach for grid temperature forecasts, the 2-m surface air temperature in the Beijing area from the ECMWF model is used. MOML with different feature engineering is compared against the ECMWF model and modified model output statistics (MOS) method. MOML shows a better numerical performance than the ECMWF model and MOS, especially for winter. The results of MOML with a linear algorithm, running training period, and dataset using spatial interpolation ideas, are better than others when the forecast time is within a few days. The results of MOML with the Random Forest algorithm, year-round training period, and dataset containing surrounding gridpoint information, are better when the forecast time is longer.
Machining is a crucial constituent of the manufacturing industry, which has begun to transition from precision machinery to smart machinery. Particularly, the introduction of artificial intelligence ...into computer numerically controlled (CNC) machine tools will enable machine tools to self-diagnose during operation, improving the quality of finished products. In this study, feature engineering and principal component analysis were combined with the online and real-time Gaussian mixture model (GMM) based on the Kullback-Leibler divergence's measure to achieve the real-time monitoring of changes in manufacturing parameters. Based on the attached accelerometer device's vibration signals and current sensing of the spindle, the developed GMM unsupervised learning was successfully used to diagnose the spindle speed changes of a CNC machine tool during milling. The F1-scores with improved experimental results for X, Y, and Z axes were 0.95, 0.88, and 0.93, respectively. The established FE-PCA-GMM/KLD method can be applied to issue warnings when it predicts a change in the manufacturing process parameter. A smart sensing device for diagnosing the machining status can be fabricated for implementation. The effectiveness of the developed method for determining the manufacturing parameter changes was successfully verified by experiments.
The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual ...orientation. The consequences of hate-speech are devastating, causing severe depression and even leading people to commit suicide. Hate-speech identification is challenging as the large amount of daily publications makes it impossible to review every comment by hand. Moreover, hate-speech is also spread by hoaxes that requires language and context understanding. With the aim of reducing the number of comments that should be reviewed by experts, or even for the development of autonomous systems, the automatic identification of hate-speech has gained academic relevance. However, the reliability of automatic approaches is still limited specifically in languages other than English, in which some of the state-of-the-art techniques have not been analyzed in detail. In this work, we examine which features are most effective in identifying hate-speech in Spanish and how these features can be combined to develop more accurate systems. In addition, we characterize the language present in each type of hate-speech by means of explainable linguistic features and compare our results with state-of-the-art approaches. Our research indicates that combining linguistic features and transformers by means of knowledge integration outperforms current solutions regarding hate-speech identification in Spanish.
Accurate detection and localization of mechanical discontinuities are essential for industries dependent on natural, synthetic and composite materials, e.g. construction, aerospace, oil and gas, ...ceramics, metal, and geothermal industries, to name a few. In this study, a physics-informed machine learning workflow is developed for detecting and locating single, linear mechanical discontinuity in homogeneous 2D material by processing the full-waveforms recorded during multi-point compressional/shear transmission measurements. This work is based on fundamental aspects of simulation of wave propagation, signal processing, feature engineering, and data-driven model evaluation. k-Wave simulator is implemented to model the compressional and shear wave transmission through the 2D numerical model of a material containing single mechanical discontinuity. For a specific source-sensor configuration, the newly developed data-driven workflow can detect and locate the mechanical discontinuity with an accuracy higher than 0.9 in terms of coefficient of determination. AdaBoost regressor with k-Nearest Neighbor as a base estimator significantly outperforms all other models. In terms of sensitivity to noise, k-Nearest Neighbor is the most robust to both gaussian and uniform distributed noise.
Several events in the last years changed to some extent the common understanding of the electricity day-ahead market (DAM). The shape of the electricity price curve has been altered as some factors ...that underpinned the electricity price forecast (EPF) lost their importance and new influential factors emerged. In this paper, we aim to showcase the changes in EPF, understand the effects of uncertainties and propose a forecasting method using machine learning (ML) algorithms to cope with random events such as COVID-19 pandemic and the conflict in Black Sea region. By adjusting the training period according to the standard deviation that reflects the price volatility, feature engineering and by using two regressors for weighing the results, significant improvements in the performance of the EPF are achieved. One of the contributions of the proposed method consists in adjusting the training period considering the price variation. Thus, we introduce a rule-based approach given an empirical observation that for days with a higher growth in prices the training interval should be shortened, capturing the sharp variations of prices. The results of several cutting-edge ML algorithms represent the input for a predictive meta-model to obtain the best forecasting solution. The input dataset spans from Jan. 2019 to Aug. 2022, testing the proposed EPF method for both stable and more tumultuous intervals and proving its robustness. This analysis provides decision makers with an understanding of the price trends and suggests measures to combat spikes. Numerical findings indicate that on average mean absolute error (MAE) improved by 48% and root mean squared error (RMSE) improved by 44% compared to the baseline model (without feature engineering/adjusting training). When the output of the ML algorithms is weighted using the proposed meta-model, MAE further improved by 2.3% in 2020 and 5.14% in 2022. Less errors are recorded in stable years like 2019 and 2020 (MAE = 6.71, RMSE = 14.67) compared to 2021 and 2022 (MAE = 9.45, RMSE = 20.64).
Digital recognition of meters aims to identify numbers in complex environments. Existing methods of digital recognition of meters are dependent on deep networks supported by high-quality large-scale ...data and features' extraction, whereas low-quality small-scale digital datasets are not effective in recognition. Moreover, the occlusion of digital images obstructs the extraction of fixed features. Multi-Classifier under Feature Engineering (MC-FE) is proposed to solve the above problems. To be specific, MC-FE builds a feature library containing a variety of mainstream features. It directly selects the optimal combination of distinguishing features applicable to the current dataset, rather than using the fixed features. In addition, ten regression machines integrated with support vector machines are adopted. The respective regression machine determines the probability of every number from 0 to 9. The positioning of the meter is the premise of accurate identification. A multilayer kernel regression positioning (ML-KRP) is designed to increase the accuracy of meter identification. The results of the experiments on several digital recognition datasets reveal that MC-FE and ML-KRP outperform the state of the arts in digital recognition under this small-scale setting.
The agile earth observation satellite scheduling problem (AEOSSP) with time-dependent transition times is a complex combinational optimization problem that has emerged from the development of ...large-scale satellite management techniques. To address this problem, we propose a deep reinforcement learning-based construction model (DRL-CM) that consists of five parts: 1) a Markov decision process (MDP); 2) a feature engineering; 3) a constructive heuristic neural network (CHNN); 4) an RL training method; and 5) an evaluation system. Specifically, the CHNN comprises six modules containing three special components that we propose: a dynamic encoder, a dynamic global layer, and a two-stage attention layer. First, we build the MDP of the AEOSSP and the feature engineering with effective features required for decision-making. Second, we design the CHNN to function as the MDP policy and train it with an RL model. Finally, we propose a comprehensive evaluation system for the validation of our model. The experimental results indicate that the proposed DRL-CM outperforms the state-of-the-art algorithm in terms of both optimization speed and quality. In addition, the feature engineering and network architecture built in our model are verified to be effective in comprehensive experiments.