The accessibility of online hate speech has increased significantly, making it crucial for social-media companies to prioritize efforts to curb its spread. Although deep learning models demonstrate ...vulnerability to adversarial attacks, whether models fine-tuned for hate speech detection exhibit similar susceptibility remains underexplored. Textual adversarial attacks involve making subtle alterations to the original samples. These alterations are designed so that the adversarial examples produced can effectively deceive the target model, even when correctly classified by human observers. Though many approaches have been proposed to conduct word-level adversarial attacks on textual data, they face the obstacle of preserving the semantic coherence of texts during the generation of adversarial counterparts. Moreover, the adversarial examples produced are often easily distinguishable by human observers. This work presents a novel methodology that uses visually confusable glyphs and invisible characters to generate semantically and visually similar adversarial examples in a black-box setting. In the hate speech detection task context, our attack was effectively applied to several state-of-the-art deep learning models, fine-tuned on two benchmark datasets. The major contributions of this study are: (1) demonstrating the vulnerability of deep learning models fine-tuned for hate speech detection; (2) a novel attack framework based on a simple yet potent modification strategy; (3) superior outcomes in terms of accuracy degradation, attack success rate, average perturbation, semantic similarity, and perplexity when compared to existing baselines; (4) strict adherence to prescribed linguistic constraints while formulating adversarial samples; and (5) preservation of ground truth label while perturbing original input using imperceptible adversarial examples.
The aim of this paper is to explore the phenomenon of euphemisms as perceived by Polish university students in the context of the COVID-19 pandemic. The article presents an overview of the major ...theoretical issues related to the linguistic concept of euphemisms and their use in everyday situations during the years of the pandemic. The data for this study were collected through an online questionnaire administered in December 2022. This stage of the research was preceded by a close examination of three internet blogs in terms of COVID-related euphemistic vocabulary. The discussion of the findings is intended to reveal the students’ perspective on the use of euphemisms, including the reasons for their use and the types of situations where such language forms are employed. The results are expected to shed some light on the possible impact that euphemistic language might have on student-teacher interactions. Another significant aspect addressed in this paper is the relationship between euphemisms and language creativity.
We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We ...collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.
•an abusive language detection model that perform multiclass classification of offensive language.•experimented with five deep learning models: Bi-LSTM, LSTM, Bi-GRU, GRU, and multi-dense ...LSTM.•dataset is classified in to three levels: offensive language categorization (Level A), offensive language detection (Level B), and offensive language target identification (Level c).•Gated Recurrent Unit (GRU) achieved the highest accuracy for Level A (78.65 %) and Level B (88.59 %). However, for Level C, all models except for the Long Short-Term Memory (LSTM) model achieved near-perfect accuracy values of 99.9 %.
With the rapid growth of social media culture, the use of offensive or hateful language has surged, which necessitates the development of effective abusive language detection models for online platforms. This paper focuses on developing a multi-class classification model to identify different types of offensive language. The input data is taken in the form of labeled tweets and is classified into offensive language detection, offensive language categorization, and offensive language target identification. The data undergoes pre-processing, which removes NaN value and punctuation, as well as performs tokenization followed by the generation of a word cloud to assess data quality. Further, the tf-idf technique is used for the selection of features. In the case of classifiers, multiple deep learning techniques, namely, bidirectional gated recurrent unit, multi-dense long short-term memory, bidirectional long short-term memory, gated recurrent unit, and long short-term memory, are applied where it has been found that all the models, except long short-term memory, achieved a high accuracy of 99.9 % for offensive language target identification. Bidirectional LSTM and multi-dense LSTM obtained the lowest loss and RMSE values of 0.01 and 0.1, respectively. This research provides valuable insights and contributes to the development of effective abusive language detection methods to promote a safe and respectful online environment. The insights gained can aid platform administrators in efficiently moderating content and taking appropriate actions against offensive language.
Offensive communications have made their way into social media posts. Using computational algorithms to distinguish objectionable content is one of the most effective ways to deal with this problem. ...One of the most effective approaches to deal with this issue is to use computational methods to distinguish undesirable content. This research aims to tackle MOLD_DL (Multilingual Offensive Language Detection using deep learning) techniques and natural language processing used in feature selection and classification. Here the dataset has been collected from YouTube, Twitter and Facebook, which has been pre-processed for noise removal, filtering and removing the stop words and segmented. The feature selection has been carried out for segmented data using Fuzzy based convolutional neural network (FCNN). Then the extraction of selected features and classification has been carried out using ensemble architecture of Bi-LSTM model with Naïve Bayes architecture hybrid with Support Vector Machines (SVM). Evaluation of offensive language detection is classified automatically based on the emotions of the text. Here the experimental analysis has been carried out for YouTube, Twitter and Facebook datasets in terms of accuracy of 98%, precision of 95%, recall of 90%, F-1 score of 92.5% and RMSE of 45% with the confusion matrix in detecting offensive text of various languages.
•Offensive communications have made their way into social media posts.•This research aims to tackle MOLD_DL (Multilingual Offensive Language Detection using deep learning) techniques and natural language processing used in feature selection and classification.•Here the dataset has been collected from Social Networks.•Finally, these extracted features have been classified using ensemble architecture of Bi-LSTM model with Support Vector Machines (SVM) which is hybrid with Naïve Bayes architecture.
The aim of the current research is to investigate the feasibility of identifying offensive language in Lithuanian by utilising the Simplified Offensive Language Taxonomy (SOLT). The key principle ...behind this taxonomy is its ability to complement existing offensive language ontologies and tagset systems, with the ultimate goal of integrating it into publicly accessible Linguistic Linked Open Data (LLOD) resources. The dataset used in the current study is a publicly available corpus of user-generated comments collected from a Lithuanian portal (Amilevičius et al. 2016). The study identified that offensive language predominantly focuses on collective derogatory language rather than individuals. The most common category of offensive language is related to physical and mental disabilities, followed by ideological offenses, xenophobic and sexist remarks, and less frequent categories like ageism, classism, homophobia, and religious discrimination. These results highlight the diverse range of offensive language online and underscore the need to combat discrimination and promote respectful discourse, particularly concerning marginalised groups.
The goal of the paper is to present a Simplified Offensive Language (SOL) Taxonomy, its application and testing in the Second Annotation Campaign conducted between March-May 2023 on four languages: ...English, Czech, Lithuanian, and Polish to be verified and located in LLOD. Making reference to the previous Offensive Language taxonomic models proposed mostly by the same COST Action Nexus Linguarum WG 4.1.1 team, the number and variety of the categories underwent the definitional revision, and the present typology was tested in the annotation on the publicly available offensive language datasets of each of the four languages. The results of the annotation are presented and as they are contained within the accepted statistical values on the inter-annotator agreement in the SOL categories and their aspects, we propose this taxonomy as a core ontology which represents the encoding of the supported offensive languages and justify its use on new data in terms of a more universal Linguistic Linked Open Data (LLOD) schema.
A considerable body of research deals with the automatic identification of hate speech and related phenomena. However, cross-dataset model generalization remains a challenge. In this context, we ...address two still open central questions: (i) to what extent does the generalization depend on the model and the composition and annotation of the training data in terms of different categories?, and (ii) do specific features of the datasets or models influence the generalization potential? To answer (i), we experiment with BERT, ALBERT, fastText, and SVM models trained on nine common public English datasets, whose class (or category) labels are standardized (and thus made comparable), in intra- and cross-dataset setups. The experiments show that indeed the generalization varies from model to model and that some of the categories (e.g., ‘toxic’, ‘abusive’, or ‘offensive’) serve better as cross-dataset training categories than others (e.g., ‘hate speech’). To answer (ii), we use a Random Forest model for assessing the relevance of different model and dataset features during the prediction of the performance of 450 BERT, 450 ALBERT, 450 fastText, and 348 SVM binary abusive language classifiers (1698 in total). We find that in order to generalize well, a model already needs to perform well in an intra-dataset scenario. Furthermore, we find that some other parameters are equally decisive for the success of the generalization, including, e.g., the training and target categories and the percentage of the out-of-domain vocabulary.
•Cross-dataset model generalization for abusive language.•Generalization of BERT, ALBERT and fastText models with respect to abusive language datasets.•Experiments covering nine widely used public abusive speech datasets.•Prediction of generalization by applying a random forest model.
Due to an exponential increase in the use of Internet by persons from different countries and educational backgrounds, the offensive online language detection has become a significant task facing ...natural language processing. Considering the major negative impact of this type of content in the case of youngers, detecting online toxic language to protect users’ online safety becomes an urgent issue. The project has two main goals: (1) developing an annotated corpus of offensive content for Romanian language and (2) testing various machine learning algorithms to identify a best approach. The proposed methods achieve results with a few percentages more than the accuracy of the current SoTA.
Offensive language is one of the problems that have become increasingly severe along with the rise of the internet and social media usage. This language can be used to attack a person or specific ...groups. Automatic moderation, such as the usage of machine learning, can help detect and filter this particular language for someone who needs it. This study focuses on improving the performance of the soft voting classifier to detect offensive language by experimenting with the combinations of the soft voting estimators. The model was applied to a Twitter dataset that was augmented using several augmentation techniques. The features were extracted using Term Frequency-Inverse Document Frequency, sentiment analysis, and GloVe embedding. In this study, there were two types of soft voting models: machine learning-based, with the estimators of Random Forest, Decision Tree, Logistic Regression, Naïve Bayes, and AdaBoost as the best combination, and deep learning-based, with the best estimator combination of Convolutional Neural Network, Bidirectional Long Short-Term Memory, and Bidirectional Gated Recurrent Unit. The results of this study show that the soft voting classifier was better in performance compared to classic machine learning and deep learning models on both original and augmented datasets.