Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Deep learning-based approac...
    Kaur, Simrat; Singh, Sarbjeet; Kaushal, Sakshi

    International journal of cognitive computing in engineering, 2024, 2024-00-00, 2024-01-01, Letnik: 5
    Journal Article

    •an abusive language detection model that perform multiclass classification of offensive language.•experimented with five deep learning models: Bi-LSTM, LSTM, Bi-GRU, GRU, and multi-dense LSTM.•dataset is classified in to three levels: offensive language categorization (Level A), offensive language detection (Level B), and offensive language target identification (Level c).•Gated Recurrent Unit (GRU) achieved the highest accuracy for Level A (78.65 %) and Level B (88.59 %). However, for Level C, all models except for the Long Short-Term Memory (LSTM) model achieved near-perfect accuracy values of 99.9 %. With the rapid growth of social media culture, the use of offensive or hateful language has surged, which necessitates the development of effective abusive language detection models for online platforms. This paper focuses on developing a multi-class classification model to identify different types of offensive language. The input data is taken in the form of labeled tweets and is classified into offensive language detection, offensive language categorization, and offensive language target identification. The data undergoes pre-processing, which removes NaN value and punctuation, as well as performs tokenization followed by the generation of a word cloud to assess data quality. Further, the tf-idf technique is used for the selection of features. In the case of classifiers, multiple deep learning techniques, namely, bidirectional gated recurrent unit, multi-dense long short-term memory, bidirectional long short-term memory, gated recurrent unit, and long short-term memory, are applied where it has been found that all the models, except long short-term memory, achieved a high accuracy of 99.9 % for offensive language target identification. Bidirectional LSTM and multi-dense LSTM obtained the lowest loss and RMSE values of 0.01 and 0.1, respectively. This research provides valuable insights and contributes to the development of effective abusive language detection methods to promote a safe and respectful online environment. The insights gained can aid platform administrators in efficiently moderating content and taking appropriate actions against offensive language.