Natural disasters, like pandemics and earthquakes, are some of the main causes of distress and casualties. Governmental crisis management processes are crucial when dealing with these types of ...problems. Social media platforms are among the main sources of information regarding current events and public opinion. So, they have been used extensively to aid disaster detection and prevention efforts. Therefore, there is always a need for better automatic systems that can detect and classify disaster data of social media. In this work, we propose enhanced Arabic disaster data classification models. The suggested models utilize domain adaptation to provide state-of-the-art accuracy. We used a standard dataset of Arabic disaster data collected from Twitter for testing the proposed models. Experimental results show that the provided models significantly outperform the previous state-of-the-art results.
The importance of online handwriting recognition has been rapidly increasing over recent years due to the rapid technological advances in handheld devices and communication software with handwriting ...interfaces. Deep learning end-to-end (E2E) models have provided high recognition rates as part of online handwriting recognition systems. However, attaining even higher performance levels requires supplementing these models with adaptation techniques that cater to individual penmanship. This study proposes a writer adaptation technique for Arabic online handwriting recognition systems that employs adversarial Multi-Task Learning (MTL). Adversarial training and MTL modify the deep-features distribution of the Writer Dependent (WD) model, leading its output to closely resemble that of the Writer Independent (WI) model. The design of the proposed method entails two tasks: label classification (primary task) and model features discrimination (secondary task). Our method was designed to jointly optimize both sub-networks. The proposed technique was tested against the E2E Connectionist Temporal Classification (CTC) based model, a combination of both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-term Memory (BiLSTM). The proposed models were trained and evaluated against two large datasets (the Online-KHATT and CHAW). In supervised adaptation, it achieved an absolute Character Error Rate (CER) of up to 1.83% and an absolute Word Error Rate (WER) reduction of 11.71% over the WI model. Additionally, supervised adaptation achieved an absolute CER of up to 0.84% and an absolute WER reduction of 6.77% over the fine-tuned model. In unsupervised adaptation, the proposed method achieved an absolute CER of up to 0.5% absolute and an absolute WER reduction of 1.74% absolute (WER) reduction over the WI. Our experimental results indicate that our proposed supervised writer adaptation can achieve significant improvements in recognition accuracy compared with the baseline models: WI and fine-tuned models.
This paper presents a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document ...analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose many other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 300 DPI resolution and are segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a "service description", which includes an interface and a server's URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared them with respect to corrected equivalent.
Social media postings are increasingly being used in modern days disaster management. Along with the textual information, the contexts and cues inherent in the images posted on social media play an ...important role in identifying appropriate emergency responses to a particular disaster. In this paper, we proposed a disaster taxonomy of emergency response and used the same taxonomy with an emergency response pipeline together with deep-learning-based image classification and object identification algorithms to automate the emergency response decision-making process. We used the card sorting method to validate the completeness and correctness of the disaster taxonomy. We also used VGG-16 and You Only Look Once (YOLO) algorithms to analyze disaster-related images and identify disaster types and relevant cues (such as objects that appeared in those images). Furthermore, using decision tables and applied analytic hierarchy processes (AHP), we aligned the intermediate outputs to map a disaster-related image into the disaster taxonomy and determine an appropriate type of emergency response for a given disaster. The proposed approach has been validated using Earthquake, Hurricane, and Typhoon as use cases. The results show that 96% of images were categorized correctly on disaster taxonomy using YOLOv4. The accuracy can be further improved using an incremental training approach. Due to the use of cloud-based deep learning algorithms in image analysis, our approach can potentially be useful to real-time crisis management. The algorithms along with the proposed emergency response pipeline can be further enhanced with other spatiotemporal features extracted from multimedia information posted on social media.
Semantic Textual Similarity (STS) is the task of identifying the semantic correlation between two sentences of the same or different languages. STS is an important task in natural language processing ...because it has many applications in different domains such as information retrieval, machine translation, plagiarism detection, document categorization, semantic search, and conversational systems. The availability of STS training and evaluation data resources for some languages such as English has led to good performance systems that achieve above 80% correlation with human judgment. Unfortunately, such required STS data resources are not available for many languages like Arabic. To overcome this challenge, this paper proposes three different approaches to generate effective STS Arabic models. The first one is based on evaluating the use of automatic machine translation for English STS data to Arabic to be used in fine-tuning. The second approach is based on the interleaving of Arabic models with English data resources. The third approach is based on fine-tuning the knowledge distillation-based models to boost their performance in Arabic using a proposed translated dataset. With very limited resources consisting of just a few hundred Arabic STS sentence pairs, we managed to achieve a score of 81% correlation, evaluated using the standard STS 2017 Arabic evaluation set. Also, we managed to extend the Arabic models to process two local dialects, Egyptian (EG) and Saudi Arabian (SA), with a correlation score of 77.5% for EG dialect and 76% for the SA dialect evaluated using dialectal conversion from the same standard STS 2017 Arabic set.
Machine translation for low-resource languages poses significant challenges, primarily due to the limited availability of data. In recent years, unsupervised learning has emerged as a promising ...approach to overcome this issue by aiming to learn translations between languages without depending on parallel data. A wide range of methods have been proposed in the literature to address this complex problem. This paper presents an in-depth investigation of semi-supervised neural machine translation specifically focusing on translating Arabic dialects, particularly Egyptian, to Modern Standard Arabic. The study employs two distinct datasets: one parallel dataset containing aligned sentences in both dialects, and a monolingual dataset where the source dialect is not directly connected to the target language in the training data. Three different translation systems are explored in this study. The first is an attention-based sequence-to-sequence model that benefits from the shared vocabulary between the Egyptian dialect and Modern Arabic to learn word embeddings. The second is an unsupervised transformer model that depends solely on monolingual data, without any parallel data. The third system starts with the parallel dataset for an initial supervised learning phase and then incorporates the monolingual data during the training process.
Deep learning based fusion strategies for personality prediction El-Demerdash, Kamal; El-Khoribi, Reda A.; Ismail Shoman, Mahmoud A. ...
Egyptian Informatics Journal/Egyptian Informatics Journal,
March 2022, 2022-03-00, 2022-03-01, Volume:
23, Issue:
1
Journal Article
Peer reviewed
Open access
Automated personality trait detection from text data has emerged and gained a great deal of attention in the subject area of affective computing and sentiment analysis. Most previous work has focused ...on features engineering such as linguistic styles and psycholinguistic databases which have correlations with personality. Recently, natural language processing has been affected significantly with transfer learning based on feature extraction and fine-tuning pre-trained language models. We propose a new deep learning-based model for personality prediction and classification using both data and classifier level fusion. The model gets benefit from, transfer learning in natural language processing through leading pre-trained language models namely Elmo, ULMFiT, and BERT. The proposed model demonstrates the powerfulness of the introduced method to be a promising personality prediction model. When evaluating the proposed method, results show a competitive and significant accuracy enhancement of about 1.25% and 3.12% in comparison to the most recent results for the two gold standard Essays and myPersonality datasets for personality detection.
Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic ...language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems.
Airlift pumps can be used in the aquaculture industry to provide aeration while concurrently moving water utilizing the dynamics of two-phase flow in the pump riser. The oxygen mass transfer that ...occurs from the injected compressed air to the water in the aquaculture systems can be experimentally investigated to determine the pump aeration capabilities. The objective of this study is to evaluate the effects of various airflow rates as well as the injection methods on the oxygen transfer rate within a dual injector airlift pump system. Experiments were conducted using an airlift pump connected to a vertical pump riser within a recirculating system. Both two-phase flow patterns and the void fraction measurements were used to evaluate the dissolved oxygen mass transfer mechanism through the airlift pump. A dissolved oxygen (DO) sensor was used to determine the DO levels within the airlift pumping system at different operating conditions required by the pump. Flow visualization imaging and particle image velocimetry (PIV) measurements were performed in order to better understand the effects of the two-phase flow patterns on the aeration performance. It was found that the radial injection method reached the saturation point faster at lower airflow rates, whereas the axial method performed better as the airflow rates were increased. The standard oxygen transfer rate (SOTR) and standard aeration efficiency (SAE) were calculated and were found to strongly depend on the injection method as well as the two-phase flow patterns in the pump riser.
Personality Traits Detection is one of the important problems as a text analytics task in Natural Language Processing (NLP). Text analytics is the process of finding out insight knowledge over ...written text. Although most deep learning models give high performance, they often lack interpretability. Computer Vision (CV) has been affected significantly with inductive transfer learning, however training from scratch and task-specific modifications are still wanted in many NLP techniques.
This paper addresses the problem of personality traits classification. We adopted the use of the Universal Language Model Fine-Tuning (ULMFiT) in personality traits detection. The model makes use of transfer learning rather than the classical shallow methods of word embedding and proved to be the most powerful model in many NLP problems.
The basic advantage of using this model is that there is no need to do feature engineering before classification. When applied to benchmark dataset, the proposed method shows a statistical accuracy improvement of about 1% compared to the state-of-the-art results for the big five personality traits.