Social media (and the world at large) have been awash with news of the COVID-19 pandemic. With the passage of time, news and awareness about COVID-19 spread like the pandemic itself, with an ...explosion of messages, updates, videos, and posts. Mass hysteria manifest as another concern in addition to the health risk that COVID-19 presented. Predictably, public panic soon followed, mostly due to misconceptions, a lack of information, or sometimes outright misinformation about COVID-19 and its impacts. It is thus timely and important to conduct an ex post facto assessment of the early information flows during the pandemic on social media, as well as a case study of evolving public opinion on social media which is of general interest. This study aims to inform policy that can be applied to social media platforms; for example, determining what degree of moderation is necessary to curtail misinformation on social media. This study also analyzes views concerning COVID-19 by focusing on people who interact and share social media on Twitter. As a platform for our experiments, we present a new large-scale sentiment data set COVIDSENTI, which consists of 90 000 COVID-19-related tweets collected in the early stages of the pandemic, from February to March 2020. The tweets have been labeled into positive, negative, and neutral sentiment classes. We analyzed the collected tweets for sentiment classification using different sets of features and classifiers. Negative opinion played an important role in conditioning public sentiment, for instance, we observed that people favored lockdown earlier in the pandemic; however, as expected, sentiment shifted by mid-March. Our study supports the view that there is a need to develop a proactive and agile public health presence to combat the spread of negative sentiment on social media following a pandemic.
Pre-processing plays an essential role in disambiguating the meaning of short-texts, not only in applications that classify short-texts but also for clustering and anomaly detection. Pre-processing ...can have a considerable impact on overall system performance; however, it is less explored in the literature in comparison to feature extraction and classification. This paper analyzes twelve different pre-processing techniques on three pre-classified Twitter datasets on hate speech and observes their impact on the classification tasks they support. It also proposes a systematic approach to text pre-processing to apply different pre-processing techniques in order to retain features without information loss. In this paper, two different word-level feature extraction models are used, and the performance of the proposed package is compared with state-of-the-art methods. To validate gains in performance, both traditional and deep learning classifiers are used. The experimental results suggest that some pre-processing techniques impact negatively on performance, and these are identified, along with the best performing combination of pre-processing techniques.
Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular ...methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.
The abundance of biomedical text data coupled with advances in natural language processing (NLP) is resulting in novel biomedical NLP (BioNLP) applications. These NLP applications, or tasks, are ...reliant on the availability of domain-specific language models (LMs) that are trained on a massive amount of data. Most of the existing domain-specific LMs adopted bidirectional encoder representations from transformers (BERT) architecture which has limitations, and their generalizability is unproven as there is an absence of baseline results among common BioNLP tasks.
We present 8 variants of BioALBERT, a domain-specific adaptation of a lite bidirectional encoder representations from transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine-tuned for 6 different tasks across 20 benchmark datasets. Experiments show that a large variant of BioALBERT trained on PubMed outperforms the state-of-the-art on named-entity recognition (+ 11.09% BLURB score improvement), relation extraction (+ 0.80% BLURB score), sentence similarity (+ 1.05% BLURB score), document classification (+ 0.62% F1-score), and question answering (+ 2.83% BLURB score). It represents a new state-of-the-art in 5 out of 6 benchmark BioNLP tasks.
The large variant of BioALBERT trained on PubMed achieved a higher BLURB score than previous state-of-the-art models on 5 of the 6 benchmark BioNLP tasks. Depending on the task, 5 different variants of BioALBERT outperformed previous state-of-the-art models on 17 of the 20 benchmark datasets, showing that our model is robust and generalizable in the common BioNLP tasks. We have made BioALBERT freely available which will help the BioNLP community avoid computational cost of training and establish a new set of baselines for future efforts across a broad range of BioNLP tasks.
Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in ...information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP-related tasks. In the end, this survey briefly discusses the commonly used ML- and DL-based classifiers, evaluation metrics, and the applications of these word embeddings in different NLP tasks.
Future wireless communication, especially the densified 5G network using millimeter-Wave (mmWave) will bring numerous innovations to the current telecommunication industry. In such scenario, the use ...of Unmanned Aerial Vehicle (UAV) as Base Station (BS) becomes one of the viable options for providing 5G services. The focus of this study is to investigate, analyze and describe the distinctive rich characteristics of mmWave propagation in Access and backhaul network simultaneously using UAV. The mathematical framework is formulated for calculating UE (User Equipment) received power for the relay path (BS–UAV–UE) based on Friis Transmission Equation. We conduct simulations using the ray-tracing simulator in different scenarios while comparing and verifying the simulation results vs mathematical equations. Using ray racing simulator, the effectiveness of diffracted, reflected, and scattered paths versus direct paths is described. Furthermore, using extensive simulations, we highlight the impact of UAV location to maximize the performance of an Amplify-and-Forward UAV based relay for providing enhanced coverage to the users.
Stock price prediction can be made more efficient by considering the price fluctuations and understanding people’s sentiments. A limited number of models understand financial jargon or have labelled ...datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method’s competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement.
Recent advancements in computing power and state-of-the-art algorithms have helped in more accessible and accurate diagnosis of numerous diseases. In addition, the development of de novo areas in ...imaging science, such as radiomics and radiogenomics, have been adding more to personalize healthcare to stratify patients better. These techniques associate imaging phenotypes with the related disease genes. Various imaging modalities have been used for years to diagnose breast cancer. Nonetheless, digital breast tomosynthesis (DBT), a state-of-the-art technique, has produced promising results comparatively. DBT, a 3D mammography, is replacing conventional 2D mammography rapidly. This technological advancement is key to AI algorithms for accurately interpreting medical images. This paper presents a comprehensive review of deep learning (DL), radiomics and radiogenomics in breast image analysis. This review focuses on DBT, its extracted synthetic mammography (SM), and full-field digital mammography (FFDM). Furthermore, this survey provides systematic knowledge about DL, radiomics, and radiogenomics for beginners and advanced-level researchers. A total of 500 articles were identified, with 30 studies included as the set criteria. Parallel benchmarking of radiomics, radiogenomics, and DL models applied to the DBT images could allow clinicians and researchers alike to have greater awareness as they consider clinical deployment or development of new models. This review provides a comprehensive guide to understanding the current state of early breast cancer detection using DBT images. Using this survey, investigators with various backgrounds can easily seek interdisciplinary science and new DL, radiomics, and radiogenomics directions towards DBT.
In the recent phenomenon of social networks, both online and offline, two nodes may be connected, but they may not follow each other. Thus there are two separate links to be given to capture the ...notion. Directed links are given if the nodes follow each other, and undirected links represent the regular connections (without following). Thus, this network may have both types of relationships/ links simultaneously. This type of network can be represented by mixed graphs. But, uncertainties in following and connectedness exist in complex systems. To capture the uncertainties, fuzzy mixed graphs are introduced in this article. Some operations, completeness, and regularity and few other properties of fuzzy mixed graphs are explained. Representation of fuzzy mixed graphs as matrix and isomorphism theorems on fuzzy mixed graphs are developed. A network of COVID19 affected areas in India are assumed, and central regions are identified as per the proposed theory.
People on social media share their thoughts and experiences using diseases and symptoms words other than to mention their health, which can introduce biases in data-driven public health applications. ...For the advancement of HMC research, in this study, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD is composed of 10015 manually annotated Reddit posts that include 15 common disease or symptom terms and are labeled with four labels: personal health mentions (HMs), nonpersonal HMs, figurative HMs, and hyperbolic HMs. Empirical evaluation using recently proposed methods demonstrates the challenge of labeling user-generated text across these four types. Contributions to this work include the public release of a robustly annotated Reddit dataset (RHMD) for HM tasks and a comprehensive performance analysis of baseline methods. We expect the release of the dataset, and the evaluations will help facilitate the development of new methods for detecting HMs in the user-generated text. The dataset is available at https://github.com/usmaann/RHMD-Health-Mention-Dataset .