•An improved Salp Swarm Algorithm is proposed for feature selection.•Opposition based learning was used with to improve its population diversity.•New local search algorithm was developed to avoid ...local optima problem.•A superior outperformance of the algorithm in comparison with other algorithms.
Many fields such as data science, data mining suffered from the rapid growth of data volume and high data dimensionality. The main problems which are faced by these fields include the high computational cost, memory cost, and low accuracy performance. These problems will occur because these fields are mainly used machine learning classifiers. However, machine learning accuracy is affected by the noisy and irrelevant features. In addition, the computational and memory cost of the machine learning is mainly affected by the size of the used datasets. Thus, to solve these problems, feature selection can be used to select optimal subset of features and reduce the data dimensionality. Feature selection represents an important preprocessing step in many intelligent and expert systems such as intrusion detection, disease prediction, and sentiment analysis. An improved version of Salp Swarm Algorithm (ISSA) is proposed in this study to solve feature selection problems and select the optimal subset of features in wrapper-mode. Two main improvements were included into the original SSA algorithm to alleviate its drawbacks and adapt it for feature selection problems. The first improvement includes the use of Opposition Based Learning (OBL) at initialization phase of SSA to improve its population diversity in the search space. The second improvement includes the development and use of new Local Search Algorithm with SSA to improve its exploitation. To confirm and validate the performance of the proposed improved SSA (ISSA), ISSA was applied on 18 datasets from UCI repository. In addition, ISSA was compared with four well-known optimization algorithms such as Genetic Algorithm, Particle Swarm Optimization, Grasshopper Optimization Algorithm, and Ant Lion Optimizer. In these experiments four different assessment criteria were used. The rdemonstrate that ISSA outperforms all baseline algorithms in terms of fitness values, accuracy, convergence curves, and feature reduction in most of the used datasets. The wrapper feature selection mode can be used in different application areas of expert and intelligent systems and this is confirmed from the obtained results over different types of datasets.
Big data is an essential aspect of innovation which has recently gained major attention from both academics and practitioners. Considering the importance of the education sector, the current tendency ...is moving towards examining the role of big data in this sector. So far, many studies have been conducted to comprehend the application of big data in different fields for various purposes. However, a comprehensive review is still lacking in big data in education. Thus, this study aims to conduct a systematic review on big data in education in order to explore the trends, classify the research themes, and highlight the limitations and provide possible future directions in the domain. Following a systematic review procedure, 40 primary studies published from 2014 to 2019 were utilized and related information extracted. The findings showed that there is an increase in the number of studies that address big data in education during the last 2 years. It has been found that the current studies covered four main research themes under big data in education, mainly, learner’s behavior and performance, modelling and educational data warehouse, improvement in the educational system, and integration of big data into the curriculum. Most of the big data educational researches have focused on learner’s behavior and performances. Moreover, this study highlights research limitations and portrays the future directions. This study provides a guideline for future studies and highlights new insights and directions for the successful utilization of big data in education.
Purpose
– The purpose of this paper is to investigate the factors that influence Facebook usage among small and medium enterprises (SMEs). In addition, it examines the impact of Facebook usage on ...financial and non-financial performance of the SMEs.
Design/methodology/approach
– Using integrated model, this study examined the influence of compatibility, cost effectiveness, interactivity and trust on Facebook usage and its subsequent impact on organizations performance. Statistical analyses were based on the data collected, through survey questionnaire from 259 SMEs in Malaysia. Partial Least Square (PLS) method was used to test the hypotheses.
Findings
– The study revealed that Facebook usage has a strong positive impact on financial performance of SMEs; similarly it was also found that Facebook usage positively impacts the non-financial performance of SMEs in terms of cost reduction on marketing and customer service, improved customer relations and improved information accessibility. Additionally, factors such as compatibility, cost effectiveness and interactivity was identified as factors that influence Facebook usage among SMEs.
Research limitations/implications
– This study is limited in selection of samples. The sample only covered one community of SME in Malaysia which limits generalizability of the findings. This study provided a clearer idea on the real importance of Facebook and its benefits. The results would motivate and guide organizations in the adoption of Facebook for business activities. The study also has various theoretical and practical contributions.
Originality/value
– Very few empirical studies investigated the actual impact of Facebook usage among organizations. This study investigated the effect of Facebook usage on the financial performance of the organizations which is really important to study as it reveals the exact value of using Facebook for business activities.
Breast cancer (BC) is the third leading cause of deaths in women globally. In general, histopathology images are recommended for early diagnosis and detailed analysis for BC. Thus, state-of-the-art ...classification models are required for the early prediction of BC using histopathology images. This study aims to develop an accurate and computationally feasible classification model named Biopsy Microscopic Image Cancer Network (BMIC_Net) to classify BC into eight distinct subtypes through deep learning (DL) and hierarchical classification approach. For experiments, the publicly available dataset BreakHis is used and splitted into training and testing set. Furthermore, data augmentation was performed on training set only and 4096 result-oriented features were extracted through DL. In order to improve the classification performance, feature reduction schemes were experimented to elicit the most discriminative feature subset. Finally, six machine-learning algorithms were analyzed to acquire the best results. The experimental results revealed that BMIC_Net outperformed existing baseline models by obtaining the highest accuracy of 95.48% for first-level classifier and 94.62% and 92.45% for second-level classifiers. Thus, this model can be deployed on a normal desktop machine in any healthcare center of less privileged areas in under-developing countries to serve as second opinion for breast cancer classification.
Sarcasm is a form of sentiment whereby people express the implicit information, usually the opposite of the message content in order to hurt someone emotionally or criticise something in a humorous ...way. Sarcasm identification in textual data, being one of the hardest challenges in natural language processing (NLP), has recently become an interesting research area due to its importance in improving the sentiment analysis of social media data. A few studies have carried out a comprehensive literature review on sarcasm identification in the existing primary study within the last 11 years. Thus, this study carried out a review on the classification techniques for sarcasm identification under the aspects of datasets, pre-processing, feature engineering, classification algorithms, and performance metrics. The study has considered the published article from the period of 2008 to 2019. Forty (40) academic literature were selected from the 7 standard academic databases in order to carry out the review and realize the objectives. The study revealed that most researchers created their own datasets since there is no standard available datasets in the domain of sarcasm identification. Context and content-based linguistic features were used in most of the studies. This review shows that n-gram and parts of speech tagging techniques were the most commonly used feature extraction techniques. However, binary representation and term frequency were utilized for feature representation whereas Chi squared and information gain were used for the feature selection scheme. Moreover, classification algorithm such as support vector machine, Naïve Bayes, random forest, maximum entropy, and decision tree algorithm were mostly applied using accuracy, precision, recall and F-measure for performance measures. Finally, research challenges and future direction are summarized in this review. This review reveals the impact of sarcasm identification in building effective product reviews and would serve as handle resources for researchers and practitioners in sarcasm identification and text classification in general.
Sarcasm is the main reason behind the faulty classification of tweets. It brings a challenge in natural language processing (NLP) as it hampers the method of finding people's actual sentiment. ...Various feature engineering techniques are being investigated for the automatic detection of sarcasm. However, most related techniques have always concentrated only on the content-based features in sarcastic expression, leaving the contextual information in isolation. This leads to a loss of the semantics of words in the sarcastic expression. Another drawback is the sparsity of the training data. Due to the word limit of microblog, the feature vector's values for each sample constructed by BoW produces null features. To address the above-named problems, a Multi-feature Fusion Framework is proposed using two classification stages. The first stage classification is constructed with the lexical feature only, extracted using the BoW technique, and trained using five standard classifiers, including SVM, DT, KNN, LR, and RF, to predict the sarcastic tendency. In stage two, the constructed lexical sarcastic tendency feature is fused with eight other proposed features for modelling a context to obtain a final prediction. The effectiveness of the developed framework is tested with various experimental analysis to obtain classifiers' performance. The evaluation shows that our constructed classification models based on the developed novel feature fusion obtained results with a precision of 0.947 using a Random Forest classifier. Finally, the obtained results were compared with the results of three baseline approaches. The comparison outcome shows the significance of the proposed framework.
Breast cancer is a common and fatal disease among women worldwide. Therefore, the early and precise diagnosis of breast cancer plays a pivotal role to improve the prognosis of patients with this ...disease. Several studies have developed automated techniques using different medical imaging modalities to predict breast cancer development. However, few review studies are available to recapitulate the existing literature on breast cancer classification. These studies provide an overview of the classification, segmentation, or grading of many cancer types, including breast cancer, by using traditional machine learning approaches through hand-engineered features. This review focuses on breast cancer classification by using medical imaging multimodalities through state-of-the-art artificial deep neural network approaches. It is anticipated to maximize the procedural decision analysis in five aspects, such as types of imaging modalities, datasets and their categories, pre-processing techniques, types of deep neural network, and performance metrics used for breast cancer classification. Forty-nine journal and conference publications from eight academic repositories were methodically selected and carefully reviewed from the perspective of the five aforementioned aspects. In addition, this study provided quantitative, qualitative, and critical analyses of the five aspects. This review showed that mammograms and histopathologic images were mostly used to classify breast cancer. Moreover, about 55% of the selected studies used public datasets, and the remaining used exclusive datasets. Several studies employed augmentation, scaling, and image normalization pre-processing techniques to minimize inconsistencies in breast cancer images. Several types of shallow and deep neural network architecture were employed to classify breast cancer using images. The convolutional neural network was utilized frequently to construct an effective breast cancer classification model. Some of the selected studies employed a pre-trained network or developed new deep neural networks to classify breast cancer. Most of the selected studies used accuracy and area-under-the-curve metrics followed by sensitivity, precision, and F-measure metrics to evaluate the performance of the developed breast cancer classification models. Finally, this review presented 10 open research challenges for future scholars who are interested to develop breast cancer classification models through various imaging modalities. This review could serve as a valuable resource for beginners on medical image classification and for advanced scientists focusing on deep learning-based breast cancer classification through different medical imaging modalities.
Sarcasm is a complicated linguistic term commonly found in e-commerce and social media sites. Failure to identify sarcastic utterances in Natural Language Processing applications such as sentiment ...analysis and opinion mining will confuse classification algorithms and generate false results. Several studies on sarcasm detection have utilised different learning algorithms. However, most of these learning models have always focused on the contents of expression only, leaving the contextual information in isolation. As a result, they failed to capture the contextual information in the sarcastic expression. Secondly, many deep learning methods in NLP uses a word embedding learning algorithm as a standard approach for feature vector representation, which ignores the sentiment polarity of the words in the sarcastic expression. This study proposes a context-based feature technique for sarcasm Identification using the deep learning model, BERT model, and conventional machine learning to address the issues mentioned above. Two Twitter and Internet Argument Corpus, version two (IAC-v2) benchmark datasets were utilised for the classification using the three learning models. The first model uses embedding-based representation via deep learning model with bidirectional long short term memory (Bi-LSTM), a variant of Recurrent Neural Network (RNN), by applying Global Vector representation (GloVe) for the construction of word embedding and context learning. The second model is based on Transformer using a pre-trained Bidirectional Encoder representation and Transformer (BERT). In contrast, the third model is based on feature fusion that comprised BERT feature, sentiment related, syntactic, and GloVe embedding feature with conventional machine learning. The effectiveness of this technique is tested with various evaluation experiments. However, the technique's evaluation on two Twitter benchmark datasets attained 98.5% and 98.0% highest precision, respectively. The IAC-v2 dataset, on the other hand, achieved the highest precision of 81.2%, which shows the significance of the proposed technique over the baseline approaches for sarcasm analysis.
Robust findings of citations have a positive impact on researchers and significantly contribute to academic development. As a paper is cited more frequently or used as a reference in other articles, ...its citation count increases. Papers with higher citations tend to be more influential than those less cited. Research on predicting citation counts has evolved throughout the year in various fields. However, despite its recent growth, research on identifying commonly used features and techniques still lacks a comprehensive literature analysis. The present study addresses this gap and identifies frequently used features and existing techniques and their evaluation process for predicting an article’s citations. This study reviewed 150 articles from 2010 to 2023, and selected 107 based on established exclusion and inclusion criteria. It provides an overview of publication features and the standard techniques used for their identification to facilitate improvements in this field. The findings indicate that previous works frequently used (i) selected features such as paper features and citation features in predicting citations and (ii) machine learning techniques that are commonly applied to predict article citations. These findings can provide beneficial information for researchers aiming to enhance their papers and maximize their impact.
Advancements in information and communication technology, and online web users have given attention to the virtual representation of each user, which is crucial for effective service personalization. ...Meeting users need and preferences is an ongoing challenge in service personalization. This issue can be addressed through the building of a comprehensive user profile. A user profile is the summary of the user's interests, characteristics, behaviours, and preferences, while user profiling is the system of collecting, organizing and inferring the user profile information. Many reviews on user profiling have been conducted but none focused on the effective profile modeling process. Hence, this article aims to provide a review of the recent state-of-the-art approach to user profiling. These include methods, description, characteristics, and taxonomy of the user profile. The study of the existing user profiling modeling in the aspect of data acquisition, feature extraction, profiling techniques, and profiling approaches (with the identification of their strengths and weaknesses) and the performance measures are also provided. In addition, the research challenges were also discussed with a focus on privacy, datasets, cold start issues, trust issues, and computational complexity. Moreover, the article identified an open research direction that serves as solutions to the identified challenges and motivation for further researchers in advancing user profiling. The findings showed that an effective modeling process enhances the construction of accurate user profile for service personalization.