The detection of mentioned aspects in product reviews is one of the significant and complex tasks in opinion mining. Recently, contextual-based approaches have significantly improved the accuracy of ...aspect extraction over non-contextual embeddings. However, these approaches are often computationally expensive and time-consuming; thus, applying such heavy models with insufficient resources and within runtime systems is impractical in many realistic scenarios. The present investigation sought an efficient, practical deep-learning-based model that relies on the complementary power of various existing non-contextual embeddings. In this regard, two morphology-based (character and FastText) and two syntax-based (POS and extended dependency skip-gram) embeddings were used alongside a base word embedding (GloVe) to form an enriched word representation layer. The presented model was integrated into the proposed network architecture (extended BiGRU). Finally, two novel post-processing rules were applied to refine the errors in the model's predictions. The proposed model achieved F-scores of 0.86, 0.91, 0.79, and 0.80 for the SemEval 2014 laptop domain and the SemEval 2015–2016 restaurant domain, respectively. Furthermore, the results were validated by comparing the computational and temporal efficiency of the proposed model with seven BERT-family transformers through statistical tests.
Identifying high spreading power nodes is an interesting problem in social networks. Finding super spreader nodes becomes an arduous task when the nodes appear in large numbers, and the number of ...existing links becomes enormous among them. One of the methods that is used for identifying the nodes is to rank them based on k-shell decomposition. Nevertheless, one of the disadvantages of this method is that it assigns the same rank to the nodes of a shell. Another disadvantage of this method is that only one indicator is fairly used to rank the nodes. k-Shell is an approach that is used for ranking separate spreaders, yet it does not have enough efficiency when a group of nodes with maximum spreading needs to be selected; therefore, this method, alone, does not have enough efficiency. Accordingly, in this study a hybrid method is presented to identify the super spreaders based on k-shell measure. Afterwards, a suitable method is presented to select a group of superior nodes in order to maximize the spread of influence. Experimental results on seven complex networks show that our proposed methods outperforms other well-known measures and represents comparatively more accurate performance in identifying the super spreader nodes.
In the domain of question subjectivity classification, there exists a need for detailed datasets that can foster advancements in Automatic Subjective Question Answering (ASQA) systems. Addressing the ...prevailing research gaps, this paper introduces the Fine-Grained Question Subjectivity Dataset (FQSD), which comprises 10,000 questions. The dataset distinguishes between subjective and objective questions and offers additional categorizations such as Subjective-types (Target, Attitude, Reason, Yes/No, None) and Comparison-form (Single, Comparative). Annotation reliability was confirmed via robust evaluation techniques, yielding a Fleiss's Kappa score of 0.76 and Pearson correlation values up to 0.80 among three annotators. We benchmarked FQSD against existing datasets such as (Yu, Zha, and Chua 2012), SubjQA (Bjerva 2020), and ConvEx-DS (Hernandez-Bocanegra 2021). Our dataset excelled in scale, linguistic diversity, and syntactic complexity, establishing a new standard for future research. We employed visual methodologies to provide a nuanced understanding of the dataset and its classes. Utilizing transformer-based models like BERT, XLNET, and RoBERTa for validation, RoBERTa achieved an outstanding F1-score of 97%, confirming the dataset's efficacy for the advanced subjectivity classification task. Furthermore, we utilized Local Interpretable Model-agnostic Explanations (LIME) to elucidate model decision-making, ensuring transparent and reliable model predictions in subjectivity classification tasks.
•Effective bidding in multi-attribute combinatorial double auction is complex.•A traded package in the history of the market does not reveal any unit price of the items.•Bidders should predict their ...true desired combination and quantity of items.•The proposed strategy is a personality- and market-based extension to MPT, FFM, mkNN, and MAUT.•It is simulated in a proposed test suite against benchmark strategies.
In a multi-attribute combinatorial double auction (MACDA), sellers and buyers’ preferences over multiple synergetic goods are best satisfied. In recent studies in MACDA, it is typically assumed that bidders must know the desired combination (and quantity) of items and the bundle price. They do not address a package combination which is the most desirable to a bidder. This study presents a new packaging model called multi-attribute combinatorial bidding (MACBID) strategy and it is used for an agent in either sellers or buyers side of MACDA. To find the combination (and quantities) of the items and the total price which best satisfy the bidder’s need, the model considers bidder’s personality, multi-unit trading item set, and preferences as well as market situation. The proposed strategy is an extension to Markowitz Modern Portfolio Theory (MPT) and Five Factor Model (FFM) of Personality. We use mkNN learning algorithm and Multi-Attribute Utility Theory (MAUT) to devise a personality-based multi-attribute combinatorial bid. A test-bed (MACDATS) is developed for evaluating MACBID. This test suite provides algorithms for generating stereotypical artificial market data as well as personality, preferences and item sets of bidders. Simulation results show that the success probability of the MACBID’s proposed bundle for selling and buying item sets are on average 50% higher and error in valuation of package attributes is 5% lower than other strategies.
Question answering (QA) systems have attracted considerable attention in recent years. They receive the user’s questions in natural language and respond to them with precise answers. Most of the ...works on QA were initially proposed for the English language, but some research studies have recently been performed on non-English languages. Answer selection (AS) is a critical component in QA systems. To the best of our knowledge, there is no research on AS for the Persian language. Persian is a (1) free word order, (2) right-to-left, (3) morphologically rich, and (4) low-resource language. Deep learning (DL) techniques have shown promising accuracy in AS. Although DL performs very well on QA, it requires a considerable amount of annotated data for training. Many annotated datasets have been built for the AS task; most of them are exclusively in English. In order to address the need for a high-quality AS dataset in the Persian language, we present PASD; the first large-scale native AS dataset for the Persian language. To show the quality of PASD, we employed it to train state-of-the-art QA systems. We also present PerAnSel: a novel deep neural network-based system for Persian question answering. Since the Persian language is a free word-order language, in PerAnSel, we parallelize a sequential method and a transformer-based method to handle various orders in the Persian language. We then evaluate PerAnSel on three datasets: PASD, PerCQA, and WikiFA. The experimental results indicate strong performance on the Persian datasets beating state-of-the-art answer selection methods by 10.66% on PASD, 8.42% on PerCQA, and 3.08% on WikiFA datasets in terms of MRR.
Developing Question Answering systems (QA) is one of the main goals in Artificial Intelligence. With the advent of Deep Learning (DL) techniques, QA systems have witnessed significant advances. ...Although DL performs very well on QA, it requires a considerable amount of annotated data for training. Many annotated datasets have been built for the QA task; most of them are exclusively in English. In order to address the need for a high-quality QA dataset in the Persian language, we present PersianQuAD, the native QA dataset for the Persian language. We create PersianQuAD in four steps: 1) Wikipedia article selection, 2) question-answer collection, 3) three-candidates test set preparation, and 4) Data Quality Monitoring. PersianQuAD consists of approximately 20,000 questions and answers made by native annotators on a set of Persian Wikipedia articles. The answer to each question is a segment of the corresponding article text. To better understand PersianQuAD and ensure its representativeness, we analyze PersianQuAD and show it contains questions of varying types and difficulties. We also present three versions of a deep learning-based QA system trained with PersianQuAD. Our best system achieves an F1 score of 82.97% which is comparable to that of QA systems on English SQuAD, made by the Stanford University. This shows that PersianQuAD performs well for training deep-learning-based QA systems. Human performance on PersianQuAD is significantly better (96.49%), demonstrating that PersianQuAD is challenging enough and there is still plenty of room for future improvement. PersianQuAD and all QA models implemented in this paper are freely available.
•Enhancing memory-based collaborative filtering techniques for group recommender systems by resolving the data sparsity problem.•Comparing the proposed method’s accuracy with basic memory-based ...techniques and latent factor model.•Makeing accurate predictions for unknown ratings in sparse matrices based on the proposed method.•More users are satisfied of the group recommender system’s performance.
Memory-based collaborating filtering techniques are widely used in recommender systems. They are based on full initial ratings in a user-item matrix. However, most of the time in group recommender systems, this matrix is sparse and users’ preferences are unknown. This deficiency may make memory-based collaborative filtering unsuitable for group recommender systems. This paper, improves memory-based techniques for group recommendation systems by resolving the data sparsity problem. The core of the proposed method is based on a support vector machine learning model that computes similarities between items. This method employs calculated similarities and enhances basic memory-based techniques. Experiments demonstrate that the proposed method overcomes the memory-based techniques. It also indicates that the presented work outperforms the latent factor approach, which is very efficient in sparse conditions. Finally, it is indicated that the proposed method gives a better performance than existing approaches on generating group recommendations.
In the domain of question subjectivity classification, there exists a need for detailed datasets that can foster advancements in Automatic Subjective Question Answering (ASQA) systems. Addressing the ...prevailing research gaps, this paper introduces the Fine-Grained Question Subjectivity Dataset (FQSD), which comprises 10,000 questions. The dataset distinguishes between subjective and objective questions and offers additional categorizations such as Subjective-types (Target, Attitude, Reason, Yes/No, None) and Comparison-form (Single, Comparative). Annotation reliability was confirmed via robust evaluation techniques, yielding a Fleiss's Kappa score of 0.76 and Pearson correlation values up to 0.80 among three annotators. We benchmarked FQSD against existing datasets such as (Yu, Zha, and Chua 2012), SubjQA (Bjerva 2020), and ConvEx-DS (Hernandez-Bocanegra 2021). Our dataset excelled in scale, linguistic diversity, and syntactic complexity, establishing a new standard for future research. We employed visual methodologies to provide a nuanced understanding of the dataset and its classes. Utilizing transformer-based models like BERT, XLNET, and RoBERTa for validation, RoBERTa achieved an outstanding F1-score of 97%, confirming the dataset's efficacy for the advanced subjectivity classification task. Furthermore, we utilized Local Interpretable Model-agnostic Explanations (LIME) to elucidate model decision-making, ensuring transparent and reliable model predictions in subjectivity classification tasks.
Identification and ranking of influential users in social networks for the sake of news spreading and advertising has recently become an attractive field of research. Given the large number of users ...in social networks and also the various relations that exist among them, providing an effective method to identify influential users has been gradually considered as an essential factor. In most of the already-provided methods, those users who are located in an appropriate structural position of the network are regarded as influential users. These methods do not usually pay attention to the interactions among users, and also consider those relations as being binary in nature. This paper, therefore, proposes a new method to identify influential users in a social network by considering those interactions that exist among the users. Since users tend to act within the frame of communities, the network is initially divided into different communities. Then the amount of interaction among users is used as a parameter to set the weight of relations existing within the network. Afterward, by determining the neighbors’ role for each user, a two-level method is proposed for both detecting users’ influence and also ranking them. Simulation and experimental results on twitter data shows that those users who are selected by the proposed method, comparing to other existing ones, are distributed in a more appropriate distance. Moreover, the proposed method outperforms the other ones in terms of both the influential speed and capacity of the users it selects.
•The goal is to identify and rank influential users in social networks.•We determine the role of each node’s neighbors in the network and apply it to rank the nodes.•We measure relation strength between nodes based on action logs.•We propose a two-level method to rank the nodes according to the neighbors’ role and relation strength among them.
Recently an increasing amount of research is devoted to the question of how the most influential nodes (seeds) can be found effectively in a complex network. There are a number of measures proposed ...for this purpose, for instance, high-degree centrality measure reflects the importance of the network topology and has a reasonable runtime performance to find a set of nodes with highest degree, but they do not have a satisfactory dissemination potentiality in the network due to having many common neighbors (CN(1)) and common neighbors of neighbors (CN(2)). This flaw holds in other measures as well. In this paper, we compare high-degree centrality measure with other well-known measures using ten datasets in order to find a proportion for the common seeds in the seed sets obtained by them. We, thereof, propose an improved high-degree centrality measure (named DegreeDistance) and improve it to enhance accuracy in two phases, FIDD and SIDD, by put a threshold on the number of common neighbors of already-selected seed nodes and a non-seed node which is under investigation to be selected as a seed as well as considering the influence score of seed nodes directly or through their common neighbors over the non-seed node. To evaluate the accuracy and runtime performance of DegreeDistance, FIDD, and SIDD, they are applied to eight large-scale networks and it finally turns out that SIDD dramatically outperforms other well-known measures and evinces comparatively more accurate performance in identifying the most influential nodes.
•The goal is to identify the most influential nodes in complex networks.•We propose DegreeDistance and improve it in two phases, FIDD and SIDD.•We take into account distance of seeds as well as the influence score.•We investigate the rate of unique nodes influenced by our methods.•The SIDD outperforms other measures by choosing a more appropriate seed set.