Abstract
Objective
Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved ...low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers’ e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media.
Methods
Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network.
Results
Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed.
Conclusion
Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.
Content sharing platforms such as product review websites largely depend on reviewers' voluntary contributions. In order to motivate reviewers to contribute more, many platforms established incentive ...mechanisms, either reputation-based or financial. Yet most of the existing research has focused on reputations that are everlasting, such as badges and virtual points, or financial rewards where no evaluation exists about the users' contributed content, such as rebates. There is still a significant gap in our understanding of how incentives with reevaluation mechanism actually influence reviewers' behaviors such as their contribution levels, the opinion they express, and how they express. In this paper, we fill this gap using data collected from Yelp Elite Squad where reviewers with good reviewing history are awarded into the elite group and most importantly reevaluated each year. We draw from the accountability theory and conduct a difference-in-differences analysis to empirically study the effect of incentives with reevaluation mechanism on reviewers' behaviors in both short term and long term. The results show that in short term, reviewers significantly increase their contribution levels, become more conservative with lower percentage of extreme ratings, and also increase the readability of their reviews. In long term, they continue improving the quality of reviews though their numerical rating behaviors stabilize. Our research has significant implications for business models that rely on user contributions.
•We investigate the influence of incentives with reevaluation mechanism on reviewers’ behavior in content sharing platforms.•We use propensity score matching and difference-in-differences method to analyze the data collected from Yelp platform.•In short term, reviewers increase contribution levels, become more conservative, and increase the readability of reviews.•In long term, reviewers continue to improve the quality of reviews while their numerical rating behaviors stabilize.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Cryptocurrency is a well-developed blockchain technology application that is currently a heated topic throughout the world. The public availability of transaction histories offers an opportunity to ...analyze and compare different cryptocurrencies. In this paper, we present a dynamic network analysis of three representative blockchain-based cryptocurrencies: Bitcoin, Ethereum, and Namecoin. By analyzing the accumulated network growth, we find that, unlike most other networks, these cryptocurrency networks do not always densify over time, and they are changing all the time with relatively low node and edge repetition ratios. Therefore, we then construct separate networks on a monthly basis, trace the changes of typical network characteristics (including degree distribution, degree assortativity, clustering coefficient, and the largest connected component) over time, and compare the three. We find that the degree distribution of these monthly transaction networks cannot be well fitted by the famous power-law distribution, at the same time, different currency still has different network properties, e.g., both Bitcoin and Ethereum networks are heavy-tailed with disassortative mixing, however, only the former can be treated as a small world. These network properties reflect the evolutionary characteristics and competitive power of these three cryptocurrencies and provide a foundation for future research.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We predict the arrival time of a hypothetically new variant emerging from China for each country/region to examine the effectiveness of travel restrictions in preventing the importation of new ...variants of SARS-COV-2.
Results show that travel restrictions are ineffective in delaying the arrival of the virus in the post-pandemic era.
During the COVID-19 pandemic, more than ever, data science has become a powerful weapon in combating an infectious disease epidemic and arguably any future infectious disease epidemic. Computer ...scientists, data scientists, physicists and mathematicians have joined public health professionals and virologists to confront the largest pandemic in the century by capitalizing on the large-scale 'big data' generated and harnessed for combating the COVID-19 pandemic. In this paper, we review the newly born data science approaches to confronting COVID-19, including the estimation of epidemiological parameters, digital contact tracing, diagnosis, policy-making, resource allocation, risk assessment, mental health surveillance, social media analytics, drug repurposing and drug development. We compare the new approaches with conventional epidemiological studies, discuss lessons we learned from the COVID-19 pandemic, and highlight opportunities and challenges of data science approaches to confronting future infectious disease epidemics. This article is part of the theme issue 'Data science approaches to infectious disease surveillance'.
Hand, foot, and mouth disease (HFMD) mostly affects the health of infants and preschool children. Many studies of HFMD in different regions have been published. However, the epidemiological ...characteristics and space-time patterns of individual-level HFMD cases in a major city such as Beijing are unknown. The objective of this study was to investigate epidemiological features and identify high relative risk space-time HFMD clusters at a fine spatial scale.
Detailed information on age, occupation, pathogen and gender was used to analyze the epidemiological features of HFMD epidemics. Data on individual-level HFMD cases were examined using Local Indicators of Spatial Association (LISA) analysis to identify the spatial autocorrelation of HFMD incidence. Spatial filtering combined with scan statistics methods were used to detect HFMD clusters.
A total of 157,707 HFMD cases (60.25% were male, 39.75% were female) reported in Beijing from 2008 to 2012 included 1465 severe cases and 33 fatal cases. The annual average incidence rate was 164.3 per 100,000 (ranged from 104.2 in 2008 to 231.5 in 2010). Male incidence was higher than female incidence for the 0 to 14-year age group, and 93.88% were nursery children or lived at home. Areas at a higher relative risk were mainly located in the urban-rural transition zones (the percentage of the population at risk ranged from 33.89% in 2011 to 39.58% in 2012) showing High-High positive spatial association for HFMD incidence. The most likely space-time cluster was located in the mid-east part of the Fangshan district, southwest of Beijing.
The spatial-time patterns of Beijing HFMD (2008-2012) showed relatively steady. The population at risk were mainly distributed in the urban-rural transition zones. Epidemiological features of Beijing HFMD were generally consistent with the previous research. The findings generated computational insights useful for disease surveillance, risk assessment and early warning.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Microblog has become one of the most widely used social media for people to share information and express opinions. As information propagates fast in social network, understanding and analyzing ...public sentiment implied in user-generated content is beneficial for many fields and has been applied to applications such as social management, business and public security. Most previous work on sentiment analysis makes no distinctions of the tweets by different users and ignores the diverse word use of people. As some sentiment expressions are used by specific groups of people, the corresponding textual sentiment features are often neglected in the analysis process. On the other hand, previous psychological findings have shown that personality influences the ways people write and talk, suggesting that people with same personality traits tend to choose similar sentiment expressions. Inspired by this, in this paper we propose a method to facilitate sentiment classification in microblog based on personality traits. To this end, we first develop a rule-based method to predict users’ personality traits based on the most well-studied personality model, the Big Five model. In order to leverage more effective but not widely used sentiment features, we then extract those features grouped by different personality traits and construct personality-based sentiment classifiers. Moreover, we adopt an ensemble learning strategy to integrate traditional textual feature based and our personality-based sentiment classification. Experimental studies on Chinese microblog dataset show the effectiveness of our method in refining the performance of both the traditional and state-of-the-art sentiment classifiers. Our work is among the first to explicitly explore the role of user's personality in social media analytics and its application in sentiment classification.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
In recent years, the emerging electronic cigarette (e-cigarette) marketplace has developed prosperously all over the world. By analyzing online e-liquid reviews, we seek to identify the features ...attracting users.
We collected e-liquid reviews from one of the largest online e-liquid review websites and extracted the e-liquid features by keywords. Then we used sentiment analysis to classify the features into two polarities: positive and negative. The positive sentiment ratio of a feature reflects the e-cigarette users' preference on this feature.
The popularity and preference of e-liquid features are not correlated. Nuts and cream are the favorite flavor categories, while fruit and cream are the most popular categories. The top mixed flavors are preferable to single flavors. Fruit and cream categories are most frequently mixed with other flavors. E-cigarette users are satisfied with cloud production, but not satisfied with the ingredients and throat hit.
We identified the flavors that e-cigarette users were satisfied with, and we found the users liked e-cigarette cloud production. Therefore, flavors and cloud production are potential factors attracting new users.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK