Recently, there has been a growing interest in sequential pattern mining in data mining, with a particular focus on clickstream pattern mining. These areas hold the potential for discovering valuable ...patterns. However, traditional mining algorithms in these domains often assume that databases are static, simplifying the mining process. In reality, databases are updated incrementally over time, partially rendering a portion of the previous results invalid. This necessitates rerunning algorithms on updated databases to obtain accurate frequent patterns. As database size increases, this approach can become time-consuming and affect performance. To tackle this issue, we propose PSB-CUP to mine frequent clickstream patterns in an incremental update manner. PSB-CUP employs the concept of search borders to reduce the search space and the information retained in memory. Furthermore, an IDList generation method called “partial imbalance join” was proposed to reconstruct possibly missing information during the incremental process. This join method, however, requires more extra information to be cached in exchange for speed. We then improve this technique by introducing “recursive imbalance join”, removing the need for extra cached data in the PSB-CUP + algorithm. The experimental results show that our proposed algorithms are efficient for incremental clickstream pattern mining.
WeChat official accounts have been increasingly adopted by Chinese government agencies to deliver public services, in response to the “Internet + Public Service” reformation. While previous studies ...depended heavily on the expert-oriented approach to evaluate the accounts, this paper presents a user-centered study based on a mixed methods research design in which an unobtrusive clickstream data analysis was complemented by a card sorting study, stakeholder interviews, and a focus group. A 2-month server log file containing 42,188,760 clickstream records was obtained from an active government WeChat official account and analyzed at the movement level, which found that the account was mainly used as a lookup tool with most services underutilized and its home portals failed to support effective wayfinding to needed services. Deficiencies in information architecture, operation strategy, and interaction design of the account were identified in the complementary studies. This study not only enriches the knowledge about social media use in the Chinese government for public service delivery, but also introduces innovative methods to generate new research insights. The findings can inform government WeChat official accounts of how to improve service quality and user experience.
•A user-centered approach involving clickstream data analysis was employed.•Most services on the government WeChat official account were underutilized.•Deficiencies existed in information architecture, operation strategy, and interaction design.•Public service delivery on social media depends on the legal and policy environment.•Governments should collaborate with private sector in official account construction.
•We propose a weight measure for mining frequent weighted clickstream patterns.•We extend CM-SPADE to develop an effective algorithm.•We propose a pruning heuristic for mining frequent weighted ...clickstream patterns.•We present an optimised data structure to mine in large databases.
Pattern mining has been an attractive topic for many researchers since its first introduction. Clickstream mining, a specific version of sequential pattern mining, has been shown to be important in the age of the Internet. However, most previous works have simply exploited and applied existing sequential pattern algorithms to the mining of clickstream patterns, and few have studied clickstreams with weights, which also have a wide range of application. In this paper, we address this problem by proposing an approach based on the average weight measure for clickstream pattern mining and adapting a previous state-of-the-art algorithm to deal with the problem of weighted clickstream pattern mining. Following this, we propose an improved method named Compact-SPADE to enhance both the efficiency and memory consumption. Through various tests on both real-life and synthetic databases, we show that our proposed algorithms outperform state-of-the-art alternatives in terms of efficiency, memory requirements and scalability.
We evaluate the unexpected shutdown of
kino.to
, a major platform for unlicensed video streaming in the German market. Using highly disaggregated clickstream data in a difference-in-differences ...setting, we compare the web behavior of 20,000 consumers in Germany and three control countries. We find that this intervention was not very effective in reducing unlicensed consumption or encouraging licensed consumption, mainly because users quickly switch to alternative unlicensed sites. We highlight that the shutdown additionally had important unintended externalities. Individuals who never visited
kino.to
and who additionally clicked on news articles that covered the shutdown increased their visits to piracy websites substantially. We show that this effect largely comes from articles that explicitly mention alternative websites or suggest that users do not have to fear legal consequences from unlicensed streaming. Finally, we document that the unlicensed video streaming market is much more fragmented after the shutdown, potentially affecting future interventions, at least in the short run. We argue that our results can be helpful to understand why online piracy rates are still high, despite a plethora of enforcement efforts.
Student video-watching behavior and quiz performance are studied in two Massive Open Online Courses (MOOCs). In doing so, two frameworks are presented by which video-watching clickstreams can be ...represented: one based on the sequence of events created, and another on the sequence of positions visited. With the event-based framework, recurring subsequences of student behavior are extracted, which contain fundamental characteristics such as reflecting (i.e., repeatedly playing and pausing) and revising (i.e., plays and skip backs). It is found that some of these behaviors are significantly correlated with changes in the likelihood that a student will be Correct on First Attempt (CFA) or not in answering quiz questions, and in ways that are not necessarily intuitive. Then, with the position-based framework, models of quiz performance are devised based on positions visited in a video. In evaluating these models through CFA prediction, it is found that three of them can substantially improve prediction quality, which underlines the ability to relate this type of behavior to quiz scores. Since this prediction considers videos individually, these benefits also suggest that these models are useful in situations where there is limited training data, e.g., for early detection or in short courses.
In this paper, we propose a real-time online shopper behavior analysis system consisting of two modules which simultaneously predicts the visitor’s shopping intent and Web site abandonment ...likelihood. In the first module, we predict the purchasing intention of the visitor using aggregated pageview data kept track during the visit along with some session and user information. The extracted features are fed to random forest (RF), support vector machines (SVMs), and multilayer perceptron (MLP) classifiers as input. We use oversampling and feature selection preprocessing steps to improve the performance and scalability of the classifiers. The results show that MLP that is calculated using resilient backpropagation algorithm with weight backtracking produces significantly higher accuracy and F1 Score than RF and SVM. Another finding is that although clickstream data obtained from the navigation path followed during the online visit convey important information about the purchasing intention of the visitor, combining them with session information-based features that possess unique information about the purchasing interest improves the success rate of the system. In the second module, using only sequential clickstream data, we train a long short-term memory-based recurrent neural network that generates a sigmoid output showing the probability estimate of visitor’s intention to leave the site without finalizing the transaction in a prediction horizon. The modules are used together to determine the visitors which have purchasing intention but are likely to leave the site in the prediction horizon and take actions accordingly to improve the Web site abandonment and purchase conversion rates. Our findings support the feasibility of accurate and scalable purchasing intention prediction for virtual shopping environment using clickstream and session information data.
Sequential pattern mining is an important task in data mining. Its subproblem, clickstream pattern mining, is starting to attract more research due to the growth of the Internet and the need to ...analyze online customer behaviors. To date, only few works are dedicately proposed for the problem of mining clickstream patterns. Although one approach is to use the general algorithms for sequential pattern mining, those algorithms’ performance may suffer and the resources needed are more than would be necessary with a dedicated method for mining clickstreams. In this paper, we present pseudo-IDList, a novel data structure that is more suitable for clickstream pattern mining. Based on this structure, a vertical format algorithm named CUP (Clickstream pattern mining Using Pseudo-IDList) is proposed. Furthermore, we propose a pruning heuristic named DUB (Dynamic intersection Upper Bound) to improve our proposed algorithm. Four real-life clickstream databases are used for the experiments and the results show that our proposed methods are effective and efficient regarding runtimes and memory consumption.
•A novel data structure named pseudo-IDList that is more suitable for mining clickstream patterns was developed.•CUP (Clickstream pattern mining Using Pseudo-IDList) algorithm for mining clickstream patterns was proposed.•We propose a pruning heuristic named DUB (Dynamic intersection Upper Bound constraint) to effectively prune candidates.•We propose an implementation technique that uses bitmaps and the proposed pseudo-IDList.•We perform an experimental evaluation of our proposed methods.
Clickstream data recording each click that each individual user makes on a media website has become the currency for evaluating digital platforms in order to maximise advertising and/or subscription ...revenue. There is a growing recognition, however, that the mere volume of clicks is not adequate for this purpose. We propose a new systematic approach to this problem based on an underlying theory of engagement. Engagement is construed theoretically as user experiences that connect to higher-order personal goals or social values. We show that such experiences can be described qualitatively using survey items that form engagement measurement scales and that these engagement scales, in fact, explain a willingness-to-pay outcome variable. Moreover, these experiences can be translated into surrogate decomposed clickstream variables. We analyse data from three news websites and show that these decomposed clickstream variables predict willingness-to-pay for the sites better than raw, undecomposed clickstream data. Our methodological framework thus provides a new way of using clickstream data to detect engagement with digital content, a method that provides a basis for improving engagement and ultimately outcomes such as the willingness to pay for content.
•We propose a taxonomy for online channels based on contact origin and brand usage.•We employ a proportional hazard model to handle multichannel customer journeys.•We uncover meaningful interaction ...effects between contacts across channel types.•Channel usage facilitates inferences on underlying purchase decision processes.
Retailers can choose from a plethora of online marketing channels to reach consumers on the Internet, and potential customers often use a vast range of channels during their customer journey. However, increasing complexity and sparse data continue to challenge retailers. This study proposes a taxonomy-based approach to help retailers better understand how channel usage along the customer journey facilitates inferences about the underlying purchase decision processes. A test of this approach with a large, clickstream data set uses a proportional hazard model with time-varying covariates. Classifying online marketing channels along the dimensions of contact origin and brand usage uncovers several meaningful interaction effects between contacts across channel types. The proposed taxonomy also significantly improves model fit and outperforms alternative specifications. The results thus can help retailers gain a better understanding of customers’ decision-making progress in online, multichannel environments and optimize their channel structures.