Online forums are rich sources of information about user communication activity over time. Finding temporal patterns in online forum communication threads can advance our understanding of the ...dynamics of conversations. The main challenge of temporal analysis in this context is the complexity of forum data. There can be thousands of interacting users, who can be numerically described in many different ways. Moreover, user characteristics can evolve over time. We propose an approach that decouples temporal information about users into sequences of user events and inter-event times. We develop a new feature space to represent the event sequences as paths, and we model the distribution of the inter-event times. We study over 30,000 users across four Internet forums, and discover novel patterns in user communication. We find that users tend to exhibit consistency over time. Furthermore, in our feature space, we observe regions that represent unlikely user behaviors. Finally, we show how to derive a numerical representation for each forum, and we then use this representation to derive a novel clustering of multiple forums.
This paper describes the research underpinning a networked application for the delivery of personalised streams of music over the Internet. The initial system used automated collaborative filtering ...(ACF), a ‘content-less’ approach to recommend new music to users. We show how we have improved on this basic technique by leveraging a light content-based technique that attempts to capture the user's current listening ‘context’. This involves a two-stage retrieval process where ACF recommendations are ranked according to the user's current interests. Finally, we demonstrate an on-line evaluation strategy that pits the ACF strategy against the context-boosted strategy in a real-time competition.
Microblogging social media (mainly represented by Twitter) focuses on fast open real-time communication using short messages between users and their followers. These platforms generate large amounts ...of content and community finding techniques are an attractive alternative for organising it. However there is no clear agreement in the literature for a definition of user community for the microblogging use case, leading to unreliable ground-truth data and evaluation. In this work, we differentiate between functional and structural definitions of communities for microblogging. A functional community groups its users by a common independent social function, e.g. fans of the same football team, while in a structural community the members exclusively depend on their connectivity in a network, e.g. modularity. We build and characterise eight types of functional communities to be used as user-labelled ground-truth and five types of live user interactions networks from Twitter. We then evaluate thirteen popular structural community definitions using five different Twitter datasets, exploring their goodness and robustness for detecting the functional ground-truth under different perturbation strategies. Our results show that definitions based on internal connectivity, e.g. Triangle Participation Ratio, Fraction Over Median Degree or Conductance work best for the Twitter use case and are very robust. On the other hand, classic scores such as Modularity are limited and do not fit very well due to the sparsity and noise of microblogging.
Unsupervised graph-based topic labelling using dbpedia Hulpus, Ioana; Hayes, Conor; Karnstedt, Marcel ...
Proceedings of the sixth ACM international conference on Web search and data mining,
02/2013
Conference Proceeding
Odprti dostop
Automated topic labelling brings benefits for users aiming at analysing and understanding document collections, as well as for search engines targetting at the linkage between groups of words and ...their inherent topics. Current approaches to achieve this suffer in quality, but we argue their performances might be improved by setting the focus on the structure in the data. Building upon research for concept disambiguation and linking to DBpedia, we are taking a novel approach to topic labelling by making use of structured data exposed by DBpedia. We start from the hypothesis that words co-occuring in text likely refer to concepts that belong closely together in the DBpedia graph. Using graph centrality measures, we show that we are able to identify the concepts that best represent the topics. We comparatively evaluate our graph-based approach and the standard text-based approach, on topics extracted from three corpora, based on results gathered in a crowd-sourcing experiment. Our research shows that graph-based analysis of DBpedia can achieve better results for topic labelling in terms of both precision and topic coverage.
Microblogging social media (mainly represented by Twitter) focuses on fast open real-time communication using short messages between users and their followers. These platforms generate large amounts ...of content and community finding techniques are an attractive alternative for organising it. However there is no clear agreement in the literature for a definition of user community for the microblogging use case, leading to unreliable ground-truth data and evaluation. In this work, we differentiate between functional and structural definitions of communities for microblogging. A functional community groups its users by a common independent social function, e.g. fans of the same football team, while in a structural community the members exclusively depend on their connectivity in a network, e.g. modularity. We build and characterise eight types of functional communities to be used as user-labelled ground-truth and five types of live user interactions networks from Twitter. We then evaluate thirteen popular structural community definitions using five different Twitter datasets, exploring their goodness and robustness for detecting the functional ground-truth under different perturbation strategies. Our results show that definitions based on internal connectivity, e.g. Triangle Participation Ratio, Fraction Over Median Degree or Conductance work best for the Twitter use case and are very robust. On the other hand, classic scores such as Modularity are limited and do not fit very well due to the sparsity and noise of microblogging.
Cross-Community Influence in Discussion Fora Belák, Václav; Lam, Samantha; Hayes, Conor
Proceedings of the International AAAI Conference on Web and Social Media,
08/2021, Letnik:
6, Številka:
1
Journal Article
Online discussion fora have become an important cultural and business asset in the context of many services provided by both non-profit organizations and enterprises. In order to keep and eventually ...increase the value these systems deliver to their users, it is often necessary to moderate or even manage their dynamics. One way to do this efficiently is to focus primarily on the most influential actors in the system. However, identifying such users becomes increasingly hard with systems where there is a continuously growing large user base. We show that analysis and explanation of influence on the cross-community level is a promising way to provide a coarse-grained picture of a potentially very large system and that it may enable its stakeholders to find groups through which the system can be efficiently influenced, or it can help them to identify and avoid activity considered as malicious. In order to achieve that, we present a novel framework for cross-community influence analysis, which is evaluated on 10 years of data from the largest Irish online discussion system Boards.ie.
Online discussion boards, or Internet forums, are a significant part of the Internet. People use Internet forums to post questions, provide advice and participate in discussions. These online ...conversations are represented as threads, and the conversation trees within these threads are important in understanding the behaviour of online users. Unfortunately, the reply structures of these threads are generally not publicly accessible or not maintained. Hence, in this paper, we introduce an efficient and simple approach to reconstruct the reply structure in threaded conversations. We contrast its accuracy against three baseline algorithms, and show that our algorithm can accurately recreate the in and out degree distributions of forum reply graphs built from the reconstructed reply structures.
Cross-community e_ects on the behaviour of individuals and communities themselves can be observed in a wide range of applications. While previous work has tried to explain and analyse such phenomena, ...there is still a great potential for increasing the quality and accuracy of this analysis. In this work, we propose a general framework consisting of several di_erent techniques to analyse and explain cross-community e_ects and the underlying dynamics. The proposed methodology works with arbitrary community algorithms, incorporates meta-data to improve the overall quality and expressiveness of the analysis and identifies particular phenomena in an automated manner. We illustrate the benefits and strengths of our approach by exposing in-depth details of cross-community e_ects between two closely related and well established areas of scientific research. This work focuses on techniques for understanding, defining and eventually predicting typical life-cycles and events in the context of cross-community dynamics.
In this paper we introduce SemStim, an unsupervised graph-based algorithm that addresses the cross-domain recommendation task. In this task, preferences from one conceptual domain (e.g. movies) are ...used to recommend items belonging to another domain (e.g. music). SemStim exploits the semantic links found in a knowledge graph (e.g. DBpedia), to connect domains and thus generate recommendations. As a key benefit, our algorithm does not require (1) ratings in the target domain, thus mitigating the cold-start problem and (2) overlap between users or items from the source and target domains. In contrast, current state-of-the-art personalisation approaches either have an inherent limitation to one domain or require rating data in the source and target domains. We evaluate SemStim by comparing its accuracy to state-of-the-art algorithms for the top-k recommendation task, for both single-domain and cross-domain recommendations. We show that SemStim enables cross-domain recommendation, and that in addition, it has a significantly better accuracy than the baseline algorithms.