In the past few years, the storage and the analysis of large-scale and fast evolving networks presents a great challenge. Therefore, a number of different techniques have been proposed for sampling ...large networks. Studies on network sampling primarily analyze the changes of network properties under the sampling. In general, network exploration techniques approximate the original networks more accurate than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction also to random walk and forest-fire sampling and evaluate the effects of subgraph induction on the sampling accuracy. We analyze different real-world networks and the changes of their properties introduced by sampling. The results reveal that the techniques with subgraph induction improve the performance of techniques without induction and create denser sample networks with larger average degree. Furthermore, the accuracy of sampling decrease consistently across various sampling techniques, when the sampled networks are smaller. Based on the results of the comparison, we introduce the scheme for selecting the most appropriate technique for network sampling. Overall, the breadth-first exploration sampling proves as the best performing technique.
•Sampling techniques are compared based on the match of properties between networks.•We apply subgraph induction step to random walk and forest-fire sampling.•Induction improves the performance in preserving degree and clustering distribution.•Techniques with induction create denser networks with larger average degree.•We introduce the scheme for selection of the most appropriate sampling technique.
The rapid growth of social media, news sites, and blogs increases the opportunity to express and share an opinion on the Internet. Researchers from different fields take advantage of nearly limitless ...data. Thus, in the past decade, opinion mining or sentiment analysis has become an important research discipline. In this paper, we focus on the target-level sentiment analysis, wherein the task is to predict the sentiment concerning specific (multiple) entities that appear as coreference mentions throughout the document. We created a new annotated dataset of Slovene news articles, additionally annotated with named entities and coreferences that are the basis for the proposed task. Using entity-document representation, we compared the task with the traditional sentiment analysis, evaluating traditional machine learning and deep neural network approaches. According to existing approaches, the proposed task represents a challenging problem. The results show that we can achieve the best results using a customised BERT adapter (a minor improvement over a standard text-classification adapter). We outperformed existing aspect-based state-of-the-art approaches by 13%, reaching up to 77% accuracy and a 73% F1 score.
Despite their diverse origin, networks of large real-world systems reveal a number of common properties including small-world phenomena, scale-free degree distributions and modularity. Recently, ...network self-similarity as a natural outcome of the evolution of real-world systems has also attracted much attention within the physics literature. Here we investigate the scaling of density in complex networks under two classical box-covering renormalizations–network coarse-graining–and also different community-based renormalizations. The analysis on over 50 real-world networks reveals a power-law scaling of network density and size under adequate renormalization technique, yet irrespective of network type and origin. The results thus advance a recent discovery of a universal scaling of density among different real-world networks P.J. Laurienti, K.E. Joyce, Q.K. Telesford, J.H. Burdette, S. Hayasaka, Universal fractal scaling of self-organized networks, Physica A 390 (20) (2011) 3608–3613 and imply an existence of a scale-free density also within–among different self-similar scales of–complex real-world networks. The latter further improves the comprehension of self-similar structure in large real-world networks with several possible applications.
► Scaling of density in complex networks is analyzed under different renormalizations. ► Various box-covering and community detection methods are considered. ► Network density follows a power-law scaling with respect to network size. ► A scale-free density exists among self-similar scales of real-world networks. ► The density scaling appears irrespective of network type, size and origin.
Many real-world networks are large, complex and thus hard to understand, analyze or visualize. Data about networks are not always complete, their structure may be hidden, or they may change quickly ...over time. Therefore, understanding how an incomplete system differs from a complete one is crucial. In this paper, we study the changes in networks submitted to simplification processes (i.e., reduction in size). We simplify 30 real-world networks using six simplification methods and analyze the similarity between the original and simplified networks based on the preservation of several properties, for example, degree distribution, clustering coefficient, betweenness centrality, density and degree mixing. We propose an approach for assessing the effectiveness of the simplification process to define the most appropriate size of simplified networks and to determine the method that preserves the most properties of original networks. The results reveal that the type and size of original networks do not affect the changes in the networks when submitted to simplification, whereas the size of simplified networks does. Moreover, we investigate the performance of simplification methods when the size of simplified networks is 10% that of the original networks. The findings show that sampling methods outperform merging ones, particularly random node selection based on degree and breadth-first sampling.
•We explore preservation of network properties under several simplification methods.•The measure for assessing the effectiveness of simplification process is proposed.•We compare the original and simplified networks based on global and local properties.•The simplification on 10% of original networks provide for fair fit of properties.•Random node selection based on degree and breadth-first sampling proved the best.
Any network studied in the literature is inevitably just a sampled representative of its real-world analogue. Additionally, network sampling is lately often applied to large networks to allow for ...their faster and more efficient analysis. Nevertheless, the changes in network structure introduced by sampling are still far from understood. In this paper, we study the presence of characteristic groups of nodes in sampled social and information networks. We consider different network sampling techniques including random node and link selection, network exploration and expansion. We first observe that the structure of social networks reveals densely linked groups like communities, while the structure of information networks is better described by modules of structurally equivalent nodes. However, despite these notable differences, the structure of sampled networks exhibits stronger characterization by community-like groups than the original networks, irrespective of their type and consistently across various sampling techniques. Hence, rich community structure commonly observed in social and information networks is to some extent merely an artifact of sampling.
•We study the presence of characteristics groups of nodes in sampled networks.•The structure of social networks reveals densely linked groups like communities.•Information networks consist of modules of structurally equivalent nodes.•Sampled networks contain more community-like groups irrespective of the network type.
In the past decade, social media has become an important part of our everyday life. The employment of different social media changes the way we communicate, collaborate, gather information and ...consequently perceive the world around us. Thus, researchers from different fields exploit the social media to provide deeper insight into human behaviour. Each social media possesses its own privacy politics and access to publicly available data. In this paper, we present a generic framework along with the tools to analyse different social media. The analysis shows basic usage statistics, reach and engagement differences, language, sentiment and gender identification of each social network data. We compare data from Twitter, Facebook, Tumblr, Google+ and YouTube. The results reveal specifics of each social media, which to some extent also depend on the data available and the selected seed keywords. We uncover that popularity of selected topics in social media is proportional to the number of hits on Google, celebrities and politicians are the most talked topics and that behaviour of users across social media is different. For example, Twitter users prefer to post more, while Facebook and Youtube users prefer to comment. The majority of all social media posts are in English, larger number of them are negative and often written by male users. The results of the proposed framework should serve as a tool to identify the appropriate source of data for the representative analysis of social media.
In the past few years, the storage and analysis of large-scale and fast evolving networks present a great challenge. Therefore, a number of different techniques have been proposed for sampling large ...networks. In general, network exploration techniques approximate the original networks more accurately than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction also to random walk and forest-fire sampling. We analyze different real-world networks and the changes of their properties introduced by sampling. We compare several sampling techniques based on the match between the original networks and their sampled variants. The results reveal that the techniques with subgraph induction underestimate the degree and clustering distribution, while overestimate average degree and density of the original networks. Techniques without subgraph induction step exhibit exactly the opposite behavior. Hence, the performance of the sampling techniques from random selection category compared to network exploration sampling does not differ significantly, while clear differences exist between the techniques with subgraph induction step and the ones without it.
Many real-world networks are large, complex and thus hard to understand, analyze or visualize. The data about networks is not always complete, their structure may be hidden or they change quickly ...over time. Therefore, understanding how incomplete system differs from complete one is crucial. In this paper, we study the changes in networks under simplification (i.e., reduction in size). We simplify 30 real-world networks with six simplification methods and analyze the similarity between original and simplified networks based on preservation of several properties, for example degree distribution, clustering coefficient, betweenness centrality, density and degree mixing. We propose an approach for assessing the effectiveness of simplification process to define the most appropriate size of simplified networks and to determine the method, which preserves the most properties of original networks. The results reveal the type and size of original networks do not influence the changes of networks under simplification process, while the size of simplified networks does. Moreover, we investigate the performance of simplification methods when the size of simplified networks is 10% of the original networks. The findings show that sampling methods outperform merging ones, particularly random node selection based on degree and breadth-first sampling perform the best.