Community detection in social networks is a fundamental task of complex network analysis. Community is usually regarded as a functional unit. Networks in real world more or less have overlapping ...community structure while traditional community detection algorithms assume that one vertex can only belong to one community. This paper proposes an efficient overlapping community detection algorithm named LED (Loop Edges Delete). LED algorithm is based on Structural Clustering, which converts structural similarity between vertices to weights of network. The evaluations of the LED algorithm are conducted both from classical networks from literature and C-DBLP, which is a huge and real-life co-author social network in China. The results show that LED is superior to other methods in accuracy, efficiency, comparing with FastModurity and GN algorithm.
In our study, we examine the impact of citation network structures on the ability to discern valuable research topics in Computer Science literature. We use the bibliographic information available in ...the DBLP database to extract candidate phrases from scientific paper abstracts. Following that, we construct citation networks based on direct citation, co-citation and bibliographic coupling relationships between the papers. The candidate research topics, in the form of keyphrases and n-grammes, are subsequently ranked and filtered by a graph-text ranking algorithm. This selection of the highest ranked potential topics is further evaluated by domain experts and through the Wikipedia knowledge base. The results obtained from these citation networks are complementary, returning valid but non-overlapping output phrases between some pairs of networks. In particular, bibliographic coupling appears to capture more unique information than either direct citation or co-citation. These findings point towards the possible added value in combining bibliographic coupling analysis with other structures. At the same time, combining direct citation and co-citation is put into question. We expect our findings to be utilised in method design for research topic identification.
Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name disambiguation in DBLP, a widely used but ...insufficiently evaluated digital library for its disambiguation performance. In doing so, this study takes a triangulation approach that author name disambiguation for a digital library can be better evaluated when its performance is assessed on multiple labeled datasets with comparison to baselines. Tested on three types of labeled data containing 5000 to 6 M disambiguated names, DBLP is shown to assign author names quite accurately to distinct authors, resulting in pairwise precision, recall, and F1 measures around 0.90 or above overall. DBLP’s author name disambiguation performs well even on large ambiguous name blocks but deficiently on distinguishing authors with the same names. Compared to other disambiguation algorithms, DBLP’s disambiguation performance is quite competitive, possibly due to its hybrid disambiguation approach combining algorithmic disambiguation and manual error correction. A discussion follows on strengths and weaknesses of labeled datasets used in this study for future efforts to evaluate author name disambiguation on a digital library scale.
•Interactions in social networks exhibit asymmetry that needs to be accounted for during their analysis.•The misconception that coauthorship networks contradict the Granovetter’s strength of weak ...ties hypothesis can be attributed to the assumption about the symmetry of ties.•Taking into account the asymmetry of social ties can remarkably increase the efficiency of link prediction methods.
The paper provides important insights into understanding the factors that influence tie strength in social networks. Using local network measures that take into account asymmetry of social interactions we show that the observed tie strength is a kind of compromise, which depends on the relative strength of the tie as seen from its both ends. This statement is supported by the Granovetter-like, strongly positive weight-topology correlations, in the form of a power-law relationship between the asymmetric tie strength and asymmetric neighbourhood overlap, observed in three different real coauthorship networks and in a synthetic model of scientific collaboration. This observation is juxtaposed against the current misconception that coauthorship networks, being the proxy of scientific collaboration networks, contradict the Granovetter’s strength of weak ties hypothesis, and the reasons for this misconception are explained. Finally, by testing various link similarity scores, it is shown that taking into account the asymmetry of social ties can remarkably increase the efficiency of link prediction methods. The perspective outlined also allows us to comment on the surprisingly high performance of the resource allocation index – one of the most recognizable and effective local similarity scores – which can be rationalized by the strong triadic closure property, assuming that the property takes into account the asymmetry of social ties.
In this article we study and characterize the phenomenon of the
hyperprolific authors
, who are the most productive researchers according to a given repository in a specific period of time. ...Particularly, we are interested in investigating and characterizing a subset of such hyperprolific authors who present a sudden growth in the number of published articles and coauthors, as well as concentrate their publications in a few specific journals, what can be seen as an
anomalous
behavior. Using data collected from the DBLP repository and covering the last 10 years, we propose a set of discriminative dimensions (features) aimed at characterizing the behavior of hyperprolific authors, ultimately helping to identify anomalous ones. Moreover, using a strategy based on ranking aggregation to identify the most prominent anomalous authors, we demonstrate that the best dimensions to characterize such anomalous behaviors may vary significantly among authors, but it is possible to identify a clear subset of them who present such behavior. Our results show that the top-ranked (most anomalous) authors manifest a distinct behavior from the middle-ranked ones. Indeed, each one of the five most anomalous authors published more than 48 journal articles in 2021 while collaborating with more than 1,000 coauthors in their careers. Specifically, one of such authors published more than 140 articles in just a single journal.
Abstract
In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it is challenging to assign newly published papers to ...their respective authors. Therefore, author name ambiguity is considered a critical open problem in digital libraries. This paper proposes an author name disambiguation approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.
We compared general and specialized databases, by searching bibliographic information regarding journal articles in the computer science field, and by evaluating their bibliographic coverage and the ...quality of the bibliographic records retrieved. We selected a sample of computer science articles from an Italian university repository (AIR) to carry out our comparison. The databases selected were INSPEC, Scopus, Web of Science (WoS), and DBLP. We found that DBLP and Scopus indexed the highest number of unique articles (4.14 and 4.05 % respectively), that each of the four databases indexed a set of unique articles, that 12.95 % of the articles sampled were not indexed in any of the databases selected, that Scopus was better than WoS for identifying computer science publications, and that DBLP had a greater number of unique articles indexed (19.03 %), when compared to INSPEC (11.28 %). We also measured the quality of a set of bibliographic records, by comparing five databases: Scopus, WoS, INSPEC, DBLP and Google Scholar (GS). We found that WoS, INSPEC and Scopus provided better quality indexing and better bibliographic records in terms of accuracy, control and granularity of information, when compared to GS and DBLP. WoS and Scopus also provided more sophisticated tools for measuring trends of scholarly publications.
The number of published academic papers has been increasing rapidly from year to year. However, this increase in publications must be linear with an emphasis on quality. To ensure that academic ...papers meet the required quality standard, the peer review process is necessary. The main objective of the assignment of reviewers is to find the appropriate reviewer who can conduct a review based on their field of research. However, there are potential obstacles when there is a conflict of interest in the process. This study aims to develop a method for assigning reviewers that overcomes such obstacles. Our approach involves combining the Latent Dirichlet Allocation (LDA), Classification, and Link Prediction methods. LDA is used to find topics from the research data of prospective reviewers to ensure that the assigned reviewers are well suited to the submitted article. These data were used as training data for classification using Random Forest. Finally, link prediction implemented to make reviewer recommendations. We evaluated and compared our proposed method with previous research that used cosine similarity as the last step in recommendation, using Mean Average Precision (MAP). Our proposed method achieved a MAP value of 0.87, which was an improvement compared to the previous approach. These results suggest that our approach has the potential to improve the effectiveness of academic peer review.
Accidentality in journal citation patterns Mrowinski, Maciej J.; Gagolewski, Marek; Siudem, Grzegorz
Journal of informetrics,
November 2022, 2022-11-00, Letnik:
16, Številka:
4
Journal Article
Recenzirano
Odprti dostop
•A single indicator cannot quantify the impact of scientific journals.•Citations are allotted according to a mixture of the “rich get richer” rule and sheer chance.•More impactful journals tend to ...have more preferentially distributed citations.•Less impactful journals are characterised by a higher degree of accidentality in their citation distribution.
We study an agent-based model for generating citation distributions in complex networks of scientific papers, where a fraction of citations is allotted according to the preferential attachment rule (rich get richer) and the remainder is allocated accidentally (purely at random, uniformly). Previously, we derived and analysed such a process in the context of describing individual authors, but now we apply it to scientific journals in computer and information sciences. Based on the large DBLP dataset as well as the CORE (Computing Research and Education Association of Australasia) journal ranking, we find that the impact of journals is correlated with the degree of accidentality of their citation distribution. Citations to impactful journals tend to be more preferential, while citations to lower-ranked journals are distributed in a more accidental manner. Further, applied fields of research such as artificial intelligence seem to be driven by a stronger preferential component – and hence have a higher degree of inequality – than the more theoretical ones, e.g., mathematics and computation theory.
Computer science has experienced dramatic growth and diversification over the last twenty years. Towards a current understanding of the structure of this discipline, we analyze a large sample of the ...computer science literature from the DBLP database. For insight on the features of this cohort and the relationship within its components, we have constructed article level clusters based on either direct citations or co-citations, and reconciled them with major and minor subject categories in the All Science Journal Classification. We describe complementary insights from clustering by direct citation and co-citation, and both point to the increase in computer science publications and their scope. Our analysis reveals cross-category clusters, some that interact with external fields, such as the biological sciences, while others remain inward looking. Overall, we document an increase in computer science publications and their scope.