Abstract
Background
The academic publishing world is changing significantly, with ever-growing numbers of publications each year and shifting publishing patterns. However, the metrics used to measure ...academic success, such as the number of publications, citation number, and impact factor, have not changed for decades. Moreover, recent studies indicate that these metrics have become targets and follow Goodhart’s Law, according to which, “when a measure becomes a target, it ceases to be a good measure.”
Results
In this study, we analyzed >120 million papers to examine how the academic publishing world has evolved over the last century, with a deeper look into the specific field of biology. Our study shows that the validity of citation-based measures is being compromised and their usefulness is lessening. In particular, the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers. Citation-based metrics, such citation number and h-index, are likewise affected by the flood of papers, self-citations, and lengthy reference lists. Measures such as a journal’s impact factor have also ceased to be good metrics due to the soaring numbers of papers that are published in top journals, particularly from the same pool of authors. Moreover, by analyzing properties of >2,600 research fields, we observed that citation-based metrics are not beneficial for comparing researchers in different fields, or even in the same department.
Conclusions
Academic publishing has changed considerably; now we need to reconsider how we measure success.
Abstract
Background
COVID-19 is the most rapidly expanding coronavirus outbreak in the past 2 decades. To provide a swift response to a novel outbreak, prior knowledge from similar outbreaks is ...essential.
Results
Here, we study the volume of research conducted on previous coronavirus outbreaks, specifically SARS and MERS, relative to other infectious diseases by analyzing >35 million articles from the past 20 years. Our results demonstrate that previous coronavirus outbreaks have been understudied compared with other viruses. We also show that the research volume of emerging infectious diseases is very high after an outbreak and decreases drastically upon the containment of the disease. This can yield inadequate research and limited investment in gaining a full understanding of novel coronavirus management and prevention.
Conclusions
Independent of the outcome of the current COVID-19 outbreak, we believe that measures should be taken to encourage sustained research in the field.
Open information about government organizations should interest all citizens who care about their governments’ functionality. Large-scale open governmental data open new opportunities for citizens ...and researchers to monitor their government’s activities and improve its transparency. Over the years, various projects and systems have processed and analyzed governmental data based on open government information. Here, we present the Collecting and Analyzing Parliament Data (CAPD) framework. This novel generic open framework enables collecting and analyzing large-scale public governmental data from multiple sources. This study utilized our framework to collect over 64,000 parliament protocols from over 90 committees from three countries and analyzed it to calculate structured features. Next, we utilized anomaly detection and time series analysis to achieve a number of insights into the committees’ activities. This study demonstrates that the CAPD framework can be utilized to effectively identify anomalous meetings and detect dates of events that affect the parliaments’ functionality and help to monitor their activities.
Abstract In the last decades, global awareness toward the importance of diverse representation has been increasing. The lack of diversity and discrimination toward minorities did not skip the film ...industry. Here, we examine ethnic bias in the film industry through commercial posters, the industry’s primary advertisement medium for decades. Movie posters are designed to establish the viewer’s initial impression. We developed a novel approach for evaluating ethnic bias in the film industry by analyzing nearly 125,000 posters using state-of-the-art deep learning models. Our analysis shows that while ethnic biases still exist, there is a trend of reduction of bias, as seen by several parameters. Particularly in English-speaking movies, the ethnic distribution of characters on posters from the last couple of years is reaching numbers that are approaching the actual ethnic composition of the US population. An automatic approach to monitoring ethnic diversity in the film industry, potentially integrated with financial value, may be of significant use for producers and policymakers.
•We constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs.•Links are most prevalent among vertices that join a ...network at a similar time.•The rate that new vertices join a network is a central factor in molding a network’s topology.•The emergence of network stars (high-degree vertices) is correlated with fast-growing networks.•A novel flexible network-generation model based on large-scale real-world data is presented.
Trends change rapidly in today’s world, prompting this key question: What is the mechanism behind the emergence of new trends? By representing real-world dynamic systems as complex networks, the emergence of new trends can be symbolized by vertices that “shine.” That is, at a specific time interval in a network’s life, certain vertices become increasingly connected to other vertices. This process creates new high-degree vertices, i.e., network stars. Thus, to study trends, we must look at how networks evolve over time and determine how the stars behave. In our research, we constructed the largest publicly available network evolution dataset to date, which contains 38,000 real-world networks and 2.5 million graphs. Then, we performed the first precise wide-scale analysis of the evolution of networks with various scales. Three primary observations resulted: (a) links are most prevalent among vertices that join a network at a similar time; (b) the rate that new vertices join a network is a central factor in molding a network’s topology; and (c) the emergence of network stars (high-degree vertices) is correlated with fast-growing networks. We applied our learnings to develop a flexible network-generation model based on large-scale, real-world data. This model gives a better understanding of how stars rise and fall within networks, and is applicable to dynamic systems both in nature and society.
Multimedia Links
▶ Video ▶ Interactive Data Visualization ▶ Data ▶ Code Tutorials
Online Social Networks: Threats and Solutions Fire, Michael; Goldschmidt, Roy; Elovici, Yuval
IEEE Communications surveys and tutorials,
01/2014, Volume:
16, Issue:
4
Journal Article
Peer reviewed
Open access
Many online social network (OSN) users are unaware of the numerous security risks that exist in these networks, including privacy violations, identity theft, and sexual harassment, just to name a ...few. According to recent studies, OSN users readily expose personal and private details about themselves, such as relationship status, date of birth, school name, email address, phone number, and even home address. This information, if put into the wrong hands, can be used to harm users both in the virtual world and in the real world. These risks become even more severe when the users are children. In this paper, we present a thorough review of the different security and privacy risks, which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users. We also offer simple-to-implement recommendations for OSN users, which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.
Complementing the formal organizational structure of a business are the informal connections among employees. These relationships help identify knowledge hubs, working groups, and shortcuts through ...the organizational structure. They carry valuable information on how a company functions de facto. In the past, eliciting the informal social networks within an organization was challenging; today they are reflected by friendship relationships in online social networks. In this paper we analyze several commercial organizations by mining data which their employees have exposed on Facebook, LinkedIn, and other publicly available sources. Using a web crawler designed for this purpose, we extract a network of informal social relationships among employees of targeted organizations. Our results show that it is possible to identify leadership roles within the organization solely by using centrality analysis and machine learning techniques applied to the informal relationship network structure. Valuable non-trivial insights can also be gained by clustering an organization’s social network and gathering publicly available information on the employees within each cluster. Knowledge of the network of informal relationships may be a major asset or might be a significant threat to the underlying organization.
Searching for a person's name is a common online activity. However, Web search engines provide few accurate results to queries containing names. In contrast to a general word that has only one ...correct spelling, there are several possible legitimate spellings when a name provided as a query. Today, most techniques used to suggest diminutives and alternative spellings in online search are based on pattern matching and phonetic encoding; however, they often perform poorly. As a result, there is a need for an effective tool for improved alternative name suggestion for a name provided as a query. In this paper, we propose a revolutionary approach for tackling the problem of alternative name suggestion. Our novel algorithm, GRAFT , utilizes historical data collected from genealogy websites, along with network algorithms. GRAFT is a general algorithm that suggests alternatives for input names using a graph based on names derived from digitized ancestral family trees. Alternative names are extracted from this graph, which is constructed using generic ordering functions that outperform other algorithms that suggest diminutives and alternative spellings based on a single dimension, a factor that limits their performance. We evaluated GRAFT 's performance on three ground truth datasets of forenames and surnames, including a large-scale online genealogy dataset with over 16 million profiles and more than 700,000 unique forenames and 500,000 surnames. We compared GRAFT 's performance at suggesting alternative names to the performance of 10 other algorithms, including phonetic encoding, string similarity, machine learning, and deep learning algorithms. The results show GRAFT 's superiority with regard to both forenames and surnames and demonstrate its use as a tool to improve alternative name suggestion.
Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many ...products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT’s performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909.
•MSDT automatically detects code injection via anomaly detection in source code•MSDT provides ranked anomalies measured by the precision@k metric•A dataset of Python functions injected with real-world malicious codes is available•With a given grammar, MSDT can support any programming language
Most software development procedures today rely heavily on open-source codes managed in a community-based form so that anyone can contribute to their maintenance. The developers and their end users are targeted by bad actors using these properties to spread their malicious intentions. To address this challenge, we developed a method for detecting malicious intent in open-source code using deep-learning algorithms, an unsupervised method for discovering malicious codes. We demonstrate the method by detecting real-world malicious codes injected into randomly selected functions.
There are many opportunities for further experimental and computational studies of malware analysis and code security based on this proposed generic method, which lays out innovative principles for dealing with this type of threat.
Modern software development often includes open-source code. Broad communities are managing those codes; therefore, anyone can modify them. Recent years have seen hackers using this process to attack these communities and their users. MSDT, presented in this article, is an algorithm that detects malicious code injections at the source code level. It uses deep-learning and anomaly-detection practices applied to an open-source dataset of 607,461 functions, part of which were injected with several real-world malicious codes.
Nowadays, detecting anomalous communities in networks is an essential task in research, as it helps discover insights into community-structured networks. Most of the existing methods leverage either ...information regarding attributes of vertices or the topological structure of communities. In this study, we introduce the Co-Membership-based Generic Anomalous Communities Detection Algorithm (referred as to
CMMAC
), a novel and generic method that utilizes the information of vertices co-membership in multiple communities.
CMMAC
is domain-free and almost unaffected by communities’ sizes and densities. Specifically, we train a classifier to predict the probability of each vertex in a community being a member of the community. We then rank the communities by the aggregated membership probabilities of each community’s vertices. The lowest-ranked communities are considered to be anomalous. Furthermore, we present an algorithm for generating a community-structured random network enabling the infusion of anomalous communities to facilitate research in the field. We utilized it to generate two datasets, composed of thousands of labeled anomaly-infused networks, and published them. We experimented extensively on thousands of simulated, and real-world networks, infused with artificial anomalies.
CMMAC
outperformed other existing methods in a range of settings. Additionally, we demonstrated that
CMMAC
can identify abnormal communities in real-world unlabeled networks in different domains, such as Reddit and Wikipedia.