Understanding the sense of discourse relations between segments of text is essential to truly comprehend any natural language text. Several automated approaches have been suggested, but all rely on ...external resources, linguistic feature engineering, and their processing pipelines are built from substantially different models. In this paper, we introduce a novel system for sense classification of shallow discourse relations (FR system) based on focused recurrent neural networks (RNNs). In contrast to existing systems, FR system consists of a single end-to-end trainable model for handling all types and senses of discourse relations, requires no feature engineering or external resources, is language-independent, and can be applied at the word and even character levels. At its core, we present our novel generalization of the focused RNNs layer, the first multi-dimensional RNN-attention mechanism for constructing text/argument embeddings. The filtering/gating RNN enables downstream RNNs to focus on different aspects of the input sequence and project it into several embedding subspaces. These argument embeddings are then used to perform sense classification. FR system has been evaluated using the official datasets and methodology of CoNLL 2016 Shared Task. It does not fall a lot behind state-of-the-art performance on English, the most researched and supported language, but it outperforms existing best systems by 2.5% overall results on the Chinese blind dataset.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In the past few years, the storage and the analysis of large-scale and fast evolving networks presents a great challenge. Therefore, a number of different techniques have been proposed for sampling ...large networks. Studies on network sampling primarily analyze the changes of network properties under the sampling. In general, network exploration techniques approximate the original networks more accurate than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction also to random walk and forest-fire sampling and evaluate the effects of subgraph induction on the sampling accuracy. We analyze different real-world networks and the changes of their properties introduced by sampling. The results reveal that the techniques with subgraph induction improve the performance of techniques without induction and create denser sample networks with larger average degree. Furthermore, the accuracy of sampling decrease consistently across various sampling techniques, when the sampled networks are smaller. Based on the results of the comparison, we introduce the scheme for selecting the most appropriate technique for network sampling. Overall, the breadth-first exploration sampling proves as the best performing technique.
•Sampling techniques are compared based on the match of properties between networks.•We apply subgraph induction step to random walk and forest-fire sampling.•Induction improves the performance in preserving degree and clustering distribution.•Techniques with induction create denser networks with larger average degree.•We introduce the scheme for selection of the most appropriate sampling technique.
Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their ...reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies.
Odkrivanje koreferenčnosti je ena izmed treh ključnih nalog ekstrakcije informacij iz besedil, kamor spadata še prepoznavanje imenskih entitet in ekstrakcija povezav. Namen odkrivanja koreferenčnosti ...je prek celotnega besedila ustrezno združiti vse omenitve entitet v skupine, v katerih vsaka skupina predstavlja svojo entiteto. Metode za reševanje te naloge se za nekatere jezike z več govorci razvijajo že dalj časa, medtem ko za slovenski jezik še niso bile izdelane. V prispevku predstavljamo nov, ročno označen korpus za odkrivanje koreferenčnosti v slovenskem jeziku – korpus coref149. Za avtomatsko odkrivanje koreferenčnosti smo prilagodili sistem SkipCor, ki smo ga izdelali za angleški jezik. Sistem SkipCor je na slovenskem gradivu dosegel 76 % ocene CoNLL 2012. Ob tem smo analizirali še vplive posameznih tipov značilk in preverili, katere so pogoste napake. Pri analiziranju besedil smo razvili tudi programsko knjižnico s spletnim vmesnikom, prek katere je možno izvesti vse opisane analize in neposredno primerjati njihovo uspešnost. Rezultati analiz so obetavni in primerljivi z rezultati pri drugih, bolj razširjenih jezikih. S tem smo dokazali, da je avtomatsko odkrivanje koreferenčnosti v slovenskem jeziku lahko uspešno, v prihodnosti pa bi bilo potrebno izdelati še večji in kvalitetnejši korpus, v katerem bodo koreferenčno naslovljene vse posebnosti slovenskega jezika, kar bi omogočilo izgradnjo učinkovitih metod za avtomatsko reševanje koreferenčnih problemov.
Coreference resolution tries to identify all expressions (called mentions) in observed text that refer to the same entity. Beside entity extraction and relation extraction, it represents one of the ...three complementary tasks in Information Extraction. In this paper we describe a novel coreference resolution system SkipCor that reformulates the problem as a sequence labeling task. None of the existing supervised, unsupervised, pairwise or sequence-based models are similar to our approach, which only uses linear-chain conditional random fields and supports high scalability with fast model training and inference, and a straightforward parallelization. We evaluate the proposed system against the ACE 2004, CoNLL 2012 and SemEval 2010 benchmark datasets. SkipCor clearly outperforms two baseline systems that detect coreferentiality using the same features as SkipCor. The obtained results are at least comparable to the current state-of-the-art in coreference resolution.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Science is a social process with far-reaching impact on our modern society. In recent years, for the first time we are able to scientifically study the science itself. This is enabled by massive ...amounts of data on scientific publications that is increasingly becoming available. The data is contained in several databases such as Web of Science or PubMed, maintained by various public and private entities. Unfortunately, these databases are not always consistent, which considerably hinders this study. Relying on the powerful framework of complex networks, we conduct a systematic analysis of the consistency among six major scientific databases. We found that identifying a single "best" database is far from easy. Nevertheless, our results indicate appreciable differences in mutual consistency of different databases, which we interpret as recipes for future bibliometric studies.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Trust is a crucial aspect when cyber-physical systems have to rely on resources and services under ownership of various entities, such as in the case of Edge, Fog and Cloud computing. The DECENTER’s ...Fog Computing Platform is developed to support Big Data pipelines, which start from the Internet of Things (IoT), such as cameras that provide video-streams for subsequent analysis. It is used to implement Artificial Intelligence (AI) algorithms across the Edge-Fog-Cloud computing continuum which provide benefits to applications, including high Quality of Service (QoS), improved privacy and security, lower operational costs and similar. In this article, we present a trust management architecture for DECENTER that relies on the use of blockchain-based Smart Contracts (SCs) and specifically designed trustless Smart Oracles. The architecture is implemented on Ethereum ledger (testnet) and three trust management scenarios are used for illustration. The scenarios (trust management for cameras, trusted data flow and QoS based computing node selection) are used to present the benefits of establishing trust relationships among entities, services and stakeholders of the platform.
Label propagation has proven to be a fast method for detecting communities in large complex networks. Recent developments have also improved the accuracy of the approach; however, a general algorithm ...is still an open issue. We present an advanced label propagation algorithm that combines two unique strategies of community formation, namely, defensive preservation and offensive expansion of communities. The two strategies are combined in a hierarchical manner to recursively extract the core of the network and to identify whisker communities. The algorithm was evaluated on two classes of benchmark networks with planted partition and on 23 real-world networks ranging from networks with tens of nodes to networks with several tens of millions of edges. It is shown to be comparable to the current state-of-the-art community detection algorithms and superior to all previous label propagation algorithms, with comparable time complexity. In particular, analysis on real-world networks has proven that the algorithm has almost linear complexity, O(m¹·¹⁹), and scales even better than the basic label propagation algorithm (m is the number of edges in the network).
Due to notable discoveries in the fast evolving field of complex networks, recent research in software engineering has also focused on representing software systems with networks. Previous work has ...observed that these networks follow scale-free degree distributions and reveal small-world phenomena, while we here explore another property commonly found in different complex networks, i.e. community structure. We adopt class dependency networks, where nodes represent software classes and edges represent dependencies among them, and show that these networks reveal a significant community structure, characterized by similar properties as observed in other complex networks. However, although intuitive and anticipated by different phenomena, identified communities do not exactly correspond to software packages. We empirically confirm our observations on several networks constructed from
Java and various third party libraries, and propose different applications of community detection to software engineering.
► Class dependency software networks reveal significant community structure. ► Communities have similar properties as observed in other complex networks. ► Though anticipated by various phenomena, they do not correspond to software packages. ► Community detection can be employed to obtain highly modular software packages.
The rapid growth of social media, news sites, and blogs increases the opportunity to express and share an opinion on the Internet. Researchers from different fields take advantage of nearly limitless ...data. Thus, in the past decade, opinion mining or sentiment analysis has become an important research discipline. In this paper, we focus on the target-level sentiment analysis, wherein the task is to predict the sentiment concerning specific (multiple) entities that appear as coreference mentions throughout the document. We created a new annotated dataset of Slovene news articles, additionally annotated with named entities and coreferences that are the basis for the proposed task. Using entity-document representation, we compared the task with the traditional sentiment analysis, evaluating traditional machine learning and deep neural network approaches. According to existing approaches, the proposed task represents a challenging problem. The results show that we can achieve the best results using a customised BERT adapter (a minor improvement over a standard text-classification adapter). We outperformed existing aspect-based state-of-the-art approaches by 13%, reaching up to 77% accuracy and a 73% F1 score.