Even in highly-developed countries, as many as 15–30% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents ...them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task that aims to make text understandable to everyone by replacing complex vocabulary and expressions with simpler ones, while preserving the original meaning. It has attracted considerable attention in the last 20 years, and fully automatic lexical simplification systems have been proposed for various languages. The main obstacle for the progress of the field is the absence of high-quality datasets for building and evaluating lexical simplification systems. In this study, we present a new benchmark dataset for lexical simplification in English, Spanish, and (Brazilian) Portuguese, and provide details about data selection and annotation procedures, to enable compilation of comparable datasets in other languages and domains. As the first multilingual lexical simplification dataset, where instances in all three languages were selected and annotated using comparable procedures, this is the first dataset that offers a direct comparison of lexical simplification systems for three languages. To showcase the usability of the dataset, we adapt two state-of-the-art lexical simplification systems with differing architectures (neural vs. non-neural) to all three languages (English, Spanish, and Brazilian Portuguese) and evaluate their performances on our new dataset. For a fairer comparison, we use several evaluation measures which capture varied aspects of the systems' efficacy, and discuss their strengths and weaknesses. We find that a state-of-the-art neural lexical simplification system outperforms a state-of-the-art non-neural lexical simplification system in all three languages, according to all evaluation measures. More importantly, we find that the state-of-the-art neural lexical simplification systems perform significantly better for English than for Spanish and Portuguese, thus posing a question if such an architecture can be used for successful lexical simplification in other languages, especially the low-resourced ones.
We present work aimed at facilitating the comprehensibility of health-related English-Spanish parallel texts by means of the semantic annotation of biomedical concepts and the automatic expansion of ...their definitions. In order to overcome the limitations posed by the scarcity of resources available for Spanish, we propose to exploit existing tools targeted at English and then transfer the produced annotations. The evaluations performed show the feasibility of this approach. An enriched set of texts is made available, which can be retrieved, visualized and downloaded through a web interface.
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the ...topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies.
Related work sections or literature reviews are an essential part of every scientific article being crucial for paper reviewing and assessment. However, writing a good related work section is an ...activity which requires considerable expertise to identify, condense/summarize, and combine relevant information from different sources. In this work we compare different automatic methods to produce “descriptive” related work sections given as input the set of papers which need to be described. The main contribution of our work is a neural sequence learning process which produces citation sentences to be included in a related work section of an article. We train the neural architecture using an available scientific data set of citation sentences and we test over a data set of related work sections; we also compare the performance to a set of baseline extractive summarizers, an abstractive summarizer and a state of the art CNNs approach. Our results indicate that our approach outperforms the simple as well as the informed baselines.
Making It Simplext Saggion, Horacio; Štajner, Sanja; Bott, Stefan ...
ACM transactions on accessible computing,
06/2015, Volume:
6, Issue:
4
Journal Article
Peer reviewed
The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that ...are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.
ABSTRACT Seven wild accessions of Coffea arabica from Ethiopia prospected by FAO Coffee Mission 1964-1965 were investigated concerning the resistance to 18 Brazilian strains and two Kenyan strains of ...Pseudomonas syringae pv. garcae and four P. syringae pv. tabaci strains, causal agents of bacterial halo blight and bacterial leaf spot, respectively. The cultivars of C. arabica IPR 102, resistant to the diseases, and Mundo Novo IAC 376-4, susceptible, were used as experimental controls. Our results indicated that the Ethiopian accessions presented high levels of resistance to all Brazilian strains of P. syringae pv. garcae but were susceptible to infection caused by Kenyan strains, which causes different levels of severity in wild accessions and experimental controls. Ethiopian accessions were also considered resistant to the four P. syringae pv. tabaci strains, with low susceptibility observed, one point on the severity scale, in access E-268 in response to a strain of the bacterium.
RESUMO Sete acessos selvagens de Coffea arabica da Etiópia prospectados pela FAO Coffee Mission 1964-1965 foram investigados quanto à resistência a 18 linhagens brasileiras e duas linhagens quenianas de Pseudomonas syringae pv. garcae e quatro P. syringae pv. tabaci, agentes etiológicos da mancha-aureolada e da mancha-foliar-bacterina, respectivamente. As cultivares de C. arabica IPR 102, resistente às doenças, e Mundo Novo IAC 376-4, suscetível, foram utilizadas como controle experimental. Nossos resultados indicaram que os acessos etíopes apresentaram altos níveis de resistência a todas as linhagens brasileiras de P. syringae pv. garcae avaliados, mas suscetíveis à infecção causada por linhagens quenianas, que causa diferentes níveis de severidade em acessos selvagens e nos controles experimentais. Os acessos etíopes também foram considerados resistentes às quatro linhagens de P. syringae pv. tabaci, tendo sido observada baixa suscetibilidade, um ponto na escala de severidade, no acesso E-268 em resposta a uma linhagem da bactéria.
Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder ...reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pretrained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets – LexMTurk, BenchLS, and NNSEval – show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.
Abstract
Motivation
Biomedical literature is one of the most relevant sources of information for knowledge mining in the field of Bioinformatics. In spite of English being the most widely addressed ...language in the field; in recent years, there has been a growing interest from the natural language processing community in dealing with languages other than English. However, the availability of language resources and tools for appropriate treatment of non-English texts is lacking behind. Our research is concerned with the semantic annotation of biomedical texts in the Spanish language, which can be considered an under-resourced language where biomedical text processing is concerned.
Results
We have carried out experiments to assess the effectiveness of several methods for the automatic annotation of biomedical texts in Spanish. One approach is based on the linguistic analysis of Spanish texts and their annotation using an information retrieval and concept disambiguation approach. A second method takes advantage of a Spanish–English machine translation process to annotate English documents and transfer annotations back to Spanish. A third method takes advantage of the combination of both procedures. Our evaluation shows that a combined system has competitive advantages over the two individual procedures.
Availability and implementation
UMLSMapper (https://snlt.vicomtech.org/umlsmapper) and the annotation transfer tool (http://scientmin.taln.upf.edu/anntransfer/) are freely available for research purposes as web services and/or demos.
Supplementary information
Supplementary data are available at Bioinformatics online.