This paper focuses on resultative constructions from a computational and corpus-based approach. We claim that the array of expressions (traditionally classed as idioms, collocations, free word ...combinations, etc.) that are used to convey a person’s change of mental state (typically negative) are basically instances of the same resultative construction. The first part of the study will introduce basic tenets of Construction Grammar and resultatives. Then, our corpus-based methodology will be spelled out, including a description of the two giga-token corpora used and a detailed account of our protocolised heuristic strategies and tasks. Distributional analysis of matrix slot fillers will be presented next, together with a discussion on restrictions, novel instances, and productivity. A final section will round up our study, with special attention to notions like “idiomaticity”, “productivity” and “variability” of the pairings of form and meaning analysed. To the best of our knowledge, this is one of the first studies based on giga-token corpora that explores idioms as integral parts of higher-order resultative constructions.
•The notion of constructional idiom as a powerful means to account for idiomaticity and cross-linguistic comparison.•Analyses based on giga-token web-crawled corpora are needed due to the low ...frequency of idioms.•English ‘insanity’ constructions are more schematic and proner to creative extensions.•Spanish ‘insanity’ constructions tend to be more substantive and less productive.•Translated constructions gravitate towards a neutral standard and reflect features of translationese.
This paper presents a corpus-based study of constructions in English and Spanish, with a special emphasis on equivalent semantic-functional counterparts, and potential mismatches. Although usage/corpus-based Construction Grammar (CxG) has attracted much attention in recent years, most studies have dealt exclusively with monolingual constructions. In this paper we will focus on two constructions that represent conventional ways to express ‘insanity’ in both languages. The analysis will cover grammatical, semantic and informative aspects in order to establish a multi-linguistic prototype of the constructions. To that end, data from several giga-token corpora of contemporary spoken English and Spanish (parallel and comparable) have been selected. This study advances the explanatory potential of constructional idioms for the study of idiomaticity, variability and cross-language analysis. In addition, relevant findings on the dialectal distribution of certain idiom features across both languages and their national varieties are also reported.
In recent years, an increasing number of studies dealt with the computational treatment of multiword expressions: identification, extraction, translation, and the role they play in Natural Language ...Processing applications. This book aims to address the need for better understanding in this comparatively new field of Computational Phraseology.
The correct interpretation of Multiword Units is crucial to many applications in Natural Language Processing. This volume illustrates a variety of topics that address this challenge, such as ...rule-based approaches, compound splitting techniques, MWU identification methodologies in multilingual applications, and MWU alignment issues.
Abstract Given the increase in production of data for the biomedical field and the unstoppable growth of the internet, the need for Information Extraction (IE) techniques has skyrocketed. Named ...Entity Recognition (NER) is one of such IE tasks useful for professionals in different areas. There are several settings where biomedical NER is needed, for instance, extraction and analysis of biomedical literature, relation extraction, organisation of biomedical documents, and knowledge-base completion. However, the computational treatment of entities in the biomedical domain has faced a number of challenges including its high cost of annotation, ambiguity, and lack of biomedical NER datasets in languages other than English. These difficulties have hampered data development, affecting both the domain itself and its multilingual coverage. The purpose of this study is to overcome the scarcity of biomedical data for NER in Spanish, for which only two datasets exist, by developing a robust bilingual NER model. Inspired by back-translation, this paper leverages the progress in Neural Machine Translation (NMT) to create a synthetic version of the Colorado Richly Annotated Full-Text (CRAFT) dataset in Spanish. Additionally, a new CRAFT dataset is constructed by replacing 20% of the entities in the original dataset generating a new augmented dataset. We evaluate two training methods: concatenation of datasets and continuous training to assess the transfer learning capabilities of transformers using the newly obtained datasets. The best performing NER system in the development set achieved an F-1 score of 86.39%. The novel methodology proposed in this paper presents the first bilingual NER system and it has the potential to improve applications across under-resourced languages.
El objetivo de este artículo es ofrecer una propuesta de clasificación de los rasgos presentes, en mayor o menor medida, en la literatura poscolonial en cualquier idioma. A pesar de que esta ...taxonomía toma como punto de partida definiciones teóricas previas de los conceptos clave relacionados con la literatura poscolonial (Edwards 2008, Nayar 2008 y Ramone 2011), parece ser la primera clasificación formal que se ha elaborado al respecto. De este modo, se analizan conceptos consolidados a la par que presenta la nueva noción de plasticidad de géneros literarios y explora las corrientes actuales en la investigación de la interseccionalidad. Como resultado, proporcionaremos un decálogo de características de la literatura poscolonial que favorecerá la crítica literaria y los estudios de literatura comparada.
This article compares the output of three neural machine translation systems (Google Translate, DeepL, and Phrase TMS) and human translation (undergraduate level students, English into Spanish). It ...focuses on five formal neologisms extracted from literary texts, thus considering creativity, and technology adoption and training.
Trends in E-Tools and Resources for Translators and Interpreters offers a collection of contributions from key players in the field of translation and interpreting that accurately outline some of the ...most cutting-edge technologies in this field.