Finite-State Text Processing De Santo, Aniello
Computational Linguistics,
03/2023, Volume:
49, Issue:
1
Journal Article, Book Review
Peer reviewed
Open access
Finite-State Text Processing by Kyle Gorman and Richard Sproat ( Graduate Center, City University of New York & Google LLC). Morgan & Claypool (Synthesis Lectures on Synthesis Lectures on Human ...Language Technologies, edited by Graeme Hirst, volume 50), 2021, xvii+140 pp; paperback, ISBN: 9781636391137; ebook, ISBN: 9781636391144; hardcover, ISBN: 9781636391151, doi: 10.2200/S01086ED1V01Y202104HLT050.
Text length is a major concern in the measurement of lexical richness, and how lexical richness is affected by text length still remains open. The present study aims to explore the relation between ...text length and lexical richness from an entropy-based perspective. Results show a non-linear growth pattern of lexical richness by increasing text length. To be specific, lexical richness increases rapidly with shorter texts. It soon reaches a boundary point from which it stabilizes despite the continuous expansion of text length. The boundary point of the lexical richness by the Shannon estimation is around 1000 tokens and that by the Zhang estimation is lower and more varied, including 500, 800, and 1000 tokens. Such stability may be explained by the stabilization of word probability in the text.
Text comprehension is a dynamic process in which readers enter into a dialog with the text and interact with it: one’s own knowledge is activated and the text is interpreted in its cultural, social ...context. The study of this process requires an interdisciplinary approach, for which this volume provides theoretical considerations, text-type-specific models of analysis, and recommendations.
This paper aims to observe the language of university students writing a summary in the writing laboratory. The object of the observation (in the formative perspective of interlingua) is the ability ...to use a structured and hierarchical language (“propositional”) rendering the textual macrostructure previously analyzed in the laboratory.
Eugenio Coseriu’s introductory lecture on text linguistics, edited and revised by the author of these lines, presents two forms of text linguistics: text linguistics in the narrower sense, i.e. ...“transphrastic grammar” tied to a specific language, and text linguistics in the broader sense, the “linguistics of meaning”. This latter form of text linguistics examines how the signs of the text of any language give meaning to the text as a whole not only through what they denote but also through what they evoke. This article attempts to bring Coseriu’s remarks on text linguistics into a systematic context with the problem of translation. In doing so, it also examines – for the time being only in a rudimentary way – what corpus-based machine translation systems can achieve in this area and where they reach their limits.