Hidden bias in empirical textualism Jennejohn, Matthew; Nelson, Samuel; Nunez, D. Carolina
The Georgetown law journal,
03/2021, Letnik:
109, Številka:
4
Journal Article
Recenzirano
A new interpretive technique called "corpus linguistics" has exploded in use over the past five years from state supreme courts and federal courts of appeals to the U.S. Supreme Court. Corpus ...linguistics involves searching a large database, or corpus, of text to identify patterns in the way in which a certain term is used in context. Proponents of the method argue that it is a more "empirical" approach than referencing dictionaries to determine a word's public meaning, which is a touchstone in originalist approaches to legal interpretation.
This Article identifies an important concern about the use of corpus linguistics in legal interpretation that courts and scholarship have over-looked: bias. Using new machine learning techniques that analyze bias in text, this Article provides empirical evidence that the thousands of documents in the Corpus of Historical American English (COHA), the leading corpus currently used in judicial opinions, reflect gender bias. Courts and scholars have not considered that the COHA is sexist, raising the possibility that corpus linguistics methods could serve as a vehicle for infecting judicial opinions with longstanding prejudices in U.S. society.
In addition to raising this important new problem, this Article charts a course for dealing with it. It explains how hidden biases can be made transparent and introduces steps for "debiasing" corpora used in legal interpretation. More broadly, it shows how the methods introduced here can be used to study biases in all areas of the law, raising the prospect of a revolution in our understanding of how discriminatory biases affect legal decisionmaking., A new interpretive technique called "corpus linguistics" has exploded in use over the past five years from state supreme courts and federal courts of appeals to the US Supreme Court. Corpus linguistics involves searching a large database, or corpus, of text to identify patterns in the way in which a certain term is used in context. Proponents of the method argue that it is a more "empirical" approach than referencing dictionaries to determine a word's public meaning, which is a touchstone in originalist approaches to legal interpretation. This article identifies an important concern about the use of corpus linguistics in legal interpretation that courts and scholarship have over-looked: bias. Using new machine learning techniques that analyze bias in text, this article provides empirical evidence that the thousands of documents in the Corpus of Historical American English (COHA), the leading corpus currently used in judicial opinions, reflect gender bias. Courts and scholars have not considered that the COHA is sexist, raising the possibility that corpus linguistics methods could serve as a vehicle for infecting judicial opinions with longstanding prejudices in US society. In addition to raising this important new problem, this article charts a course for dealing with it. It explains how hidden biases can be made transparent and introduces steps for "debiasing" corpora used in legal interpretation. More broadly, it shows how the methods introduced here can be used to study biases in all areas of the law, raising the prospect of a revolution in our understanding of how discriminatory biases affect legal decision-making.
This title is about the design of a setswanna corpus for lexicography. It explores the question of whether a corpus for lexicography must comprise of a variety of texts drawn from different text ...types or whether the quality of information for lexicographic purposes could be extracted from a corpus with a single text type.
All Families and Genera Moskowich, Isabel; Lareo, Inés; Camiña, Gonzalo
2021, 2021-09-15
eBook
This volume is the fourth of its kind devoted to the analysis of English language use in different scientific disciplines from 1700 to 1900. Forty texts on biology and related fields constitute the ...basis for describing scientific discourse on both methodological issues, the period, and the status of the discipline itself.
Input a Word, Analyze the World Almeida, Francisco Alonso; Barrera, Ivalla Ortega; Toledo, Elena Quintana
2016, 2016-01-01
eBook
Input a Word, Analyze the World represents current perspectives on Corpus Linguistics (CL) from a variety of linguistic subdisciplines. Corpus Linguistics has proven itself an excellent methodology ...for the study of language variation and change, and is well-suited for interdisciplinary collaboration, as shown by the studies in this volume. Its title is inspired by the use of CL to assess language in different registers and with a variety of purposes. This collection contains thirty contributions by scholars in the field from across the globe, dealing with current topics on corpus production and corpus tools; lexical analysis, phraseology and grammar; translation and contrastive linguistics; and language learning. Language specialists will find these papers inspiring, as they present new insights on aspects related to research and teaching.
In recent years, an increasing number of studies dealt with the computational treatment of multiword expressions: identification, extraction, translation, and the role they play in Natural Language ...Processing applications. This book aims to address the need for better understanding in this comparatively new field of Computational Phraseology.
Focusing on the first journal in The Unabridged Journals of Sylvia Plath, this book writes a convincing case for the value of corpus-based stylistics and narrative psychology in the analysis of ...representations of the experience of affective states. Situated at the intersection between language study, psychology and healthcare, this study of the personal writing of a poet and novelist showcases a cutting-edge combination of quantitative and qualitative approaches, including metaphor analysis, corpus methods, and second person narration. Techniques that systematically account for representations of experiences of affective states, such as those in this book, are rare and crucial in improving understanding of these experiences. The findings and methods of this book therefore potentially have bearing on the study, diagnosis and treatment of depression and other mental illnesses. Zsófia Demjén follows the cognitive turn in both literary studies and linguistics here, emerging with a greater understanding of Plath, her diarized output and her experience of her inner world.
This is the third volume in the series Within Language, Beyond Theories, which focuses on current linguistic research that surpasses the limits of contemporary theoretical frameworks in order to gain ...new insights into the structure of the language system and to offer more explanatorily adequate accounts of linguistic phenomena taken from a number of the world's languages. This book offers a collection of fourteen chapters organized into three parts and serves as a vehicle for the survey of new voices in discourse analysis, pragmatics and corpus-based studies. Part I addresses a panorama of topics related to different discourse types, such as talk show discourse, multimodal discourse, and everyday spoken discourse, as well as written academic discourse. Part II covers a range of highly controversial issues in pragmatics, including the status of ad-hoc concepts, linguistically encoded meaning, explicit content, and the lexicographic treatment of modality. Part III encompasses chapters which offer an overview of some of the recent phenomena covered in the area of corpus-based research, including the semantic functions of the temporal meanings of selected prepositions; the diffusion of gerundive complements; the institutionalization and de-institutionalization of neologisms; contextual factors in the placement of the adverb "well"; the behaviour of the verb "bake" in copular constructions; the syntactic flexibility of English idioms and their thematic composition; tendencies in the formation of nouns in tabloids; and the application of cluster analysis to the categorization of linguistic data. Drawing on recent advances in discourse analysis, pragmatics and corpus-based studies, the majority of the issues discussed here are approached and investigated from a dual perspective. While on the theoretical side, an array of different theoretical models is
surveyed, in the analytical parts, the practical applications of the models examined are tested against data from English (both British and American), Estonian and Polish. The wide range of theoretical and empirical issues discussed in this book will help to provoke further academic discussion on the study of language in the areas of discourse analysis, pragmatics, and corpus-based research.
The ability to program a computer has become increasingly important in work that involves corpora. Specialised research needs can no longer be met by available software, and purchasing customised ...programs is usually not an option. This book enables the researcher to write programs for text and corpus processing. Useful techniques are illustrated with the popular programming language Java, which is very well suited for handling textual data, and at the same time easy to learn.Features:* a general introduction to programming for readers with a linguistic background* a practical introduction to corpus linguistics for readers with a programming background who are new to corpus processing* a guide to relevant aspects of Java which will be useful for text processing* a variety of sample programs which are in themselves useful tools for corpus research.
Placing contemporary spoken English at the centre of phonological research, this book tackles the issue of language variation and change through a range of methodological and theoretical approaches.