In this volume, Matthew L. Jockers introduces readers to large-scale literary computing and the revolutionary potential of macroanalysis--a new approach to the study of the literary record designed ...for probing the digital-textual world as it exists today, in digital form and in large quantities. Using computational analysis to retrieve key words, phrases, and linguistic patterns across thousands of texts in digital libraries, researchers can draw conclusions based on quantifiable evidence regarding how literary trends are employed over time, across periods, within regions, or within demographic groups, as well as how cultural, historical, and societal linkages may bind individual authors, texts, and genres into an aggregate literary culture. Moving beyond the limitations of literary interpretation based on the close-reading of individual works, Jockers describes how this new method of studying large collections of digital material can help us to better understand and contextualize the individual works within those collections.
•We model hundreds of themes in a corpus of some 3200 19th-century novels.•We analyze the statistical significance of these themes, heeding gender especially.•We consider impact of external factors ...on use of themes and words within themes.•We offer a robust method of exploring literary themes at scale.
External factors such as author gender, author nationality, and date of publication can affect both the choice of literary themes in novels and the expression of those themes, but the extent of this association is difficult to quantify. In this work, we apply statistical methods to identify and extract hundreds of topics (themes) from a corpus of 19th-century British, Irish, and American fiction. We use these topics as a measurable, data-driven proxy for literary themes and assess how external factors may predict fluctuations in the use of themes and the individual word choices within themes. We use topics not only to measure these associations but also to evaluate whether this evidence is statistically significant.
Previous studies of dependency distance as a measure of, or a proxy for, syntactic complexity do not consider factors such as sentence length and root distance. In the present study, we propose a new ...algorithm, i.e. Normalized Dependency Distance (NDD), that takes sentence length and root distance into consideration. Our analysis showed that exponential distribution fit well the distribution model of NDD as it did with Mean Dependency Distance (MDD), the algorithm used in previous studies. Findings indicated that NDD is significantly less dependent on sentence length than MDD is, which suggests that the new algorithm may have, to some extent, addressed the issue of MDD's dependency on sentence length. It is argued that NDD may serve as a measure of syntactic complexity, which is a kind of universality limited by the capacity of human working memory.
Don't let copyright block data mining Jockers, Matthew L.; Sag, Matthew; Schultz, Jason
Nature (London),
10/2012, Letnik:
490, Številka:
7418
Journal Article
Recenzirano
Odprti dostop
In 2005, the Authors Guild, based in New York, with some 8,500 members including published authors, literary agents and lawyers, filed a class-action lawsuit claiming that Google's scanning activity ...was a "massive copyright infringement". According to the US Constitution, the purpose of copyright is "To promote the Progress of Science and useful Arts".
Judging style: The case of Bush versus Gore Jockers, Matthew L; Nascimento, Fernando; Taylor, George H
Digital Scholarship in the Humanities,
06/2020, Letnik:
35, Številka:
2
Journal Article
Recenzirano
Abstract
The judgments by members of the US Supreme Court in the 2000 case of Bush versus Gore remain controversial to the present. We use text mining and machine learning methods to compare the word ...usage patterns of Supreme Court Justices in order to explore the likely authorship of both the anonymous 5-4 per curiam decision in this case and the concurrence that is attributed to Chief Justice Rehnquist, with Scalia and Thomas joining. An analysis of high and medium frequency words suggests that Justice Kennedy was likely the main contributor to the per curiam decision. A similar analysis of the concurrence, however, suggests that Justice Scalia may have played a more central role than the document’s purported author, Justice Rehnquist. Our analysis indicates that while Chief Justice Rehnquist was likely to have been the crafter of the document, much of the more forceful language of the concurrence resonates more clearly with a vocabulary that is indicative of Justice Scalia.
Metaadat Jockers, Matthew L.; Gergely, Labádi
Digitális bölcsészet,
09/2018, Letnik:
1, Številka:
1
Journal Article
Recenzirano
Odprti dostop
A Macroanalysis: Digital Methods and Literary History című könyv 5. fejezete azt kísérli meg bebizonyítani, hogy a bibliográfiai és demografikus metaadatok számítógépes módszerekkel történő elemzése ...lehetővé teszi irodalomtörténeti narratívák újraértékelését vagy átírását. A fejezet bemutatja, hogy irodalomtörténeti korszakok és trendek gondolatébresztő perspektívákba helyezhetők a könyvszintű metaadatok – könyvcímek, szerző származása, publikáció dátuma, kitalált helyszín és időpont stb. – makroelemzésével. Az ír-amerikai irodalomhoz kötődő metaadat-adatbázis hasznosításával a szerző újraértékeli az eddig elfogadott ír-amerikai irodalomtörténeti narratívát és egy alternatív perspektívába állítja Charles Fanning elméletét az ír-amerikai írók „elveszett generációjáról.” A metaadatok kontextust teremtenek Fanning irodalomtörténeti olvasatához, és azt sugallják, hogy az ír-amerikai irodalom történetére vonatkozó tudományos feltételezések a szerzők egy homogén csoportjának kisszámú művének elemzésén alapulnak. Absztraktabban fogalmazva a fejezet amellett érvel, hogy a hagyományos irodalomtudósok tévesen gondolják, hogy a nagy merítésenalapuló elemzések helyettesíteni szeretnék a szoros olvasást. Éppen ellenkezőleg, a metaadatok makroelemzése kizárólag a szükséges kontextust teremti meg a szoros olvasáshoz, és új kérdések felvetéséhez, az irodalomtörténet új perspektíváinak megalkotásához járul hozzá.
“The Ancient World in 19th-Century Fiction” is a lightly revised version of a lecture delivered at the first meeting of the Digital Classicists Association. The intent of the lecture, in accordance ...with the invitation to deliver it, was to introduce literary “macroanalysis” in the context of the ancient world and offer some exploration of how the ancient world is represented in the 19th-century literary imagination.