Twitter was an integral part of Donald Trump's communication platform during his 2016 campaign. Although its topical content has been examined by researchers and the media, we know relatively little ...about the style of the language used on the account or how this style changed over time. In this study, we present the first detailed description of stylistic variation on the Trump Twitter account based on a multivariate analysis of grammatical co-occurrence patterns in tweets posted between 2009 and 2018. We identify four general patterns of stylistic variation, which we interpret as representing the degree of conversational, campaigning, engaged, and advisory discourse. We then track how the use of these four styles changed over time, focusing on the period around the campaign, showing that the style of tweets shifts systematically depending on the communicative goals of Trump and his team. Based on these results, we propose a series of hypotheses about how the Trump campaign used social media during the 2016 elections.
In this paper, we analyze double modal use in American English based on a multi-billion-word corpus of geolocated posts from the social media platform Twitter. We identify and map 76 distinct double ...modals totaling 5,349 examples, many more types and tokens of double modals than have ever been observed. These descriptive results show that double modal structure and use in American English is far more complex than has generally been assumed. We then consider the relevance of these results to three current theoretical debates. First, we demonstrate that although there are various semantic tendencies in the types of modals that most often combine, there are no absolute constraints on double modal formation in American English. Most surprisingly, our results suggest that double modals are used productively across the US. Second, we argue that there is considerable dialect variation in double modal use in the southern US, with double modals generally being most strongly associated with African American Language, especially in the Deep South. This result challenges previous sociolinguistic research, which has often highlighted double modal use in White Southern English, especially in Appalachia. Third, we consider how these results can help us better understand the origins of double modals in America English: although it has generally been assumed that double modals were introduced by Scots-Irish settlers, we believe our results are more consistent with the hypothesis that double modals are an innovation of African American Language.
In this Element, the authors introduce and apply a framework for the linguistic analysis of fake news. They define fake news as news that is meant to deceive as opposed to inform and argue that there ...should be systematic differences between real and fake news that reflect this basic difference in communicative purpose. The authors consider one famous case of fake news involving Jayson Blair of The New York Times, which provides them with the opportunity to conduct a controlled study of the effect of deception on the language of a single reporter following this framework. Through a detailed grammatical analysis of a corpus of Blair's real and fake articles, this Element demonstrates that there are clear differences in his writing style, with his real news exhibiting greater information density and conviction than his fake news. This title is also available as Open Access on Cambridge Core.
Dialect classification is a long-standing issue in Chinese dialectology. Although various theories of Chinese dialect regions have been proposed, most have been limited by similar methodological ...issues, especially due to their reliance on the subjective analysis of dialect maps both individually and in the aggregate, as well as their focus on phonology over syntax and vocabulary. Consequently, we know relatively little about the geolinguistic underpinnings of Chinese dialect variation. Following a review of previous research in this area, this article presents a theory of Chinese dialect regions based on the first large-scale quantitative analysis of the data from the
which was collected between 2000 and 2008, providing the most up-to-date picture of the full Chinese dialect landscape. We identify and map a hierarchy of 10 major Chinese dialect regions, challenging traditional accounts. In addition, we propose a new theory of Chinese dialect formation to account for our findings.
In this paper, I propose that replication failure in linguistics may be due primarily to inherent issues with the application of experimental methods to analyze an inextricably social phenomenon like ...language, as opposed to poor research practices. Because language use varies across social contexts, and because social context must vary across independent experimental replications, linguists should not be surprised when experimental results fail to replicate at the expected rate. To address issues with replication failure in linguistics, and to increase methodological rigor in our field more generally, I argue that linguists must use experimental methods carefully, keeping in mind their inherent limitations, while acknowledging the scientific value of observational methods, which are often the only way to pursue basic questions in our field.
For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a ...small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of
analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.
Abstract
Cultural areas represent a useful concept that cross-fertilizes diverse fields in social sciences. Knowledge of how humans organize and relate their ideas and behavior within a society can ...help us to understand our actions and attitudes toward different issues. However, the selection of common traits that shape a cultural area is somewhat arbitrary. What is needed is a method that can leverage the massive amounts of data coming online, especially through social media, to identify cultural regions without ad-hoc assumptions, biases, or prejudices. This work takes a crucial step in this direction by introducing a method to infer cultural regions based on the automatic analysis of large datasets from microblogging posts. The approach presented here is based on the principle that cultural affiliation can be inferred from the topics that people discuss among themselves. Specifically, regional variations in written discourse are measured in American social media. From the frequency distributions of content words in geotagged tweets, the regional hotspots of words’ usage are found, and from there, principal components of regional variation are derived. Through a hierarchical clustering of the data in this lower-dimensional space, this method yields clear cultural areas and the topics of discussion that define them. It uncovers a manifest North–South separation, which is primarily influenced by the African American culture, and further contiguous (East–West) and non-contiguous divisions that provide a comprehensive picture of modern American cultural areas.
This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured ...over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.
In this paper, I introduce a situational approach to the study of linguistic complexity. As opposed to most research on linguistic complexity, which has focused on the grammatical complexity of ...languages, I consider this topic from a situational perspective. I make two proposals. First, I claim that languages can vary in their situational diversity. Languages that have been adapted for a wider range of communicative contexts are more situationally complex than languages that have been adapted for a narrower range of communicative contexts. To support this claim, I consider examples of situational diversity from across a range of different languages and varieties of languages, drawing on empirical research from linguistics and anthropology. Second, I claim that situational diversity can help explain variation in grammatical complexity. I propose that increasing situational diversity in a language over time should lead to decreasing grammatical complexity. Furthermore, I argue that this trade-off between situational and grammatical complexity could explain how overall linguistic complexity could be maintained across languages and over time.