The state of an individual's mental health depends on many factors. Determination of the importance of any particular factor within a population needs access to unbiased data. We used publicly ...available data-sets to investigate, at a population level, how surrogates of mental health covary with light exposure. We found strong seasonal patterns of antidepressant prescriptions, which show stronger correlations with day length than levels of solar energy. Levels of depression in a population can therefore be determined by proxy indicators such as web query logs. Furthermore, these proxies for depression correlate with day length rather than solar energy.
Feminist news media researchers have long contended that masculine news values shape journalists' quotidian decisions about what is newsworthy. As a result, it is argued, topics and issues ...traditionally regarded as primarily of interest and relevance to women are routinely marginalised in the news, while men's views and voices are given privileged space. When women do show up in the news, it is often as "eye candy," thus reinforcing women's value as sources of visual pleasure rather than residing in the content of their views. To date, evidence to support such claims has tended to be based on small-scale, manual analyses of news content. In this article, we report on findings from our large-scale, data-driven study of gender representation in online English language news media. We analysed both words and images so as to give a broader picture of how gender is represented in online news. The corpus of news content examined consists of 2,353,652 articles collected over a period of six months from more than 950 different news outlets. From this initial dataset, we extracted 2,171,239 references to named persons and 1,376,824 images resolving the gender of names and faces using automated computational methods. We found that males were represented more often than females in both images and text, but in proportions that changed across topics, news outlets and mode. Moreover, the proportion of females was consistently higher in images than in text, for virtually all topics and news outlets; women were more likely to be represented visually than they were mentioned as a news actor or source. Our large-scale, data-driven analysis offers important empirical evidence of macroscopic patterns in news content concerning the way men and women are represented.
We address the problem of observing periodic changes in the behaviour of a large population, by analysing the daily contents of newspapers published in the United States and United Kingdom from 1836 ...to 1922. This is done by analysing the daily time series of the relative frequency of the 25K most frequent words for each country, resulting in the study of 50K time series for 31,755 days. Behaviours that are found to be strongly periodic include seasonal activities, such as hunting and harvesting. A strong connection with natural cycles is found, with a pronounced presence of fruits, vegetables, flowers and game. Periodicities dictated by religious or civil calendars are also detected and show a different wave-form than those provoked by weather. States that can be revealed include the presence of infectious disease, with clear annual peaks for fever, pneumonia and diarrhoea. Overall, 2% of the words are found to be strongly periodic, and the period most frequently found is 365 days. Comparisons between UK and US, and between modern and historical news, reveal how the fundamental cycles of life are shaped by the seasons, but also how this effect has been reduced in modern times.
Nowcasting the mood of the nation Lansdall-Welfare, Thomas; Lampos, Vasileios; Cristianini, Nello
Significance (Oxford, England),
08/2012, Letnik:
9, Številka:
4
Journal Article
Odprti dostop
Vast data‐streams from social networks like Twitter and Facebook contain a people's opinions, fears and dreams. Thomas Lansdall‐Welfare, Vasileios Lampos and Nello Cristianini exploit a whole new ...tool for social scientists.
Abstract
Recent studies have shown that macroscopic patterns of continuity and change over the course of centuries can be detected through the analysis of time series extracted from massive textual ...corpora. Similar data-driven approaches have already revolutionized the natural sciences and are widely believed to hold similar potential for the humanities and social sciences, driven by the mass-digitization projects that are currently under way, and coupled with the ever-increasing number of documents which are ‘born digital’. As such, new interactive tools are required to discover and extract macroscopic patterns from these vast quantities of textual data. Here we present History Playground, an interactive web-based tool for discovering trends in massive textual corpora. The tool makes use of scalable algorithms to first extract trends from textual corpora, before making them available for real-time search and discovery, presenting users with an interface to explore the data. Included in the tool are algorithms for standardization, regression, change-point detection in the relative frequencies of n-grams, multi-term indices, and comparison of trends across different corpora.
Content analysis of 150 years of British periodicals Lansdall-Welfare, Thomas; Sudhahar, Saatviga; Thompson, James ...
Proceedings of the National Academy of Sciences - PNAS,
01/2017, Letnik:
114, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Previous studies have shown that it is possible to detect macroscopic patterns of cultural change over periods of centuries by analyzing large textual time series, specifically digitized books. This ...method promises to empower scholars with a quantitative and data-driven tool to study culture and society, but its power has been limited by the use of data from books and simple analytics based essentially on word counts. This study addresses these problems by assembling a vast corpus of regional newspapers from the United Kingdom, incorporating very fine-grained geographical and temporal information that is not available for books. The corpus spans 150 years and is formed by millions of articles, representing 14% of all British regional outlets of the period. Simple content analysis of this corpus allowed us to detect specific events, like wars, epidemics, coronations, or conclaves, with high accuracy, whereas the use of more refined techniques from artificial intelligence enabled us to move beyond counting words by detecting references to named entities. These techniques allowed us to observe both a systematic underrepresentation and a steady increase of women in the news during the 20th century and the change of geographic focus for various concepts. We also estimate the dates when electricity overtook steam and trains overtook horses as a means of transportation, both around the year 1900, along with observing other cultural transitions. We believe that these data-driven approaches can complement the traditional method of close reading in detecting trends of continuity and change in historical corpora.
We have digitised a corpus of Italian newspapers published in 1873-1914 in Gorizia, the county town of an area in the North Adriatic at the crossroad of the Latin, Slavic and Germanic civilizations, ...then part of the Habsburg Empire and now divided between Italy and Slovenia. This new corpus (of 47,466 pages) is analysed along with a comparable set of local Slovenian newspapers, already digitised by the Slovenian National Library. This large and multilingual effort in digital humanities reveals the statistical traces of events and ideas that shaped a remarkable place and period. The emerging picture is one of rapid cultural, social and technological transformation, and of rising national awareness, combining the larger European pattern with uniquely local aspects.