In this tutorial, we show how to scrape and collect online data, perform sentiment analysis, social network analysis, tribe finding, and Wikidata cross-checks, all without using a single line of ...programming code. In a step-by-step example, we use self-collected data to perform several analyses of the glass ceiling. Our tutorial can serve as a standalone introduction to data science for qualitative researchers and business researchers, who have avoided learning to program. It should also be useful for experienced data scientists who want to learn about the tools that will allow them to collect and analyze data more easily and effectively.
Abstract
In recent years, with the advent of the era of big data, the importance of data has also become more prominent. This article introduces the characteristics of today’s Internet data from the ...background of big data, and the main method of data scraping-crawlers. In addition, the concepts of data mining, as well as hotspots and trends are briefly introduced. Highlights the importance of data scraping and data mining in today’s Internet field.
O surgimento do Covid-19 mudou significativamente as notícias veiculadas nas mídias tradicionais e na web. Preocupações como cozinhar em casa ou pedir alimentos por aplicativo (apps), saúde, realizar ...compras pelo e-commerce ou trabalhar em casa, passaram a fazer parte do cotidiano das famílias. A falta de alguns itens, filas e restrições nos supermercados, e as disputas externas por itens alimentícios também foram noticiados. Nessa perspectiva, buscou-se analisar as implicações iniciais da pandemia no agronegócio, por meio das notícias numa mídia brasileira. Para isso, realizou-se a coleta de reportagens sobre o agronegócio e o Covid-19 datados no início da pandemia, no portal do Jornal Folha de São Paulo (FSP), na internet. Foram obtidas 184 notícias, cujo conteúdo foi analisado com auxílio do software QDA Miner. Observou-se que os principais tópicos tratados nas publicações da FSP, nesse período inicial da pandemia, estavam relacionados ao mercado externo, a pobreza e ao preço dos alimentos. Abstract: The emergence of Covid-19 has significantly changed the news broadcast in traditional media and on the web. Concerns such as cooking at home or ordering food through apps, health, shopping on e-commerce, or working from home became part of everyday life for families. Shortages of some items, queues and restrictions at supermarkets, and outside disputes over food items were also reported. From this perspective, we sought to analyze the initial implications of the pandemic on agribusiness through the news in a Brazilian media outlet. To this end, the collection of news reports on agribusiness and Covid-19 dating from the beginning of the pandemic was carried out in the internet portal of the newspaper Folha de São Paulo (FSP). We obtained 184 news reports, whose content was analyzed with the help of the QDA Miner software. It was observed that the main themes addressed in the FSP publications, in this initial period of the pandemic, were related to the foreign market, poverty and food prices.
In this paper, we present Pyrlato, an innovative tool developed in Python for collecting acoustic data from YouTube. The development of this tool was motivated by the need to conveniently collect ...real-world spoken data. By executing this Python code, researchers can obtain a spoken corpus of specific words, syllables, constituents, and more. We illustrate the main steps of the execution to demonstrate how it works and how to use it. Additionally, we provide a complete example for reference, demonstrating how to customize Pyrlato according to specific requirements. Finally, we discuss the future developments we intend to cover for Pyrlato.
Research on sentence consistency in England and Wales has focused on disparities between courts, with differences between judges generally ignored. This is largely due to the limitations in official ...data. Using text mining techniques from Crown Court sentence records available online we generate a sample of 7,212 violent and sexual offences where both court and judge are captured. Multilevel time-to-event analyses of sentence length demonstrate that most disparities originate at the judge, not the court-level. Two important implications follow: i) the extent of sentencing consistency in England and Wales has been underestimated; and ii) the importance attributed to the location in which sentences are passed – in England and Wales and elsewhere - needs to be revisited. Further analysis of the judge level disparities identifies judicial rotation across courts as a practice conducive of sentence consistency, which suggests that sentencing guidelines could be complemented with other, less intrusive, changes in judicial practice to promote consistency.
Communication channels play a crucial role in times of crisis, especially during disasters. Social media have become substantial means of communication, playing coextensive roles to those of ...traditional media. Social media present a communication format that can operate not only within areas directly affected by a disaster but also throughout the rest of the world. Twitter has proven to be an important social media platform for providing services and information conveyed by credible organizations in times of crisis when other means of communication become inaccessible. This study focuses on the different uses of Twitter during disasters in Asia and the Pacific in 2014 and 2015. The purpose of this study is to show the pattern of use of Twitter to send warnings and identify crucial needs and responses. This study is based on the premise that Twitter has considerable potential as a communication channel during disasters given its advantages and high compatibility with rapid information dissemination. We gather tweets by scraping
https://twitter.com/search-advanced
results using the Application Programming Interface of Twitter. The scraping process is conducted with the Python Tweepy library. Data are classified based on a social media framework, geographical area, and user type. We find that the pattern of Twitter users plays a crucial role in raising awareness as well as coordinating relief efforts during disasters. Various types of users utilize Twitter in ways that are consistent with its traditional role. News organizations participate in secondhand reporting, and nongovernment organizations and celebrities are committed to relief coordination. Results cast light on not only how various types of users utilize Twitter in times of disaster but also on how a number of potential Twitter users are absent during disasters. Twitter use for relief coordination occurs understandably in the aftermath of a disaster, but the speed and reach of Twitter make it an ideal platform for disaster preparedness coordination and planning.
Differentiated green loans Giraudet, Louis-Gaëtan; Petronevich, Anna; Faucheux, Laurent
Energy policy,
February 2021, 2021-02-00, 20210201, 2021-02, Volume:
149
Journal Article
Peer reviewed
Open access
Scaling up home energy retrofits requires that associated loans be priced efficiently. Using a unique dataset of posted loan prices scraped from online simulators made available by French credit ...institutions, we examine the differentiation of interest rates in relation to project risk. Crucially, our data are immune from sorting bias based on borrower characteristics. We find that greener, arguably less risky, automobile projects carry lower interest rates, but greener home retrofits do not. On the other hand, conventional automobiles carry lower interest rates than do conventional home retrofits, despite arguably similar risk. Our results are robust to a range of robustness checks, including placebo tests. They together suggest that lenders use underlying assets to screen borrower's unobserved willingness to pay, which can cause under-investment in home energy retrofits. We thereby point to a new form of the energy efficiency gap. This has important policy implications in that it can explain low uptake of zero-interest green loan programs.
•Analysis of a unique dataset of posted loan prices scraped from online simulators.•Interest rates are smaller for vehicles than for home retrofits.•Green penalty for home retrofits; green discount for vehicles.•New evidence of energy efficiency gap in home energy retrofit.•Suggestive evidence of unconventional differentiation in loan pricing.
Abstract
Most research in sentencing discrimination in the United Kingdom has relied on aggregate analyses comparing disparities by ethnic group. These studies fail to consider differences in the ...individual characteristics of the cases processed. To circumvent the lack of official data, we scraped sentence records stored in a commercial website, from which a sample of 8,437 offenders sentenced to custody in the Crown Court from 2007 to 2017 was generated. Using the names of the offenders, we have been able to classify 8.6 per cent of our sample as having a traditional Muslim name. We find that Muslim-named offenders received sentences 9.8 per cent longer than the rest of the sample. However, this difference disappeared once we accounted for the type of offence and other key case characteristics.