Hierarchical cluster analysis (HCA) belongs to the family of multifactorial exploratory approaches. What it does is cluster individuals based on the distance between them. I illustrate HCA with the ...preposition data set described here. Hierarchical Cluster Analysis HCA comes in two flavors: agglomerative (or ascending) and divisive (or descending). Agglomerative clustering fuses the individuals into groups, whereas divisive clustering separates the individuals into finer groups. What ...
In a previous post, I showcased the development of the split infinitive (( This was, in fact, my very first post on this blog. )) I wanted to check whether the split infinitive had spiked after the ...airing of the original Star Trek series in the late 1960s (to find out whether that was indeed the case, I invite you to read the post!). In other words, I had a theoretical time-partition in mind (before and after 1967), and I wanted to check whether it had any empirical relevance. Another strate...
I am pleased to announce that I will teach a course on exploratory statistics for corpus linguistics at the Corpus Linguistics Summer School. The summer school will take place at the University of ...Birmingham from 24 to 28 June 2019. Topics The course covers chapters 9 and 10 of Corpus Linguistics and Statistics with R. Multifactorial Exploratory Approaches: fundamentals Exploring large contingency tables with correspondence analysis Exploring large data sets of nominal variables wi...
BNC.query() is an interactive R script that I wrote for a course in computational sociolinguistics last semester. It is designed to run queries over the BNC-XML (spoken component). It extracts a ...single word or a complex expression, along with speaker information (gender, age class, and social grade), tabulates the results, computes frequencies, and makes a barplot or an association plot (along with a χ2 test). For this demo, I have used R version 3.5.1 (2018-07-02) -- "Feather Spray" on RSt...
Last edit: June 7th, 2019 BNC.2014.query() is an interactive R script that I wrote for a course in computational sociolinguistics last semester. It is designed to run queries over the BNC-2014 ...(spoken component). It extracts a single word or a complex expression, along with speaker information (gender, age class, and social grade), tabulates the results, computes frequencies, and makes a barplot or an association plot (along with a χ2 test). For this demo, I have used R version 3.5.1 (201...
British National Corpus 2014 is a project led by the Centre for Corpus Approaches to Social Science at Lancaster University to create a 100M word corpus of contemporary British English, the BNC-XML, ...which is now over 20 years old. On November 19th, 2018, the spoken component of the BNC 2014 was made available for download for offline analysis. Before then, it was available via Lancaster University’s CQPweb. It is now accessible online in full, free of charge. The 11.5-million-word spoken co...
The data set presented here was compiled by Frédérique Gayet, a psychomotor therapist whose research I supervised in 2013. Gayet (2013) focused on spatial prepositions in French: à côté de "next to" ...en dessous de "below'', au dessus de "on top of", à gauche/droite de "to the left/right of'', etc. Psychomotor therapists interact physically with their patients. This physical interaction is cued by verbal references to space. The loss of spatial vocabulary by Alzheimer patients is damaging to...
Le 29 juin 2018, j'ai eu l'honneur et le plaisir d'être invité par Alain Polguère à animer une séance du séminaire de l'ATILF à l'Université de Lorraine (Nancy). J'y ai présenté mes travaux sur les ...liens entre les réseaux en Grammaires de Constructions et la théorie des graphes. En cliquant sur l'image ci-dessous, vous pourrez visionner la vidéo dans son intégralité (1h27min). citeCite this article as: {author}, "{title}," in {sitename}, {publication_date}, {permalink}./cite
A regression towards mediocrity Originally, the term regression means “going back”. It gained currency when Sir Francis Galton related the heights of children to the average height of their parents. ...Galton (1886) found that children whose parents were short were likely to be shorter than average, whereas children whose parents were tall tended to be taller than average. Galton also found that when the parents were “taller than mediocrity”, the children were in general shorter than their paren...
This post is the first of a series on word embeddings, i.e. vector representations of words in a vector space. Word embeddings have been known to linguists for quite some time. Recently, artificial ...neural networks have taken word embeddings to the next level. I will explain what makes artificial-intelligence-flavored word vectors so appealing in a future post. Right now, I wish to reel back a little and explain the basics. Words of a feather flock together Because a corpus is not a mere bag o...