The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. ...Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.
Many human languages have words for emotions such as "anger" and "fear," yet it is not clear whether these emotions have similar meanings across languages, or why their meanings might vary. We ...estimate emotion semantics across a sample of 2474 spoken languages using "colexification"-a phenomenon in which languages name semantically related concepts with the same word. Analyses show significant variation in networks of emotion concept colexification, which is predicted by the geographic proximity of language families. We also find evidence of universal structure in emotion colexification networks, with all families differentiating emotions primarily on the basis of hedonic valence and physiological activation. Our findings contribute to debates about universality and diversity in how humans understand and experience emotion.
Debates about human prehistory often center on the role that population expansions play in shaping biological and cultural diversity. Hypotheses on the origin of the Austronesian settlers of the ...Pacific are divided between a recent "pulse-pause" expansion from Taiwan and an older "slow-boat" diffusion from Wallacea. We used lexical data and Bayesian phylogenetic methods to construct a phylogeny of 400 languages. In agreement with the pulse-pause scenario, the language trees place the Austronesian origin in Taiwan approximately 5230 years ago and reveal a series of settlement pauses and expansion pulses linked to technological and social innovations. These results are robust to assumptions about the rooting and calibration of the trees and demonstrate the combined power of linguistic scholarship, database technologies, and computational phylogenetic methods for resolving questions about human prehistory.
The Sino-Tibetan language family is one of the world’s largest and most prominent families, spoken by nearly 1.4 billion people. Despite the importance of the Sino-Tibetan languages, their prehistory ...remains controversial, with ongoing debate about when and where they originated. To shed light on this debate we develop a database of comparative linguistic data, and apply the linguistic comparative method to identify sound correspondences and establish cognates. We then use phylogenetic methods to infer the relationships among these languages and estimate the age of their origin and homeland. Our findings point to Sino-Tibetan originating with north Chinese millet farmers around 7200 B.P. and suggest a link to the late Cishan and the early Yangshao cultures.
From the foods we eat and the houses we construct, to our religious practices and political organization, to who we can marry and the types of games we teach our children, the diversity of cultural ...practices in the world is astounding. Yet, our ability to visualize and understand this diversity is limited by the ways it has been documented and shared: on a culture-by-culture basis, in locally-told stories or difficult-to-access repositories. In this paper we introduce D-PLACE, the Database of Places, Language, Culture, and Environment. This expandable and open-access database (accessible at https://d-place.org) brings together a dispersed corpus of information on the geography, language, culture, and environment of over 1400 human societies. We aim to enable researchers to investigate the extent to which patterns in cultural diversity are shaped by different forces, including shared history, demographics, migration/diffusion, cultural innovations, and environmental and ecological conditions. We detail how D-PLACE helps to overcome four common barriers to understanding these forces: i) location of relevant cultural data, (ii) linking data from distinct sources using diverse ethnonyms, (iii) variable time and place foci for data, and (iv) spatial and historical dependencies among cultural groups that present challenges for analysis. D-PLACE facilitates the visualisation of relationships among cultural groups and between people and their environments, with results downloadable as tables, on a map, or on a linguistic tree. We also describe how D-PLACE can be used for exploratory, predictive, and evolutionary analyses of cultural diversity by a range of users, from members of the worldwide public interested in contrasting their own cultural practices with those of other societies, to researchers using large-scale computational phylogenetic analyses to study cultural evolution. In summary, we hope that D-PLACE will enable new lines of investigation into the major drivers of cultural change and global patterns of cultural diversity.
The island of New Guinea has the world's highest linguistic diversity, with more than 900 languages divided into at least 23 distinct language families. This diversity includes the world's third ...largest language family: Trans-New Guinea. However, the region is one of the world's least well studied, and primary data is scattered across a wide range of publications and more often then not hidden in unpublished "gray" literature. The lack of primary research data on the New Guinea languages has been a major impediment to our understanding of these languages, and the history of the peoples in New Guinea. TransNewGuinea.org aims to collect data about these languages and place them online in a consistent format. This database will enable future research into the New Guinea languages with both traditional comparative linguistic methods and novel cutting-edge computational techniques. The long-term aim is to shed light into the prehistory of the peoples of New Guinea, and to understand why there is such major diversity in their languages.
We report a new geometric maser distance estimate to the active galaxy NGC 4258. The data for the new model are maser line-of-sight (LOS) velocities and sky positions from 18 epochs of very long ...baseline interferometry observations, and LOS accelerations measured from a 10 yr monitoring program of the 22 GHz maser emission of NGC 4258. The new model includes both disk warping and confocal elliptical maser orbits with differential precession. The distance to NGC 4258 is 7.60 + or - 0.17 + or - 0.15 Mpc, a 3% uncertainty including formal fitting and systematic terms. The resulting Hubble constant, based on the use of the Cepheid variables in NGC 4258 to recalibrate the Cepheid distance scale, is H sub(0) = 72.0 + or - 3.0 km s super(-1) Mpc super(-1).
The 21 cm transition of neutral hydrogen is opening an observational window into the Cosmic Dawn of the universe-the epoch of first star formation. We use 28 hr of data from the Owens Valley Radio ...Observatory Long Wavelength Array to place upper limits on the spatial power spectrum of 21 cm emission at z 18.4 ( ), and within the absorption feature reported by the EDGES experiment. In the process we demonstrate the first application of the double Karhunen-Loève transform for foreground filtering, and diagnose the systematic errors that are currently limiting the measurement. We also provide an updated model for the angular power spectrum of low-frequency foreground emission measured from the northern hemisphere, which can be used to refine sensitivity forecasts for next-generation experiments.
ABSTRACT The Murchison Widefield Array (MWA) has collected hundreds of hours of Epoch of Reionization (EoR) data and now faces the challenge of overcoming foreground and systematic contamination to ...reduce the data to a cosmological measurement. We introduce several novel analysis techniques, such as cable reflection calibration, hyper-resolution gridding kernels, diffuse foreground model subtraction, and quality control methods. Each change to the analysis pipeline is tested against a two-dimensional power spectrum figure of merit to demonstrate improvement. We incorporate the new techniques into a deep integration of 32 hours of MWA data. This data set is used to place a systematic-limited upper limit on the cosmological power spectrum of mK2 at k = 0.27 h Mpc−1 and z = 7.1, consistent with other published limits, and a modest improvement (factor of 1.4) over previous MWA results. From this deep analysis, we have identified a list of improvements to be made to our EoR data analysis strategies. These improvements will be implemented in the future and detailed in upcoming publications.
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis ...claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.