We aimed to develop a machine learning model to infer OCEAN traits from text.
The psycholexical approach allows retrieving information about personality traits from human language. However, it has ...rarely been applied because of methodological and practical issues that current computational advancements could overcome.
Classical taxonomies and a large Yelp corpus were leveraged to learn an embedding for each personality trait. These embeddings were used to train a feedforward neural network for predicting trait values. Their generalization performances have been evaluated through two external validation studies involving experts (N = 11) and laypeople (N = 100) in a discrimination task about the best markers of each trait and polarity.
Intrinsic validation of the model yielded excellent results, with R
values greater than 0.78. The validation studies showed a high proportion of matches between participants' choices and model predictions, confirming its efficacy in identifying new terms related to the OCEAN traits. The best performance was observed for agreeableness and extraversion, especially for their positive polarities. The model was less efficient in identifying the negative polarity of openness and conscientiousness.
This innovative methodology can be considered a "psycholexical approach 2.0," contributing to research in personality and its practical applications in many fields.
A key issue in language processing is how we recognize and understand words in sentences. Research on sentence reading indicates that the time we need to read a word depends on how (un)expected it ...is. Research on single word recognition shows that each word also has its own recognition dynamics based on the relation between its orthographic form and its meaning. It is not clear, however, how these sentence-level and word-level dynamics interact. In the present study, we examine the joint impact of these sources of information during sentence reading. We analyze existing eye-tracking and self-paced reading data (Frank et al.,
2013
,
Behavior Research Methods, 45
4, 1182–1190) to investigate the interplay of sentence-level prediction (operationalized as Surprisal) and word Orthography-Semantics Consistency in activating word meaning in sentence processing. Results indicate that both Surprisal and Orthography-Semantics Consistency exert an influence on several reading measures. The shape of the observed interaction differs, but the results give compelling indication for a general trade-off between expectations based on sentence context and cues to meaning from word orthography.
In morphological processing, research has repeatedly found different priming effects by English and German native speakers in the overt priming paradigm. In English, priming effects were found for ...word pairs with a morphological and semantic relation (SUCCESSFUL-success), but not for pairs without a semantic relation (SUCCESSOR-success). By contrast, morphological priming effects in German occurred for pairs both with a semantic relation (AUFSTEHEN-stehen, ‘stand up’-‘stand’) and without (VERSTEHEN-stehen, ‘understand’-‘stand’). These behavioural differences have been taken to indicate differential language processing and memory representations in these languages. We examine whether these behavioural differences can be explained with differences in the language structure between English and German. To this end, we employed new developments in distributional semantics as a computational method to obtain both observed and compositional representations for transparent and opaque complex word meanings, that can in turn be used to quantify the degree of semantic predictability of the morphological system of a language. We compared the similarities between transparent and opaque words and their stems, and observed a difference between German and English, with German showing a higher morphological systematicity. The present results indicate that the investigated cross-linguistic effect can be attributed to quantitatively-characterized differences in the speakers' language experience, as approximated by linguistic corpora.
Quantitative, data-driven models for mental representations have long enjoyed popularity and success in psychology (e.g., distributional semantic models in the language domain), but have largely been ...missing for the visual domain. To overcome this, we present ViSpa (Vision Spaces), high-dimensional vector spaces that include vision-based representation for naturalistic images as well as concept prototypes. These vectors are derived directly from visual stimuli through a deep convolutional neural network trained to classify images and allow us to compute vision-based similarity scores between any pair of images and/or concept prototypes. We successfully evaluate these similarities against human behavioral data in a series of large-scale studies, including off-line judgments-visual similarity judgments for the referents of word pairs (Study 1) and for image pairs (Study 2), and typicality judgments for images given a label (Study 3)-as well as online processing times and error rates in a discrimination (Study 4) and priming task (Study 5) with naturalistic image material. ViSpa similarities predict behavioral data across all tasks, which renders ViSpa a theoretically appealing model for vision-based representations and a valuable research tool for data analysis and the construction of experimental material: ViSpa allows for precise control over experimental material consisting of images and/or words denoting imageable concepts and introduces a specifically vision-based similarity for word pairs. To make ViSpa available to a wide audience, this article (a) includes (video) tutorials on how to use ViSpa in R and (b) presents a user-friendly web interface at http://vispa.fritzguenther.de.
Normative measures of verbal material are fundamental in psycholinguistic and cognitive research for the control of confounding in experimental procedures and for achieving a better comprehension of ...our conceptual system. Traditionally, normative studies have focused on classical psycholinguistic variables, such as concreteness and imageability. Recent works have shifted researchers’ focus to perceptual strength, in which items are rated separately for each of the five senses. We present a resource that includes perceptual norms for 1,121 Italian words extracted from the Italian version of ANEW. Norms were collected from 57 native speakers. For each word, the participants provided perceptual-strength ratings for each of the five perceptual modalities. The perceptual norms performance in predicting human behavior was tested in two novel experiments, a lexical decision task and a naming task. Concreteness, imageability, and different composite variables representing perceptual-strength scores were considered as competing predictors in a series of linear regressions, evaluating the goodness of fit of each model. For both tasks, the model with
imageability
as the only predictor was found to be the best-fitting model according to the Akaike information criterion, whereas the model with the separately considered
five modalities
better described data according to the explained variance. These results differ from the ones previously reported for English, in which maximum perceptual strength emerged as the best predictor of behavior. We investigated this discrepancy by comparing Italian and English data for the same set of translated items, thus confirming a genuine cross-linguistic effect. We thus confirmed that perceptual experience influences linguistic processing, even though evaluations from different languages are needed to generalize this claim.
Many theories on the role of semantics in morphological representation and processing focus on the interplay between the lexicalized meaning of the complex word on the one hand, and the individual ...constituent meanings on the other hand. However, the constituent meaning representations at play do not necessarily correspond to the free-word meanings of the constituents: Role-dependent constituent meanings can be subject to sometimes substantial semantic shift from their corresponding free-word meanings (such as
-bill
in
hornbill
and
razorbill
, or
step-
in
stepmother
and
stepson
). While this phenomenon is extremely difficult to operationalize using the standard psycholinguistic toolkit, we demonstrate how these as-constituent meanings can be represented in a quantitative manner using a data-driven computational model. After a qualitative exploration, we validate the model against a large database of human ratings of the meaning retention of constituents in compounds. With this model at hand, we then proceed to investigate the internal semantic structure of compounds, focussing on differences in semantic shift and semantic transparency between the two constituents.
We release a database of cloze probability values, predictability ratings, and computational estimates for a sample of 205 English sentences (1726 words), aligned with previously released ...word-by-word reading time data (both self-paced reading and eye-movement records; Frank et al.,
Behavior Research Methods
,
45
(4), 1182–1190.
2013
) and EEG responses (Frank et al.,
Brain and Language
,
140
, 1–11.
2015
). Our analyses show that predictability ratings are the best predictors of the EEG signal (N400, P600, LAN) self-paced reading times, and eye movement patterns, when spillover effects are taken into account. The computational estimates are particularly effective at explaining variance in the eye-tracking data without spillover. Cloze probability estimates have decent overall psychometric accuracy and are the best predictors of early fixation patterns (first fixation duration). Our results indicate that the choice of the best measurement of word predictability in context critically depends on the processing index being considered.
Abstract Pseudowords such as “knackets” or “spechy” – letter strings that are consistent with the orthotactical rules of a language but do not appear in its lexicon – are traditionally considered to ...be meaningless, and employed as such in empirical studies. However, recent studies that show specific semantic patterns associated with these words as well as semantic effects on human pseudoword processing have cast doubt on this view. While these studies suggest that pseudowords have meanings, they provide only extremely limited insight as to whether humans are able to ascribe explicit and declarative semantic content to unfamiliar word forms. In the present study, we employed an exploratory-confirmatory study design to examine this question. In a first exploratory study, we started from a pre-existing dataset of words and pseudowords alongside human-generated definitions for these items. Employing 18 different language models, we showed that the definitions actually produced for (pseudo)words were closer to their respective (pseudo)words than the definitions for the other items. Based on these initial results, we conducted a second, pre-registered, high-powered confirmatory study collecting a new, controlled set of (pseudo)word interpretations. This second study confirmed the results of the first one. Taken together, these findings support the idea that meaning construction is supported by a flexible form-to-meaning mapping system based on statistical regularities in the language environment that can accommodate novel lexical entries as soon as they are encountered.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
How quantifiers are represented in the human mind is still a topic of intense debate. Seminal studies have addressed the issue of how a subclass of quantifiers, that is, number words, is spatially ...coded displaying the Spatial-Numerical Association of Response Codes (SNARC) effect; yet, none of these studies have explored the spatial representation of nonnumerical quantifiers such as "some" or "many." The aim of the present study is to investigate whether nonnumerical quantifiers are spatially coded in the human mind. We administered two typical comparison tasks to 52 participants: the first task involved nonnumerical quantifiers; the second task involved number words. Results showed a response-side compatibility effect for both number words and nonnumerical quantifiers, suggesting that both types of quantifiers are encoded in a spatial format; quantifiers referring to "small" quantities are responded to faster with the left hand and quantifiers referring to "large" quantities are responded to faster with the right hand. We labeled this effect for nonnumerical quantifiers as the Spatial-Linguistic Association of Response Codes (SLARC) effect. Notably, we found that the SNARC and the SLARC effects were strictly related to each other, namely the more a participant was sensitive to the SNARC effect in the number-word task, the more a SLARC effect was detectable in the nonnumerical quantifier task. These findings add evidence to the tendency of humans to align magnitude information on a mental line that is coded from left to right.
When mentally exploring maps representing large-scale environments (e.g., countries or continents), humans are assumed to mainly rely on spatial information derived from direct perceptual experience ...(e.g., prior visual experience with the geographical map itself). In the present study, we rather tested whether also temporal and linguistic information could account for the way humans explore and ultimately represent this type of maps. We quantified temporal distance as the minimum time needed to travel by train across Italian cities, while linguistic distance was retrieved from natural language through cognitively plausible AI models based on non-spatial associative learning mechanisms (i.e., distributional semantic models). In a first experiment, we show that temporal and linguistic distances capture with high-confidence real geographical distances. Next, in a second behavioral experiment, we show that linguistic information can account for human performance over and above real spatial information (which plays the major role in explaining participants’ performance) in a task in which participants have to judge the distance between cities (while temporal information was found to be not relevant). These findings indicate that, when exploring maps representing large-scale environments, humans do take advantage of both perceptual and linguistic information, suggesting in turn that the formation of cognitive maps possibly relies on a strict interplay between spatial and non-spatial learning principles.