We present the Chinese Lexical Database (CLD): a large-scale lexical database for simplified Chinese. The CLD provides a wealth of lexical information for 3913 one-character words, 34,233 ...two-character words, 7143 three-character words, and 3355 four-character words, and is publicly available through
http://www.chineselexicaldatabase.com
. For each of the 48,644 words in the CLD, we provide a wide range of categorical predictors, as well as an extensive set of frequency measures, complexity measures, neighborhood density measures, orthography-phonology consistency measures, and information-theoretic measures. We evaluate the explanatory power of the lexical variables in the CLD in the context of experimental data through analyses of lexical decision latencies for one-character, two-character, three-character and four-character words, as well as word naming latencies for one-character and two-character words. The results of these analyses are discussed.
This article introduces childLex, an online database of German read by children. childLex is based on a corpus of children’s books and comprises 10 million words that were syntactically annotated and ...lemmatized. childLex reports linguistic norms for lexical, superlexical, and sublexical variables in three different age groups: 6–8 (grades 1–2), 9–10 (grades 3–4), and 11–12 years (grades 5–6). Here, we describe how childLex was collected and analyzed. In addition, we provide information about the distributions of word frequency, word length, and orthographic neighborhood size, as well as their intercorrelations. Finally, we explain how childLex can be accessed using a Web interface.
ASL-LEX is a lexical database that catalogues information about nearly 1,000 signs in American Sign Language (ASL). It includes the following information: subjective frequency ratings from 25–31 deaf ...signers, iconicity ratings from 21–37 hearing non-signers, videoclip duration, sign length (onset and offset), grammatical class, and whether the sign is initialized, a fingerspelled loan sign, or a compound. Information about English translations is available for a subset of signs (e.g., alternate translations, translation consistency). In addition, phonological properties (sign type, selected fingers, flexion, major and minor location, and movement) were coded and used to generate sub-lexical frequency and neighborhood density estimates. ASL-LEX is intended for use by researchers, educators, and students who are interested in the properties of the ASL lexicon. An interactive website where the database can be browsed and downloaded is available at
http://asl-lex.org
.
This article presents CPB-LEX, a large-scale database of lexical statistics derived from children's picture books (age range 0-8 years). Such a database is essential for research in psychology, ...education and computational modelling, where rich details on the vocabulary of early print exposure are required. CPB-LEX was built through an innovative method of computationally extracting lexical information from automatic speech-to-text captions and subtitle tracks generated from social media channels dedicated to reading picture books aloud. It consists of approximately 25,585 types (wordforms) and their frequency norms (raw and Zipf-transformed), a lexicon of bigrams (two-word sequences and their transitional probabilities) and a document-term matrix (which shows the importance of each word in the corpus in each book). Several immediate contributions of CPB-LEX to behavioural science research are reported, including that the new CPB-LEX frequency norms strongly predict age of acquisition and outperform comparable child-input lexical databases. The database allows researchers and practitioners to extract lexical statistics for high-frequency words which can be used to develop word lists. The paper concludes with an investigation of how CPB-LEX can be used to extend recent modelling research on the lexical diversity children receive from picture books in addition to child-directed speech. Our model shows that the vocabulary input from a relatively small number of picture books can dramatically enrich vocabulary exposure from child-directed speech and potentially assist children with vocabulary input deficits. The database is freely available from the Open Science Framework repository: https://tinyurl.com/4este73c .
In this article, we present the Chinese Children’s Lexicon of Written Words (CCLOWW), the first grade-level database that provides frequency statistics of simplified Chinese characters and words for ...children. The database computes from a corpus of 34,671,424 character tokens and 22,427,010 word tokens (including single- and multicharacter words), extracted from 2131 books. It contains 6746 different character types and 153,079 different word types. CCLOWW provides several frequency indices of simplified Chinese for three grade levels (grade 2 and below, grades 3–4, grades 5–6) to profile children’s experience with written Chinese in and outside of school. We describe in this article the distributions of frequency and contextual diversity of the characters and words, as well as word length and syntactic categories of the words in the corpus and the subcorpora. We also report results of correlation analyses with other written corpora and of several naming and lexicon decision experiments. The findings suggest that CCLOWW frequency measures correlate well with other corpora. Importantly, they could reliably predict children’s and adults’ naming and lexical decision performances. They could also explain variance in adults’ visual word recognition, in addition to frequency measures computed in an adult corpus, indicating that early print exposure might influence readers’ lexical processing later on beyond an age of acquisition effect. CCLOWW will help researchers in language processing and development as well as educators with selecting language materials appropriate for children’s developmental stages. The database is freely available online at
https://www.learn2read.cn/database/
.
When describing variation at the lexical level in sign languages, researchers often distinguish between phonological and lexical variants, using the following principle: if two signs differ in only ...one of the major phonological components (handshape, orientation, movement, location), then they are considered phonological variants, otherwise they are considered separate lexemes. We demonstrate that this principle leads to contradictions in some simple and more complex cases of variation. We argue that it is useful to visualize the relations between variants as graphs, and we describe possible networks of variants that can arise using this visualization tool. We further demonstrate that these scenarios in fact arise in the case of variation in color terms and kinship terms in Russian Sign Language (RSL), using a newly created database of lexical variation in RSL. We show that it is possible to develop a set of formal rules that can help distinguish phonological and lexical variation also in the problematic scenarios. However, we argue that it might be a mistake to dismiss the actual patterns of variant relations in order to arrive at the binary lexical vs. phonological variant opposition.
This article introduces the Children and Young People's Books-Lexicon (CYP-LEX), a large-scale lexical database derived from books popular with children and young people in the United Kingdom. ...CYP-LEX includes 1,200 books evenly distributed across three age bands (7-9, 10-12, 13+) and comprises over 70 million tokens and over 105,000 types. For each word in each age band, we provide its raw and Zipf-transformed frequencies, all parts-of-speech in which it occurs with raw frequency and lemma for each occurrence, and measures of count-based contextual diversity. Together and individually, the three CYP-LEX age bands contain substantially more words than any other publicly available database of books for primary and secondary school children. Most of these words are very low in frequency, and a substantial proportion of the words in each age band do not occur on British television. Although the three age bands share some very frequent words, they differ substantially regarding words that occur less frequently, and this pattern also holds at the level of individual books. Initial analyses of CYP-LEX illustrate why independent reading constitutes a challenge for children and young people, and they also underscore the importance of reading widely for the development of reading expertise. Overall, CYP-LEX provides unprecedented information into the nature of vocabulary in books that British children aged 7+ read, and is a highly valuable resource for those studying reading and language development.
Abstract A novel lexical resource for treating speech impairments from childhood to senility: DILLo— Database Italiano del Lessico per Logopedisti (i.e., Italian Database for Speech-Language ...Pathologists) is presented. DILLo is a free online web application that allows extraction of filtered wordlists for flexible rehabilitative purposes. Its major aim is to provide Italian speech-language pathologists (SLPs) with a resource that takes advantage of Information and Communication Technologies for language in a healthcare setting. DILLo’s design adopts an integrated approach that envisages fruitful cooperation between clinical and linguistic professionals. The 7690 Italian words in the database have been selected based on phonological, phonotactic, and morphological properties, and their frequency of use. These linguistic features are encoded in the tool, which includes the orthographic and phonological transcriptions, and the phonotactic structure of each word. Moreover, most of the entries are associated with their respective ARASAAC pictogram, providing an additional and inclusive tool for treating speech impairments. The user-friendly interface is structured to allow for different and adaptable search options. DILLo allows Speech-Language Pathologists (SLPs) to obtain a rich, tailored, and varied selection of suitable linguistic stimuli. It can be used to customize the treatment of many impairments, e.g., Speech Sound Disorders, Childhood Apraxia of Speech, Specific Learning Disabilities, aphasia, dysarthria, dysphonia, and the auditory training that follows cochlear implantations.
In this article, we introduce the Chinese Children’s Lexicon of Oral Words (CCLOOW), the first lexical database based on animated movies and TV series for 3-to-9-year-old Chinese children. The ...database computes from 2.7 million character tokens and 1.8 million word tokens. It contains 3920 unique character and 22,229 word types. CCLOOW reports frequency and contextual diversity metrics of the characters and words, as well as length and syntactic categories of the words. CCLOOW frequency and contextual diversity measures correlated well with other Chinese lexical databases, particularly well with that computed from children’s books. The predictive validity of CCLOOW measures were confirmed with Grade 2 children’s naming and lexical decision experiments. Further, we found that CCLOOW frequencies could explain a considerable proportion in adults’ written word recognition, indicating that early language experience might have lasting impacts on the mature lexicon. CCLOOW provides validated frequency and contextual diversity estimates that complements current children’s lexical database based on written language samples. It is freely accessible online at
https://www.learn2read.cn/ccloow
.