The present paper aims to describe the process of creating CHARG—Corpus de Habla Radiofónica de Guayaquil (the Guayaquil Radiophonic Speech Corpus). It is the first systematized spoken corpus for ...this rather under-researched variety of Spanish. Guayaquil is the most populated city of Ecuador, while its capital city is Quito. Therefore, Ecuador is a rare case of a Spanish-speaking country with two major urban centers that belong to two separate dialectal zones, offering a very peculiar sociolinguistic context. CHARG is a corpus composed of Guayaquil radio programs. Its structure is organized by non-linguistic criteria (program type) in order to ensure a representative and balanced sample. The paper describes the design of the corpus (defining the study population, sample and stratification) and its construction (recording procedure, speakers and speech style coding, transcription and annotation). As a result, CHARG consists of 24 h of transcriptions and annotations of recordings from 142 speakers. The paper’s potential use is twofold: since it presents a step-by-step procedure of corpus construction that can be replicated, the readers might be interested in both the procedure and the corpus itself as a research material.
Although compiling a spoken learner corpus is not a recent enterprise, the number of developmental learner spoken corpora in the field of corpus linguistics is not satisfactory. This report describes ...the compilation of the Yeditepe Spoken Corpus of Learner English (YESCOLE), a 119,787-word corpus of Turkish students’ spoken English at tertiary level. YESCOLE was compiled to generate a developmental corpus of spoken interlanguage by collecting samples from learners of different English proficiency levels at regular short intervals over seven months. In order to shed light on the laborious methodology of compiling the developmental spoken learner corpus, this paper elucidates the steps taken to build YESCOLE and discusses its potential benefits for research and instructional purposes.
The article describes the principles of creating a corpus of teachers’ speech, which enables to apply an ethnographic approach to study teaching practices. Through the analysis of a large dataset of ...real classroom recordings, this corpus aims to identify linguistic, psychological, and sociological factors contributing to the improvement of teaching effectiveness. The corpus includes audio recordings of lessons in 5–8 grades from several schools in Russia. Annotation of the corpus is conducted using the Praat program. To determine the linguistic parameters that can influence teachers’ effectiveness and should be annotated in the corpus, we conducted a survey aimed to find out how students describe an ideal and a poor teacher. Based on the survey results, along with an analysis of existing spoken corpora and papers in linguistics and education, we have developed an annotation system comprising 19 levels. Some of these levels overlap with those found in any spoken corpus (orthographic transcription of words, lemmas, parts of speech, morphological annotation). The following levels are specific to our corpus: the parts of the lesson (organizational stage, introduction of new material, etc.), the level at which fragments of reading are separated from the rest of the teacher’s speech, four levels for marking pauses, phonetic transcription level, volume annotation, two levels for error annotation (phonetic and grammatical separately), and four levels related to vocabulary (words with special derivational features, emotionally-evaluative vocabulary, word usage domains, discourse markers). The corpus will allow to provide recommendations for improving teachers’ speech behavior.
Empirically capturing sociocultural interpretations—situated interpretations of linguistic expressions shared among members of a group—can be difficult for two reasons: First, the interpretations ...themselves cannot be directly observed and, second, the contexts that enable these interpretations cannot be defined independently of them. Yet, the reality of such interpretations attested in piece after piece of empirical research calls for an explanation. This article outlines a bottom-up methodology that seeks to extract context-sensitive definitions of, on one hand, sociocultural interpretations and, on the other hand, the context variables that covary with them, from the data itself. Uptake-based definitions of sociocultural interpretations are empirically verifiable and include speaker, context, and addressee contributions to the bringing about of a certain sociocultural interpretation. Dynamic definitions of macro-social variables (gender, age, class, ethnicity, region, etc.) can emerge by gradually abstracting over the minimal contexts that are found to enable particular sociocultural interpretations. The article outlines with examples how this methodology can be applied to spoken conversational data, as well as some of its limitations.
This article examines the use of direct reported speech in business meetings that is framed by the speaker as hypothetical. While the past two decades have seen many empirical studies on direct ...reported speech (DRS) in spoken interactions, fewer have focused specifically on hypothetical reported speech (HRS). This study identifies and examines the discourse patterns and sequences used to perform HRS in a 1-million-word corpus of business interactions, and explores the reasons why HRS is used. As such, it is the first study to locate and examine this discourse phenomenon across a spoken business corpus. Through the application of an original methodology, HRS was found to occur as part of specific sequential patterns, and was used largely as a persuasive device, fulfilling a range of related rhetorical functions. Like DRS, HRS can project either a sense of involvement or detachment, but unlike DRS, also allows speakers to generalise; detachment and generalisability being particularly relevant to a business context. The research provides a theoretical contribution on the use of HRS, indicating that HRS is used strategically in professional contexts, often by senior employees, not only to persuade others but also to bring about change in action relevant to the professional practice of the organisation.
•Hypothetical reported speech (HRS) occurs frequently in business meetings.•HRS may be introduced by a range of phraseological patterns.•HRS occurs as part of a sequential pattern (frame shift – HRS – evaluative summary).•HRS was used to perform a range of persuasive functions through ‘involvement’ or ‘detachment’.•In business meetings, HRS is used to generalize and bring about change in action.
El crecimiento y la consolidación de la demanda de servicios de interpretación telefónica ha traído consigo un mayor estudio de los mismos en el entorno académico. El objetivo de este trabajo ha sido ...crear un corpus oral de interacciones telefónicas mediadas por intérpretes, orientado en concreto al estudio de los ataques contra la imagen. Estas interacciones, que siempre incluyen el español, se realizan también en alemán, chino, francés, inglés o ruso. Primeramente se describen brevemente los hitos alcanzados en este sentido en trabajos anteriores: la recopilación de las grabaciones anonimizadas de las conversaciones, su procesamiento inicial, su transcripción y su traducción. En segundo lugar, se detalla el proceso de conversión de las transcripciones al formato EXMARaLDA y su posterior sincronización con las grabaciones. Para terminar, se discuten las limitaciones y dificultades encontradas en estos procesos de conversión y sincronización.
The United Arab Emirates (UAE) is characterized by extensive language contact. Although Arabic is the official language, practically all communication in general as well as in higher education, in ...particular, takes place in English. The current study reports from the larger project
Language, Attitudes, and Repertoires in the Emirates
(LARES, 2019–2021) and investigates the use of English as a lingua franca (ELF) among university students in Sharjah, one of the seven sovereign emirates of the UAE. A spoken corpus based on 58 semi-structured interviews is used to examine the use of the discourse marker
like
. It has been shown to be a ubiquitous feature of English no longer confined to American English and occurs frequently in the corpus. It doubtlessly is a prominent discourse marker in the type of English spoken among the heterogeneous group of multicultural university students considered here. Although a large individual variation with respect to normalized frequencies of
like
can be observed, none of the social variables (i.e., gender, citizenship, L1, year of birth, number of languages, college, self-assessed proficiency in English, and English usage score) included in the analysis account for this variability. Instead, I argue that
like
as a discourse marker is part of the English repertoire of all students and appears to be even more frequently used than in other English varieties. This supports previous research arguing for an intensification of language change in ELF contexts as well as high individual variation as a characteristic of multilingual ELF users.
Grammatical complexity of written and spoken language of L2 leaners has been extensively studied, but casual conversation of L2 learners remains rarely explored although it is considered one of the ...most basic forms of speech. This study explores whether proficiency level modulates grammatical complexity in casual conversation. We examined the conversations performed by 51 Korean EFL learners of two proficiency levels (HIGH and LOW) and 21 native speakers of American English (NS). The syntactic complexity was measured for global scale complexity (e.g., production length, use of subordination) and clause complexity for fine-grained scale complexity (e.g., components within a clause). As a result, in the global scale, HIGH demonstrated complex structures more often than LOW in general, and similarly with or more often than NS. HIGH employed subordination as often as NS do, but demonstrated more complex structures for production length and complex nominals. NS used more coordination than the non-native speakers. In the fine-grained scale, HIGH produced more dependents in a clause than LOW in general. When compared with NS, HIGH employed more dependents and subordination conjunctions or similar number of clausal complements and prepositions. In short, HIGH used grammatical structures close to written compositions rather than natural conversation. The results suggest that proficient learners can readily use complex structures as often as NS do, but their conversation is not as natural as that of NS. KCI Citation Count: 2
Pilots and air traffic controllers need to undergo a specific English test in order to be granted a license for international operations. A language proficiency scale was developed to serve as a ...parameter to all aviation regulatory agencies throughout the world by targeting the language produced specifically by air traffic controllers and pilots in radio communications when non-routine situations (such as technical problems, bird strike, changes in weather, health problems on board, etc.) occur (ICAO 2010). However, there is a lack of empirical investigation which could shed light upon this particular register helping the users of the scale with its understanding. In an attempt to fill this gap, this paper outlines a compilation of the Radiotelephony Plain English Corpus (RPTEC), a spoken corpus of aeronautical communication consisting of transcriptions of exchanges between pilots and air traffic controllers in non-routine situations for research and pedagogical purposes. By presenting steps taken during the process, we intend to provide fellow researchers with data which may suit other purposes and yield further analyses, as well as enlighten similar investigations in the field of English for Specific Purposes.
In contrast to well-studied prenominal relative clauses (RCs) in Chinese, little has been known about postnominal RCs that are non-canonical but existent in spoken Chinese. Focusing on Standard ...Mandarin, this paper examines in a large-scale spoken corpus the distributional patterns of postnominal RCs. Using distribution patterns of prenominal RCs in existing corpus studies as benchmarks, we show that postnominal RCs in our spoken corpus of Standard Mandarin tend to modify sentential objects more frequently than sentential subjects, and that they are likely to be short, with extremely rare presence of aspect markers. Based on these patterns, we propose that postnominal RCs in Standard Mandarin are mostly afterthoughts, motivated by information structure of spoken languages and word order principles. To better understand their general coverage, we further investigate postnominal RCs in Chinese dialects using available resources, including Yue, Min, Xiang, and Wu, followed by a raw comparison of cross-dialectal similarities and differences. We conclude that postnominal RCs in Chinese are similarly motivated, but their degrees of grammaticalization vary.