The study reported in the paper starts with a hypothesis that errors observable in writing performances can account for much of the variability of the ratings awarded to them. The assertion is that ...this may be the case even when prescribed rating criteria explicitly direct rater focus towards successfully performed aspects of a writing performance rather than towards errors. The hypothesis is tested on a sample of texts rated independently of the study, using a five-point analytic rating scale involving ‘Can do’-like descriptors. The correlation between errors and ratings is ascertained using ordinal logistic regression, with Pseudo R2 of 0.51 discerned overall. Thus, with roughly 50% of score variability explainable by error occurrences, the stated hypothesis is considered confirmed. The study goes on to discuss the consequences of the findings and their potential employ in assessment of writing beyond the local assessment context.
•writing assessment criteria focus mainly on the positive, ‘Can do’ performance.•nevertheless, errors, as ‘Cannot do’ properties, still exert effects on raters.•significant effects of errors are evident even in contexts discouraging that.•the study shows errors explaining 50% of score variability in one such setting.•the findings contribute to a better understanding of rater cognition.
Marking errors in L2 learner performance, though useful in both a didactic and academic sense, is a challenging process, one usually performed manually when involving learner corpora. This is because ...errors are largely latent phenomena whose manual identification and description involve a significant degree of judgment on the side of human annotators. The purpose of the paper is to discuss and demonstrate the implications of the two stages of the decision-making process that is manual error coding, error location and error description, for measuring inter-annotator agreement as a marker of quality of annotation. The crux of the study is in the proposal that inter-annotator agreement on error location and on error description should be considered and reported separately rather than, as is common, together as a single measurement. The case study, grounded in a high-stakes exam context and typified using an established error taxonomy, demonstrates the method behind the proposal and showcases its usefulness in real-world settings.
Various linguistic and extra-linguistic criteria have been put forward as useful for evaluating the semantic prototypicality of polysemous lexemes. It has been suggested that most of these criteria ...are linked to frequency of occurrence of senses. However, the provision of additional quantitative linguistic data can shed more light on this complex lexical issue. The present paper embarks on a corpus-based investigation of whether a three-factor measurement of prototypicality based on a) frequency of occurrence of a sense (as the central feature), b) contextual saliency, and c) inter-category similarity produces significant results, particularly when applied to a highly polysemous lexeme – in this case the verb look. Besides investigating the quantitative linguistic background of the central (prototypical) member of a semantic category, the paper briefly scrutinizes whether the same combination of quantitative data can be applied to gauging the level of prototypicality of senses other than the prototype. The findings strongly support the application of the proposed three-factor methodology and point to the need for further work on the identification of suitable criteria for evaluating attested levels of category membership of multiple senses of a lexeme.
•A three-factor measurement of semantic prototypicality.•Measures involve: frequency, contextual saliency and inter-category similarity.•Exemplified by a highly polysemous lexeme – the verb look.
There is perhaps no better way to gain an insight into a society than to observe the language it uses. And there is no better way to analyze such a discourse than to use large representative language ...corpora available today. Following this claim, I wish to propose a hypothesis that the extent of discrimination in one society (a very important topic at this point in time of global migrations) can be linked to the extent to which discriminatory expressions are used in everyday communication in that given society. To demonstrate the empirical attestability of this hypothesis, I will offer an analysis of the extent of discriminatory expressions within four South-European languages (and by hypothesized correlation also societies), namely in German, Romanian, Serbian, and Slovenian.
Task difficulty is an important but complex phenomenon in Applied Linguistics, for which there is relatively little empirical research. This article discusses approaches to defining task difficulty ...and focuses on objective task difficulty derived from ratings of performances and on difficulty derived from an error count in the performances, thus taking errors as indicators of writing task difficulty. Errors are described in terms of the Scope–Substance error taxonomy in writing performances from the Slovene General Matura examination in English. The most frequent errors are located at word and phrase level. Generally, error frequency decreases as writing proficiency increases, but some error types do not conform to this trend. This is the case for punctuation errors, which gain prominence at higher levels of mastery. The results of this study are relevant for assessment, particularly for rating scale development or revision, and for rater training. They are equally relevant for teaching, since knowing sources of difficulty in tasks is a prerequisite for effective pedagogical action. More generally, if applied to performances based on a wider range of tasks, viewing errors as indicators of difficulty can lead to a better understanding of difficulty‐generating task features.
Povzetek
Težavnost nalog je pomemben, a zapleten pojav v uporabnem jezikoslovju, o katerem je razmeroma malo empiričnih raziskav. Prispevek obravnava pristope k opredeljevanju zahtevnosti nalog pri preverjanju pisnega sporočanja. Osredinjen je na njihovo objektivno zahtevnost, ki izhaja iz točkovnih ocen pisnih izdelkov in iz števila napak v izdelkih, kar pomeni, da napake obravnava kot kazalec zahtevnosti nalog. Napake opredeli po taksonomiji obseg‐vsebina in jo uporabi za ocenjevanje pisnih izdelkov na izpitu iz angleščine na slovenski splošni maturi. Najpogostejše napake se pojavljajo na ravni besed in besednih zvez; večinoma se njihova pogostnost zmanjša z naraščanjem pisne zmožnosti, vendar nekatere vrste napak ne sledijo temu trendu. Tak je primer napak v rabi ločil, ki so pogostejše na višjih ravneh znanja. Rezultati študije so pomembni za ocenjevanje, predvsem za razvoj in prenovo ocenjevalnih lestvic, pa tudi za usposabljanje ocenjevalcev. Dognanja so pomembna tudi za poučevanje, saj je razumevanje razlogov za zahtevnost predpogoj za učinkovito pedagoško delo. Širše gledano lahko napake uporabimo kot kazalec zahtevnosti tudi pri drugih vrstah nalog in s tem izboljšamo prepoznavanje in razumevanje tistih lastnosti nalog, ki prispevajo k njihovi zahtevnosti.
There are few testing and assessment notions that have been so much written about as validity. Seen as the central psychometric issue, it has had a long history of theoretical and practical ...development and has stirred up quite a controversy within academic and non-academic ranks over time. The present paper traces this development within educational (and psychological) testing and presents the current cutting edge.
The second half of last century and the beginning of the current one were marked by several major health crises caused by the widespread and often deadly flu epidemics. The paper investigates the ...academic medical discourse used by the media as the major initial source of information about the epidemic. The focus of the study is on the conceptual metaphors used in medical academic publications to portrait the flu as they seem to be the most semantically accessible linguistic units used by the media to transfer information from scientific discourse to the general public. Theoretical and practical implications of this information transfer as well as limitations of the study are discussed.