Keyword extraction involves the application of Natural Language Processing (NLP) algorithms or models developed in the realm of text mining. Keyword extraction is a common technique used to explore ...linguistic patterns in the corpus linguistic field, and Dunning’s Log-Likelihood Test (LLT) has long been integrated into corpus software as a statistic-based NLP model. While prior research has confirmed the widespread applicability of keyword extraction in corpus-based research, LLT has certain limitations that may impact the accuracy of keyword extraction in such research. This paper summarized the limitations of LLT, which include benchmark corpus interference, elimination of grammatical and generic words, consideration of sub-corpus relevance, flexibility in feature selection, and adaptability to different research goals. To address these limitations, this paper proposed an extended Term Frequency-Inverse Document Frequency (TF-IDF) method. To verify the applicability of the proposed method, 20 highly cited research articles on climate change from the Web of Science (WOS) database were used as the target corpus, and a comparison was conducted with the traditional method. The experimental results indicated that the proposed method could effectively overcome the limitations of the traditional method and demonstrated the feasibility and practicality of incorporating the TF-IDF algorithm into relevant corpus-based research.
•Semantic domains in which gesture commonly co-occur with speech were revealed.•Gesture-speech relationship was not significantly affected by speakers’ L1.•Advanced speakers use significantly more ...gestures as reinforces and integrates.•Less proficient speakers tend to use gestures as complements to speech.
This study investigates face-to-face interaction among Taiwanese, Indonesian, and Indian speakers utilizing a multimodal corpus linguistics approach to examine semantic categories of speech that most frequently co-occur with gestures, and whether the gesture-speech relationship is to a certain extent influenced by language/culture backgrounds or English proficiency levels of a speaker. The analysis of the semantic categories of the co-gesture speech demonstrates that speech most commonly co-occurs with gestures in the categories of moving, coming and going, general objects, numbers, location and direction, and time. The findings demonstrate similar preferences of gesture-speech production by speakers despite different cultural and linguistic backgrounds. The gesture-speech relationship was shown to fall into six discrete categories: reinforcing, integrating, supplementary, complementary, contradictory, and others. While results show that the gesture-speech relationship is not significantly influenced by different language backgrounds of a speaker, speakers at a high proficiency level tended to use significantly more gestures that serve reinforcing and integrating functions, whereas less proficient speakers produced more gestures as complements and other gestures that have no obvious relationship to the conceptual content of their accompanying speech.
In this paper we document the developmental trajectory of the complementizer system (CP-system) in Italian by looking at the earliest spontaneous production of eleven young children, whose ...transcriptions are available on CHILDES. We conducted a novel corpus analysis, tracking down a number of constructions in which the clausal left-periphery is activated. First, we considered the appearance of the different complementizer particles in the CP-system, which overtly realize the three distinct functional projections ForceP, IntP, and FinP. The analysis revealed that children acquiring Italian correctly use these complementizer particles already in the third year of life. Second, we looked for the simultaneous activation of different functional projections within the CP-system. We went through our corpus searching for complex sentences in which more than one constituent was moved to the left periphery. This option is allowed by the adult grammar of Italian and, as our search revealed, it is also attested in the grammar of young children. Soon after their second birthday, sequences in which a left-dislocated Topic and a Wh- element co-occur are attested, directly supporting the existence of a (high) Topic position above FocusP. Moreover, movement in general conforms to the constraints of the adult grammar, with no attested violation of obligatory inversion (a consequence of the Q-Criterion). Importantly, "
-questions" did not require inversion, much as in the adult grammar of Italian. Taken together, children's use of complementizer particles and their activation of multiple landing sites for movement show that 2-year-olds already possess a richly articulated functional structure of the CP-system, aligned to the layered adult structure. In concluding the paper, we also discuss some temporal differences between constructions activating high and low portions of the CP-system. In particular, we detect a temporal precedence for wh-questions over why-questions. Since the former activate a lower projection, this is consistent with the recently proposed
hypothesis, according to which the development of the CP-system proceeds stepwise.
In this paper, we present corpus data that questions the concept of native speaker homogeneity as it is presumed in many studies using native speakers (L1) as a control group for learner data (L2), ...especially in corpus contexts. Usage-based research on second and foreign language acquisition often investigates quantitative differences between learners, and usually a group of native speakers serves as a control group, but often without elaborating on differences within this group to the same extent. We examine inter-personal differences using data from two well-controlled German native speaker corpora collected as control groups in the context of second and foreign language research. Our results suggest that certain linguistic aspects vary to an extent in the native speaker data that undermines general statements about quantitative expectations in L1. However, we also find differences between phenomena: while morphological and syntactic sub-classes of verbs and nouns show great variability in their distribution in native speaker writing, other, coarser categories, like parts of speech, or types of syntactic dependencies, behave more predictably and homogeneously. Our results highlight the necessity of accounting for inter-individual variance in native speakers where L1 is used as a target ideal for L2. They also raise theoretical questions concerning a) explanations for the divergence between phenomena, b) the role of frequency distributions of morphosyntactic phenomena in usage-based linguistic frameworks, and c) the notion of the individual adult native speaker as a general representative of the target language in language acquisition studies or language in general.
This paper presents word clusters used to comment on results in the Discussion section of quantitative research articles in the field of applied linguistics. The corpus linguistic approach was ...adopted to identify clusters in 124 Discussion texts from leading applied linguistics journals. The identified clusters were then comprehensively analysed in context for their discourse functions. Next, the present study mapped the clusters onto an analytical framework termed the ‘four-Step model’, based on Yang and Allison's (2003) genre-based description of the Commenting on results Move. The study provided a detailed corpus linguistic account of how the clusters were used in specific Steps described in the model. A detailed description of the linguistic features, the internal structure (Move/Step cycles and embedding) and communicative functions of specific Steps in the Commenting on results Move were also presented based on the concordance analysis of the clusters. The findings further suggest that the use of specific clusters strongly manifests, and is conditioned by, the research article genre. The study has pedagogical implications for academic writing courses for students, especially for those from non-English language backgrounds.
•Word clusters in Comment on results Move in discussions examined.•Keywords established, and associated clusters then identified.•Clusters analysed for discourse functions and mapped onto Move framework.•Concordance analyses of clusters showed linguistic features, cycles/embedding.•Points to relationship between clusters and genre.
Government and market are the two main factors that drive the practices of the Chinese media system and influence the news construction process. A dramatic, socially disruptive event like the 2014 ...Kunming terrorist attack has the potential both to damage the government image and to attract readers. Analyzing how different types of media, more specifically the state-sponsored and the market-oriented press, construct a terrorist attack may therefore reveal essential characteristics of the Chinese media system and its relationship with both government and market. In doing so, the present study makes a contribution in terms of methodology, resources, and empirical description. From a methodological perspective, drawing on a dataset of 275 news articles about the Kunming attack that was collected from 16 mainstream Chinese newspapers, we explore the possibilities of combining computer-assisted techniques (i.e. part-of-speech tagging, sentiment analysis, collocation, and concordance) and Discursive News Values Analysis (DNVA), based on which we identified 699 Chinese lexical indicators distributed across ten news values. The open-source wordlist produced by this procedure will facilitate future quantitative DNVA, but also fills a resource gap in non-English news values studies. After calculating the mean normalized frequency of indicators under each news value on a more empirical level, we found that the state-sponsored and the market-oriented press converge in foregrounding the news values of Eliteness and Personalization, in line with public expectations, while at the same time diverging in their use of the news values of Positivity, Negativity, and Superlativeness, which we can relate to the different aims and responsibilities of these two types of newspapers.
Metadiscourse refers to the linguistic element that is used to communicate meanings with imagined readers and to express a viewpoint as members of a particular academic community. Accordingly, this ...study reported the distributions of interactive and interactional metadiscourse markers in a corpus of 99 research articles representing the English language, Computer Sciences, and Education disciplines. To observe the writers’ metadiscourse devices usage in their discourse community, Hyland’s (Metadiscourse: exploring interaction in writing. Continuum, New York, 2005) metadiscourse taxonomy was employed. The data were computed through descriptive statistics, Chi square, Kruskal–Wallis test, and content analysis. Hence, the data revealed that though articles in all disciplines employed both interactive and interactional metadiscourse markers, English Language discipline articles contained highest metadiscourse devices compared with Education and Computer sciences discipline articles. It was also observed that the book review writers used much more interactive markers such as transition and evidential devices than interactional markers. However, among interactional markers, self-mention markers were extensively used. The data also indicated that there was statistically a significant difference across disciplines in using interactive and interactional metadiscourse devices. Hence, these findings implied that academic writing teachers should focus on discipline-oriented metadiscourse devices while teaching academic writing skills.
This article investigates the use and non-use of objects with six transitive verbs in a corpus of English football match reports. The verbs were selected on the basis of their frequency as well as ...their lexico-grammatical features of "footballness" and transitivity. The study suggests that object omission may not be as pervasive as hinted at in previous studies (e.g. Bergh and Ohlander 2016; Ruppenhofer and Michaelis 2010). Regarding potential reasons for object omission, it is uncovered that the football verbs-net, save, play-are more prone to object omission than the general verbs: feed, create, take. This is attributed to the strong attraction of the former to recurrent collocates such as goal and ball. This suggests that verbs used to report on unremarkable and canonical situations (to the game of football) more readily omit the object, albeit not on a general basis, as individual differences between the verbs also emerge.