Visual understanding, such as image caption generation, has received extensive attention. Describing images with textual information is one way to help people achieve barrier-free visibility. This ...study focuses on the text-based image captioning (TextCaps) task. The TextCaps task is more complex than the traditional image captioning task because it depends on optical character recognition (OCR) and the textual information that appears in the image. It also requires consideration of the relationship between recognized objects and OCR's linguistic part in the image. In this study, we propose maximizing the use of multiple modalities in an image to improve TextCaps performance. We enrich the image and OCR linguistic features using pre-trained Contrastive Language-Image Pre-training (CLIP) models. We then introduce using two additional attention models in a transformer architecture to strengthen the representation of the image modality. The experimental results demonstrate that our proposed method, which introduces a multimodal transformer with four image-related modalities, outperforms existing methods for the TextCaps dataset.
The purpose of this study is to see the role of using multimodal texts based on Indonesian local cultural content to improve the literacy skills of BIPA (Indonesian for Foreign Speakers) learners. ...The focus of literacy skills in this study is the learner's ability to communicate using the communication culture of Indonesian society as a result of understanding multimodal texts. The method used in this study is a single subject experimental research method (single subject method) with research design A-B-A. The single-subject experimental research method was taken because limited participants could not be divided into control and experimental classes (Cresswell, 2013; Fraenkel, Wallen, Hyun, 2015). The instrument used was multimodal text with the theme of communication culture, observation sheets, and communication skills assessment sheets in applying the understanding of communication culture from multimodal texts. The data collection method is done through authentic assessment and direct observation. Participants in this study were four BIPA learners from various professions, namely students, employees, entrepreneurs, and housewives. Data analysis was carried out by looking at the literacy competence of the subject in each condition. The research data is in the form of literacy skills, namely multimodal text interpretation to communicate. The results showed that they could develop their literacy skills including how to communicate, behaviours and communication gestures shown by foreign students. In single subject research, the data used to determine the conclusion of multimodal texts that have a significant or not role in literacy skills is overlap data.
K-Pop has become a global phenomenon that affects Indonesian pop culture. K-Pop contributes various inspirations for Indonesian pop culture products, including television shows. In January 2021, ...Indonesian film television (FTV) entitled Bagaimana Menyadarkan Istriku yang Terlalu Terobsesi K-Pop (How to Remind My K-Pop-Obsessed Wife) went viral on social media. The FTV received criticism for its depiction of K-Pop fans which was considered excessive and incorrect. The study examines how Indonesian-made FTV represents K-Pop fans in the FTV. The research was conducted with multimodal text analysis. The scene that includes K-Pop fan activity was selected based on the five levels of fan activities by Henry Jenkins. Those scenes indicated four fan activities: modes of reception, critical and interpretive practice, consumer activism, and alternative social community. The scenes were then analyzed by seeing the relation between visual (image) and audio (dialog) modes using the relation between modes by Bogucki. The analysis showed some inaccuracies in K-Pop fan activities, such as the use of K-Pop terms, pronunciation, merchandise, fashion, and hairstyle, were found. The FTV storyline focused on affairs and marriage issues. K-Pop was merely used to attract audience interest and to reach more viewers. K-Pop telah menjadi fenomena global yang mempengaruhi budaya pop Indonesia. K-Pop menyumbangkan berbagai inspirasi untuk produk budaya pop Indonesia, termasuk acara televisi. Pada Januari 2021, film televisi Indonesia (FTV) berjudul Bagaimana Menyadarkan Istriku yang Terlalu Terobsesi K-Pop menjadi viral di media sosial. FTV tersebut menuai kritik karena dinilai berlebihan dan tidak tepat dalam menggambarkan sosok penggemar K-Pop. Penelitian ini bertujuan untuk mengkaji bagaimana FTV buatan Indonesia merepresentasikan penggemar K-Pop di FTV tersebut. Penelitian dilakukan dengan analisis teks multimodal. Adegan yang mencakup aktivitas penggemar K-Pop dipilih berdasarkan lima tingkat aktivitas penggemar oleh Henry Jenkins. Adegan-adegan tersebut menunjukkan empat aktivitas penggemar: mode penerimaan, praktik kritis dan interpretatif, aktivisme konsumen, dan komunitas sosial alternatif. Adegan kemudian dianalisis dengan melihat hubungan antara mode visual (gambar) dan audio (dialog) menggunakan hubungan antar mode oleh Bogucki. Hasil analisis menemukan beberapa ketidakakuratan dalam aktivitas penggemar K-Pop, seperti penggunaan istilah K-Pop, pengucapan, merchandise, fesyen, dan gaya rambut. Jalan cerita FTV ini berfokus pada perselingkuhan dan masalah pernikahan. Isu K-Pop hanya digunakan untuk menarik minat penonton dan menjangkau lebih banyak penonton.
Recently, I came across the statement that „adaptation is a procedure for the translation of the film” (Post). Adaptation reaches not only the increasing circles of the cinema or art, but also our ...lives, and the very concept of adaptation undergoes constant expansion. The main goal of this article is to show that film adaptation is an example of a multimodal translation pattern. As a result, we receive a completely new text, which is a film, in contrast to its literary prototype. To show that a film adaptation is an example of a multimodal translation pattern I will discuss it from a linguistic point of view. The article will often refer to modality or multimodal text. I will also present a small part of the research conducted on the film from a linguistic point of view at the Philology University (Wyższa Szkoła Filologiczna) in Wrocław.
The graphic novel is considered as a multimodal text — a complex of verbal and visual components. The differences between comics and graphic novels are explained. The definition of the concept of ...“multimodality” is given, and the main approaches to the study of a multimodal text are described. Attention is paid to the issue of identity in a multicultural aspect. On the example of a specific autobiographical graphic novel, the discursive construction of identity by visual and linguistic means is analyzed. The expediency of using critical discourse analysis to understand verbal and non-verbal connections, visual images and communications, as well as text and context is substantiated. To study the linguistic modality of the graphic novel, the methods of linguo-stylistic, lexico-semantic and contextual analysis of the literary text were used, while the iconic components were considered using the methods of observation, interpretation and comparison with the text. The sociocultural dominants of food and appearance were revealed in the novel, which contributed to the convergence of stylistic and iconic means of expressing meaning. Examples of combining linguistic, metalinguistic and visual aspects of expressing aspects of identity in the space of the American graphic novel as a multimodal text are given. The novelty of the study is seen in the demonstration of identity markers in a multimodal text.
An attempt is made to analyze the place of political cartoons in the current socio-political media discourse in the United States. The material was the cartoons published in the spring of 2020 from ...USA Today and Philadelphia Inquirer, the informational occasion for the creation of which was the Covid-19 pandemic. The definitions of political cartoons as a multimodal text with a complex coding system is considered in the article. It is noted that in this type of text, phenomenological cognitive structures are actualized both through linguistic projection and through visual-spatial images. Attention is paid to intertextuality as the basis of political cartoon: the authors proceed from the position that the decoding of meaning by the recipient depends on whether he and the author have common background knowledge. It is shown that the Covid-19 pandemic is thematically embedded in the broader socio-political agenda, whereby a successful interpretation requires the recipient to have background knowledge of the current socio-political challenges facing the United States, namely the domestic political agenda. It is stated that the studied cartoons are distinguished by their reliance on precedent, and the actualization of background knowledge occurs through a combination of the visual and verbal components of the text. It is concluded that among the linguistic means of creating a satirical effect, a play on words is distinguished based on the literal and figurative meaning of individual lexical units.
This qualitative study examined the interplay between teacher facilitation, children’s uptake of vocabulary and reasoning strategies, and the roles children assumed as learners as they experienced ...instruction grounded in Connected Teaching and Learning (CTL an interdisciplinary instructional framework that leverages key practices from culturally responsive pedagogies and meaningful use of multimodal text sets. Analyses suggest (1) students assumed more active roles in their learning as they “enacted” the work of scientists and (2) varied teacher facilitation practices and children’s vocabulary and reasoning uptake were key factors in children’s shift to more active roles. Although findings suggest CTL is a promising instructional framework, findings also underscore the significance of how teachers act on the instructional framework.
Aim. The aim is to describe the interaction between verbal and non-verbal units generating meaning in the dynamic scope of the screen heterogeneous text. Methodology. The main bulk of the work is the ...analysis of the mass culture texts – film texts and teletexts. The research is based on feature film “Midnight in Paris” and TV show “Morning of the Friday” (season 6, episode 38) at the level of several shots. Chosen fragments are divided into structural units and their meanings as separate components of the text and then the ways of generating new meanings by the means of their constellation with each other in different fragments of the narration are analysed. To structure and regulate the process of the study the following methods are used: philosophical general-logical; general scientific and empirical; disciplinary methods.Results. The author comes to the conclusion that in space-time continuum of the text the meaning of verbal and non-verbal units depends not only on their position within one or two shots but also on juxtaposition in syntagmatic of screen speech – in the chain of shots, which is named “constellation” in this work. Besides, it is concluded that constellation of verbal and non-verbal units creates the mimetic layer of the text – the world of the imaginary universum, the field of action of audio-visual narration, which is perceived by the viewer directly. Research implications. In the conclusion it is stated that the perspectives of the further study of the constellation of the polycode-multimodal text verbal and non-verbal units will allow linguists to gain deeper insight in the mechanisms of the mass culture screen texts manipulation with the viewer consciousness.