Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves ...processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.
Text-driven sentiment analysis has been widely studied in the past decade, on both random and benchmark textual Twitter datasets. Few pertinent studies have also reported visual analysis of images to ...predict sentiment, but much of the work has analyzed a single modality data, that is either text or image or GIF video. More recently, as the images, memes and GIFs dominate the social feeds; typographic/infographic visual content has become a non-trivial element of social media. This multimodal text combines both text and image defining a novel visual language which needs to be analyzed as it has the potential to modify, confirm or grade the polarity of the sentiment. We propose a multimodal sentiment analysis model to determine the sentiment polarity and score for any incoming tweet, i.e., textual, image or info-graphic and typographic. Image sentiment scoring is done using SentiBank and SentiStrength scoring for Regions with convolution neural network (R-CNN). Text sentiment scoring is done using a novel context-aware hybrid (lexicon and machine learning) technique. Multimodal sentiment scoring is done by separating text from image using an optical character recognizer and then aggregating the independently processed image and text sentiment scores. High performance accuracy of 91.32% is observed for the random multimodal tweet dataset used to evaluate the proposed model. The research further demonstrates that combining both textual and image features outperforms separate models that rely exclusively on either images or text analysis.
Abstract Graphic novels are marketed as helpful for reluctant young readers. The supplementation of text with visual stimuli as part of a multimodal narrative is often claimed to improve reading ...comprehension and motivation in children and young adolescents. The translation into Arabic of Jeff Kinney’s Diary of a Wimpy Kid series, by comparison, fails to deliver an equally engaging reading experience. In the Arabic versions of Rodrick Rules and Hard Luck , language acts as an obstacle to comprehension due to the misrepresentation of textual paralanguage, broadly defined as the written representation of nonverbal aspects of communication including tone, stress and volume. As paralanguage is also involved in character portrayal, this translation approach paints a rather dull image of the series’ protagonist Greg. Using the textual paralanguage typology proposed by Luangrath et al. (J Consum Psychol 27:98–107, 2017), the case is made here for closer attention in translation to the pragmatic meanings contained in textual paralanguage. As the novel evolves to incorporate an ever-expanding array of multimodal elements, so should the translation strategies involved in rendering these texts into other languages.
The article deals with the linguistic and psychological concepts explaining the perception of multimodal texts that imply information about violence. The importance of the subject is verified by the ...task of countering the spread of ideologies of violence, extremism, and terrorism in Internet communication. The multimodal text is considered a popular means of propagating radical ideas relating to incitement to social discord. The absence of complex models explaining the transmission of violence through multimodal texts is observed. It is shown that linguistic and psychological concepts can be relevant resources for building such a model. The purpose of the article is to generalize concepts in the form of intra- and extralinguistic factors that determine the perception of information about violence in a multimodal text. The generalization of cognitive linguistic theories into the group of intralinguistic factors is complemented by the conclusion that the semantic structure of a multimodal text is formed by visual, sign, and verbal coding systems; the perception of information about violence is determined by the typicality of linguistic means and the presence of logical and semantic links between the elements of a multimodal text. The conclusion is made about the influence of aggressiveness, personal attitude toward violence, and the experience of using violence on the perception of multimodal information about violence by the reader. The article may be interesting to linguists, psychologists, and a wide range of researchers in the field of humanities and humanities-related social sciences.
U.S. UNIVERSITY WEBSITES AS SPECIFIC MULTIMODAL TEXTS Fedorenko, Svitlana V.; Sheremeta, Kateryna B.
Visnyk universytetu imeni Alʹfreda Nobeli͡a︡. Filolohichni nauky,
12/2023, Letnik:
2, Številka:
26/2
Journal Article
Recenzirano
Odprti dostop
The aim of the article was to study the specifics of the interfaces of the U.S. university websites as multimodal heterogeneous texts that synthesize elements of educational, scientific and ...advertising discourses. The overall objectives to achieve the established goal were as follows: to identify and distinguish the types of multimodal means on the U.S. university website, which contribute to its genre mixing and genre embedding; to establish the nature of the interaction of verbal, non-verbal and para-verbal components of the U.S. university websites, and to determine their pragmatic features. The methodological basis of the research was a complex of the following methods: analysis (to study multimodal components of the university website as a specific multimodal text), synthesis (to identify the features of the integration of multimodal means of the websites of American universities), observation (for the selection of fragments with verbal means that actualize the visual content and the selection of visual fragments to actualize the verbal content), the method of discourse analysis (to highlight specific fragments of websites that arouse the interest of the authors of this articleб and have a meaningful content), structural method (to analyze the university website as a whole structure, which is provided by separate means of cohesion), functional method (to clarify the pragmatic potential of multimodal elements of the university website, which are means of communication between the university and the reader of its website). It also employed the system functional (drawing on the provisions of linguistic metafunctions, and focusing on the categories of the grammar of visual design) and the socio-semiotic (grounding on the interrelationship of modes, their compatibility and social needs for which they serve, making meanings) approaches. The chosen methodology made it possible to conduct a study of the multimodality of the websites of the U.S. universities, realized as a symbiosis of verbal, non-verbal and paraverbal resources. The multimedia corpus of the research consists of the websites of five American universities (Massachusetts Institute of Technology, Harvard University, University of Pennsylvania, Yale University and Рrinceton University). The main conclusion that can be drawn is that the complex discursive nature of the websites under study is determined by the features inherent in advertising (the benefits of services to influence the choice of the recipient), educational (the talk about the educational process and educational services) and scientific (information of a scientific nature is provided) discourses. All universities under study employ semiotic landscapes at their disposal to portray attractive brands on their websites. Being the most important way to ensure fast and effective communication of educational institutions with their target audience, the discourse of university websites has a pronounced pragmatic orientation. The purpose of the analyzed type of heterogeneous discourse is to create an image of an “ideal” educational institution, attract potential students, researchers, sponsors, and disseminate the latest achievements in the field of science and education. The concept of multimodality of the websites of the analyzed U.S. universities as specific multimodal texts is manifested in visual content through a number of paragraphemic and infographic elements, the synthesis of which is due to the combination of language tools, visual content and web technologies of modern website construction. The most common visual content exploited on the U.S. university websites embraces: unique photographs and “color” mode (photos of the university and its students, classrooms, laboratories, events, etc.), which helps to clearly illustrate the educational services offered, and give the desired emotional mood; infographics and data visualization, which is an effective way to combine text, pictures and design to present complex information (infographics do not always completely replace the text, more often it is its addition or retelling); video interviews with students, graduates, videos about studying at a university are one of the means to convince potential students to make an admission decision. Using video is a fairly popular form of visual content. With the help of video, the universities can not only diversify the content of their websites, but also satisfy the needs of those users who prefer visual content. Placing various videos on website pages allows solving the problems of reinforcing textual content, strengthening the arguments “for” admission and attracting applicants to university educational programs. In such a way, on the basis of the interaction of different discourses (advertising, educational and scientific) and various semiotic systems, a single visual-structural and functionally complete image of an attractive and popular university is achieved among readers of its website.
The given article raises issues closely related to interrelationships, interdependence and pragmatics of colors in multimodal texts. Magazine covers were selected for analysis due to the lack of a ...comprehensive study of the cover as a multimodal text in both domestic and global multimodality. The purpose of the article is to clarify and describe the pragmatic potential of color as a key component of the visual-graphic text for expressing the information encoded in it. To achieve the goal, it was necessary to find out the essence, specificity and functions of the cover as a multimodal text consisting of heterogeneous components; to identify the potential of color for symbolization of information and the process of decoding meanings presented on the cover with the usage of color; to analyze the peculiarity of the embodiment of the main realities within Russian-Ukrainian war on the cover by means of color and their connection with pragmatic tasks. The current study was conducted on the material of the world publications covers, including “Time”, “The Economist”, “Society”, “Elle”, “Vanity Fair”, “Womankind”, “The Guardian Weekly”, “The New Yorker”, “The Washington Examiner”, “Tygodnik Powszechny”, etc., which illustrate the urgent problems of modern society and realities of the Russian-Ukrainian war as well. The study confirmed that the main function of the сover is the reader’s attraction and increasing numbers of possible and potential sales. In addition to attracting the attention of the potential reader, the cover represents key information depicted on it by means of heterogeneous elements, that is, various semiotic codes. A magazine cover can be interpreted as a multimodal text, particularly complex of verbal and visual components, which have a strong pragmatic potential. The main visual characteristics of the cover include color, layout, prominence, framing, and photographs. Constitutive elements interact with each other and form a single semantic and informational coherence and cohesion. Color plays an important role in coding and presenting information on a magazine cover. Based on the dominance of a certain color range, the surveyed covers were ranked into several groups: 1) сovers that are done in colors of the Ukrainian flag; 2) сovers made in a crimson-grey color paradigm; 3) сovers made in a red color paradigm; 4) сovers with a black background. It was found that each of the distinguished groups contains coded information on events that illustrate the course of the war on the territory of Ukraine and the reaction of the world community to it. By decoding the visually and verbally presented meanings, the key ones were revealed, including grief, death, sadness, blood, despair, confusion. Instead, light colors and yellow-blue splashes symbolize hope for peace. By means of different semiotic codes, on the one hand, the support of the Ukrainian people is represented, and on the other hand, the condemnation of Russian aggression, the inhumane behavior of the Russian occupiers on the temporarily occupied territories and the total conviction of the current Russia leader are depicted. According to the results of the conducted research, despite the peculiarity of the text space and the form of verbal part presented in covers, it is the context and its integrity that is the text’s key aspect. This is quite explainable by the fact that the meaning of the fragment is clear and acceptable to the reader only in the case of inseparable perception of both verbal and non-verbal means in a multimodal text, in particular in covers. Color as an essential element of the visual component on covers makes it possible to perceive and interpret the key idea embedded in the analyzed multimodal samples. Verbal framing can be considered additional and such that it enhances the overall pragmatic effect.
This research aims to describe students' perceptions regarding the integration of linguistic-visual multimodality texts in the design of developing Indonesian morphology textbooks. Data were obtained ...by applying the nonface-to-face interview method through a list of questions to explore opinions and expectations about the need to integrate linguistic-visual multimodality texts in the design of Indonesian morphology textbooks. Data analysis was carried out by applying the equivalent method which includes identification, classification, and interpretation steps. The findings of this research illustrate that students responded positively to the integration of linguistic-visual aspects in the design of developing Indonesian morphology textbooks. The integration of elements such as images, tables, graphs, photos, text, colors, and infographics in textbook design was assessed very positively by respondents.
Multimodal text summarization is a complex and challenging task in the field of natural language processing. Its objective is to use a combination of features from various modalities to create a ...concise yet informative summary from a given set of input data. In our research, we conducted a thorough survey of various techniques and methods used for multimodal text summarization and analyzed their implications on both research and industry. Moreover, we have developed a straightforward yet efficient model to address the challenges associated with this task. Our model has achieved state-of-the-art performance on the MMSS dataset. Additionally, we have proposed a semantic-based evaluation technique to measure the quality of the generated summaries. The effectiveness of our proposed technique has been substantiated through empirical evidence and appropriate analysis and discussion. Our goal is to make the code and models we have developed for our system publicly accessible.
В статье рассматривается вопрос о природе манипулятивности применительно к мультимодальному медиатексту на примере англоязычной политической карикатуры. Выдвигается гипотеза, что всякий ...мультимодальный медиатекст обладает персуазивностью, но становится манипулятивен, только если в нем применена некорректная аргументация. Материалом для анализа послужили американские политические карикатуры разных временных периодов. С помощью мультимодального анализа и последующего анализа аргументации гипотеза верифицируется и позволяет выявить, что в политической карикатуре (в диахронической перспективе) могут присутствовать ошибки всех трех вершин риторического треугольника Аристотеля: логоса, этоса и пафоса.