Akademska digitalna zbirka SLovenije - logo
University of Maribor Library (UKM)
Opening hours: Monday to Friday from 8.00 to 14.00, Wednesdays to 17.00 and Saturdays from 9.00 to 13.00. ČUK reading room opening hours: Monday to Saturday from 12.00 to 24.00, closed on Saturday. Information: 02 25 07 431, ukm@um.si
  • Analiza diskurza kot podpora sistemom strojnega simultanega prevajanja govora : [doktorska disertacija]
    Verdonik, Darinka
    The aim of this work is to research the telephone conversations in tourist domain with the concepts of discourse analysis that could be used in speech-to-speech translation in order to better handle ... spontaneous speech phenomena. Speech-to-speech translation systems have to manage three differenttasks: first a speech recognition of speech in input language is needed. The text gained through speech recognition usually includes errors and is not structured to clauses and sentences. The recognized text is then translated to output language in the process called speech centred translation. Translation of spontaneous spoken text is different than translation of written text because the spoken text includes disfluencies, repairs, false starts, hesitations, filled pauses, silences etc.; repetitions are much more often, implicitness of information is higher, prosody is lost when transforming speech to text... These and other similar phenomena of the spontaneous speech have be en noticed in the speech-to-speech translation as problematic: the C-STAR consortium (http://www.c-star.org/main/english/cstar2/) therefore suggests that simple combining of machine translation technics, developed for the translation of the written text, with speech recognition and speech synthesis into speech-to-speech translation systems cannot achieve satisfying quality, but special approaches to the speech centred translation are needed. Similar is concluded in the Verbmobil (http://verbmobil.dtki.de/verbmobilNM.English.Mai1.30 .1 0.96.html) and other projects where speech-to-speech translation systems were built. The last act of the speech-to-speech translation system is speech synthesis of the translated text into output language. The system has to be reciprocal. An overview of machine translation and speech-to-speech translation shows that different approaches to the problem have been developed, the most promising recently are statistical corpus technies. When using certain parts of traditional linguistics knowledge the machine translation as well as other language technologies can perform better - part-of-speech categories as well as other morpho-syntactic' attributes, for example, are widely used. But when dealing with the spontaneous speech we find many phenomena exceeding the traditional linguistics knowledge since it was gained mostly through researching written language forms. The spontaneous speech was better researched in fields such as pragma-linguistics, conversation analysis and others which can be classified as discourse analysis. Therefore I suggest to use some parts of linguistic knowledge of the discourse analysis to overcome the phenomena of the spontaneous speech in speech-to-speech translation. Researching was done theoretically and empirically. It was limited to tourist domain, to telephone conversations in tourist agency, tourist office and hotel. The corpus Turdis-1, including 30 conversations, was used as research material. The discourse analysis were studied in search for concepts that could be as easily as possible implemented to speech corpora as attributes fortagging. In this work I suggest that the spoken text is structured to smaller units: opening and closing sections, turns and utterances. The utterance is precisely defined. Hearer's signal s (words such as mhm, aha, ja) are treated as special discourse events, not as turn-taking. Further I suggest that the concept of discourse markers could be used. The empirical study shows that at least 15 expressions in the corpus Turdis-l (ja, mhm, aha,aja, ne?, no, eee, dobro/v redu/okej/prav, glejte/poglejte, veste, mislim,zdaj) could be specified as discourse markers. In the function of discourse marker these expressions represent almost 14% of all words in the 15.000 words corpus. Their particularity is that they do not contribute much to a representational meaning of utterance but are used mainly as pragmatic expressions: they help connecting discourse, expressing speaker's attitude towards discourse content, maintaining hearer's attention, organizing discourse etc. A structure of spontaneous spoken utterance can be fuzzy and disfluent. I suggest to use the concept of repair to eliminate a special, retrograde part of the utterance which can be disturbing for further processing since it is cut off. In 8% of all utterances in the corp us the repair was used. Further researching of the analyzed phenomena as well as researching of some not analyzed, but mentioned phenomena such as repetitions,topic structure of conversation, adjacency pairs, could be continuation of the present work. From the linguistic perspective this work brings researches of language use in a domain (spontaneous telephone conversations) and from perspectives (conversation structure, discourse markers, repair) which are all more or less new in the linguistics of Slovenian language.
    Type of material - dissertation ; adult, serious
    Publication and manufacture - Ljubljana : [D. Verdonik], 2006
    Language - slovenian
    COBISS.SI-ID - 227346944

Call number – location, accession no. ... Copy status Reservation
Skladišče II 0000065124/k Skladišče II 65124/k available - reading room
Skladišče CD 0000005980/cd Skladišče CD 5980/cd available - reading room
loading ...
loading ...
loading ...