This article aims to provide a deeper understanding of the automatic subtitling tools that are currently being used and of their context of application. Based on the assumption that AVT, in general, ...and subtitling, in particular, are undergoing fundamental changes, it is our aim to analyse the range of tools that allow AVT translators to enhance their productivity and their efficiency. For this purpose, we have analysed 40 different automatic subtitling tools, currently available and accessible on Internet. Through this analysis, it has been possible to observe the main features of these tools and observe their functioning. Therefore, different criteria have been established in order to systemize this extensive inventory based on which 23 categories of software dedicated to automatic subtitling have been identified. These categories have been illustrated with examples. In this study, the aim is to provide a more accurate and more systematic understanding of automated subtitling programs. The paper is addressed to AVT professionals as well as to teachers and students having an interest in the state-of-the-art of automated subtitling.
A high percentage of theatrical prints and manuscripts from the aurisecular period have never been transcribed in an analogical or, of course, digital
format. It is therefore impossible to use these ...documents to carry out searches
of our interest or for the valuable computer analyses (stylometry, topic modelling,
sentiment analysis, etc.) that have been developed in recent years. Thanks to Artificial Intelligence (Transkribus) and HTR (Handwritten Text Recognition) techniques,
I have trained three models, already public for the research community, capable
of transcribing and orthographically modernizing these documents automatically
with a high degree of precision: around 97% of success in prints and 91% in manuscripts. Through these models I have been able to process some 1,300 theatrical
plays contained in prints and manuscripts from numerous libraries, archives, and
other digitized sources. The resulting transcripts are now part of the ETSO project,
of the TEXORO search engine and, in addition to being an advanced starting point
for careful editing of the texts, they themselves have sufficient quality to be subjected to stylometric analysis, which is yielding authorship attributions of interest
Un elevado porcentaje de impresos y manuscritos teatrales del periodo aurisecular no ha sido nunca transcrito en un formato analógico ni, por supuesto, digital. Es imposible, por tanto, emplear estos documentos para realizar búsquedas de nuestro interés o para los valiosos análisis informáticos (estilometría, topic modelling, detección de sentimientos, etc.) que se están desarrollando en los últimos años. Gracias a la Inteligencia Artificial (Transkribus) y técnicas de HTR (Handwritten Text Recognition) he entrenado tres modelos, públicos ya para la comunidad investigadora, capaces de transcribir y modernizar ortográficamente estos documentos de forma automática con un alto grado de precisión: alrededor del 97% de acierto en impresos y 91% en manuscritos. A través de estos modelos he podido procesar unas 1.300 obras teatrales contenidas en impresos y manuscritos procedentes de numerosas bibliotecas, archivos y otras fuentes digitalizadas. Las transcripciones resultantes forman ahora parte del proyecto ETSO, del buscador TEXORO y, además de suponer un avanzado punto de partida para la edición cuidada de los textos, cuentan por sí mismas con la calidad suficiente para ser sometidas a análisis estilométricos, los cuales están arrojando atribuciones autoriales de interés.
Recording and transcribing interviews in qualitative social research is a vital but time-consuming and resource-intensive task. To tackle this challenge, researchers have explored various alternative ...approaches; automatic transcription utilising speech recognition algorithms has emerged as a promising solution. The question of whether automated transcripts can match the quality of transcripts produced by humans remains unanswered. In this paper we systematically compare multiple automatic transcription tools: Amberscript, Dragon, F4x, Happy Scribe, NVivo, Sonix, Trint, Otter, and Whisper. We evaluate aspects of data protection, accuracy, time efficiency, and costs for an English and a German interview. Based on the analysis, we conclude that Whisper performs best overall and that similar local-automatic transcription tools are likely to become more relevant. For any type of transcription, we recommend reviewing the text to ensure accuracy. We hope to shed light on the effectiveness of automatic transcription services and provide a comparative frame for others interested in automatic transcription.
This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by ...approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (
= 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological challenges, automatic transcription and acoustic quality of remote recordings. First, we find that among commercially available transcription programs, Sonix far outperforms Google Transcribe and Vocalmatic on the measure of word error rate (WER). Second, we also establish that the acoustic quality of the remote recordings was adequate for intonational and speech rate analysis. Moreover, this remote method of collecting and analyzing speech samples works successfully with elderly bilingual participants who speak a heritage language different from their dominant societal language, and it can become a new norm when face-to-face communication with elderly participants is not possible.
The current paper aims to analyse the main features and the limitations of the online automatic subtitling platforms. Based on different applications such as: video-to-text transcription programs, ...machine translation programs, and text segmenters, automatic subtitling involves a complex workflow and is meant to enhance the productivity of the professional subtitler. There are very few studies about the online subtitling platforms, therefore, the analysis we carried on will provide comprehensive empirical data and will contribute to a better knowledge of these innovative systems.
Organ tablature music notation differs considerably in structure and form from the music notation used today. The manual transcription of organ tablature compositions to modern music notation is ...time-consuming and often prone to errors. In this paper, we present a deep learning approach to automatically recognize organ tablature notation in scanned documents and transcribe it to modern music notation. Our approach is aimed at generating a uniform transcription that remains as close as possible to the original sheet music and therefore does not perform automatic error correction or musical interpretation. The artificial neural network model developed for the recognition of tablature characters is trained using a combination of real annotated tablature staves and tablatures produced by a synthetic data generator. The results of our experiments are evaluated on tablatures taken from two tablature books. We identify several types of error and validate that these are primarily caused by the poor legibility of relevant parts of some tablature scans. Overall, our approach achieves an accuracy of 97.2% and 99.3% correctly recognized bars, depending on whether note pitch and rest characters or note duration and special characters are considered, respectively.
The inclusion of people with disabilities has always been a challenge for governments and society. To provide environments accessible to everyone, as well as teachers able to handle diversities in ...the classroom, and work conditions that benefit inclusion are mandatory actions towards a fair community. A system was designed to create real-time speech subtitles during a presentation and textual support for audio transcription. ...people with hearing impairments are able to follow a presentation just by reading the subtitle. Keywords: Accessibility; Inclusion; Hearing Impaired; Automatic Transcription System. 1.Introduçao A comunicaçao é uma condiçao indispensável na vida dos seres humanos, por permitir o convívio em sociedade.
Most work on automatic transcription produces "piano roll" data with no musical interpretation of the rhythm or pitches. We present a polyphonic transcription method that converts a music audio ...signal into a human-readable musical score, by integrating multi-pitch detection and rhythm quantization methods. This integration is made difficult by the fact that the multi-pitch detection produces erroneous notes such as extra notes and introduces timing errors that are added to temporal deviations due to musical expression. Thus, we propose a rhythm quantization method that can remove extra notes by extending the metrical hidden Markov model and optimize the model parameters. We also improve the note-tracking process of multi-pitch detection by refining the treatment of repeated notes and adjustment of onset times. Finally, we propose evaluation measures for transcribed scores. Systematic evaluations on commonly used classical piano data show that these treatments improve the performance of transcription, which can be used as benchmarks for further studies.
Games can be used to exploit the computational power of humans to perform tasks that are difficult for computers. One of these difficult tasks is the transcription of video lectures. Indeed, the ...characteristics of the speech that occur in video lectures are not well suited for speech recognition technologies. In this paper we propose ALGA, an ALtruistic GAme, designed to involve students in the production of transcripts. Players challenge each other by listening to short, and randomly selected, pieces of the audio stream, and by submitting the corresponding transcription. When two players (unknown to each other) submit the same version, the transcript of the audio chunk is considered correct and the players gain points. To motivate players, ALGA provides the final transcript to all the players and maintains a high-score list for every video lecture. The evaluation shows that the accuracy of the obtained transcripts is higher than the one obtained by speech recognition technologies and also shows that participants like the game approach. Hence, ALGA can be considered a reasonable, feasible and affordable solution to produce transcripts from video lectures.
In this paper, we have proposed an automated tabla syllable transcription method using image processing technique. As for a beginner tabla learner, the learning is faster by visualizing things rather ...than just listening. Therefore, we have adopted this technique for our study. We have used a human perception based approach for learning tabla and implemented the same. We have created three regions of interest for each drum,
dayan
and
bayan
. The placement of the fingers’ image feature over this region is tracked to determine the exact region where it strikes and produces a particular syllable. Each frame is initially labeled to a syllable. Finally, we have used supervised classification to prune the labeling for each stroke based on its image for a particular syllable by comparing incoming frames to the reference image using the structural similarity index. Based on this the syllables are classified and automatic transcription is done. Using the proposed method, we are proficiently able to transcript 97.14% of the tabla syllables with F1 score of 0.98.