The text to speech technology has achieved significant progress during the past decade and is an active area of research and development in providing different human–computer interactive systems. ...Even though a number of speech synthesis models are available for different languages focusing on the domain requirements with many motive applications, a source of information on current trends in Indian language speech synthesis is unavailable till date making it difficult for the beginners to initiate research for the development of TTS systems for the low-resourced languages. This paper provides a review of the contributions made by different researchers in the field of Indian language speech synthesis along with a study on the Indian language characteristics and the associated challenges in designing TTS systems. A set of available applications and tools results out of different projects undertaken by different organizations along with a set of possible future developments are also discussed to provide a single reference to an important strand of research in speech synthesis which may benefit anyone interested to initiate research in this area.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation ...and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
State-of-the art voice conversion methods re-synthesize voice from spectral representations such as MFCCs and STRAIGHT, thereby introducing muffled artifacts. We propose a method that circumvents ...this concern using concatenative synthesis coupled with exemplar-based unit selection. Given parallel speech from source and target speakers as well as a new query from the source, our method stitches together pieces of the target voice. It optimizes for three goals: matching the query, using long consecutive segments, and smooth transitions between the segments. To achieve these goals, we perform unit selection at the frame level and introduce triphone-based preselection that greatly reduces computation and enforces selection of long, contiguous pieces. Our experiments show that the proposed method has better quality than baseline methods, while preserving high individuality.
A command-line tool and Python framework is proposed for the exploration of a new form of audio synthesis known as ‘concatenative synthesis’, a form of synthesis that uses perceptual audio analyses ...to arrange small segments of audio based on their characteristics. The tool is designed to synthesise representations of an input target sound using a source database of sounds. This involves the segmentation and analysis of both the input sound and database, the matching of input segments to their closest segment from the database, and the resynthesis of the closest matches to produce the final result. The project aims to provide a tool capable of generating high-quality sonic representations of an input, to present a variety of examples that demonstrated the breadth of possibilities that this style of synthesis has to offer and to provide a robust framework on which concatenative synthesis projects can be developed easily. The purpose of this project was primarily to highlight the potential for further development in the area of concatenative synthesis, and to provide a simple and intuitive tool that could be used by composers for sound design and experimentation. The breadth of possibilities for creating new sounds offered by this method of synthesis makes it ideal for digital sound design and electroacoustic composition. Results demonstrate the wide variety of sounds that can be produced using this method of synthesis. A number of technical issues are outlined that impeded the overall quality of results and efficiency of the software. However, the project clearly demonstrates the strong potential for this type of synthesis to be used for creative purposes.
Spoken language training benefits from showing a video of native speakers' articulatory movements to train the second language learners. Typically, the articulatory video is prepared in conjunction ...with the audio which is collected simultaneously with the articulatory recording. Articulatory video recording requires specialized equipment and, hence, is expensive and time consuming. In this work, we propose a concatenative synthesis approach to obtain articulatory videos for an audio, which may not have a simultaneous articulatory recording. In the training stage of the proposed approach, we make a repository for phoneme specific articulatory image sequence from the available articulatory video. During testing, image sequences are selected from this repository to ensure a smooth transition across phonetic events. The selected image sequences are finally stitched to synthesize the articulatory video for the test audio. Articulatory videos are synthesized for 50 words randomly selected from the MRI-TIMIT database, not seen in the training data. Subjective evaluation on the quality of the synthesized videos using twelve subjects suggests that the videos are close to the original ones with a rating of 3.78 out of 5, where a score of 5 (1) indicates that there is no (great) difference in quality between the original and the synthesized videos.
We present a machine learning approach to automatically generate expressive (ornamented) jazz performances from un-expressive music scores. Features extracted from the scores and the corresponding ...audio recordings performed by a professional guitarist were used to train computational models for predicting melody ornamentation. As a first step, several machine learning techniques were explored to induce regression models for timing, onset, and dynamics (i.e. note duration and energy) transformations, and an ornamentation model for classifying notes as ornamented or non-ornamented. In a second step, the most suitable ornament for predicted ornamented notes was selected based on note context similarity. Finally, concatenative synthesis was used to automatically synthesize expressive performances of new pieces using the induced models. Supplemental online material for this article containing musical examples of the automatically generated ornamented pieces can be accessed at doi:
10.1080/17459737.2016.1207814
and
https://soundcloud.com/machine-learning-and-jazz
. In the Online Supplement we present an example of the musical piece Yesterdays by Jerome Kern, which was modeled using our methodology for expressive music performance in jazz guitar.
Full text
Available for:
BFBNIB, GIS, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Speech Synthesis deals with artificial production of speech and a text-to-speech system (TTS) in this aspect converts natural language text into a spoken waveform or speech. There are a number of TTS ...systems available today for different languages, still Indian languages are lacking behind in providing high quality synthesized speech. Even though almost all Indian languages share a common phonetic base, till now a generic model for all official Indian languages is not available. Also, the existing speech synthesis techniques are found to be less effective in the scripting format of Indian languages. Considering the intelligibility of speech production and increasing memory requirement in concatenative speech synthesis technique, in this paper, we have proposed an efficient technique for text-to-speech synthesis in Indian languages. The model uses a pronunciation rule based waveform concatenation approach, to produce intelligible speech minimizing the memory requirement. To show the effectiveness of the technique, at an initial step of implementation the Odia (formerly Oriya), Bengali and Hindi languages are considered. The model is being compared with the existing technique and the results of our experiments show our technique outperforms the existing technique.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Developing a text to speech (TTS) system, commonly referred as TTS, that sounds similar to human natural speech is being attempted over the years, but still not achieved by even the best of presently ...available TTS algorithms. Most of these still sound robotic, unless human speech itself is present in them. However, such human speech necessitates creation of a large database of each and every word of that language which is quite an onerous task. This research article illustrates a new approach and methodology that helps to reduce database size by using “syllabic based concatenative speech synthesis”. In this method, new words are ‘created’ using existing words and syllables from the database. The naturalness of these ‘created’ words in speech are further improved by ‘position based syllabification ’and ‘objective spectral noise reduction’. A combination of neural and classification network and non-neural methods are used for syllabification. After new words are ‘created’, the spectral distortion present at joints is reduced with objective spectral estimation and reduction methods in time and frequency domains. These approaches result in improved naturalness for proposed Marathi—TTS.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ