UNI-MB - logo
UMNIK - logo
 

Search results

Basic search    Advanced search   
Search
request
Library

Currently you are NOT authorised to access e-resources UM. For full access, REGISTER.

1 2 3 4 5
hits: 207
1.
  • Incremental Text-to-Speech ... Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model
    Saeki, Takaaki; Takamichi, Shinnosuke; Saruwatari, Hiroshi IEEE signal processing letters, 2021, Volume: 28
    Journal Article
    Peer reviewed
    Open access

    This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech. Incremental TTS is generally ...
Full text

PDF
2.
  • Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization
    Hsu, Wei-Ning; Zhang, Yu; Weiss, Ron J. ... ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Conference Proceeding

    To leverage crowd-sourced data to train multi-speaker text-to-speech (TTS) models that can synthesize clean speech for all speakers, it is essential to learn disentangled representations which can ...
Full text
3.
  • Speech Synthesis Based on H... Speech Synthesis Based on Hidden Markov Models
    Tokuda, Keiichi; Nankaku, Yoshihiko; Toda, Tomoki ... Proceedings of the IEEE, 05/2013, Volume: 101, Issue: 5
    Journal Article
    Peer reviewed
    Open access

    This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this ...
Full text

PDF
4.
  • Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens
    Valle, Rafael; Li, Jason; Prenger, Ryan ... ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Conference Proceeding
    Open access

    Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and ...
Full text

PDF
5.
Full text

PDF
6.
  • Investigation of learning a... Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
    Yasuda, Yusuke; Wang, Xin; Yamagishi, Junichi Computer speech & language, 20/May , Volume: 67
    Journal Article
    Peer reviewed
    Open access

    •We compare sequence-to-sequence text-to-speech synthesis systems with pipeline text-to-speech synthesis systems.•We investigate sequence-to-sequence text-to-speech from three aspects: a) model ...
Full text

PDF
7.
  • Controllable speech synthes... Controllable speech synthesis by learning discrete phoneme-level prosodic representations
    Ellinas, Nikolaos; Christidou, Myrsini; Vioni, Alexandra ... Speech communication, January 2023, 2023-01-00, Volume: 146
    Journal Article
    Peer reviewed
    Open access

    In this paper, we present a novel method for phoneme-level prosody control of F0 and duration using intuitive discrete labels. We propose an unsupervised prosodic clustering process which is used to ...
Full text
8.
  • Statistical Parametric Spee... Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks
    Saito, Yuki; Takamichi, Shinnosuke; Saruwatari, Hiroshi ... IEEE/ACM transactions on audio, speech, and language processing, 01/01, Volume: 26, Issue: 1
    Journal Article
    Peer reviewed
    Open access

    A method for statistical parametric speech synthesis incorporating generative adversarial networks GANs is proposed. Although powerful deep neural networks techniques can be applied to artificially ...
Full text

PDF
9.
  • FastTalker: A neural text-t... FastTalker: A neural text-to-speech architecture with shallow and group autoregression
    Liu, Rui; Sisman, Berrak; Lin, Yixing ... Neural networks, 09/2021, Volume: 141
    Journal Article
    Peer reviewed

    Non-autoregressive architecture for neural text-to-speech (TTS) allows for parallel implementation, thus reduces inference time over its autoregressive counterpart. However, such system architecture ...
Full text
10.
  • Evaluating Speech-Phoneme Alignment and its Impact on Neural Text-To-Speech Synthesis
    Zalkow, Frank; Govalkar, Prachi; Muller, Meinard ... ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023-June-4
    Conference Proceeding
    Open access

    In recent years, the quality of text-to-speech (TTS) synthesis vastly improved due to deep-learning techniques, with parallel architectures, in particular, providing excellent synthesis quality at ...
Full text
1 2 3 4 5
hits: 207

Load filters