Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

E-resources

PDF

Full text

Peer reviewed Open access

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

Saito, Yuki; Takamichi, Shinnosuke; Saruwatari, Hiroshi

IEEE/ACM transactions on audio, speech, and language processing, 2018-Jan., 2018-1-00, Volume: 26, Issue: 1

Journal Article

A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an oversmoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness for text-to-speech and voice conversion, and found that the proposed method can generate more natural spectral parameters and F 0 than conventional minimum generation error training algorithm regardless of its hyperparameter settings. Furthermore, we investigated the effect of the divergence of various GANs, and found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.

Keep searching

Author

Saito, Yuki | Takamichi, Shinnosuke | Saruwatari, Hiroshi

Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.

Year	Impact factor		Edition		Category		Classification
Year	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Links to authors' personal bibliographies	Links to information on researchers in the SICRIS system

Source: Personal bibliographies and: SICRIS

Upload image

Shelf entry

Adding material to shelf was successful.

Adding material to shelf failed.

It was not necessary to add the material to the shelf.

Permalink

E-mail

Impact factor

Select the library membership card:

DRS, in which the journal is indexed

Citations

Theme