This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this ...approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.
To clarify the real risk of nerve injury during elbow arthroscopy, the distances of the radial and median nerves to the elbow joint were investigated using ultrasonography in patients who underwent ...surgery.
A total of 35 patients who underwent arthroscopic surgery of the elbow were investigated. The distances of the nerves to the capsule and bony landmarks were measured using ultrasonography. The radial nerve distances were measured at the capitellum, joint space, radial head, and radial neck levels. The median nerve distances were measured at the trochlear, joint space, and coronoid process levels. The patients were divided into 2 groups: nine patients in the hydrarthrosis (HA) group and 26 patients in the non-hydrarthrosis (non-HA) group. HA was defined as the intra-articular effusion on magnetic resonance imaging scans.
The radial nerve ran closer to the capsule at the radial neck level in the HA group than in the non-HA group (2.0 mm vs. 5.9 mm, P < .01). In the non-HA group, the radial nerve ran closer to the radial head than in the HA group (6.3 mm vs. 8.5 mm, P = .01). The median nerve ran closer to the capsule at the trochlear level in the HA group than in the non-HA group (5.2 mm vs. 8.8 mm, P < .01). Nerves at a distance of ≤2 mm from the capsule were found in 7 patients at the radial neck of the radial nerve and in 2 patients at the trochlear region of the median nerve in the HA group. In the non-HA group, they were found in 3 patients at the radial head and in 1 patient at the joint space of the radial nerve.
The dangerous locations for nerve injury during elbow arthroscopy vary according to hydrarthrosis, and this risk should be recognized during arthroscopic surgery.
This paper describes development of an open-source toolkit which makes it possible to explore a vast variety of aspects in speech interactions at spoken dialog systems and speech interfaces. The ...toolkit tightly incorporates recent speech recognition and synthesis technologies with a 3-D CG rendering module that can manipulates expressive embodied agent characters. The software design and its interfaces are carefully designed to be fully open toolkit. Ongoing demonstration experiments to public indicates that it is promoting related researches and developments of voice interaction systems in various scenes.
This paper proposes a generative adversarial training method for deep neural network (DNN)-based singing voice synthesis. The DNN-based approach has been used in statistical parametric singing voice ...synthesis and improved the naturalness of the synthesized singing voice 1. Recently, generative adversarial networks (GANs) 2 have attracted significant attention in various machine learning research areas including speech synthesis 3. GANs have achieved great success in modeling the distributions of complex data, and they have the potential to alleviate over-smoothing problem on the generated speech parameters in speech synthesis. In this paper, we propose a DNN-based singing voice synthesis system incorporating the GAN. Experimental results show that the proposed method outperforms the conventional method in the naturalness of the synthesized singing voice.
This paper investigates how to use neural networks in statistical parametric speech synthesis. Recently, deep neural networks (DNNs) have been used for statistical parametric speech synthesis. ...However, the specific way how DNNs should be used in statistical parametric speech synthesis has not been studied thoroughly. A generation process of statistical parametric speech synthesis based on generative models can be divided into several components, and those components can be represented by DNNs. In this paper, the effect of DNNs for each component is investigated by comparing DNNs with generative models. Experimental results show that the use of a DNN as acoustic models is effective and the parameter generation combined with a DNN improves the naturalness of synthesized speech.
This paper presents PeriodNet, a non-autoregressive (non-AR) waveform generative model with a new model structure for modeling periodic and aperiodic components in speech waveforms. Non-AR raw ...waveform generative models have enabled the fast generation of high-quality waveforms. However, the variations of waveforms that these models can reconstruct are limited by training data. In addition, typical non-AR models reconstruct a speech waveform from a single Gaussian input despite the mixture of periodic and aperiodic signals in speech. These may significantly affect the waveform generation process in some applications such as singing voice synthesis systems, which require reproducing accurate pitch and natural sounds with less periodicity, including husky and breath sounds. PeriodNet uses a parallel or series model structure to model a speech waveform to tackle these problems. Two sub-generators connected in parallel or in series take an explicit periodic and aperiodic signal (sine wave and Gaussian noise) as an input. Since PeriodNet models periodic and aperiodic components by focusing on whether these input signals are autocorrelated or not, it does not require external periodic/aperiodic decomposition during training. Experimental results show that our proposed structure improves the naturalness of generated waveforms. We also show that speech waveforms with a pitch outside of the training data range can be generated with more naturalness.
This paper presents a method to realize the hidden Markov model (HMM)-based Mandarin-Tibetan cross-lingual statistical speech synthesis using speaker adaptive training. A set of Speech Assessment ...Methods Phonetic Alphabet (SAMPA) is designed to label the pronunciation of the initial and the final of Mandarin and Tibetan syllables according to the similarities in pronunciation between Mandarin and Tibetan. A grapheme-to-phoneme conversion method is realized to convert Chinese or Tibetan sentences to SAMPA-based Pinyin sequences. A Mandarin statistical speech synthesis framework is employed to realize Mandarin-Tibetan cross-lingual speech synthesis. A set of context-dependent label format is designed to label the context information of Mandarin and Tibetan sentences. A question set is also realized for context dependent decision tree clustering. The initial and the finalare used as the synthesis units with training using a set of average mixed-lingual models from a large Mandarin multi-speaker-based corpus and a small Tibetan one-speaker-based corpus using speaker adaptive training (SAT). Then, the speaker adaptation transformation is applied to the speaker dependent (SD) training data to obtain a set of speaker dependent Mandarin or Tibetan models from the average mixed-lingual models. The Mandarin speech or Tibetan speech is then synthesized from the speaker dependent Mandarin or Tibetan models. Tests show that this method outperforms the method using only Tibetan SD models when only a small number of Tibetan training utterances are available. When the number of training Tibetan utterances is increased, the performances of the two methods tend to be the same. Mixed Tibetan training sentences have a small effect on the quality of synthesized Mandarin speech.
Background For correction of cubitus varus deformity resulting from supracondylar fracture of the humerus, we developed an operative method with use of a custom-made surgical guide, designed on the ...basis of 3-dimensional (3D) computer simulation with computed tomography data. The purpose of this study was to investigate the postoperative accuracy of this system in clinical cases. Methods Subjects included 17 consecutive patients (13 males and 4 females) with cubitus varus deformity after supracondylar fracture. Patients underwent 3D corrective osteotomy with use of a custom-made surgical guide. Postoperative computed tomography scan was performed after bone union diagnosis on plain radiographs, and postoperative 3D bone models were compared with preoperative simulation by surface registration technique. In addition, we evaluated radiographic parameters (humerus-elbow-wrist angle and tilting angle) and range of elbow motion at the most recent follow-up. Results Mean errors in 3D corrective osteotomy were 0.6° ± 0.7° in varus-valgus rotation, 0.8° ± 1.3° in flexion-extension rotation, 2.9° ± 2.8° in internal-external rotation, 1.7 ± 1.8 mm in anterior-posterior translation, 1.3 ± 1.8 mm in lateral-medial translation, and 7.1 ± 6.3 mm in proximal-distal translation. The mean humerus-elbow-wrist angle on plain radiographs of the affected side was 15° in varus before surgery and improved to 6° in valgus after surgery. The mean tilting angle of the affected side was 31° before surgery and improved to 40° after surgery. Conclusion The 3D correction of cubitus varus deformity was performed accurately within the allowable error limits.
This paper presents a mel-cepstrum-based quantization noise shaping method for improving the quality of synthetic speech generated by neural-network-based speech waveform synthesis systems. Since ...mel-cepstral coefficients closely match the characteristics of human auditory perception, the proposed method effectively masks the white noise introduced by the quantization typically used in neural-network-based speech waveform synthesis systems. The paper also describes a computationally efficient implementation of the proposed method using the structure of the mel-log spectrum approximation filter. Experiments using the WaveNet generative model, which is a state-of-the-art model for neural-network-based speech waveform synthesis, showed that speech quality is significantly improved by the proposed method.