Psoriasis, though not an immediately life‐threatening disorder, still lowers quality of life and disrupts daily functions. While many treatments for psoriasis exist, few are convenient and safe long ...term; even if sufficiently effective, treatments themselves often decrease quality of life and make long‐term treatment adherence difficult. Approved in Japan in December 2016, apremilast was expected to function as a simple, long‐term, systemic therapeutic agent for psoriasis. We report on the clinical outcomes of administrating apremilast for 2 years as observed in 46 psoriatic patients at the Saruwatari Dermatology Clinic between March 2017 and February 2019. We believe the ease of clinic consultation (as compared with large general hospital consultation) draws patients who are busy or have mild symptoms, and consequently poor adherence to regular visits and treatment, reflecting realistic conditions. Major adverse events with apremilast treatment were diarrhea (37.0%) and nausea (15.3%), with most cases of diarrhea being mild. Drug survival analysis by the Kaplan–Meier method revealed a 1‐year continuation rate of 46.8% and a 2‐year continuation rate of 37.4%. Dermatology Life Quality Index (DLQI) scores during the observation period declined from 9.3 to 2.8 (P < 0.0001) on average, with the DLQI‐0/1 achievement rate being 28.6%. Based on our findings, we conclude that apremilast is suitable as a long‐lasting basic treatment that is easy to prescribe in small clinics and easy to use in everyday life.
This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a ...state-of-the-art technique that utilizes the statistical independence between sources in a mixture signal, and an efficient optimization scheme has been proposed for IVA. However, since the source model in IVA is based on a spherical multivariate distribution, IVA cannot utilize specific spectral structures such as the harmonic structures of pitched instrumental sounds. To solve this problem, we introduce NMF decomposition as the source model in IVA to capture the spectral structures. The formulation of the proposed method is derived from conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA. The proposed method can be optimized by the update rules of IVA and single-channel NMF. Experimental results show the efficacy of the proposed method compared with IVA and MNMF in terms of separation accuracy and convergence speed.
A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks techniques can be applied to artificially ...synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an oversmoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness for text-to-speech and voice conversion, and found that the proposed method can generate more natural spectral parameters and F 0 than conventional minimum generation error training algorithm regardless of its hyperparameter settings. Furthermore, we investigated the effect of the divergence of various GANs, and found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving the synthetic speech quality.
A sound field recording method based on spherical or circular harmonic analysis for arbitrary array geometry and directivity of microphones is proposed. In current methods based on harmonic analysis, ...a sound field is decomposed into harmonic functions with a center given in advance, which is called a global origin, and their coefficients are obtained up to a certain truncation order using microphone measurements. However, the accuracy of the reconstructed sound field depends on the predefined position of the global origin and the truncation order, which makes it difficult to apply this technique to an asymmetric array since the criterion to determine the position of the global origin and the truncation order is not obvious. We formulate an estimate of the harmonic coefficients on the basis of infinite-order analysis. This formulation enables us to estimate the harmonic coefficients at an arbitrary desired position independently of the position of the global origin without truncation errors. Numerical simulation results indicated that the proposed method makes it possible to avoid performance degradation caused by inappropriate setting of the global origin.
A wave field estimation method exploiting prior information on source direction is proposed. First, we formulate a wave field estimation problem as regularized least squares, where the norm of the ...wave field is used for a regularization term. The norm of the wave field is defined on the basis of the weighting function that reflects the prior information on the source direction. We derive the closed-form solution using theories on Hilbert spaces. Results of numerical experiments indicated that high estimation accuracy can be achieved by using the proposed method in comparison with other current methods that do not use any prior information.
This paper describes several important methods for the blind source separation of audio signals in an integrated manner. Two historically developed routes are featured. One started from independent ...component analysis and evolved to independent vector analysis (IVA) by extending the notion of independence from a scalar to a vector. In the other route, nonnegative matrix factorization (NMF) has been extended to multichannel NMF (MNMF). As a convergence point of these two routes, independent low-rank matrix analysis has been proposed, which integrates IVA and MNMF in a clever way. All the objective functions in these methods are efficiently optimized by majorization-minimization algorithms with appropriately designed auxiliary functions. Experimental results for a simple two-source two-microphone case are given to illustrate the characteristics of these five methods.
•We propose phase reconstruction methods from amplitude spectrograms using directional statistics deep neural networks (DNNs).•The directional statistics DNN is a novel deep generative model that has ...a circular probability distribution as the conditional probability.•We use the DNN to model not only phase of speech signals but also group delay that is strongly related to amplitude spectra.•Experimental evaluation demonstrates that our method outperforms the conventional signal processing based method.
This paper presents a deep neural network (DNN)-based phase reconstruction method from amplitude spectrograms. In speech processing, an amplitude spectrogram is often used for processing, and the corresponding phases are reconstructed from the amplitude spectrogram by using the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. To solve this problem, we propose the directional-statistics DNNs for predicting phases from the amplitude spectrograms. We first propose the von Mises distribution DNN, which is a generative model having the von Mises distribution and models histograms of a periodic variable. We extend it for modeling group delay that has a stronger connection to the amplitude spectrograms. Furthermore, we generalize the group-delay modeling and propose another DNN called the sine-skewed generalized cardioid distribution DNN for modeling asymmetric histograms such as a group delay. Results from objective and subjective evaluations indicate that (1) our von Mises distribution DNN can predict group delay more accurately than predicting phases, (2) our DNN works as better initialization of the Griffin-Lim method, (3) the phase reconstruction methods based on our von Mises distribution DNN achieve better speech quality than the conventional Griffin-Lim method, and (4) our sine-skewed generalized cardioid distribution DNN models the group delay more accurately than our von Mises distribution DNN.
•We propose unsupervised text-to-speech synthesis using subword tokenization and prosodic-context extraction.•The subword tokenization can determine language units suitable for prosody ...generation.•The context extraction can retrieve contexts from pairs of subwords and prosody.•Experimental evaluation demonstrates that the proposed methods outperform conventional methods in terms of synthetic speech quality.
This paper presents text tokenization and context extraction without using language knowledge for text-to-speech (TTS) synthesis. To generate prosody, statistical parametric TTS synthesis typically requires the professional knowledge of the target language. Therefore, languages suitable for TTS synthesis are limited to only rich-resource languages. To achieve TTS synthesis without using language knowledge, we propose acoustic model-based subword tokenization and unsupervised extraction of prosodic contexts. The subword tokenization can determine language units suitable for prosody generation. The context extraction can retrieve contexts from pairs of subwords and prosody. The proposed methods function without language knowledge and can improve F0 prediction accuracy. Experimental evaluation demonstrates that 1) the training of proposed subword tokenization, which uses the expectation-maximization algorithm and deep neural networks, is empirically stable, 2) the proposed subword tokenization tokenizes text into subwords that are close to language-specific units, and 3) the proposed methods outperform the conventional methods using language model-based tokenization in terms of synthetic speech quality.
•This is the first work to decompose and reconstruct a sound field based on the reciprocity gap functional (RGF) in the spherical harmonic domain.•As opposed to the sparse-representation algorithms, ...the proposed method does not require the discretization of the target region into grid points.•The proposed method makes it possible to avoid decomposition errors of off-grid sources and high computational cost of sparse representation.•The RGF is applied to the sound field decomposition, which enables to decompose the sound field as a closed-form solution with the flexible arrangement of microphones.
A sound field decomposition method based on the reciprocity gap functional (RGF) in the spherical harmonic domain is proposed. To estimate and reconstruct a continuous sound field including sources by using multiple microphones, an intuitive and powerful strategy is to decompose the sound field into Green’s functions. Sparse-representation algorithms have been applied to this decomposition problem; however, it requires the discretization of the target region into grid points to construct a dictionary matrix. Discretization-based methods lead to decomposition errors of off-grid sources and high computational cost of sparse representation. We apply the RGF to sparse sound field decomposition, which makes it possible to decompose the sound field as a closed-form solution without discretization. In addition, the formulation in the spherical harmonic domain enables the flexible arrangement of microphones under the assumption of the spherical target region. Numerical simulation results indicated that high decomposition and reconstruction accuracies can be achieved by the proposed method, especially at low frequencies, with a low computational cost.