loading...
Emotional Speech Synthesis using Subspace Constraints in Prosody
Toronto, ON, Canada July 09-July 12
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICME.2006.2627252006 IEEE International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Shinya Mori, Keio University, Department of Information and Computer Science, 3-14-1 Hiyoshi, Yokohama-shi, Kanagawa 223-8522, Japan. kpeg@ozawa.ics.keio.ac.jp
Tsuyoshi Moriyama, Keio University, Department of Information and Computer Science, 3-14-1 Hiyoshi, Yokohama-shi, Kanagawa 223-8522, Japan. kpeg@ozawa.ics.keio.ac.jp
Shinji Ozawa, Keio University, Department of Information and Computer Science, 3-14-1 Hiyoshi, Yokohama-shi, Kanagawa 223-8522, Japan. kpeg@ozawa.ics.keio.ac.jp
An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for "anger", "surprise", "disgust", "sorrow", "boredom", "depression", and "joy".
Citation:
Shinya Mori, Tsuyoshi Moriyama, Shinji Ozawa, "Emotional Speech Synthesis using Subspace Constraints in Prosody," icme, pp.1093-1096, 2006 IEEE International Conference on Multimedia and Expo, 2006
Usage of this product signifies your acceptance of the Terms of Use.