In this paper we cope with the task of modeling phoneme duration for Greek speech synthesis. In particular we apply well established machine learning approaches to the WCL-1 prosodic database for predicting segmental durations from shallow morphosyntactic and prosodic features. We employ decision trees, instance based learning and linear regression. Trained on a 5500 word database, both CART and linear regression models proved to be the most effective in terms for the task with a root mean square error of 0.0252 and 0.0251 respectively.
Citation:
Alexandros Lazaridis, Panagiotis Zervas, Georgios Kokkinakis, "Segmental Duration Modeling for Greek Speech Synthesis," ictai, vol. 2, pp.518-521, 19th IEEE International Conference on Tools with Artificial Intelligence - Vol.2 (ICTAI 2007), 2007