The increasing interest in multilingual applications like speech-to-speech translation systems is accompanied by the need for speech recognition front-ends in many languages that can also handle multiple input languages at the same time. In this paper we describe a universal speech recognition system that fulfills such needs. It is trained by sharing speech and text data across languages and thus reduces the number of parameters and overhead significantly at the cost of only slight accuracy loss. The final recognizer eases the burden of maintaining several monolingual engines, makes dedicated language identification obsolete and allows for code-switching within an utterance. To achieve these goals we developed new methods for constructing multilingual acoustic models and multilingual n-gram language models.
Index Terms:
Multilingual acoustic modeling, data-driven, IPA, Multilingual n-gram language modeling
Citation:
Zhirong Wang, Umut Topkara, Tanja Schultz, Alex Waibel, "Towards Universal Speech Recognition," icmi, pp.247, Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02), 2002