Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video
|
Abstract—We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American Sign Language (ASL) using a single camera to track the user's unadorned hands. The first system observes the user from a desk mounted camera and achieves 92 percent word accuracy. The second system mounts the camera in a cap worn by the user and achieves 98 percent accuracy (97 percent with an unrestricted grammar). Both experiments use a 40-word lexicon.
[1] 1371 L. Baum, "An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of Markov Processes," Inequalities, vol. 3, pp. 1-8, 1972.
[2] L.W. Campbell, D.A. Becker, A. Azarbayejani, A.F. Bobick, and A. Pentland, "Invariant Features for 3D Gesture Recognition," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 157-162, Oct. 1996.
[3] B. Dorner, "Hand Shape Identification and Tracking for Sign Language Interpretation," IJCAI Workshop on Looking at People, 1993.
[4] I. Essa, T. Darrell, and A. Pentland, "Tracking Facial Motion," Proc. Workshop Motion of Nonrigid and Articulated Objects, pp. 36-42.Cambridge, Mass., 1995.
[5] B. Horn, Robot Vision.Cambridge, Mass.: MIT Press, 1986.
[6] X.D. Huang, Y. Ariki, and M.A. Jack, Hidden Markov Models for Speech Recognition. Edinburgh Univ. Press, 1990.
[7] T. Humphries, C. Padden, and T. O'Rourke, A Basic Course in American Sign Language.Silver Spring, Md.: T. J. Publ., Inc., 1990.
[8] R. Liang and M. Ouhyoung, "A Real-Time Continuous Gesture Interface for Taiwanese Sign Language," Submitted to UIST, 1997.
[9] R. Picard, "Toward Agents That Recognize Emotion," Imagina98, 1998.
[10] H. Poizner, U. Bellugi, and V. Lutes-Driscoll, "Perception of American Sign Language in Dynamic Point-Light Displays," J. Exp. Pyschol.: Human Perform., vol. 7, pp. 430-440, 1981.
[11] L.R. Rabiner and B.H. Juang, "An Introduction to Hidden Markov Models," IEEE Acoustics, Speech, and Signal Processing Magazine, vol. 3, pp. 4-16, Jan. 1986.
[12] J. M. Rehg and T. Kanade, "DigitEyes: Vision-Based Human Hand Tracking," School of Computer Science Technical Report CMU-CS-93-220, Carnegie Mellon Univ., Dec. 1993.
[13] J. Schlenzig, E. Hunter, and R. Jain, "Recursive Identification of Gesture Inputs Using Hidden Markov Models," Proc. Second IEEE Workshop on Applications of Computer Vision,Sarasota, Fla., pp. 187-194, Dec.5-7, 1994.
[14] G. Sperling, M. Landy, Y. Cohen, and M. Pavel, "Intelligible Encoding of ASL Image Sequences at Extremely Low Information Rates," Comp. Vision, Graphics, and Image Processing, vol. 31, pp. 335-391, 1985.
[15] T. Starner, "Visual Recognition of American Sign Language Using Hidden Markov Models," Master's thesis, MIT, Media Laboratory, Feb. 1995.
[16] J. Makhoul, T. Starner, R. Schartz, and G. Lou, “On-Line Cursive Handwriting Recognition Using Speech Recognition Models,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. v125-v128, Adelaide, Australia, 1994.
[17] T. Starner, S. Mann, B. Rhodes, J. Levine, J. Healey, D. Kirsch, R. Picard, and A. Pentland, "Augmented Reality Through Wearable Computing," Presence, vol. 6, no. 4, pp. 386-398, 1997.
[18] T. Starner and A. Pentland, "Real-Time American Sign Language Recognition From Video Using Hidden Markov Models," Technical Report 375, MIT Media Lab, Perceptual Computing Group, 1995. Earlier version appeared ISCV'95.
[19] T. Starner, J. Weaver, and A. Pentland, "Real-Time American Sign Language Recognition Using Desktop and Wearable Computer Based Video," Technical Report 466, Perceptual Computing, MIT Media Laboratory, July 1998.
[20] C. Vogler and D. Metaxas, “ASL Recognition Based on a Coupling between HMMs and 3D Motion Analysis,” Proc. Sixth IEEE Int'l Conf. Computer Vision, pp. 363-369, 1998.
[21] A.D. Wilson and A.F. Bobick, “Learning Visual Behavior for Gesture Analysis,” Proc. IEEE Int'l. Symp. Computer Vision, Coral Gables, Fla., Nov. 1995.
[22] J. Yamato, H. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” Proc. 1992 IEEE Conf. Computer Vision and Pattern Recognition, pp. 379-385, 1992.
[23] S. Young, "HTK: Hidden Markov Model Toolkit V1.5," Cambridge Univ. Eng. Dept., Speech Group, and Entropic Research Lab., Inc., Washington D.C., 1993.
Index Terms:
Gesture recognition, hidden Markov models, wearable computers, sign language, motion and pattern analysis.
Citation:
Thad Starner, Joshua Weaver, Alex Pentland, "Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, Dec. 1998, doi:10.1109/34.735811