Arm and body pose are useful cues for diectic reference-users naturally extend their arms to objects of interest in a dialog. We present recent progress on untethered sensing of articulated arm and body configuration using robust stereo vision techniques. These techniques allow robust, accurate, real-time tracking of 3-D position and orientation. We demonstrate users? performance with our system on object selection tasks and describe our initial efforts to integrate this system into a multimodal conversational dialog framework.