Successful tracking of articulated hand motion is the first step in many computer vision applications such as gesture recognition. However the nonrigidity of the hand, complex background scenes, and occlusion make tracking a challenging task. We divide and conquer tracking by decomposing complex motion into non-rigid motion and rigid motion. A learning-based algorithm for analyzing non-rigid motion is presented. In this method, appearance-based models are learned from image data, and underlying motion patterns are explored using a generative model. Non-linear dynamics of the articulation such as fast appearance deformation can thus be analyzed without resorting to a complex kinematic model. We approximate the rigid motion as planar motion which can be approached by a filtering method. We unify our treatments of nonrigid motion and rigid motion into a single, robust Bayesian framework and demonstrate the efficacy of this method by performing successful tracking in the presence of significant occlusion clutter.