Hand Tracking for Hand Performance Capture



In the summer after my sophomore year, I was one of the 50 bachelors and masters students, chosen out of a pool of 1500 applicants, to participate in the Summer@EPFL program, a summer research internship, fully funded by the School of Computer and Communication Sciences, EPFL. There, I worked mostly on the “Computer Vision” aspect of a project related to “Hand Performance Capture”, and was supervised by Dr. Andrea Tagliasacchi and Prof. Mark Pauly. The eventual goal of this project is to generate a 3D model of a person’s hand in real-time using RGB-D data from a depth sensor and to manipulate that model to look like paws, claws, etc of some fictional character. I detected the hand and fingertips in the RGB data and created a triangulated 3D mesh of a human hand using RGB-D data, which changes in real-time. I extended a method used for face detection to hand detection, but it was found to be unsuitable as hands have many more generic poses than faces. Moreover, while working on the project, I implemented the “RGB-D sensor extension” for “Starlab”, an open source 3D geometry processing environment, used as a teaching tool for Computer Graphics at EPFL.

Technical Details:

This project helped me develop strong coding skills for computer vision, graphics, and Kinect based applications. I used C++ and Qt to create the user interface for Starlab’s “Kinect” plugin. RBD and Depth frames were acquired from the Kinect sensor using OpenNI. I used OpenGL to triangulate the points obtained from the depth data and implemented smooth shading to create a mesh which updates in real-time as the person moves and as the user rotates the model. Vertex buffer objects were used to make the smooth shading code efficient. HAAR cascades for hand detection available online, did not produce acceptable results. So, The NiTE library and Kinect sensor were used to generate training data to train our own HAAR cascade for hand detection. HAAR cascades trained on the dataset we collected, produced good results for palm detection, but proved to be unsuitable to detect other hand poses as hands have too many generic poses or degrees of freedom. Subsequently, convex hull was implemented in C++ and OpenCV in order to detect fingertips. Since our goal was to align a 3D model with a person’s hand captured from Kinect data, I built a tool that allowed us to extract information of a rigged hand model from a FBX file and import it into Starlab.