Robust Hand Detection via Computer Vision


I am currently working on a system for robust hand detection.

The first step is to take a photo of the hand (in HSV color space) with the hand placed in a small rectangle to determine the skin color. I then apply a thresholding filter to set all non-skin pixels to black and all skin pixels white.

So far it works quite well, but I wanted to ask if there is a better way to solve this? For example, I found a few papers mentioning concrete color spaces for caucasian people, but none with a comparison for asian/african/caucasian color-tones.

By the way, I'm working with OpenCV via Python bindings.

10/1/2012 5:30:02 PM

Accepted Answer

Have you taken a look at the camshift paper by Gary Bradski? You can download it from here

I used the the skin detection algorithm a year ago for detecting skin regions for hand tracking and it is robust. It depends on how you use it.

The first problem with using color for tracking is that it is not robust to lighting variations or like you mentioned, when people have different skin tones. However this can be solved easily as mentioned in the paper by:

  1. Convert image to HSV color space.
  2. Throw away the V channel and consider the H and S channel and hence discount for lighting variations.
  3. Threshold pixels with low saturation due to their instability.
  4. Bin the selected skin region into a 2D histogram. (OpenCV"s calcHist function) This histogram now acts as a model for skin.
  5. Compute the "backprojection" (i.e. use the histogram to compute the "probability" that each pixel in your image has the color of skin tone) using calcBackProject. Skin regions will have high values.
  6. You can then either use meanShift to look for the mode of the 2D "probability" map generated by backproject or to detect blobs of high "probability".

Throwing away the V channel in HSV and only considering H and S channels is really enough (surprisingly) to detect different skin tones and under different lighting variations. A plus side is that its computation is fast.

These steps and the corresponding code can be found in the original OpenCV book.

As a side note, I've also used Gaussian Mixture Models (GMM) before. If you are only considering color then I would say using histograms or GMM makes not much difference. In fact the histogram would perform better (if your GMM is not constructed to account for lighting variations etc.). GMM is good if your sample vectors are more sophisticated (i.e. you consider other features) but speed-wise histogram is much faster because computing the probability map using histogram is essentially a table lookup whereas GMM requires performing a matrix computation (for vector with dimension > 1 in the formula for multi-dimension gaussian distribution) which can be time consuming for real time applications.

So in conclusion, if you are only trying to detect skin regions using color, then go with the histogram method. You can adapt it to consider local gradient as well (i.e. histogram of gradients but possibly not going to the full extent of Dalal and Trigg's human detection algo.) so that it can differentiate between skin and regions with similar color (e.g. cardboard or wooden furniture) using the local texture information. But that would require more effort.

For sample source code on how to use histogram for skin detection, you can take a look at OpenCV"s page here. But do note that it is mentioned on that webpage that they only use the hue channel and that using both hue and saturation would give better result.

For a more sophisticated approach, you can take a look at the work on "Detecting naked people" by Margaret Fleck and David Forsyth. This was one of the earlier work on detecting skin regions that considers both color and texture. The details can be found here.

A great resource for source code related to computer vision and image processing, which happens to include code for visual tracking can be found here. And not, its not OpenCV.

Hope this helps.

12/12/2013 9:36:57 AM

I have worked on something similar 2 years ago. You can try with Particle Filter (Condensation), using skin color pixels as input for initialization. It is quite robust and fast. The way I applied it for my project is at this link. You have both a presentation (slides) and the survey. If you initialize the color of the hand with the real color extracted from the hand you are going to track you shouldn't have any problems with black people.

For particle filter I think you can find some code implementation samples. Good luck.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow