Color Tracking



Introduction

The basic idea here is to get a sample of an object's color by grabbing a small (16 x 16 pixel) window of the image where the user clicks. This can be done multiple times to capture light-dark variation. In colorspace (RGB here) these sampled pixels will form a cluster of points. If the object sampled is of fairly uniform color and is nearly Lambertian, then theory tells us that the cluster will lie along a single line between the origin (black) and the color reflected from the object (mostly the object's intrinsic color). The points will not be perfectly linear because of object color variation and camera noise, so by using principal component (PC) analysis we can compute the parameters of a bounding ellipsoid for the sample points that is essentially a fat version of a segment of the theoretical line.
Once the bounding ellipsoid has been computed, it can be used as a membership function. Pixels captured later can be tested to see whether they are inside or outside the ellipsoid representing the object color. Another possible membership function that is easier to construct and faster to evaluate is a sphere of some radius centered at the mean (in RGB space) of the sample pixels. This is inadequate for most situations, though.
In the following images you can get an idea of the discrimination ability of the above membership functions in one particular indoor situation. I got a sample from just below my right eye in the first image. In the second and third images, pixels drawn in red are those that have been classified as inside the computed membership function. White pixels in the third image represent colors that were saturated in at least one color channel; they are automatically discarded if present in the sample and are always classified as outside by the membership function.
Indoor scene
Sphere of radius 30
Ellipsoid scaled to one standard deviation
We can use a membership function for tracking by initializing a small (128 x 128 pixels) tracking window centered on the mean image location of the color sample clicks. Each tracking cycle, every pixel in the tracking window is evaluated by the membership function. The mean image location of the pixels classified as inside (red pixels) is used as the center of the tracking window in the next cycle.
Here, a red latex ball is being tracked using a PC ellipsoid as the membership function. The X Window display slows things down, but it gives you an idea of how much information the tracker has to work with.



Drawing directly to the monitor is much faster. Furthermore, by keeping a record of the tracked ball's consecutive positions, we can gain additional knowledge for use as input to another program.



Face-tracking

This technique also works well for tracking faces. Here's what is segmented in a sample image; note the false positives cause by highlights in my hair.



With the camera mounted on a Directed Perception PTU-46 pan-tilt unit, the tracker can follow a person as they walk around the room. This is done by moving the camera incrementally in the direction that will bring the tracking window back to the center of the 640 x 480 video signal. Here are four images taken from a continuous sequence; note the changes in scale, orientation, and incident illumination of the tracked face.




Refinding

Occasionally the tracker will lose track of its quarry. This may be due to excessive speed, occlusion, or distraction. Distraction depends solely on the discrimination ability of the membership function, but the first two cases can often be resolved by repeatedly resampling the entire image at a lower resolution. Pixels satisfying the color membership function in this lo-res image are weighted according to the inverse of their distance from the point of loss and how clustered they are, yielding hypotheses for the new location of the object. When a sufficiently strong hypothesis is found, tracking is restarted at that point.
With failure recovery, tracking is fast and robust enough to track tossed or bouncing balls. In the following figure, tracking was initialized on the red latex ball, which was then removed from view. The ball was then thrown into the camera's view without warning; the tracker reacquired it before it hit the ground. The red line shows the ball's trajectory as it was subsequently tracked.



In the next figure, the red ball is being tracked as it and a blue ball are juggled (the camera is turned on its side). Note that the trajectory line has gaps: this is because at different points in its arc the ball was going too fast to track or slowed down enough to be picked up again. Also, the act of catching the ball often obscures it enough to interrupt tracking; this accounts for the disconnectedness of the line between catches and throws. When the image was grabbed, the blue ball was in front of the juggler's nose and the red ball falling into his left hand.



Future Work

Face tracking is just flesh tracking unless geometry or other information is incorporated. Other body parts or other people's faces may distract the tracker.