|
|
|
Introduction
The basic idea here is to get a sample of an object's color
by grabbing a small (16 x 16 pixel) window of the image where
the user clicks. This can be done multiple times to capture
light-dark variation. In colorspace (RGB here)
these sampled pixels will form a cluster of points. If the object
sampled is of fairly uniform color and is nearly Lambertian, then
theory tells us that the cluster will lie along a single line between
the origin (black) and the color reflected from the object
(mostly the object's intrinsic color). The points will not be perfectly
linear because of object color variation and camera noise, so by using
principal component (PC) analysis we can compute the parameters of a bounding
ellipsoid for the sample points that is essentially a fat version of a
segment of the theoretical line.
Once the bounding ellipsoid has been computed, it can be used as a
membership function. Pixels captured later can be tested to see whether
they are inside or outside the ellipsoid representing the object color.
Another possible membership function that is easier to construct and faster to
evaluate is a sphere of some radius centered at the mean (in RGB space) of the
sample pixels. This is inadequate for most situations, though.
In the following images you can get an idea of the discrimination ability of
the above membership functions in one particular indoor situation. I got a sample from
just below my right eye in the first image. In the second
and third images, pixels drawn in red are those that have been classified as
inside the computed membership function. White pixels in the third image
represent colors that were saturated in at least one color channel; they are
automatically discarded if present in the sample and are always classified as outside
by the membership function.
|
|
 |
|
Indoor scene |
|
 |
 |
| Sphere of radius 30 |
Ellipsoid scaled to one standard deviation |
|
|
We can use a membership function for tracking by initializing a small (128 x 128 pixels)
tracking window centered on the mean image location of the color sample clicks.
Each tracking cycle, every pixel in the tracking window is evaluated by the membership
function. The mean image location of the pixels classified as inside (red pixels)
is used as the center of the tracking window in the next cycle.
Here, a red latex ball is being tracked using a PC ellipsoid as the membership
function. The X Window display slows things down, but it gives you an idea of how much
information the tracker has to work with.

Drawing directly to the monitor is much faster. Furthermore, by keeping a record of
the tracked ball's consecutive positions, we can gain
additional knowledge for use as input to another program.

|
|
Face-tracking
This technique also works well for tracking faces. Here's what is segmented in a
sample image; note the false positives cause by highlights in my hair.

With the camera mounted on
a Directed Perception
PTU-46 pan-tilt unit, the tracker can follow a person as they walk around the room.
This is done by moving the camera
incrementally in the direction that will bring the tracking window back to the
center of the 640 x 480 video signal. Here are four images taken from a
continuous sequence; note the changes in scale, orientation, and incident illumination
of the tracked face.


|
|
Refinding
Occasionally the tracker will lose track of its quarry. This may be due
to excessive speed, occlusion, or distraction. Distraction depends solely on the
discrimination ability of the membership function, but the first two cases can
often be resolved by repeatedly resampling the entire image at a lower resolution.
Pixels satisfying the color membership function in this lo-res image are weighted
according to the inverse of their distance from the point of loss and how clustered
they are, yielding hypotheses for the new location of the object. When a
sufficiently strong hypothesis is found, tracking is restarted at that point.
With failure recovery, tracking is fast and robust enough to track tossed or
bouncing balls. In the following figure, tracking was initialized on the red latex
ball, which was then removed from view. The ball was then thrown into the camera's
view without warning; the tracker reacquired it before it hit the ground. The red line
shows the ball's trajectory as it was subsequently tracked.
|
|

|
|
In the next figure, the red ball is being tracked as it and a blue ball are juggled
(the camera is turned on its side).
Note that the trajectory line has gaps: this is because
at different points in its arc the ball was going too fast to track or slowed
down enough to be picked up again. Also, the act of catching the ball often
obscures it enough to interrupt tracking; this accounts for the disconnectedness of the
line between catches and throws. When the image was grabbed, the blue ball
was in front of the juggler's nose and the red ball falling into his left hand.

|
|
Future Work
Face tracking is just flesh tracking unless geometry or other
information is incorporated. Other body parts or other people's faces
may distract the tracker.

|