Overview: The focus of my
current interests is artificial hand/eye coordination and vision-based
robotics. In particular, I'd like to know the best way to tell a robot what to
do and ensure it gets done.
Why worry about how to tell a robot what to do? The motivation comes from the
simple observation that robots can do things and people have things to
do. If these two sides - task specification and task execution -
can communicate, things will get done.
Since robots have always been
told what to do, why isn't the problem already solved? The problem is solved only
for robots which blindly follow commands to move from position to position -
the kinds of robots used to assemble cars, for example. However, the rapidly
maturing field of visual servoing has enabled robotic systems to act on what
they "see" in their environment. When a robot and a human are looking at the same
thing, it isn't obvious how to translate the human's intentions into robotic
action. What is it we want the robot to see? To date, people have borne all of the burden of answering this question. Similar to the early days of computing, in which programmers painstakingly wrote (or punched) each elementary instruction in machine code, visual tasks are often specified by deciding exactly how each visual feature should look when the task is accomplished.
Certainly, a robot will depend on this low level of feature interaction to perform a task successfully, just as machine instructions continue to run the world's computers. Our goal is to push more of the work of generating those task-specific details onto the computer itself - in the spirit of the compilers and development environments which facilitate software programming today.
The following links offer greater detail on our approaches to this project, as well as some related (and unrelated) work in vision-based robotics.
What can be done with a seeing robot? This seems a logical starting point,
and, in fact, the answer to this question provides a structure on performable
visual tasks from which we build a language for specifying them.
A calculus for feature-based manipulation We use the basic structure
imposed by a system's cameras to determine a family of primitive
performable skills. A set of operators on those skills generates a
language for composing and executing any performable task.
Abstracting to object-level interaction The promise of a general hand/eye
task language is the ability to specify tasks as object-based
interactions. With appropriate object models available, the system itself can
"compile" this high-level task description into the "machine code" mecessary
for accurate execution.
What vision algorithms are needed? The transition from objects to and from
features is not to be taken lightly. What constitutes an object and what
constitutes a feature are, in general, ill-defined. We accept a group of
shortcuts to get the visual system's job accomplished, but these skirt some
important issues.
The human-computer interface Some of the vision tools we use to control
the robot can enhance the interface between man and machine. Why tell when you
can show?
Linux Device Driver for the Zebra Zero 6 DOF Robot I have been developing
an interface to the motor control board of IMI's Zebra Zero Robot under
Linux. This will allow for considerably easier networking than the supplied
DOS-based system. The code is available.
Papers Available A number of papers on this and other work are available
in postscript and/or HTML form.