The folks in the Cognitive Machines Group at The MIT Media Lab have built a robot that allows them to "investigate connections between natural language semantics, perception, and action". The robot, called Ripley, is a seven-degree-of-freedom robot with many sensors, including touch, vision, gravity, position and sound. You can talk to Ripley about its world of objects, and Ripley will understand you. There's an interesting paper on how they use a visually grounded language model to allow descriptions of scenes in a natural way. Some sample phrases are:

the green one on the left that's hidden by a purple one

on the left the purple one that's all the way in the corner and it's separate

in the middle towards the right there's a line of purple ones and then
there's a kink in the line and the one that's right where the lines turns

the purple one all the way on the right in the front

Using phrases such as this, a person can command Ripley to select and manipulate a single object amongst a group of objects. Ripley can distinguish objects based on size, color, weight (it will weigh objects if it needs to), position, and relation to other objects.

Ripley uses Sphinx-4 for speech recognition. The folks in the Cognitive Machines Group have been great folks to work with and have contributed many good features and enhancements to Sphinx-4.

Comments:

Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere