Multi-sensor fusion for human-robot interaction in crowded environments
[Thesis]
McKeague, Stephen John
Yang, Guang-Zhong
Imperial College London
2015
Ph.D.
Imperial College London
2015
For challenges associated with the ageing population, robot assistants are becoming a promising solution. Human-Robot Interaction (HRI) allows a robot to understand the intention of humans in an environment and react accordingly. This thesis proposes HRI techniques to facilitate the transition of robots from lab-based research to real-world environments. The HRI aspects addressed in this thesis are illustrated in the following scenario: an elderly person, engaged in conversation with friends, wishes to attract a robot's attention. This composite task consists of many problems. The robot must detect and track the subject in a crowded environment. To engage with the user, it must track their hand movement. Knowledge of the subject's gaze would ensure that the robot doesn't react to the wrong person. Understanding the subject's group participation would enable the robot to respect existing human-human interaction. Many existing solutions to these problems are too constrained for natural HRI in crowded environments. Some require initial calibration or static backgrounds. Others deal poorly with occlusions, illumination changes, or real-time operation requirements. This work proposes algorithms that fuse multiple sensors to remove these restrictions and increase the accuracy over the state-of-the-art. The main contributions of this thesis are: A hand and body detection method, with a probabilistic algorithm for their real-time association when multiple users and hands are detected in crowded environments; An RGB-D sensor-fusion hand tracker, which increases position and velocity accuracy by combining a depth-image based hand detector with Monte-Carlo updates using colour images; A sensor-fusion gaze estimation system, combining IR and depth cameras on a mobile robot to give better accuracy than traditional visual methods, without the constraints of traditional IR techniques; A group detection method, based on sociological concepts of static and dynamic interactions, which incorporates real-time gaze estimates to enhance detection accuracy.