DYGEST - Neural Architecture for DYnamic GESTure Recognition


 
Introduction
 

Gestures are part of everyday natural human communication. They are used as anaccompaniment to spoken language and as an expressive medium in their own right.

Recently, there have been strong efforts to develop intelligent, natural interfaces between users and systems based on gesture recognition, which can be used easily and intuitively. These interfaces are able to substitute common interface devices (keybord,mouse, data glove etc.) and to extend their functionality.
The operational area of such intelligent interfaces covers a broad range of application fields in which an arbitrary system is to be controlled by an external user or in which system and user have to interact immediately.

For the recognition of dynamic gestures the extraction and subsequently the exploitation of motion information is required. A lot of approaches deal with motion based recognition. From a sequence of images some motion patterns are extracted and further classified. The choice of the classifier ranges from template matching, statistical techniques to neural networks and Hidden Markov Models.
One of the crucial problems in recognition of dynamic gestures is to deal with the varying temporal structure of dynamic gestures. This results in the need for algorithms which transform a dynamic gesture obtained from an image sequence into a predefined temporal scheme in order to match the obtained gesture against the stored gesture instances.
So far, the methods proposed for gesture recognition were tested on very small sets of simple gestures and thus have very limited scope. The problem remains how to design a system that can work with a large"vocabulary'' of gestures, and remain user independent.

Aims
 

The final outcome of this project consists of a highly structured neural architecture capable for detection of a potential user within the operational area of the system, recognition and interpretation of predefined static gestures (poses), and visually guided navigation of the system dependent on the user's intention, submitted via the corresponding gestures.

Project's Description
 

One part of the project will deal with neural mechanisms for motion-based saliency to improve the robust detection of a potential user in an unknown operational area. These mechanisms will complement a neural architecture for gesture-based human-machine-interaction which has been developed in the research project GESTIK.
The first step involves the robust detection of motion in order to support the localization of a potential user in an unknown indoor environment. These mechanisms are to interact with parallel working moduls detecting skin color, facial structure and the structure of a head-shoulder-area. All these modules drive a neural saliency system for the localization of an user.
The second and main part of the proposed research project is to deal with the motion-based description and recognition of dynamic gestures. This results from the need to extend the gesture-based interaction from static to dynamic gestures in order to obtain a highly flexible, more natural and instructive interaction scheme.
A neural architecture has to be designed and implemented that is capable to obtain a sufficient, gesture-relevant description of the images within the (video) sequence, and to represent the current dynamic gesture in such a way that an efficient matching against stored gesture instances can be made.
Via the use of a hybrid Hidden-Markov-Model and Neural Networks it should be possible to ensure to model the dynamic gestures taking into account the temporal variations of them. This would avoid the need for time warping or related methods that transform the current dynamic gesture into a predefined temporal structure. Furthermore, only in such a way it would be possible to perform dynamic gesture recognition in real-time on conventional computers.