Academic Journals Database
Disseminating quality controlled scientific knowledge

Statistical Lip-Appearance Models Trained Automatically Using Audio Information

ADD TO MY LIST
 
Author(s): Daubias Philippe | Del├ęglise Paul

Journal: EURASIP Journal on Advances in Signal Processing
ISSN 1687-6172

Volume: 2002;
Issue: 11;
Start page: 720534;
Date: 2002;
Original page

Keywords: lip-appearance model | lip-shape model | automatic lip-region labeling | artificial neural networks | dynamic time warping | audio-visual corpora

ABSTRACT
We aim at modeling the appearance of the lower face region to assist visual feature extraction for audio-visual speech processing applications. In this paper, we present a neural network based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes. This model requires labeled examples to be trained, and we propose to label images automatically by employing a lip-shape model and a red-hue energy function. To improve the performance of lip-tracking, we propose to use blue marked-up image sequences of the same subject uttering the identical sentences as natural nonmarked-up ones. The easily extracted lip shapes from blue images are then mapped to the natural ones using acoustic information. The lip-shape estimates obtained simplify lip-tracking on the natural images, as they reduce the parameter space dimensionality in the red-hue energy minimization, thus yielding better contour shape and location estimates. We applied the proposed method to a small audio-visual database of three subjects, achieving errors in pixel classification around 6%, compared to 3% for hand-placed contours and 20% for filtered red-hue.
Why do you need a reservation system?      Affiliate Program