AI Can Now Animate Characters From Speech

NVIDIA revealed that researchers from the Max Planck Institute for Intelligent Systems have created an end-to-end deep learning algorithm that can take any speech signal as input to animate it in a wide range of adult faces.

“There is an extensive literature on estimating 3D face shape, facial expressions, and facial motion from images and videos. Less attention has been paid to estimating 3D properties of faces from sound,” the team wrote in their paper. “Understanding the correlation between speech and facial motion thus provides additional valuable information for analyzing humans, particularly if visual data are noisy, missing, or ambiguous.”

The team used a dataset of 12 subjects and 480 sequences of about 3-4 seconds each to train a deep neural network model on NVIDIA Tesla GPUs, with the cuDNN-accelerated TensorFlow deep learning framework (Voice Operated Character Animation).

“Our goal for VOCA is to generalize well to arbitrary subjects not seen during training,” the researchers pointed out. “Generalization across subjects involves both (i) generalization across different speakers in terms of the audio (variations in accent, speed, audio source, noise, environment, etc.) and (ii) generalization across different facial shapes and motion.”

You can learn more about the research here.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more