The Future of Animation and Real-Time Capturing

Cubic Motion Team talked with us about the digital animation and their real-time animation technology Persona that allows merging characters and real actors.

Cubic Motion Team talked with us about the digital animation and their real-time animation technology Persona that allows merging digital characters and real actors.

Introduction

80lv: Guys, could you introduce yourselves to us and tell us more about Cubic Motion?

Cubic Motion was founded in 2009 by a team of Computer Vision scientists and facial animation experts. We are based in Manchester, England, and use our proprietary tracking and solving technology to create facial animation for video games, TV, and film. Recent projects include God of War with Sony Santa Monica and Spider-Man with Insomniac Games. Advances in our real-time technology have led to live demonstrations such as Hellblade (winner of Real-Time Live at SIGGRAPH 2016), MeetMike at SIGGRAPH 2017, Siren at GDC 2018 and ultimately to the development of our real-time animation product, Persona.

Prior to joining Cubic Motion, many of the team members were involved in Image Metrics, which was founded by our CEO Dr Gareth Edwards, working on projects including GTA 4, the Digital Emily Project and The Curious Case of Benjamin Button.

An exhibitor demonstrates the Cubic Motion Ltd. digital facial animation technology during the Game Developers Conference (GDC) in San Francisco, California, U.S., on Wednesday, March 21, 2018. The GDC is the world’s largest professional game industry event that brings together game designers, programmers, artists, producers, and business professionals in the global game development community. Photographer: David Paul Morris/Bloomberg via Getty Images

The Challenges of Digitizing Humans

80lv: Why are humans so hard to digitize organically? For years, we’ve seen developers trying to do proper mocap, but a lot of times game characters ended up looking more like puppets regardless of the number of polys and the quality of the skin. What are the challenges that make it so complicated?

Humans are hard to digitize because a facial movement is complex and non-linear. Rendering a photo-realistic human involves detailed physical models of skin, eyes, and hair. Even if this can be done well, the character must move and convey emotion in a believable way. Audiences are also pretty good judges of when a digital face doesn’t look right.

We try to use as much real data as possible (performance capture, 3D and 4D scans) to build high fidelity models and try to accurately simulate the underlying physical processes. This is of course still an approximation to the real thing, so no matter how many polygons the face has it will still look wrong if the movement is too simplistic. Overcoming this linear rather than organic appearance is probably the main challenge for digital characters.

1 of 2

Capturing Data

80lv: One of the key things you’re doing incredibly accurately is getting the data. Could you talk a bit about the way the cameras work, what kind of hardware is used and more importantly, what the key data is? What’s necessary to make the character and the actor become one?

Most of the data we capture is from a head-mounted camera (HMC) system. This allows detailed capture of the facial performance when the actor is moving around. This is a fairly specialized piece of equipment and we offer our own HMC as part of Persona to capture high-quality data. The cameras used are typically high frame rate machine vision cameras, deployed in frontal, frontal stereo or front-side configurations for different use cases. For real-time capture, we use front-side to capture depth information without the complication of stereo reconstruction, which is generally more suitable for offline work. The cameras are typically 60fps machine vision cameras, often using IR illumination and lens filters to ensure consistency of lighting.

Facial Rigging

80lv: After you get all the data, how big is the rig? It would be great to have a better look at how complex the result can turn out to be. You’ve shown an example of Gary Oldman’s Star Citizen character – it’s amazing how much stuff you get just for the lips and mouth movement alone. It would be awesome to hear from you about the elements that you think are of vital importance.

Whilst a facial rig can have thousands of polygons and hundreds of joints and blend shapes, the number of degrees of freedom representing facial expressions is much smaller and related to the activation of facial muscles. Many rigs are loosely based around the Facial Action Coding System (FACS) and typically have around 100-150 animation controls

Our task in measuring the actor’s facial movements is to drive these controls as accurately as possible. We typically track around 200-400 points on the face in order to do this. We focus much of this on the shape of the lips, particularly the inner lip line. This is a difficult feature to track because it is an occluding boundary and varies a lot in appearance. The single most important thing to measure is the position of the jaw, as much of the lower face movement is related to it. We do this in some cases by explicitly tracking the teeth, using tools to estimate and interpolate when they are covered by the lips. Where it is difficult to track the teeth, for example in a real-time digital performance, we use additional information from a side camera to help infer the position of the jaw.

Cleanup for Animation

80lv: What is the way you optimize the data for the animations, do the cleanup and make sure the animation looks accurate? Is it all done by hand or maybe is there some clever algorithm to it?

The animation can only be as good as the character rig. If it isn’t capable of certain movements, these will not be translated to the final animation. For offline work, we are able to check and clean up the data at the tracking and animation stage to give the best results. For real-time systems, we apply the same process to training the tracking and solving models. The solver, which maps the input performance data on to the character controls, defines how that data is interpreted to give the best result using machine learning and statistical methods.

Real-Time Capturing

80lv: What’s amazing is that your technology can be used not only to capture stuff for CGI but also for real-time. Could you tell us about the tech that lets you capture data in real-time? Is it the cameras or the software? Are the systems like these available for game artists on a wider scale?

The software allows this all to happen in real-time. Our core, model-based tracking algorithms are able to track hundreds of feature points from multiple cameras at 60fps. Solving this data to the rig controls is even faster, having reduced the complexity of the input image data down to a more manageable level. This all runs on a wearable PC to allow actors to drive live performances wirelessly. This forms the basis of our real-time Persona product, which is available now and allows real-time capture of digital performances and on set pre-visualisation. This will open up opportunities to improve production workflows and delivers immediate animation data without having to wait for an offline animation pass.

1 of 2

AI Tools

80lv: Do you utilize any of the AI-powered tools which could help you decode the data? Do you think AI could actually help to generate believable animation if provided with sufficient amount of data?

AI is often used in this context to refer to deep convolutional neural networks (CNNs) and similar algorithms, which have proved successful in many areas. A good example of this being applied to faces is Digital Domain’s Digital Doug work.

This kind of algorithm is just one of a number of approaches to solving Machine Learning problems. We use a range of Machine Learning methods in our trackers and solvers, depending on the number of training examples available and the complexity of the data. Some of these algorithms are much better at handling sparse training data and less susceptible to over-training than CNNs, though deep learning methods are well suited to highly non-linear problems where large amounts of training and test data are available or can be synthetically generated. We also try to leverage prior knowledge using statistical models and physical constraints to simplify the learning problem where possible, in order to minimize the amount of training data required.

So we have really been using AI/Machine Learning to generate believable facial animation for a number of years.

Distribution

80lv: How can users get hold of your solver? Are you planning to release it and make it more widely available?

This is currently provided by our support team as part of Persona, ensuring fast turnaround and a high level of tracking and animation quality. Future releases will allow the users more control to manipulate and set up the solver.

Cubic Motion Team

Interview conducted by Kirill Tokarev

If you found this article interesting, below we are listing a couple of related Unity Store Assets that may be useful for you.