Performance capture is not animation

Cubic Motion’s Simon Elms explains the important decisions that must be made before considering motion capture
We provide specialist facial animation services to many of the world’s largest games developers. Most of the time, some type of performance capture is part of the pipeline and, as a result, new clients will often inquire about ‘capture technology’, wondering how ours compares to others.

Cubic Motion isn’t a vendor of capture technology. We create solutions – usually service-based – for delivering large volumes of high-end facial animation quickly. We’re not tied to any capture method. So we spend quite a lot time explaining to developers that they’re on the wrong track if their first task is to survey capture technologies.

The reason for that is performance capture isn’t animation. You can have the most powerful capture system ever seen and achieve nothing. Likewise, some developers produce great results without any capture at all. Misunderstanding the role of capture remains a major source of confusion, lost cost, and wasted effort for too many developers.

We’ve always believed that any project in facial animation should start with a vision of what you want to see on the screen. Forget about capture at this stage – just decide how you want it to look. Is it photorealistic? Is it even human? Should it by stylised?

Are you trying to produce perfect replicas of real physical actors, or novel characters?

Once you’ve made this choice, you must move on to facial rigs. There have been a few examples of rig-free facial animation based on dense scanning, but we consider them impractical and far too limiting. Any visual advantage a rig-free system may have had has gone. Modern rigs can achieve the same results and better.


Use great rigs. You won’t make a more important decision, but here’s the unfortunate truth: great facial rigs are very hard to build. A common mistake is to collect a set of scans (FACS poses, for example) and assume that some basic processing will yield a set of blendshapes (or whatever) to do the job. It’s much more complex than that.

A great rig must be thoroughly decomposed into isolated controls so that it’s able to generalise to any shape required. If you’ve got a game to make, and don’t have a brilliant rigging team in place already, our advice would be to outsource it to experts.

Once you’re happy with the rigs, and satisfied that your game engine can handle them, it’s time to consider capture options. Most developers tend to choose some form of head-mounted camera system. You can opt for single or multi-camera systems, and various other configurations.

The natural assumption is that more cameras must produce better results, but it’s not that simple. Many factors determine how well performance capture will work, but if you’ve worked on the capture stage itself, you’ll probably realise that stability and ease-of-use are very important. If a complex system shakes too much, or is uncomfortable to wear, you may find the benefits of extra cameras are outweighed by the problems.

The better your rig, the less complex your capture needs to be. When clients test with us, they’re often (pleasantly) shocked by the quality of results from a single-camera system. In many cases, this quality will already exceed the expectations they had from a multi-camera system, and, therefore, they choose this most simple solution.

That’s not to say there isn’t a place for multi-camera capture, or other types of capture – even old-fashioned marker-based mo-cap. The point is simply that you should start with the end in mind, and then work backwards. What do you want to see on-screen? What rig do you need to build to deliver that vision? Then lastly, what type of capture best balances the optimal data to drive that rig, and the convenience and robustness during filming?

Finally, don’t try to make all these decisions yourself – trust experts who’ve done this time and again.