Trade-offs with Human Actors in VR

The Missing Key to Sustainable Content

There’s something innately human about wanting to be around other humans. This is our own social programming. When I started playing Robot Recall, as much as I enjoyed the mechanics of telekinesis and catching bullets Matrix-style, I was missing the battle against human characters when the game was a demo called Bullet Time. It’s also the feeling of loss context when trying demos or experiences without anyone in it at all. At SXSW, it was a consensus that volumetric films will replace 360, given time, costs and the advancement of technology. For VR to work like film, the tools need to be accessible, with reasonable post-processing time for the given results. Right now it is easy to give life to almost anything except empathetic characters. I don’t mean controlled avatars, just relatable, animated figures for both tracked narratives and interaction. If we can cross this threshold, we will see content proliferate to every conceivable topic. To deal with the constraints of cost and tech limits, we need to look at trade-offs that we are willing to make to generate sustainable content.

Full Volumetric

If we want interaction AND realism, the options are somewhat limited now–and expensive. 360 film does not separate actors from the scene, in so preventing the audience from watching around or interacting beyond coarse methods today. Photogrammetry tools like those employed by 8i can give us Buzz Aldrin standing in place at several thousand dollars a minute. He becomes a mesh with texture redrawn at 90 fps. This is not a simulated technology anymore than a camera filming a documentary could be considered fiction. It is a wonder to be able to feel a familiar face in the same room as you, even looking them in the eye or walk around them as they talk about Mars. The file limitation is the post-processing time and the lack of adjust-ability as in traditional bone animation. They play more or less like little insert movie clips but can be triggered by interactive calls in a VR engine. In the next 5 years, I expect to see much refinement to this tech, especially through machine learning to allow for interpolation between key-frames so that they assets could begin to take on more diverse uses, like on-call actors in summer blockbusters. If these tools become available on a consumer-level, where semi-pros are able to capture volume vids of their children playing or key milestones in their lives, we may have the next home movie format. Let us also not discount where light-field cameras like Lytro’s is going.

Motion Capture Made Easy

True-to-life authenticity is not always needed. Some of the most celebrated, relatable TV shows star animated characters. Until recently, motion cap over green screen was a very expensive process. Now, there is software that can turn a DSLR or an XBOX Kinect into a basic body ri. These still require reflector suits but that’s beginning to change as well. There are pattern recognition, cloud-driven solutions that are getting better at pulling skeletons out of straight video. If 2 or 3 views are present, the accuracy becomes uncanny. I haven’t seen a good solution for capturing facial details, but do not see any technical limitations on this. Once motion point clouds become mainstream, I expect to see episodic VR animations with exciting new interactive paradigms that will begin to emerge. The caution is the ever uncanny valley, like the CGI recreations for Marvel, Lionsgate and the recent Ghost in the Shell VR short in the faces. These just don’t work, enough that I’d recommend cover the faces up completely. For this format, exaggerated motions and expressions on non-realistic characters seems to be the safe bet.

Limited View Reality Capture

Now something that I am surprised has barely been done thus far is the use of masked 2D and 3D footage in a volumetric environment. VR does not have to be 100% free roaming. Rather, it should aim to maintain quality within some manageable constraints. Imagine a fully rendered living room for the show, the Big Band Theory. The actors could be shot in green screen surrounded by multiple camera angle which would let the audience see the show from several staging points. This is a fair trade-off of quality in exchange for a simple to reproduce new content format that may have interactive or point-of-view potential. The same idea could be applied to 360 cameras over greenscreen to capture a smoother, more realistic and possible stereoscopic vantage point mixed into a photo-real rendered scene with all the bells and whistles that scene could afford. I hope to see the volumetric tools like light-field and photogrammetry also making the same trade-offs to limit the viewing angle in favor of preserving quality, costs and time so that we begin to see consumer-friendly character solutions by the end of the decade.