Virtual Reality in Depth: Inside Google’s Quest to Bring Back 3D Video (EXCLUSIVE)

JUMP camera
Courtesy of Google

It all started with a 3D-printed piece of plastic, zip ties and a bunch of masking tape: Google’s Seattle-based computer vision team seemed to channel MacGyver when it began to experiment with capturing 3D video in the fall of 2014, testing out a bunch of different camera setups put together in the form of janky rigs.

Two years later, the results of those early trials have led to JUMP, Google’s 3D video capture and production platform. JUMP is just one part of Google’s efforts to popularize virtual reality; the company also introduced a low-cost VR headset dubbed Daydream View alongside its new phones last week.

One of the fist prototypes of Google’s JUMP camera. Photo courtesy of Google

Here’s how these pieces fit together: Daydream wants to get people to use VR. JUMP aims to make it VR more immersive, thanks to the power of 3D, while also lowering production costs to the point where many more content producers can shoot 3D video. “Before this, stereo 360 was entirely for high-end visual effects studios,” said JUMP technical lead Sameer Agarwal during a recent interview with Variety.

3D Is Back, Thanks to Virtual Reality

Let’s face it: 3D has become a bit of a dirty word for both Hollywood and consumers. Just a few years ago, the movie industry tied its hopes for the survival of home video to 3D, betting that the excitement for 3D blockbusters would translate to the living room. But 3D TVs largely flopped because of a lack of consumer interest, relegating 3D effects to theaters and big Hollywood blockbusters.

But with growing interest in VR headsets, 3D is getting a kind of second chance — albeit with a very different twist. There are some technical differences to the way 3D works in theaters, on 3D TVs and on VR headsets, but the bigger change is how content creators are using 3D. In VR, 3D is being used as a much more subtle effect to add to the overall sense of immersion — something that neither a TV screen nor a big theater can provide.

Headsets can put viewers in the middle of a refugee camp, on a beach or in a jungle, and make them feel like they’re really there, thanks to the ability to take charge and look around on their own pace. But while computer-generated imagery has long offered 3D experiences in VR, most 360-degree videos are still shot in 2D. Watching them in a headset feels like sitting in a round room without corners, with a movie playing on the entire wall-space.

The science behind Google’s JUMP cameras

Stereoscopic 3D adds depth to this experience by emulating the way human vision works: By relying on images from two eyes with slightly different vantage points, the human mind adds depth to what it sees.

However, the JUMP team found out quickly that capturing 3D with video cameras for VR was rather tricky. Our eyes can add 3D depth to anything around us because we can simply turn our head. Video cameras on the other hand have to be static, pointing towards a set direction — especially if the resulting video is supposed to allow viewers to explore a scene freely, instea of directing them where to look.

VR video producers sometimes use wide angle lenses for 2D 360 degree videos, but these lenses don’t offer the same sense of depth, and can add distortion around the edges. Adding many more cameras to make up for different perspectives would wouldn’t have been economic, if only for the reason that each and every camera captures huge amounts of data. That’s why Google’s JUMP team relied on science instead.

Google’s JUMP emulates the way humans perceive 3D by combining 16 GoPro cameras with an infinite number of “virtual” camera instances. Photo courtesy of Google

Researching cutting-edge algorithms for 3D still photography, they realized that they could simply arrange a set number of cameras in a circle, and have computers simulate an infinite number virtual cameras that would fill in to allow omni-directional 3D video capture.

(For the technically inclined: This is called Omnidirectional Stereo Projection, and the JUMP team will be presenting a research paper about their work in this field at Siggraph Asia in December.)

The team even developed a formula to find out how many real cameras it needed for any given circle size, 3D depth effect and the field of view of particular camera models. It’s complex math, but led them to realize that it was actually fairly simple to build capture hardware with an array of 16 GoPro cameras, which happened to already have the optimal field-of-view. “We were kind of lucky with that,” said Agarwal.

Google’s Cloud to the rescue

The next step was to build the computing backend to make sense of the data from those 16 cameras. Google’s JUMP assembler doesn’t simply stitch their videos together, but instead interpolates imagery from those “virtual” cameras.

Add color correction and other automated editing functionality, and you end up with a pretty time-intensive process. “It’s more than 4000 times slower than real time,” explained Agarwal. On a normal desktop computer, processing of just one hour of JUMP video would take months. Google is instead letting JUMP users upload their footage to the cloud, where numerous servers churn away on it at the same time. Processing an hour of JUMP footage with 1000 cores in the cloud takes merely 10 hours.

A side effect of this approach is that all of the stitching is being automated, which is not always the case with other VR video production environments. Wired for example used to have seven people on staff to do stitching work for VR and 360-degree videos, said Agarwal. “They are doing something else now.”

The cloud-centric approach also means that JUMP is a bit of a black box. Producers shoot their footage with a JUMP camera rig, and use a Mac application to upload the video files from each of the 16 cameras. Then, Google’s servers take over and turn all the source footage into a 3D VR video. Some hours later, the video is available for download, and producers can trim it, or do color correction for some final touches.

Some studios may decide that this doesn’t give them enough control; but the flip side of letting Google’s servers do all of this is that it significantly lowers the barrier of entry for 3D video. Sure, big production studios could feasibly also do similar things with a huge server park, admitted Agarwal. But that would be a mistake, he argued. “If this system is only available to people who have a cluster at their disposal, then we are not doing our job.”

What’s next for JUMP

Google first showed off a JUMP camera prototype dubbed Odyssey at its Google I/O developer conference in 2015, and started to ship first JUMP rigs to select partners soon after. It also equipped YouTube’s production studios in Los Angeles, New York, London and Tokyo with JUMP cameras, and JUMP partners like GoPro, the Atlantic and the AP have since shot dozens of videos with the platform.

Agarwal said that a huge part of his team’s work over the past few months has been to learn from these partners. “We are not filmmakers,” he acknowledged. Learning how JUMP is being used in the field, and what matters to producers, helped a lot, he said, with image quality being one of the key areas of future improvements.

JUMP partners also kept bringing up another issue, which Google is likely going to solve sooner or later: The current camera rigs only shoot omnidirectional video horizontally. Look up straight in any JUMP video, and you’ll only get to see a blurry spot that was out of reach for the rig’s camera lenses.

But more than anything, the next chapter of JUMP may be to scale up production. Google’s cooperation with GoPro on the existing JUMP Odyssey camera rig was little more than a test case, with the number of rigs in the wild being somewhere north of a hundred.

In May, Google announced new partnerships with Chinese camera maker YiYi as well as iMax to produce additional JUMP rigs. It’s unlikely that these efforts will result in a cheap consumer camera any time soon, but making them available to many more producers could be enough to kick off a true 3D renaissance.

Said Agarwal: “We are not done with this.”