106 Cameras, Holograms and Sticky Tape: Inside Microsoft’s Mixed Reality Capture Studios

Microsoft Mixed Reality Capture studio
Janko Roettgers / Variety

Before famed British broadcaster Sir David Attenborough ventured across the Atlantic to star in his first virtual reality film last year, an advance team took the trip with a key prop: a handful of Attenborough’s signature pale blue shirts.

The shirts had to be tested because Attenborough wasn’t just visiting a regular VR film set. Instead, he was being filmed at Microsoft’s Mixed Reality Capture studio, housed on the company’s Redmond campus. It’s a specialized studio dedicated to volumetric capture, capable of recording people in full 3D, ready to be turned into holograms.

Microsoft has been working on this technology for close to eight years, and recently opened another such studio in San Francisco. The company has ambitious plans to license its technology to a variety of operators — collaborations that could perhaps one day result in futuristic photo booths, capable of turning anyone into a hologram for a few bucks.

Lazy loaded image
Courtesy of Factory 42

Sir David Attenborough is being turned into a hologram in Microsoft’s Redmond Mixed Reality Capture studio.

But right now, volumetric capture is still very much a cutting-edge technology, with lots of potential pitfalls. That’s why Factory 42, the U.K.-based immersive content studio that was working with Attenborough, had to test whether his pale blue shirts would cause complications with the studio’s 106 cameras.

Once Attenborough had arrived in Redmond, they had to glue down the collar of his shirts, and torture the 91-year-old broadcast veteran in hair and makeup. “We had to spray incredible amounts of hair spray on him,” recalled Factory 42 co-founder and creative director Dan Smith, adding: “It’s a technology in its infancy, so there are restrictions.”

Microsoft’s volumetric capture efforts began with the Kinect

Volumetric capture is still very new, but also advancing at a rapid pace, driven largely by increased demand for 3D assets for virtual and augmented reality. Microsoft’s work on volumetric capture began years ago, when Microsoft Research employees were trying to use the Xbox’s original Kinect motion sensor as a way to capture 3-D and holographic images.

Those efforts didn’t pan out, but the company stuck with its efforts, and began tests using off-the-shelf cameras for volumetric capture. Initially, those efforts were driven by the desire to produce 3D content for the company’s HoloLens augmented reality headset. Nowadays, Microsoft thinks bigger, and sees its capture studios enabling mixed reality across headsets, mobile and desktop computing. “We are seeing us as trying to support mixed reality in general,” said Mixed Reality Capture Studios general manager Steve Sullivan.

Microsoft opened its Redmond Mixed Reality Capture Studio four years ago, and recently unveiled a similar studio housed in its San Francisco Microsoft Reactor space. There, 106 cameras encircle a space of roughly 25 by 25 feet, with green screens and light arranged in all directions. Right outside of the capture space, monitors show the session in real-time from four directions. And just next door, dozens of servers are humming to process all the raw video captured by those 106 cameras, to the tune of 10 GB of data per second.

Microsoft's Mixed Reality Capture studio San Francisco

The capture stage of Microsoft’s Mixed Reality Capture studio in San Francisco.

Ultimately, Microsoft wants to make 3D holograms look virtually indistinguishable from regular video. “That’s the goal, and we are pretty close to it,” said Sullivan. To achieve this, Microsoft uses two different types of cameras: 53 RGB cameras used for capturing video from many different angles, as well as 53 infrared cameras that capture special IR laser light points projected onto the recorded subject.

The latter is being used to create a kind of map of the surface of a person, which is then combined with silhouette data and traditional 3D video to recreate a 3D model of a person. (For the technically inclined: Microsoft published an in-depth paper about its approach to volumetric capture online.) Microsoft uses also some AI to aid this process, and for instance has algorithms detect the face of a person to spend more of its rendering power on that region. After all, viewers will likely pay a lot more attention to an actor’s face than her shoes.

How holograms can transform virtual reality

Microsoft isn’t the only company looking to popularize volumetric capture. New Zealand- and Los Angeles-based capture startup 8i has been working on similar technology for years, and Intel opened a much bigger volumetric capture stage in Los Angeles earlier this year.

But while Intel focuses on high-end productions that could include groups of actors moving across a bigger space, Microsoft is very much committed to keeping it compact and easily replicable. The capture space of Microsoft’s setup is much smaller than that of Intel’s Los Angeles studio, measuring about 8 feet in diameter, with a maximum height of 10 feet. But that also allows the company to keep the setup portable, ready to be broken down and relocated to other spaces if need be.

Microsoft Mixed Reality Capture Studio

The capture space of Microsoft’s Mixed Reality Capture studio in San Francisco measures about 8 feet in diameter.

Microsoft also made it a point to only use off-the-shelf cameras and other easily accessible components to put its Mixed Reality Capture studio together. That’s because Microsoft ultimately wants others to license its technology, and allow third parties to open their own capture studios around the world. A first such licensed space recently opened in London, and many more are planned.

The company is scaling up its efforts just in time for a rapidly growing demand for 3D assets, spurred in part by the growing popularity of virtual reality, and with that the hunt for ever-more-immersive experiences.

A few years back, VR experiences could largely be divided into two groups: Animated content produced with the help of game engines, offering users of higher-end VR headsets the ability to lean into the scenery, and even walk around computer-generated characters. On the other end of the spectrum were 360-degree videos with real actors. Even with 3D, these were still essentially films projected on a sphere around the viewer, without the ability to lean in.

Microsoft Mixed Reality Capture Studio

The control room of Microsoft’s Mixed Reality Capture studio San Francisco.

Holograms like the ones captured at Microsoft’s Mixed Reality Capture Studios allow filmmakers to combine both approaches: Immersive sets that viewers can step into, rendered in real-time by a game engine, with real-life actors, captured as 3D holographic assets. That’s an approach that worked really well for Factory 42 and its “Hold the World” project, which is set to be released by the British broadcaster Sky in the coming weeks. In it, viewers an are able to go behind the scenes of London’s Natural History Museum, where they’re able to virtually explore some of the Museum’s rare artifacts, guided by Attenborough himself.

“It’s not the same as normal VR,” said Factory 42 co-founder and CEO John Cassy. “It’s a big step closer to reality.”

Sullivan agreed, and said that adding holograms can make VR a lot more relatable. “Without the humans, it’s very sterile,” he said.

How it feels like to be turned into a hologram

Turning a human into a hologram, or the process of volumetric capture, is ultimately not all that different from a regular film shoot, as I got to experience during a recent visit to Microsoft’s Mixed Reality Capture studio in San Francisco. That is, if you discount the fact that there is not just one camera facing you, but a full 106 lenses, pointed all around you.

A video of the writer being turned into a hologram, showing off the way Microsoft’s technology captures 3D textures and silhouettes.

Many first-timers make the mistake that they want to address all of the cameras, but I was discouraged to turn a lot. Instead, one of the facility’s managers helped me by providing  a point of focus. In Attenborough’s case, the setup even involved a few paper cups, turned upside down, to give him an idea of where the viewer would be, and where computer-generated objects would appear later-on.

Like Attenborough, I was also prepared to only wear certain clothes, and avoid busy patterns that could confuse the algorithms. My collar wasn’t glued down, but still arranged to cause fewer issues during the shoot. While 106 cameras can capture a lot of details, there are occasionally some blind spots. Small gaps between clothing that the camera can’t catch, and that the computer then fills out with random mush, resulting in extra color that can look similar to webbed fingers.

Developers need holograms to feed the mobile AR boom

In addition to VR, augmented reality (AR) is also driving a lot of the increased demand in volumetric capture. Headsets like Microsoft’s HoloLens may only have reached a small group of users thus far, but Apple opened the floodgates for AR last summer when it brought the technology to its mobile devices.

ARKit, which allows developers to add digital objects to the camera view of an iPhone or iPad, brought the technology to hundreds of millions of consumers over-night. Google has since followed suit with a similar technology for Android devices, and both Snapchat and Facebook have been investing heavily in mobile AR as well.

Many mobile AR apps simply add holograms to a view of the camera, allowing users to capture celebrities, wild tigers and more in their everyday life. That’s an ideal application for the content captured at Microsoft’s Mixed Reality Capture studios, as I got to learn when the team placed a juggling mini-me into someone’s living room.

Most consumers won’t have a chance to turn themselves into holograms like that just yet. However, Microsoft’s Mixed Reality Capture team is already thinking about democratizing the technology. As a test, Sullivan recently captured himself with his own kids. Now, he has a volumetric 3D video of them that he will be able to watch on VR headsets and any other future devices as they grow up. “It’s a very meaningful experience,” he said.

Sullivan doesn’t think that there will be a consumer-grade version of the technology any time soon. Even ignoring the significant equipment costs, setting up and calibrating the cameras is still too complicated for your average living room.

However, he argued that the technology could easily find its way into a neighborhood mall photo studio. Or perhaps even into a photo booth of the future, where you walk away not with a film strip, but a hologram. Microsoft is using its existing Mixed Reality Studios as testing ground for such future deployments, said Sullivan. “This is teaching us how to get to a much more lightweight form factor.”

Lazy loaded image
Janko Roettgers / Variety

Microsoft’s servers, crunching away on turning huge amounts of raw video data into holograms.

One of the improvements Microsoft is already working on is the processing. Ultimately, the company wants to not rely on local servers anymore, and instead do all the number crunching in the cloud, doing away with a key bottleneck. “It’s on our path to commoditizing this capture,” said Sullivan. The company has also already simplified camera calibration, and made the setup process in general much easier. “It’s no longer a science project,” he said.

That’s not to say that volumetric capture won’t have some quirks for some time to come, as Attenborough and anyone else who has gone through the process can attest. Said Smith:

“There is still sticky tape and coffee cups involved, and a whole lot of hair spray.”