David Beckham does not speak Arabic, Hindi or Mandarin. But when the soccer legend starred in a PSA for malaria awareness this spring, he effortlessly switched among these and six other languages, thanks to cutting-edge technology that could soon change how Hollywood localizes its movies and TV shows.
The PSA in question was produced with technology from Synthesia, a London-based startup that uses artificial intelligence for dubbing. In Beckham’s case, the company recorded video footage of the soccer star as well as native speakers in each of the languages it wanted to use.
Then it fed all that raw footage to an algorithm that “learned” the facial expressions for each word in languages like Spanish and Yoruba, and tweaked video of Beckham’s face accordingly. “You puppeteer video,” explains Synthesia co-founder and chief operating officer Steffen Tjerrild. “We have actual lip-sync.”
AI-based video editing has gotten a lot of attention for its darker side, best known as the “deep fake” phenomenon — porn videos that have been altered by so-called deep learning-based algorithms to convincingly feature the faces of celebrities. However, Synthesia’s work shows that the technology also has a lot of potential for Hollywood, with dubbing being a key area of interest.
Traditionally, dubbing has been done without altering the source video. Instead, local script writers aim to match translated dialogue to the action on screen. “It takes a lot to prepare those scripts,” says Tjerrild. What’s more, voice actors have to time their delivery perfectly to make sure the dub doesn’t feel off, a process that can take weeks for feature films.
Artificial intelligence that tweaks an actor’s mouth movements to fit the local language could significantly shorten those timelines and make dubs even more accurate, agrees Markus Gross, vice president of research at Disney Research. “If we had the possibility to change lip movement in post-production, that would be a huge deal,” he says. “It points to ways to make the lip-sync indistinguishable from what the actor would do if he spoke that language.”
Disney Research has done some work on using deep learning for speech animation, and is looking at ways to apply the tech to special effects. While dubbing isn’t a high priority for the studio’s research arm, Gross signals that he is intrigued by the possibilities. “It is definitely on the horizon for us,” he says.
Dubbing is becoming ever more important for Hollywood as media companies target consumers worldwide with their own streaming services. Netflix, for instance, is dubbing in 31 languages, and began to target English-language audiences with dubbed versions of foreign originals last year. “We are taking this new initiative very seriously,” says the company’s head of international dubbing, Debra Chinn, about its efforts to unlock global stories for the English-speaking world.
The streaming giant has been exploring methods for automating some aspects of dubbing, but Chinn cautions that there may be limits to what technology can do. “Dubbing is an art,” she says. “It’s a creative process.”
There are indeed technical issues. Synthesia’s tech works best when actors look directly into the camera and is less effective with profile shots or action scenes.
Teaching the algorithms the facial idiosyncrasies of each individual actor is also challenging. “No actor can speak 20 languages,” says Gross. And while an algorithm can do a good job of making it look like someone’s mouth forms foreign-language words, it doesn’t know how an actor such as Leonardo DiCaprio would deliver a line in Mandarin.
Ultimately, studios might have to test their dubs with actual users, suggests Gross. “We have to expose the results to a lot of people and then ask them if they feel it’s authentic or if it looks weird.”
Employing AI for dubbing will be less demanding in animation, where the technology can also be used to better match the mouth movements of a character to the original language. “It is one important step to produce high-quality animation at a lower price tag,” says Gross.
Those savings could help convince Hollywood to embrace this kind of technology in the near future. “Within the next two, three years,” Gross says, “we could see first applications.”