A few years back, Google was actively exploring whether it should launch a male counterpart to Amazon’s female Alexa voice assistant.
However, at the time, text-to-speech technology was still struggling to make male voices sound natural. “You’d get these warble effects, oftentimes, with male voices,” said Ward. This forced the company to ultimately go with a female voice for the Assistant’s launch in 2016.
Fast forward three years, and the Assistant offers U.S. consumers to choose among 11 male and female voices, with Google announcing this week that it is bringing additional voice choices to nine more languages.
Much of that has been made possible by rapid advancements in cutting-edge artificial intelligence. But the way Google presents itself through voice interfaces has just as much to do with early design decisions related to the Assistant’s name, personality and more. Ward and Google Assistant personality character lead Emma Coats recently talked to Variety to explain how Google found its voice.
Does software have a gender?
Early on, the team working on the Google Assistant had to figure out a key question: What is Google’s persona? Looking at existing Google products like Gmail and Chrome offered few clues. “You don’t see a lot of personality in these products,” recalled Coats.
Without clear guidance from existing products, the Assistant team decided to map out some potential choices. “There were two schools of thinking,” Coats said. One was to turn the Assistant into a kind of audible version of Google.com, a matter-of-fact oracle that spits out knowledge whenever you ask for it. The other was to create more of a character. Make it helpful, but also a bit playful.
Coats’ team developed 20 hypothetical questions consumers might ask the Assistant, and the answers these two types of personalities might respond with. All of the questions and answers were hung up in a conference room, and key stakeholders were asked to mark the ones they preferred with little dot stickers.
The result of this dot-voting process was that the character won over the oracle — the right choice, if you ask Coats. After all, if you search for “Hello” on Google.com, the first result is a video of the Adele song. Said Coats: “Is that really what you want when you are speaking to something?”
But while the Assistant was supposed to have personality, it was also clear early on that consumers shouldn’t confuse it with an actual person. “It should speak like a human, but it should never pretend to be one,” said Coats.
In addition to personality traits, the Assistant team also had to settle on a range of other issues, recalled Coats: “What are the implications of giving the Assistant a human name? Does software have a gender?”
In order to help the assistant be more approachable in a variety of cultures, the team ultimately decided against giving it a human name, and instead settled on the “Google Assistant” moniker. “We did want to make it feel like a conversation with Google,” she said. This also helped to avoid that consumers would associate the Assistant with a single gender, ultimately paving the way for Google to roll out additional voices.
When the Assistant sounds like a ransom note
When Google initially developed the Assistant, it was still relying on traditional text-to-speech technology. This required to record a lot of source material with a voice actor, which was then chopped up, and reassembled by Google’s algorithms to create words and sentences. “It’s kind of like a ransom note,” said Ward.
That system worked reasonably well for common words and phrases, but would trip up a lot on edge cases. “It would sound really choppy,” he recalled. “Aberrations are hard.”
Google achieved a breakthrough when it replaced its traditional text-to-speech model with a deep learning-based approach called Wavenet in 2017. Curious minds will find more on the way Wavenet works on Google’s Deepmind blog, but in essence, the algorithm generates sounds from scratch after having received enough training from voice samples.
This not only resulted in a lot more natural-sounding Google Assistant, it also made it significantly easier for Google to develop and deploy new voices to the Assistant. “We can build more voices in less time,” Ward explained.
With the help of Wavenet, Google has been able to launch 11 voices total in the U.S. market, and many more internationally. The company was even able to work with John Legend to bring his voice to the Assistant.
Ward didn’t want to reveal how many voice samples Legend exactly had to record for this collaboration, but he said that it didn’t require too much of the singer’s time. Now, when users ask Google to “talk like a legend,” they get to hear their local weather forecast and other tidbits in his voice. This is being generated by Wavenet on the fly, with a little bit of Legend ad-libbing and singing thrown in for good measure.
With the Google offering more voices to choose from, the Assistant team found itself confronted with a new challenge: They didn’t want to resort to old gender and personality stereotypes to present all those choices. The solution has been a color picker within the Google app, which lets users quickly switch back and forth between different voices without gender and similar labels.
Google has also been randomizing the voice it starts with in select countries, either starting users with the “red” or the “orange” voice, and users in Italy and South Korea got to hear from a male Assistant voice by default at launch.
In the future, consumers may be able to change more than just the voice about the Google Assistant, suggested Coats. “We think a lot about personalizing the assistant.” For instance, consumers may one day be able to name the Assistant, or decide that it should be more professional during business hours, and more playful after-hours.
No matter what form the Assistant takes in the future, getting that fine line between professionalism and personality right, while also keeping it approachable across demographics and cultures, will likely keep the Assistant team busy for some time to come. Said Coats: “We are still finding the balance.”