Sad Songs, Artificial Intelligence and Gracenote’s Quest to Unlock the World’s Music

Music AI: A Close Look at
Screenshot: Janko Roettgers / Variety

It’s all about that vibe. Anyone who has ever compiled a mix-tape, or a Spotify playlist for that matter, knows that compilations succeed when they carry a certain emotional quality across their songs.

That’s why the music data specialists at Gracenote have long been classifying the world’s music by moods and emotions. Only, Gracenote’s team hasn’t actually listened to each and every one of the 100 million individual song recordings in its database. Instead, it has taught computers to detect emotions, using machine listening and artificial intelligence (AI) to figure out whether a song is dreamy, sultry, or just plain sad.

“Machine learning is a real strategic edge for us,” said Gracenote’s GM of music Brian Hamilton during a recent interview.

Gracenote began its work on what it calls sonic mood classification about 10 years ago. Over time, that work has evolved, as more traditional algorithms were switched out for cutting-edge neural networks. And quietly, it has become one of the best examples for the music industry’s increasing reliance on artificial intelligence.

How computers learn that Gaga’s “Lovegame” is a “sexy stomper”

First things first: AI doesn’t know how you feel. “We don’t know which effect a musical work will have on an individual listener,” said Gracenote’s VP of research Markus Cremer during an interview with Variety. Instead, it is trying to identify the intention of the musician as a kind of inherent emotional quality. In other words: It wants to teach computers which songs are truly sad, not which song may make you feel blue because of some heartbreak in your teenage years.

Still, teaching computers to identify emotions in music is a bit like therapy: First, you name your feelings. Gracenote’s music team initially developed a taxonomy of more than 100 vibes and moods, and has since expanded that list to more than 400 such emotional qualities.

Gracenote’s engineers are teaching the company’s AI systems that Lady Gaga’s song “Lovegame” is a “sexy stomper.” Sean Ryan/IPS/REX/Shutterstock

Some of these include obvious categories like “sultry” and “sassy,” but there are also extremely specific descriptors like “dreamy sensual,” “gentle bittersweet,” and “desperate rabid energy.” New categories are constantly being added, while others are fine-tuned based on how well the system performs. “It’s sort of an iterative process,” explained Gracenote’s head of content architecture and discovery Peter DiMaria. “The taxonomy morphs and evolves.”

In addition to this list of moods, Gracenote also uses a so-called training set for its machine learning efforts. The company’s music experts have picked and classified some 40,000 songs as examples for these categories. Compiling that training set is an art of its own. “We need to make sure that we give it examples of music that people are listening to,” said DiMaria. At the same time, songs have to be the best possible example for any given emotion. “Some tracks are a little ambiguous,” he said.

The current training set includes Lady Gaga’s “Lovegame” as an example for a “sexy stomper,” Radiohead’s “Pyramid Song” as “plaintive,” and Beyonce’s “Me Myself & I” as an example for “soft sensual & intimate.”

Just like the list of emotions itself, that training set needs to be kept fresh constantly. “Artists are creating new types of musical expressions all the time,” said DiMaria. “We need to make sure the system has heard those.” Especially quickly-evolving genres like electronica and hip-hop require frequent updates.

To a computer, compression can sound like a musical style

Once the system has been trained with these songs, it is being let loose on millions of tracks. But computers don’t simply listen to long playlists of songs, one by one. Instead, Gracenote’s system cuts up each track into 700-millisecond slices, and then extracts some 170 different acoustic values, like timbre, from any such slice.

In addition, it sometimes takes larger chunks of a song to analyze a song’s rhythm and similar features. Those values are then being compared against existing data to classify each song. The result isn’t just a single mood, but a mood profile.

All the while, Gracenote’s team has to periodically make sure that things don’t go wrong. “A musical mix is a pretty complex thing,” explained Cremer.

With instruments, vocals, and effects layered on top of each other and the result being optimized for car stereos or internet streaming, there is a lot to listen to for a computer — including things that aren’t actually part of the music.

“It can capture a lot of different things,” said Cremer. Unsupervised, Gracenote’s system could for example decide to pay attention to compression artifacts, and match them to moods, with Cremer joking that the system may decide: “It’s all 96 kbps, so this makes me sad.”

The world’s music, categorized by the world’s emotions

Once Gracenote has classified music by moods, it delivers that data to customers, which use it in a number of different ways. Smaller media services often license Gracenote’s music data as their end-to-end solution for organizing and recommending music. Media center app maker Plex for example uses the company’s music recommendation technology to offer its customers personalized playlists and something the company calls “mood radio.” Plex users can for example pick a mood like “gentle bittersweet,” press play, and then wait for Mazzy Star to do its thing.

Gracenote also delivers its data to some of the industry’s biggest music service operators, including Apple and Spotify. These big players typically don’t like to talk about how they use Gracenote’s data for their products. Bigger streaming services generally tend to operate their own music recommendation algorithms, but they often still make use of Gracenote’s mood data to train and improve those algorithms, or to help human curators pre-select songs that are then being turned into playlists.

This means that music fans may be acutely aware of Gracenote’s mood classification work, while others may have no idea that the company’s AI technology has helped to improve their music listening experience.

Either way, Gracenote has to make sure that its data translates internationally, especially as it licenses it into new markets. On Tuesday, the company announced that it will begin to sell its music data product, which among other things includes mood classification as well as descriptive, cleaned-up metadata for cataloging music, in Europe and Latin America. To make sure that nothing is lost in translation, the company employs international editors who not just translate a word like “sentimental,” but actually listen to example songs to figure out which expression works best in their cultural context.

And the international focus goes both ways. Gracenote is also constantly scouring the globe to feed its training set with new, international sounds. “Our data can work with every last recording on the planet,” said Cremer.

In the end, classifying all of the world’s music is really only possible if companies like Gracenote do not just rely on humans, but also on artificial intelligence and technologies like machine listening. And in many ways, teaching computers to detect sad songs can actually help humans to have a better and more fulfilling music experience — if only because relying on humans would have left many millions of songs unclassified, and thus out of reach for the personalized playlists of their favorite music services.

Using data and technology to unlock these songs from all over the world has been one of the most exciting parts of his job, said Cremer: “The reason I’m here is to make sure that everyone has access to all of that music.”