Echo Nest: A Music Discovery Service That Also Knows How You'll VoteFebruary 06, 2014
by Tom Vanderbilt
The Echo Nest helps music services from Spotify to Rdio and Rhapsody suggest tunes you'll like. But your playlists also teach its algorithms what movies you'll watch -- and even how you'll vote
"I just found Polish reggae two days ago," boasts Glenn McDonald, lead engineer at the Cambridge, Massachusetts-based music startup The Echo Nest, as we stare at a screen in a conference room.
"Apparently, before the Wall came down, and big western artists came into Poland, a common sound track of the protest movements was Bob Marley," says McDonald, a compact man with close cropped hair and an intense, caffeinated manner. Meanwhile, other reggae bands, such as the UK's Misty in Roots, played influential shows there during the 80s. As the Ramones at CBGB did for punk, he says, the mere exposure to this music seemed to launch scores of reggae acts, from Paprika Korps to Pablopavo i Ludziki.(According to Wikipedia, the latter "is one of the first vocalists singing raggamuffin in Polish".)
The Echo Nest claims it reaches around 100 million listeners per month, by powering music discovery services such as Spotify, Rdio, Rhapsody and VEVO, and delivering musical connections where none may have existed before. Even now, where every band is no more than a Google search away, you may never have heard of these two acts, or indeed the genre. But a few days before, as The Echo Nest's research algorithms were sifting through "unassigned artists", McDonald found a strange cluster. "The computer said, 'These go together more strongly than with anything else,'" says McDonald.
And so another entry was added to Every Noise at Once, McDonald's massive project to map the world's corpus of musical genres. Staring at the sprawling projection up on the wall, which resembles Mark Lombardi's unsettlingly internecine drawings of political conspiracies, one finds Polish reggae wedged roughly between Romanian pop and K-hop (or Korean hip-hop), closer in musical space to Chicago soul than it is to Finnish hip-hop.
These genre clusters are created not by what the artists sound like, but how they are described in the world: on music blogs; in record-company catalogues; on your cousin's Facebook page -- anything The Echo Nest's semantic trawl pulls up. Although to human ears, there is a distinct sound to Polish reggae -- McDonald says there is a "polka-folk melody to some of it", not to mention the lyrics being in Polish -- the distinction is murkier to a computer. There are reggae bands from Bulgaria to Omaha, Nebraska, that would sound, in terms of quantitative audio signal, fairly similar. "But Polish reggae is clearly a thing," says McDonald, "and bands from Bulgaria to Omaha aren't part of it, no matter what they sound like."
Every Noise at Once is more of an obsessive skunkworks-style exercise in computer learning than a revenue stream for The Echo Nest. The company grew out of research at MIT's Media Lab, and now essentially makes money as the behind-the-scenes engine powering any number of greater-known applications, from Rdio to Spotify to Yahoo! to MTV. "In some ways," says Jim Lucchese, The Echo Nest's CEO, "we're the nerds doing the cool kids' homework," adding over drinks in Tribeca in New York City that, before McDonald joined the company, his hobby was "cleaning up music metadata".
But Every Noise at Once speaks to the same animating impulse on which the company was founded, as described by The Echo Nest's cofounder and chief technical officer, 36-year-old Brian Whitman. "We are trying to automatically understand everything about music," he says. "What people are saying about it, what it sounds like." That means teaching computers about human constructs such as genre, which provide handy "bookmarks", as McDonald describes it, to finding one's place in the musical universe; but which can also be imprecise or particular to a computer. Run a query for the hottest "rock" artists, he notes, and the computer returns Rihanna. With enough human guidance and algorithmic power, however, a kind of AI version of the frenzied, aspiringly omniscient hipster narrator of LCD Soundsystem's Losing My Edge ("I'm losing my edge to the internet seekers who can tell me every member of every good group from 1962 to 1978") can be created. As McDonald notes in a blog post, "our computers can now enter plausibly into arguments over almost 500 genres, from a cappella to zydeco."
The Echo Nest claims to have a knowledge base of more than a trillion data points, covering more than 35 million songs and 2.5 million artists. This is music in the age of Big Data, when you can carry "almost all of recorded music in your pocket", as Echo Nest development director Paul Lamere puts it. The algorithms can rank the danceability of a tune somewhere on a spectrum of zero to one: it turns out that the Top Ten's danceability has changed little from "The Twist" to twerking, according to The Echo Nest. The listening habits of music-service users can meanwhile be tracked in real-time and correlated across their digital lives to derive powerful insights about their likely non-musical preferences. At the heart of what The Echo Nest is trying to do is not simply find the right song to play for you next, but to understand something about what it is playing and how that might relate to you. As Whitman puts it: "We're trying to get to the stuff you and I would be implicitly figuring out when we have a conversation about music, about ourselves." Generically categorising the world's recorded music is, by itself, a gargantuan task. But even more formidable is cracking who's on the other end of the signal.
Brian Whitman and Tristan Jehan, (above: left and right) cofounders of The Echo Nest, met a decade ago at the MIT Media Lab. Whitman had been recording "intelligent dance music" ("the only genre," he jokes, sitting in a conference room, "happy enough to compliment itself in its name") under the name Blitter, and found it hard, like so many musicians, to successfully do so "at scale". As he recalls, the audience "was out there, but hard to find". How could those fans be discovered and connected? Returning to his graduate studies, he began to work in natural language processing while thinking back to his original problem.
He began conducting experiments such as trying to automatically extract information ("with maximal semantic value") from song reviews, even taking on Pitchfork, the granddaddy of oblique music writing. Jehan, meanwhile, was toiling in the world of "music information retrieval", a wide-ranging discipline that seeks, in essence, to break down music into its component parts; to turn it into data. This is surprisingly difficult. For example, to a computer, the harpsichord and the guitar sound rather similar. "At the end of the day, it's a plucked string," says Jehan, a stylish 39-year-old Frenchman. Or take what he calls "valence". What is a song's mood? As Lamere had joked: "The happiest thrash metal song may sound like the most angry soft rock song." Is a song such as Joy Division's major-key Ceremony happy or sad? Then there's what might be called the Bohemian Rhapsody problem. "What do you do with a song like this?" asks Jehan. "You can't just have one data point representing one thing about it."
So The Echo Nest was born. The goal, Whitman says, was to get past the problems of "care and scale" that plagued the existing music recommendation and discovery technologies. Collaborative filtering ("those who bought X also bought Y") suffers on the "care" standard. "These technologies don't know about the music itself; they only know about some of the usage patterns around it," says Whitman. A Beatles album on Amazon will simply show that listeners also bought other Beatles albums, while the closed loop of popularity bias makes it nigh impossible for new music to enter the system. A service such as Pandora, Whitman argues, has expert human tagging, but fails on scale. "They've got a million songs. To us that's ridiculous. There's so much music that Pandora will never catalogue."
The cramped offices of The Echo Nest are not far from both MIT and the Berklee College of Music, and the place seems to have a foot in both camps. "We understand everything there is to know about music," says Whitman, "and then do a lot of crazy stuff around that." Every so often, for instance, the company will unveil a music hack -- for instance, a "Bonhamiser" that will add the Led Zeppelin drummer's signature sound to any song; or "The Infinite Jukebox", which takes a song, breaks it down via the Echo Nest Analyzer (a piece of software that codes songs according to a number of attributes, such as "acousticness"), and will randomly jump from one part in the song to another that sounds vaguely similar, so that the song never ends. The atmosphere is a hothouse of hacker bravado and record-shop-assistant connoisseurship, something like The Social Network meets High Fidelity.
The Echo Nest's algorithms also conduct searches of what's being said about music online. "We're out there crawling words," says Whitman. "If people start using a word, the system will bubble it up." If something called EDM-core -- "I'm making it up" -- starts showing up on blogs, it could become a meaningful term to associate with a number of artists.
Semantic searching has its challenges, as McDonald notes. "The biggest problem is ambiguity," he says. Take something as simple as an artist's name. "There's a Polish death-metal band called Hate, there's a Compton rapper called Hate, there's a Japanese pop band called Hate." Humans can use contextual clues to disambiguate, "but the people writing about them are not putting unique identifiers about them -- it can be hard to tell which is which."
The big problem with an acoustic-based recommendation system is that just because a computer deems that two things sound alike does not mean a human will. Take Miley Cyrus. "You could sort of line up Miley Cyrus with a few indie female singer-songwriters," says Lemere. "There was a very similar acoustic footprint. But you'd never put Miley in that singer-songwriter cluster -- the cultural mismatch would just be too bad." What Lemere is talking about is perhaps the greatest machine-learning challenge of all: human taste.
What you listen to, Whitman argues, is the most direct route to who you are. "Music preference can predict more about you than anything else," he says. "If all I knew about you was the last five books you read, I wouldn't know much about you." Music is personal; people talk about "my music" in discussing playlists and collections. The Echo Nest, in a typical exercise called "What's Your Stereotype?", asks users to reveal the kind of music they like. It then delivers a profile; in Wired's case, it was witheringly accurate: "Hipster barista".
Whitman says that although the company's algorithms had become pretty good at making sense of music, they had less of a grip "on understanding how listeners relate to it". So its latest big push is what it calls its "Taste Profile" technology. At its furthest horizon, this is about using music to understand people's other affinities. In one exercise, The Echo Nest correlated listening preferences to political affiliation (expressed via data such as Facebook Likes). It turns out that, in the US, Pink Floyd are primarily liked by Republicans. Whitman speculates this has primarily to do with the changing demographics of an ageing fan base. Mining the data revealed other tendencies: Democrats liked more music genres (ten) than Republicans (seven); but liking The Beatles predicted nothing in the way of political preference. Curiously, the least predictive of all genres when it came to political affiliation was heavy metal. Loud and rebellious apparently cuts all kinds of ways.
One day, political campaigns may choose not to bother approaching voters who play a lot of Nordic metal on Spotify. But for now, a challenge looms for online listening services: maintaining listeners. A recent Echo Nest white paper notes that the majority of users of services such as Pandora and Rdio are inactive, and thus contribute little advertising value. "We have some really scary numbers from some of our customers about the sort of churn rate of people who will sign up for a free trial and never listen to anything," says Whitman. Call it search fright: you sign up for a service that has everything you could ever want to listen to, and suddenly the prospect of listening to any one thing becomes overwhelming. Not to mention the question of what to play after that.
The Echo Nest believes that figuring out what kind of listener you are -- rather than just knowing what you listened to -- is the key to engagement. So it is modelling attributes such as "mainstreamness" -- how far out do your tastes go compared to other listeners'? Are Radiohead experimental for you, or about as popular as you'll go? Or what it calls "taste freeze" -- does all of your listening tend to cluster in a particular time in your life? "A lot of our customers have this problem where the people who have money to pay for a streaming service tend to be a bit older," says Whitman. "They're slowing down discovering new music. It's a big problem for [streaming services] because the more music you discover, the longer you stay on." And there is, as always, the tyranny of the skip button, which not only provides an exact measure of when a playlist stopped appealing to someone, it arguably makes it harder to deliver that playlist. Whitman notes that when The Echo Nest designed playlisting applications for radio companies, this button faced resistance from programmers. "They think the road to getting people to like a new song is forcing them to listen to it four or five times," he says. Indeed, research backs this up: in what's called the "mere exposure" effect, people's liking for all kinds of music goes up the more they are exposed to it. After all, every song is unfamiliar at some point.
Of course, there is more to our relationship with music than a set of curated preferences. Whitman suggests the next frontier in discovery and playlisting will get deeper into how people actually interact with music. As he noted at a talk at Microsoft, "not just what they skip, ban and recommend, but when? Did they just break up with their girlfriend?" He calls it watching people "fumbling with music".
Music today is like a non-stop sound track running through people's lives. Adrian North, a professor of psychology at Australia's Curtin University, says that music has become so pervasive that, although it still forms a central part of people's identity, "it's as though it's become so omnipresent that they've almost forgotten about it". Where it was once at the centre of vital subcultures, now "music just sits there enmeshed with a gazillion other apps on your iPhone. You can't separate your musical identity from the rest of your identity any more."
Having access to any song you want has only closed this circle, says North. He points to research by his lab suggesting that people's music selection is tied to what they're doing: while commuting, for example, music is "almost like a safe haven". What's ironic, he notes, is that what makes this sort of analysis possible is the very device that's playing the music -- and those people, like The Echo Nest, that are paying attention.
Lamere sees this as one of the next frontiers of music recommendation: "Being able to understand a person's context and what kind of music they want to listen to, and playing it for them." The phone, he says, "knows whether I'm running or browsing the web." But the APIs for converting this context into musical data are underdeveloped. As Whitman says, the privacy settings on smartphones typically do not apply to music apps: "They'll have to add that soon."
For now, Lamere says, "our smartphones look very similar to cassette players from 30 years ago. You've got a 'play' button and that's about it." But if your phone knows so much about you -- where you are, what you are doing and what kind of music you like, why should we even have to choose the song we want to hear?