No comments


Collaboration between music and data science communities to extract insights from the richest and largest music appreciation dataset ever shared – the EMI Million Interview dataset

London, 18th July 2012 – Data Science London , a non-profit community of data scientists, and EMI Music, one of the world’s leading music companies, today announce that they have joined forces to release parts of the EMI Million Interview dataset EMI is building, the deepest and most extensive collection of data on music consumers ever shared. This collaborative project will feature a series of community events, conferences and competitions to promote and build a community around music data science starting with the Music Data Science Hackathon, a 24-hour global music competition taking place this weekend, 21st-22nd July.

Over time, the EMI Million Interview dataset promises to radically transform analysis and insight in the music industry, improving the understanding of how artists and their fans connect to the benefit of music lovers everywhere. Before the EMI Million Interview dataset, public access to such information has been very limited. Now, this wealth of data creates an opportunity for a new approach to music research and insight.

David Boyle, SVP Insight for EMI Music said: “EMI’s insight has profoundly increased our understanding of music consumers and the service we now provide for our artists. I’m excited and touched that our initiative is helping to bring the music and data science communities together. The Data Science London community is inspiring; I’ve never seen so many great people come together to share ideas, data and science. With the EMI Million Interview Dataset we hope to bring more new ways of thinking into our industry that will deliver enormous benefits to artists and their fans.”

The EMI Million Interview Dataset is the richest and largest music appreciation dataset ever; a massive, unique, rich, high-quality dataset that contains interests, attitudes, behaviours, familiarity, and appreciation of music as expressed by music fans around the world. This new initiative with Data Science London will see a rich subset of data from those interviews shared extensively so that many eyes can bring new insights from the data.

Data scientists will have their first opportunity to showcase their talents and extract insight from parts of the EMI Million Interview Dataset at the Music Data Science Hackathon, a 24 hour music data science competition launching this Saturday, July 21st. The hackathon challenge will be “Can you predict if a listener will love a new song?” and will require entrants to develop an algorithm that can predict a listener’s level of appreciation for songs and artists, based on the listener’s demographics, word associations, and the past interviews contained in the EMI Million Interview Dataset.

The data scientists taking part in the hackathon will be competing for £6,500 in cash prizes sponsored by EMI and EMC. EMC, a world leader in data science and big data solutions, are providing IT Infrastructure and analytical tools to the contestants, as well as operational support for the competition through its Greenplum division. The EMI Million Interview Dataset is the latest partnership between EMC Greenplum’s Data Science Community platform and Data Science London which aims to promote the adoption of data science and enable collaboration across business and communities around the world.

“Community, learning and collaboration are at the heart of innovation. To succeed in the new world of Big Data, companies need to invest in innovation and experiment with data-sets to mine their real, untapped value,” said Chris Roche, Regional Director for EMC Greenplum. “I see this series of crowd sourcing events as one of Greenplum’s investments in community, learning, collaboration and innovation. We are pleased to support the Data Science London community and EMI both with our technology and expertise.”

Kaggle, the global leader in data science crowdsourcing, will be hosting the competition on its collaborative, real-time, online platform for predictive modeling competitions. In addition to providing the infrastructure for the competition, Kaggle will make the EMI Million Interview Dataset available to its online community of over 44,000 data scientists around the world.

“Given the extremely subjective nature of musical appreciation, predicting what kind of music people will like is a much tougher problem than, say, trying to predict someone’s click-through rate for online adverts. But using this kind of unique and vast data set of listeners’ own preferences really makes a lot of sense, after all no one knows your music tastes better than you,” said Jeremy Howard, Kaggle’s President and Chief Scientist. “In a sense this is double crowdsourcing, because besides listeners’ data it’s also calling upon the best data science minds in the crowd to generate the most accurate predictive algorithms.”

Lightspeed Research worked closely with EMI to develop and implement an innovative research programme that created a tactical and strategic ‘Artist and Music research’ tool to help better understand how to connect with artists and their music with consumers. The research, which has run for over two years across 25 countries, is a robust, scalable and flexible online research program that is able to adapt to the ongoing research requirements of EMI.

James de Vick, Key Account Director at Lightspeed Research commented: “Lightspeed Research is delighted to be involved with EMI and the Music Data Science Hackathon, as it is a good example of how you can use data innovatively – creating an engaging experience for our panellists, while improving EMI’s understanding of the music market. It is great to see that EMI, as a forward thinking music company, have taken our research and helped develop a culture of Insight within their organisation, ensuring they are always close to their fans.”

Adatis, the UK consultancy specialised in Business Intelligence and Data Management, will sponsor the Data Visualization Prize with £500. “We are proud to be a key contributor to this project,” said Tim Kent, Director at Adatis. “We are committed to deliver leading edge analytics and visualisation solutions that will enable EMI -our key strategic partner- to successfully deliver this project.”

Data Science London, a non-profit organization dedicated to the free, open, dissemination of data science, is one of the driving forces behind this project. With more than 825 members, this organisation is one of the largest and most active data science communities in the world. By involving EMC and Kaggle in this project, Data Science London aims to assemble a world-class team with the best data science platforms, people, and technologies.

“We are very excited launching this awesome project in partnership with EMI because it reflects many of the values and ideas promoted by our organisation like collaboration, open data, and community development,” said Carlos Somohano, Founder of Data Science London. “I believe this project is unique in the sense that brings together the best minds in the music and data science communities to develop new kinds of predictive algorithms that will transform the music industry.”

More than 150 data scientists will participate in Music Data Science Hackathon on-site in London and hundreds more will participate remotely via the Kaggle platform.

Richard O’Brien
EMI Group, London.
Tel: +44 (0)20 7795 7447

Ralph Risk, Marketing Director, EMEA
Lightspeed Research Ltd
Tel: +44 (0)207.896.1950

Jenny Sneyd
Tel: +44 (0)20 7592 1200

Paige Schoknecht
Cutline Communications (for Kaggle)
Tel: +1 (415) 348-2708

Notes for Editors

About EMI Music
EMI Music is one of the world’s leading music companies, representing artists spanning all musical tastes and genres. Its record labels include Angel, Astralwerks, Blue Note, Capitol, Capitol Latin, Capitol Records Nashville, EMI Classics, EMI CMG, EMI Records, EMI Records Nashville, Manhattan, Parlophone, Virgin Classics and Virgin Records.

About Lightspeed Research
Lightspeed Research delivers valuable data to help clients make informed business decisions. With proprietary online panels throughout the world, our verified, engaged, and deeply profiled survey respondents can support research studies that vary in scope and complexity. Lightspeed Research’s expert Client Operations Team offers data collection services including survey design, sample management, programming and reporting. The company has offices throughout the United States, Europe, and Asia Pacific. Lightspeed Research is part of Kantar, the information insight and consultancy division of WPP. For more information, please visit

About EMC
EMC Corporation is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a service. Fundamental to this transformation is cloud computing. Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect and analyze their most valuable asset — information — in a more agile, trusted and cost-efficient way. Additional information about EMC can be found at

About Data Science London
Data Science London is a non-profit organization dedicated to the free, open, dissemination of data science. Created by data scientists for data scientists, we act as a forum of discussion and an exchange of ideas. In their quest to contribute to the community, our data scientists actively collaborate with organisations around the world in the development of data science best practices, insightful analytic methods, and innovative predictive models.

Leave a reply