Ulrike Rohn - What is Cultural Big Data?
When it comes to cultural big data, we distinguish different kinds of data. Raw cultural data refers to the massive amounts of digital content that can be found today. This data includes the billions of hours of video and audio and the billions of pages of written text, pictures, and photos that have been digitized or that were “digitally born.”
When it comes to cultural big data, we distinguish different kinds of data. Raw cultural data refers to the massive amounts of digital content that can be found today. This data includes the billions of hours of video and audio and the billions of pages of written text, pictures, and photos that have been digitized or that were “digitally born.”
Metadata is data about that data. It tells us, for example, what the content is about and where it can be found. Metadata facilitates the re-usage of content. Without high-quality metadata, cultural data is unusable. Traditionally, experts in memory institutions, such as archives, libraries, and museums, have created such data.
In addition to this metadata that contextualizes cultural content and makes it findable, there is another kind of metadata. It is usually referred to as “big data.”
Big data is information on the usage of content. Who used or consumed the content? What did they do with the content? When we consume content online, we leave traces. We may “like” a picture on Facebook, add a comment to an online news article, forward and recommend a link to a podcast, or modify a music piece. All this usage data makes up “cultural big data.”
In the digital media business, this kind of usage big data has become a central good. Information about audiences and how they use and consume media is of vital interest, especially for marketers. While having data about audiences and consumers of cultural content is nothing new, the sheer amount of data anyone leaves about his or herself when consuming and using content online is unprecedented—that is why it is now referred to as big data.
The problem, however, is that most usage data is owned by a few large online service providers, such as Google, YouTube, Facebook, and Twitter. These companies dominate the usage data market since they own the largest online platforms. Because of “network externalities,” Internet users are drawn to the largest platforms because the benefit they receive from using a platform increases with the number of other people using it as well. The more people use Facebook, the more Facebook is of value to each of its users.
A related challenge is that these large players are rarely transparent in terms of how they trade their usage data and how such trade affects the information services we use to make everyday decisions as consumers and citizens.
The European Union recently created a set of regulations regarding the minimum rules for the protection of individual user data. This includes informing individuals of how information about them is used. However, more policy initiatives are needed to make the data trade more transparent and to inform society at large on how this trade and data markets may affect the nature of knowledge and cultural services in Europe and elsewhere.