METADATA AS MEDIUM
On the Image Conference, September 1st 2016
Of the classified documents leaked by Edward Snowden, the first to be published by The Guardian revealed that Verizon was required to hand over, in bulk, the telephone records of its customers to the National Security Agency. These records did not include the content of telephone calls, but rather the metadata associated with the calls: phone numbers, GPS coordinates, duration and time of calls, SIM card ID, etc. Senator Dianne Feinstein, chair of the Senate intelligence committee, wrote in USA Today:
The call-records program is not surveillance. It does not collect the content of any communication, nor do the records include names or locations. The NSA only collects the type of information found on a telephone bill.
Here, Feinstein differentiates between metadata collection and surveillance – a controversial distinction that many would contest. To follow the NSA’s logic, ‘surveillance’ would be limited to the collection and analysis of the content of conversations that, presumably, people deliberately participated in. In contrast, metadata constitutes supplementary information that is inadvertently generated.
So what is metadata? Matteo Pasquinelli says:
Metadata can be logically conceived as the ‘measure’ of information, the computation of its social dimension and its transformation into value.
When we talk about Big Data today, we are talking about large quantities of metadata. Metadata is information about information, carefully measured, quantified and computed. Some metadata is directly contributed by us when we tag videos, images and sounds with keywords, but most of it is persistently accumulated by the algorithmic machines of Google, Facebook, Amazon and others. To illustrate, if I were to turn a camera on this room, and run each video frame through Google’s Cloud Vision API, I would receive an average of ten descriptors per frame, alongside information about color, lighting, etc. On this basis, a one minute video clip would generate more than 15,000 tags. And this would all happen in close to real time.
The logic of Big Data appears quite clear – once information generates metadata, it is highly indexable, searchable and commodifiable. A broad range of active and passive actions – buying things, searching for things, reading things – all generate metadata, and therefore value. We become targets for algorithms that find products with similar metadata. Increasingly, metadata governs the means of production – the things that are produced, sold, desired and discussed all rely upon patterns gleaned from metadata.
Like the NSA, my own work also tends to privilege metadata over content. Much of my work is concerned with the ways in which metadata has become a means for us to read the world, and to read each other. Content and metadata can be a slippery distinction. But broadly I understand content in terms of the ways we (as humans) read (for example) an image, and metadata in terms of the ways a computer reads an image. As the media, images, and texts that we encounter become increasingly automated and governed by metadata, we find ourselves having to adapt to the ways in which the computer reads the data – via computer vision, natural language processing and user-contributed tags. Once this accommodation is established, the fluidity between content and metadata becomes even more reinforced: and gradually, metadata becomes content – we begin to read the computer’s view of the world as its content, and content and metadata come further into alignment.
Surveillance is a good example of this alignment: these are mediated images, interpreted by a computer vision algorithm that is looking for specific types of activity. The computer’s framing of events becomes validated, because of its presumed objectivity (– note Facebook’s recent controversy about using humans to filter its trending news stories). In a project I made in 2013, Swarm, stereo cameras build panoramic crowds of people from gallery visitors. Viewers are passively profiled for age, gender, race and other characteristics, and inserted into groupings. Some groups are large – white women in their 50s perhaps – others much smaller. As is the case with, say, social media, the groupings are algorithmically constructed communities, brought together based on common metadata. Here, the computer has not just read video frames based upon metadata, it has used them to re-construct reality, in real-time. The character of the gallery changes: it becomes territorialized by people, as metadata. There is a menace in measurement: according to the American Association of Museums, 79% of gallery visitors in the United States are white. And although in Swarm, we do not believe for a second that there is actually a crowd of people surrounding us, in other contexts, we do – namely social media, with our demographically specific followers, friends and subscribers that are algorithmically put into proximity with our every action, based upon shared metadata measuring demographics, politics, geography, education, profession, socio-economic class, income, etc. Simply, in social media, the data points are larger: there is more metadata from which to build the crowd. The more metadata that a system can acquire, the more granular (and supposedly accurate) the classification systems become. Again though, there is a menace in classification, and this is something that is a feature of Swarm: what happens when we are uncategorizable or mis-categorized by the computer? What is the status of the designation “other” when it comes to metadata? Big Data is premised upon the principle of “inclusion at all costs”: all information should be taggable, searchable, findable and classified. To be outside of this system is something reserved for people who pay for the privilege. In Swarm, everyone is categorized, whether it is accurate or not and this makes visible what happens when we have less control over our own metadata.
In (re)collector, ten cameras were positioned around Cambridge city center. They run a computer vision algorithm that looks for events that match scenes from Antonioni’s 1969 film, Blow Up. Each day it attempts to reconstruct the original film using the clips that it has recorded and combines them with keyword-based extracts from Julio Cortazar’s Las Babas del Diablo, the short story upon which Antonioni’s film was based. Its choice of clips is entirely based upon metadata – where the recorded videos share metadata with the scenes from the original film and text from the original story. The more metadata in the system, the more variation will occur as the clips’ tags get more and more granular. Since a version of the algorithmically generated movie is created each day, and played back on a large projector in the city center, there is the potential for people to observe a pattern – i.e. a scene from the original film – and then attempt to get into the next day’s version by performing for the algorithm in specific places in the city. So here, we have a clear example of people modifying their behavior in order to satisfy how computers accumulate metadata. In other words, people read the metadata and then consciously become metadata, producing ‘value’ within the eyes of the system. And again, as with Swarm, how real are these performed metadata narratives? Do they become content – how do we distinguish a person acting a certain way in order to cooperate with metadata from someone who is not consciously doing so? Baudrillard’s precession of simulacra and Borges’ On Exactitude in Science come to mind, perhaps with the distinction that we are replacing our own reality with one interpreted for us by algorithms.
This blurry distinction between content and metadata is very familiar to us on social media platforms such as Facebook. A virtuosic user of Facebook understands the currency of a status post – ways to validate ones actions within the rule set of friends, acquaintances, likes, feeds, walls, etc. How authentic are these actions – are they for humans or for the machine? In Today, too, I experienced something I hope to understand in a few days, I built a Facebook app that builds short narratives from people’s status posts. It then matched the demographics of the people responsible for the posts with video portraits of individuals whose demographics matched their own. Keywords from the status posts were then used as search terms in YouTube and the resulting videos were juxtaposed with the portraits. Again, all associations were governed by metadata, not content.
Big Data is most useful when dealing with the relationship between one piece of information and other pieces of information, and this generally can be considered as a network – a network of machine-readable metadata accumulated from images, websites, social media profiles, purchases, interests, etc. Algorithms seek to compare, evaluate and find patterns between metadata. In this respect, metadata can show us relationships between things that we cannot see ourselves – songs that we might like, things that we might want to buy, etc. It can make those connections visible. But how do we make the metadata itself visible? Images are helpful for this: looking at one image, we read the content, not the metadata. Only when we see a multiplicity of images, or a stream of images, can we start to deduce the relationship between them – to infer what the computer sees as patterns of information. So in Today, too, I experienced something I hope to understand in a few days we see the metadata through the unusual relationships between video clips. Watching it, maybe we smile as we congratulate ourselves at seeing what the computer saw.
My 2013 work Sanctum was installed for more than 2 years, on the façade of the Henry Art Gallery in Seattle. Made in collaboration with composer Juan Pampin, Sanctum uses six video cameras to track and profile people as they walk towards the gallery. Once they have been profiled, voices that match their demographic are beamed at them via ultrasonic speakers. The voices read out narratives built from Facebook status posts, again matching their demographic. As they get closer to the gallery’s façade the voices become clearer, eventually resolving into a single voice. Once within 12 feet of the façade, a person’s live image is put onto video monitors that wrap around the gallery, paired up with other people that match their demographic, and with the Facebook narrative as subtitles. So again the project relies upon metadata – marrying demographic information gleaned from the faces of visitors with narrativized status posts from Facebook. In other words it uses your metadata to generate a narrative – not yours, but one that is demographically appropriate. Again, this encompasses the menace of measurement – narratives are superimposed based upon what the algorithm sees, not on our own subjectivity. Over the course of two years, the project itself accumulated a lot of metadata – there were over 25,000 “interactions” with the work, and we were approached by multiple research groups to use Sanctum’s metadata to, for instance, estimate demographics from walking patterns.
The more metadata, the greater the value of the information to the network – the more granularity it will have in terms of how well it will match a search query, the more variety of search queries it will match, the more possible meanings the image will have. Marxist theories of surplus value come into play – with Moore’s Law there is no apparent limit to the amount of metadata that algorithms can generate, irrespective of human labor. The surplus created by metadata-tagging algorithms is only made visible by search. Furthermore, anyone who has ever used a search engine is well aware of this surplus: it has become a fact of life, a reminder of the need to individuate your content and make it machine-readable in order to avoid invisibility – or to encrypt your data appropriately in order to attempt invisibility. But there is also another kind of surplus: abstract human knowledge. Knowledge that has no place within these algorithmic machines. Within this framework, this becomes an excess, an affect that the machine cannot read or see.
In each of these works, the search or query is decisive. It is one thing to have a large dataset, full of metadata. But crucially we need to figure out how to ask the right questions of that dataset. In my work the query has aesthetic value – some searches are more interesting than others, some are more valuable than others. In terms of art – when dealing with large datasets, it is the query that makes things visible. That is the art of selection, filtering, choice. That is the responsibility of the artist, and a signal of human intent – the human surplus. And when we see the work, we see the query. So again, it is metadata over content. If I show you an image, do you look at the image itself, or do you understand that the image is simply the result of the query? In other words, the image is not only dependent upon, or contingent, it is subordinate to the query. The query can be modified and the image will be different. Or in a dynamic system – such as Swarm – the query can remain the same but the image will change (give me all the people who have been in the gallery today). In some works, the query is appropriated from an existing template – such as scenes from Blow Up in the case of (re)collector.
The metadata is made further visible by these projects’ focus on demographically similar types rather than individuals. The videos and narratives generated within these works make little sense if you linger on a specific individual as the protoganist, a habit we have of course developed from film – we don’t expect the lead actor to switch to someone else part way through the movie. But in the narratives I build in my work, we see the individuals as archetypes, filling roles that have specific characteristics, specific metadata, for instance – the video subjects in Today, too, I experienced something I hope to understand in a few days.
Finally I want to discuss my most recent project, General Intellect (which is Marx’s term for abstract human knowledge). In many ways this work summarizes my previous works’ concerns – metadata, query, surplus, automation and visibility. General Intellect was made using Amazon’s Mechanical Turk service, through which you can hire people from a work force of several hundred thousand around the world. Requesters post HITs, or Human Intelligence Tasks, which typically consist of microtasks such as tagging images, reading handwriting, filling out surveys – all things which humans tend to do better than computers at the moment. Workers are typically paid 5-10c for each task, and sign away all intellectual property as a condition of service. For General Intellect, I hired several hundred workers to record 8 one-minute videos, one per hour from 9am to 5pm. They were instructed to simply film whatever they happened to be doing at the time, to write a one sentence description of what they were doing, and to provide demographic data about themselves. Workers were paid $3 for the eight videos, and the system accumulated over 3000 videos. I built a database for the videos and their metadata.
The work was exhibited as a series of single and multi-channel videos, each generated from the results of queries to the database. The first version was shown in Bath, and then another version was shown in an abandoned school building near Amazon’s headquarters in Seattle. Sample queries included:
“all the videos that mention husbands or wives”
“all the videos that mention boredom”
“all the videos that mention prescription medication”
“all the videos by white women in North America, in their 50s”
The videos that were seen in the exhibition were those that had metadata to match the queries. Anything that did not match was not seen. The videos in each set were diverse, different workers having specific metadata in common. These queries were then made available for sale, as the first new media artworks exhibited on Amazon’s new online art gallery. All profits from the sales were used to fund further HITs. Collectors would receive a monitor that would stream videos matching their query, updated daily.
General Intellect has much to say about the conditions of labor under post-industrial capitalism. MTurk is a real-life example of a human metadata factory: people converting content into machine-readable metadata. Deleuze and Guattari once described post-industrial capitalism as follows:
… it is as though human alienation through surplus labour were replaced by a generalized ‘machinic enslavement’, such that one may furnish surplus value without doing any work.
Without doing any work. As in my HIT – which asked workers to record themselves doing what they were already doing. The metadata is generated automatically. Indeed, one could say that the premise of most HITs is that human beings will convert information into something a computer can understand – i.e. turn content into metadata. The hierarchy of worker and requester in the mTurk system puts this premise front and center: workers try hard to ensure they are generating the right kind of metadata. Otherwise, they may not get paid. Unlike Facebook, where we generate metadata for free, in mTurk a price is assigned. These are the sweatshops of metadata –workers who are rendered invisible by a system that desires total visibility of the general populace, and inclusion at all costs.
And so how do we read an image today? I would argue that we must include metadata and its politics in that reading. After all, if the image you are reading is online, you are already generating metadata by looking at it. As was the case with Swarm, Sanctum, and General Intellect, paying attention is no longer a passive act: it is meticulously measured and mined for meaning and value. In a content-rich, attention-poor economy, we must consider where the image is in the network, we must ask how this image is measured, and how it is actively measuring us in return. We must consider what its value is, and what our value is to it. We must ask who else has seen it – human or machine – and why we are looking at it now.
Metadata surplus is self-perpetuating. The more metadata algorithms generate, the more they can measure their own value. As we passively click, scroll and view content, we generate metadata – how long did we look at something? What did we do with our mouse? Images acquire metadata from the amount of attention we pay to them, increasing or decreasing their rankings and network value, adding to the surplus and increasing the efficiency of algorithms when they perform searches: like mechanical turk workers, we service the algorithm, contributing to our own obsolescence. Metadata is an instrument through which algorithms can autonomously create more surplus, and evolve beyond human limits. As processing speed improves, the ability for algorithms to match the tagging capability of humans will improve too. Computer vision will close the gap with the eye. As with the map in Borges’ On Exactitude in Science, metadata will approach and eventually exceed the brain’s processing and storage capacities. Increasingly virtuosity will be understood as the ability to successfully work alongside these systems, to be sympathetic to the algorithmic machine and to understand how our actions will be interpreted by it.
So within this framework, where is the artistic value to be found? What gives a query artistic value? If I search for “dogs” can we call that art? No. But if I can use an algorithm to locate and identify pathos, a rupture in the quantative logic of Big Data, then perhaps we can call it art. In my current project, I am mining Soundcloud for homemade songs that people have uploaded but no one has ever listened to. These are the orphans of Big Data – the invisible objects that only become searchable through their metadata, not their content: things we can read but computers cannot.
So computer vision as subordinates content to metadata. Subjective human narrative and intention is subordinated to the search or query. In my work form is subordinated, or perhaps only authored through metadata’s compositional logic. In other words, in say, (re)collector, a shot is determined not by my authorial framing of it as aesthetically pleasing, but rather by its fidelity or proximity to Antonioni’s film. In General Intellect, the videos are framed by the workers, not by me. Composition occurs with metadata rather than with visual form. Form is appropriated, second-hand, a side-effect of compositional determinations that are made via query. The results of this approach are often messy, error-strewn and ambiguous; they are like the searches that we do online everyday: they are human. I would suggest that this gives us another way of representing the images that surround us today, one premised on multiplicity rather than exactitude, upon metadata rather than content.