• No results found

On Photographic Portraits as Documents of Truth in Automated Facial Recognition Patterns of Discrimination Department of Linguistic, Literary and Aesthetic studies University of Bergen

N/A
N/A
Protected

Academic year: 2022

Share "On Photographic Portraits as Documents of Truth in Automated Facial Recognition Patterns of Discrimination Department of Linguistic, Literary and Aesthetic studies University of Bergen"

Copied!
137
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

University of Bergen

Department of Linguistic, Literary and Aesthetic studies

KUN350

Master’s Thesis in Art History Spring 2021

Patterns of Discrimination

On Photographic Portraits as Documents of Truth in Automated Facial Recognition

Tuva Mossin

(2)

Sammendrag/Abstract

Denne avhandlingen tar for seg fotografiers rolle i treningen av ansiktsgjenkjenningsalgoritmer, samt i selve den tekniske prosessen hvor ansikter analyseres. Gjennom en lesning av tre ulike kunstprosjekter som på ulike måter anvender eksisterende ansiktsgjenkjenningsteknologi til å problematisere denne praksisen, etablerer jeg hvordan ulike fordommer – særlig hva angår fotografiets status som objektiv representasjon av verden – påvirker systemenes evne til å analysere ansikter. De aktuelle prosjektene er ImageNet Roulette (2019) av Trevor Paglen og AI-forsker Kate Crawford, How do you see me? (2019) av Heather Dewey-Hagborg, og Spirit is a Bone (2013-15) av kunstner-duoen Broomberg & Chanarin. Problemstillingen som oppgaven forsøker å besvare er som følger: hva kan disse kunstprosjektene fortelle publikum om ansiktsgjenkjenningsteknologi som praksis, og hvilken rolle spiller digitalt fotografi som slike systemers bindeledd til den analoge verden «utenfor» dem selv? Som svar på dette tar avhandlingen for seg selve den tekniske arkitekturen og hvordan den legger føringer for ansiktsgjenkjenningssystemers operasjoner alt i designprosessen. I tillegg diskuteres ansiktsgjenkjenning fra et historisk perspektiv, hvor forsøk på å knytte juridisk identitet til kroppen gjennom fotografi spores helt tilbake til mediets oppfinnelse på 1800-tallet.

This thesis is concerned with the role of photography in the training of automated facial recognition algorithms, as well as in the technical process of analyzing faces itself. By reading three different art projects which in various ways have appropriated existing facial recognition technology in order to scrutinize this practice, I establish how various biases – especially regarding the status of photography as an objective representation of the world – inform such systems’ ability to analyze faces. The works in question are ImageNet Roulette (2019) by Trevor Paglen and AI researcher Kate Crawford, How do you see me? (2019) by Heather Dewey-Hagborg, and Spirit is a Bone (2013-15) by the artist-duo Broomberg & Chanarin. The central question which the thesis is trying to answer is as follows: what can these art projects teach the audience about facial recognition technology as a practice, and what role does digital photography play as such systems’ connection to the analogous world “outside” of the systems themselves? Answering this, the thesis takes issue with the technical architecture informing facial recognition systems already at the level of design. It also discusses facial recognition from a historical perspective, tracing the endeavor to anchor legal identity to the body through photography all the way back to the medium’s invention in the 1800s.

(3)

Patterns of

Discrimination

On Photographic Portraits as

Documents of Truth in Automated

Facial Recognition

(4)

© Tuva Mossin 2021

Patterns of Discrimination: On Photographic Portraits as Documents of Truth in Automated Facial Recognition

Tuva Mossin https://bora.uib.no/

(5)

Table of Contents

Sammendrag/Abstract ... 2

Table of Contents... 5

Acknowledgements ... 7

Introduction ... 9

Scope and Structure ... 10

Existing Research and Method ... 14

Important Terms ... 16

Chapter 1 Less Than a Thousand Words: ImageNet Roulette by Trevor Paglen and Kate Crawford ... 17

Introduction ... 17

The Viral App ... 18

An Archeology of Datasets ... 22

Training Humans ... 28

Documents of a Visual Practice ... 37

A New Media Ecology ... 40

A Nascent Paradigm Shift? ... 44

Conclusion ... 47

Chapter 2 The Treachery of Facial Recognition: How Do You See Me? by Heather Dewey-Hagborg ... 49

Introduction ... 49

Presenting The Work ... 50

Layers of Abstraction ... 54

A Brief History of AI ... 60

Intelligence as Situated Knowledge ... 62

Fast Forward: Image Recognition in 2019 ... 66

(6)

Conclusion ... 69

Chapter 3 A History of Social Sorting: Spirit is a Bone by Broomberg & Chanarin ... 72

Introduction ... 72

First encounter: Spirit is a Bone as Installation ... 73

Background ... 75

Second Encounter: Spirit is a Bone as Book ... 79

Photography and Physiognomy ... 84

August Sander Revisited ... 92

Enter Biometrics ... 94

Hegel: “The Spirit is a Bone” ... 95

Conclusion ... 97

Chapter 4 Dataismus: Disrupting Flow in the Age of Digital Reproduction ... 99

Introduction ... 99

Taxonomies, archives, bias ... 100

The Two-Faced Nature of Surveillance ... 102

The Ultimate Rabbit Hole ... 106

Pattern Discrimination and Determinism ... 107

Digital Dualism ... 113

Going Against the Flow ... 116

Data or Dada? ... 119

Conclusion ... 124

Concluding Remarks ... 126

Bibliography... 129

(7)

Acknowledgements

Above anyone else, I am hugely grateful for all the help and motivation from my dear supervisor, Prof. Sigrid Lien. The devotion you show towards your students is remarkable.

Never could I have hoped for a more supporting and enthusiastic person to guide me through the overwhelming process of writing my first work at this scale. I do not know how I would have managed without you. Thank you.

Next, I wish to thank Post. Doc. Tonje Haugland Sørensen for your encouragement at an early stage of developing my thesis project. Not only was it you who introduced me to Heather Dewey-Hagborg’s work, but your early reading tips have also been of great help in navigating the ocean of scholarship surrounding photography, surveillance and digital art.

Thanks also to Prof. Asbjørn Grønstad and the research group Media Aesthetics, who were kind enough to invite me to their seminar on the aesthetics of surveillance when I was only an MA- student in my first semester. This gave me much needed motivation as well as helpful insights into the interdisciplinary field of surveillance studies. I appreciate it.

I would also like to show my gratitude to those who have supported my work economically.

Thanks therefore to Meltzerfondet, Bergen kommunes utdanningslegat and Benjamin Johannessens familielegat for generous stipends which have made the process of writing this thesis considerably less stressful.

Lastly, my family and friends deserve thanks for both emotional and practical support throughout the whole process. I am especially grateful for my parents’ willingness to read through and comment this whole text. I also thank my dearest Olav for always giving me support and unconditional love, which not even a global pandemic has managed to affect. I could never have done this without you by my side.

(8)

“The good news about computers is that they do what you tell them to do.

The bad news is that they do what you tell them to do.”

- Ted Nelson

(9)

Introduction

It’s a late-November day in Paris and I’m at the Centre Pompidou, having just wrapped up a seminar I’ve attended. With a couple of hours on my hands before the flight back to Bergen, I head straight for the section exhibiting early modernist art. Just like my fellow museum-visitors (most of whom are walking phone-first from work to work), I feel an urge to document what I see. My phone unpocketed, it’s only a matter of minutes before I notice something curious: its facial detection function isn’t limited to real faces, but detects painted faces as well, marking them with yellow circles on the screen. My interest piqued, I decide to conduct a little experiment: holding my phone up in front of me with the camera turned on, I move from work to work, curious to see if there’s any kind of pattern regarding which kind of painted faces it can or cannot detect – and whether there are any notable differences between how the program reacts to various styles or levels of abstraction. What I find is something highly illustrative of the topic of this thesis. While the phone has no problem recognizing even some of the most abstract or blurry (“pixelated”) faces painted in, for example, an impressionistic manner, there is one style that seems to give the camera function an extra hard time: the cubist portraits of Picasso.

On the face of it, this simple observation does not seem too surprising: the deconstructed figures in some of Picasso’s paintings are so deformed, the perspectives so twisted and multifaceted, that no one would expect even the most accomplished system to be able to recognize the assemblages of seemingly random forms and colors as people – let alone to detect a face in the midst of it. Yet, I could. To the human beholder, there is no doubt as to what Picasso’s motifs are meant to represent, even though the works’ symbolic meanings are open for debate. Even if you dislike the artist’s style and way of representation, the indexical meanings of his paintings are still obvious. This takes us to the core of the subject matter of this thesis, which is to analyze systems of automated facial recognition (AFR). More precisely, I will investigate AFR technology as a practice through a reading of three art projects which in different ways have appropriated tools from such technologies to produce portraits. I use the term automated facial recognition to denote any digital process of connecting faces to legal identities via photographic images or image flows. It goes from this definition that AFR comes in many forms in that such systems usually consist of a computer program working in tandem with a set of cameras through which people’s bodies are registered. It is important to note, however, that not all optically based surveillance systems include facial recognition software.

(10)

The projects in question are, respectively, ImageNet Roulette (2019) by Trevor Paglen and AI researcher Kate Crawford, How do you see me? (2019) by Heather Dewey-Hagborg, and Spirit is a Bone (2013-15) by the recently separated artist duo Broomberg & Chanarin.

Even though the forms and aesthetic expression of these works vary – as well as their particular relation to the technology in question – they all have in common that they, through different sets of artistic techniques, engender a critical “look back” at the technologies with which they were made. As all of these artists seem fond of mentioning, facial recognition is everywhere.

While mainstream media usually reserve critiques of AFR to only the most obviously problematic uses – the most chilling examples can be found in China within the so-called social- credit system, as well the segregationist monitoring of Uyghurs in the Xinjiang province – there exist many more subtle examples that relate more closely to a Western public’s daily experience. For my own part, the first conscious encounter with facial recognition was on Facebook many years ago, when the site suddenly started suggesting people for me to tag in the photos I had uploaded. Another common use for AFR is in smartphones, of which most new models offer the possibility of unlocking the phone and apps by recognizing the owner’s face through the front camera. Apps such as TikTok and Snapchat, where face filters are very popular, are other examples. In the latter case, however, it is only talk of facial detection, which is part of, but not the same as, facial recognition.

The above are all examples of AFR in private or relatively limited social contexts, but even more is at stake when biometric technologies (of which AFR is only one example) are used in public spaces. A Norwegian reader is probably only used to encountering public AFR at airports, as these constitute border areas with a particular concern for security. One does not have to travel further than London, however, to find oneself in one of the world’s most heavily surveilled cities in terms of number of CCTV cameras (Bischoff 2021). In light of how increasingly ingrained in our daily lives AFR technologies are becoming, the critical perspectives offered by the artworks in question are a welcome way of investigating this developing relationship between humans and machines. Importantly, by building on the rich tradition for the study of visuality and optics within art history, the works facilitate an embodied and historicized relation towards AFR not easily achieved by other means.

Scope and Structure

The question guiding this thesis is simple: what can the three art projects discussed here teach the audience about the AFR technologies with which they were made? As AFR’s relation to the world “outside” the system is mediated through cameras and other optical devices, it is clear

(11)

that they rely on an idea of the photographic portrait as a document of truth concerning the depicted individuals. A secondary focus will thus be to question the role played by photography in AFR technology. In the process, a whole subset of related questions arises, ranging from the nature of identity and the idea of positive identification, to global and national power relations, to claims of mechanical objectivity. These will all be discussed separately along the way, whenever they appear particularly relevant. Regarding structure, I have organized the thesis as three case studies where each art project will be thoroughly analyzed in a separate chapter. A consequence of this strategy might be that some ideas are left behind when moving on to the next chapter. However, the central ones will be taken up again in the fourth and final chapter, which is constructed as a meta-analysis discussing the most intriguing finds in more detail.

The first chapter is focused on the work ImageNet Roulette by Trevor Paglen and AI researcher Kate Crawford, which is a multifaceted project existing in several versions. The first instance through which an audience could experience the work was as an online app published on a webpage in September 2019 to which people were invited to upload photos in order to have them categorized by a facial recognition algorithm. The app was later joined by an online essay contextualizing it to make the political dimensions of the work explicit. In my reading of this essay, it becomes clear that the algorithm had been trained in a database called ImageNet, which is the biggest and most used platform for training AFR systems. As the app turned out to be severely biased – especially along racial lines – the essay explains how this is a result of the highly dubious taxonomy informing the database in which it was trained. In this way, the project served as a platform through which to call attention to the role played by training databases in informing the “worldview” of algorithms, and also to raise questions about the possibility of building such a system that is politically neutral.

Shortly after the app’s release, Paglen and Crawford opened a related exhibition in Milan titled Training Humans, in which a general evolution of facial recognition research was presented to the audience. Discussing this project – which I from here on refer to collectively as ImageNet Roulette (without the italics) – another set of questions arises. Here, it becomes clear how photographs play a crucial role in training AFR algorithms by serving as visual representations of the given categories. Questions regarding photography’s status as a mediator between the algorithm and the world outside of the digital realm are therefore pertinent. The biases which will be discussed in the reception analysis of the app also spur questions of what constitutes “fair” use of AFR, and who has the right to determine this. Because Chapter 1 serves a double role as both a presentation of ImageNet Roulette and also as a general introduction to AFR, I will move on towards the end to situate the findings in a broader context of AI research.

(12)

Here, I problematize what within the field is being referred to as the “alchemy problem,” which has to do with how AI research, according to some, finds itself at a crisis in terms of its scientific status. Finally, I ask whether the kind of artistic scrutiny exemplified by ImageNet Roulette can be read into a larger tendency within the field of visual studies, where scholars are starting to discuss networked digital photography in terms of what is referred to as a viewing ecology; a discussion which I will return to again in Chapter 4.

Chapter 2 is constructed as an analysis of the work How do you see me? by Heather- Dewey Hagborg. This project also saw the light of day in September 2019 when it was exhibited at the Photographer’s Gallery in London. This is a digital video work presented as a series of images based on an experiment conducted by the artist. Seeking to determine the limits of what could be recognized as a face by a facial recognition algorithm, Dewey-Hagborg used what is called adversarial processes to “fool” the system, revealing how it does not at all understand what a face actually is. Compared to ImageNet Roulette, which can be said to center around the material and epistemic bases of AFR’s production, How do you see me?, on its side, appears more concerned with the technology as a mode of perception. The result is a series of black and white oval shapes which to the human beholder look nothing like faces at all. This not only raises questions of visual representation in general, but also about what constitutes recognition.

Discussing this, I will take a step back to investigate the technical basis of AFR systems in order to make explicit how these constitute abstractive processes that turn information into increasingly compressed versions of the original input photos. While this step might first appear unnecessarily technical, the discussion is of importance in understanding both the relation between Dewey-Hagborg’s face and the abstractions in How do you see me?, and also because it serves to give the reader a fundamental understanding of how AFR systems are essentially statistical probability calculation devices. While the political implications of this last point might not be crystal clear at this point, it is of central importance to the discussion in Chapter 4, where I will discuss this aspect of AFR in detail. Moreover, the work opens a broader discussion of the nature of intelligence implied by the term artificial intelligence (AI), in terms of which AFR systems are commonly described. Discussing this, I will draw on the work of philosopher Hubert L. Dreyfus, whose famous critique of AI from the 1970s still appears highly relevant. In light of this discussion, it becomes clear how idealist notions of form as separable from content have informed the field of AI since its inception, and how this constitutes serious problems when faced with semantic ambiguities – of which photographs are full.

In Chapter 3, I analyze Spirit is a Bone by the artist duo Broomberg & Chanarin. This work exists in several versions, including a photobook published in 2015 and various

(13)

installations exhibited across the world. In addition to the book, the basis for my discussion will be an installation encountered by me at the Hamburger Kunsthalle in 2018. Intrigued by this particular installation, I went on to order the book as well. It turned out that the series of eerie portraits were made with a facial recognition system specifically designed to photograph people without consent. This is in itself, of course, an interesting aspect of the work, but the artistic methods used for presenting the images also invite a lot of other reflections. While the installation positions its subject matter clearly within the digital realm, raising questions concerning the nature of body as a site for digital surveillance, the book, on its side, more clearly connects the use of contemporary AFR technology to a history of surveillance based on physiognomic measurement of the face. This is achieved by appropriating the categories from the twentieth century German photographer August Sander’s oeuvre. Drawing mostly on the work of photography theorist Alan Sekula, I discuss how Sander’s practice was based on dubious ideas surrounding the face as an icon of personal character. In this way, Spirit is a Bone opens up a discussion of the photographic portrait’s role in pseudo-scientific practices with the human skull as their object of study, dating all the way back to the medium’s invention.

The fourth and final chapter of the thesis is constructed as a meta-analysis of the preceding discussions in two parts. In the first, I summarize the general findings, placing them more specifically within a broader historical context. I am here especially concerned with how the integration of the photograph into disciplines such as craniometry and phrenology coincided with the rise of the modern surveillance state. Focusing on the deterministic nature shared between phrenology and present-day surveillance systems, I trace some common threads between the two, arguing that contemporary AFR technology can be understood as the latest

“peak” in a centuries old evolution of practices concerned with anchoring legal identity to the body. Having established this, I go on in part two to analyze the artistic methods used for bringing forth the above reflections. Here, I am particularly concerned with how all three works play on contested ideas about the materiality of the digital realm in order to position themselves within a discourse on digital phenomena. Moreover, I compare the aesthetic expression of ImageNet Roulette, How do you see me? and Spirit is a Bone to the Berlin Dada group’s use of so-called photomontage to ask if there is a special political potential in the photographic portrait when re-assembled and recontextualized.

What has not been noted until now is that all the works presented in this thesis are portraits of some kind and therefore belong to a genre with special connotations. As art historian Shearer West has noted “it is fair to say that portraits are — and have always been — used for documentary purposes” (2004: 57). From ancient times, portraits in various forms have been

(14)

used to document things as different as the mere existence of a person, to their power, to central events in a person’s life. However, the imaginative and interpretative aspects of the genre, West argues, make portraits resistant to what she calls documentary reductivism (2004: 59). As the reader will be made explicitly aware, this is nevertheless exactly how photographic portraits are used in AFR. At the heart of the discussion throughout this thesis thus lies a tension between the portrait as likeness – documenting specific physiognomic aspects of the sitter – and the portrait as expressive of the sitter’s personal qualities. While this tension is constantly lurking between the lines, it is not always directly addressed. I therefore ask of the reader to keep this in mind, so that the final discussion in Chapter 4 will present as the necessary end point it is.

Existing Research and Method

It is beyond question that the topic of this thesis is situated within a discourse on optics and vision, which has been central to art in the Western tradition at least since the Renaissance.

Merleau-Ponty, Panofsky, Lacan and Sartre are examples of nineteenth century thinkers who have problematized the importance of the “gaze” in Western society and culture – especially in relation to problems concerning knowledge and politics. So much so, that ocular metaphors are even ingrained in most European languages, as Martin Jay has skillfully demonstrated (1994:

1). While his examples are taken from French and English, this is true for Norwegian as well;

especially regarding the use of light-metaphors in relation to knowledge and thinking (“opplyse,” “kaste lys på,” “reflektere,” “være klarsynt,” etc.). Clearly, AFR systems are produced on the basis of the idea that visual documentation is an especially trustworthy form of evidence and can thus be read into a long history of technologies serving as prostheses to enhance the human eye. Even modernity itself, as Jay has put it elsewhere, is normally considered “resolutely ocularcentric” (1988: 3).

The discussions in this thesis are also clearly related to what Jonathan Crary has described as “the problem of the observer,” informing our very ways of seeing. This, he argues, constitutes “the field on which vision in history can be said to materialize, to become itself visible” (1991: 5). This renders the topic of surveillance, and AFR more specifically, especially well-suited for analysis from an art historical perspective. It is indeed a topic of wide scholarly debate both here and within the broader field of visual studies and the bourgeoning discipline called surveillance studies. The fact that AFR technologies are essentially systems of automated image interpretation has nevertheless not been much discussed. Here, Lila Lee-Morrison’s doctoral thesis from Lund University stands out. As she also points out, “there has been little scholarship from within the cultural studies and the humanities more broadly on the ways in

(15)

which AFR actually performs recognition and how this may constitute and enculture a mode of perception” (2019: 27). Instead, the technical processes have been widely left to the field of computer science, which – as this thesis will make clear – falls short when it comes to grasping the epistemic and political dimensions of their own products. Like Lee-Morrison, I also focus on the performative aspects of AFR, but through a very different set of artistic and technological examples. Her thesis has nevertheless been of much help to me in setting an example for how AFR might be approached from an aesthetically oriented point of view. I also have Lee- Morrison to thank for the idea of comparing AFR’s mode of perception to the Wittgensteinian concept of family resemblance, as discussed in Chapter 2. This being said, the field is still at an underdeveloped stage, leaving much to be wanted. This last point will be further developed in Chapter 1, where I discuss how several scholars within the various visual studies are calling for an increased focus on the networked nature of digital photography.

Besides the general stance that the relationship between photographs and computers in AFR constitutes a complicated assemblage of operations and that its production can be understood in light of tendencies associated with the idea of modernism, this thesis has no specific theoretical framework as its starting point. This is a conscious choice on my part, as I am primarily interested in discussing the topic as it is presented by the artworks, rather than reading them into a pre-existing framework. Too often, art is being treated as mere illustrations of ideas rather than as proper epistemic statements themselves. I will not contribute to this. Of course, the idea of art as such is up for debate: the so-called epistemization of art is indeed a hot topic within the contemporary art scene.1 This is not to say that I am of the opinion that all art is – or should be – concerned with neither knowledge production nor politics. What I am arguing, however, is that the projects presented in this thesis are, and should therefore be acknowledged as such.

As for the bibliography, the reader will probably notice how it is comprised of a mix of literature from various fields, art history and other visual studies being dominant. In Chapter 4 I have also made active use of work from within surveillance studies. Furthermore, I am cautious to point out that I am not a computer scientist, and my understanding of how computer systems work does probably not meet what is considered academic standards within the computer sciences themselves. When discussing these particular issues, I have been reliant on introductory books as well as blogs aimed at explaining computer science to a general audience.

1 For an illuminating discussion on the epistemization, or “knowledgization,” of art, see Tom Holert (2020), Knowledge Beside Itself: Contemporary Art’s Epistemic Politics (Berlin: Sternberg Press).

(16)

However, my goal is not to deconstruct or propose better technical methods for building and using AFR technology, but rather to demonstrate how such algorithms are based on some questionable assumptions regarding both the truth-value of photography and the possibilities of denoting an image’s subject matter by a restricted set of words. In this respect, I hope that the reader will find my presentation of the technicalities to be sufficient.

Important Terms

Before moving on to the first chapter, I round off the introduction by making clear what I mean by some terms which are central to the discussion of this thesis. Firstly, it should be clear that when I speak of photography here, I refer to the medium in the broadest sense possible, including everything from daguerreotypes to stills from moving digital image flows. If referring to a specific mode or form of photography, this will be noted in the text. Secondly, as the readers of this thesis may not be schooled within the computer sciences, the term algorithm should be explained. Within computer science, this describes any effective procedure that reduces the solution of a problem to a predetermined sequence of actions (Nowviskie 2014: 1). There are several uses for algorithms in software, of which the most common are to perform calculations, conduct automated reasoning, and process data. Algorithms can also, amongst other things, be implemented in mathematical models, mechanical devices or electrical circuitry. As one dictionary puts it, the term algorithm in common usage typically references “a deterministic algorithm, formally defined as a finite and generalizable sequence of instructions, rules, or linear steps designed to guarantee that the agent performing the sequence will reach a particular, predefined goal or establish incontrovertibly that the goal is unreachable” (Nowviskie 2014: 1).

In this, they are different from heuristics, which are processes of trial and error. There exists, however, nondeterministic algorithms which are designed to solve harder problems by finding the best solution available within a given set of restraints (Nowviskie 2014: 1). The algorithm discussed in in Chapter 2 is an example of the latter.

The last term which should be explained is data, as its colloquial use may obscure the technical meanings. According to one dictionary, data come in three kinds: the first is “data that refers to something outside of itself – encoding or representing changes outside of the computer.” Next, there are data as data, simply meaning anything handled by a computer.

Lastly, there are data that work on data, i.e. a program (Fuller 2014: 125). The latter definition refers to how most computer programs are themselves made up of data. In this sense, data is worked on, but also does work. The consequences of this will become clear throughout the thesis.

(17)

Chapter 1

Less Than a Thousand Words: ImageNet Roulette by Trevor Paglen and Kate Crawford

Introduction

Whenever the subject of machine vision in art is discussed, the name Trevor Paglen almost inevitably comes up. This American artist’s work seems to be included in almost every single exhibition, talk or text related to this topic. Paglen (b. 1974) is known for his many projects concerning surveillance and state secrecy. Through a highly multi-faceted practice he has addressed the issue of covert global surveillance at least since the early 2000’s. Already holding an MFA from the Art Institute of Chicago, Paglen also obtained a PhD in geography from the University of California, Berkeley in 2008. This speaks of the artist’s multidisciplinary approach to his work. While centered around photography, Paglen’s artistic practice has gradually developed into spanning a wide range of mediums including performance, sculpture and writing – often made in collaboration with teams of differently trained professionals. In the early 2010s, Paglen gained widespread international attention in the artworld with a series of projects which all focused on the material traces of clandestine military operations and finding ways to literally make these processes visible. Amongst these, The Black Sites (2006), The Other Night Sky (2010 -), and Limit Telephotography (2004-12) seem to be the most widely exhibited and praised. Since then, Paglen’s work has been exhibited, collected and discussed by a wide range of internationally acclaimed art institutions, rendering any survey of surveillance related art incomplete without mention of his work.2

As undertaking a thorough analysis of Paglen’s whole oeuvre would be a too comprehensive task in the context of an MA-thesis, I have chosen to discuss those of his projects that specifically addresses the topic of machine vision. This implies emphasizing the part of his work that followed after the performance Sight Machine in 2017. From that time on, he seems to have been more or less exclusively focused on instances of machine vision. After becoming an artist fellow at the AI Now Institute3, he has already produced at least eighteen projects specifically examining this phenomenon, amongst which several concern the analysis

2 A full timeline of Paglen’s life and work until 2017 is available in the monograph Trevor Paglen (2018) by Cornell, Bryan-Wilson and Kholeif (London: Phaidon).

3 The AI Now Institute at the New York University was founded in 2017 with the goal to produce interdisciplinary research on the social implications of artificial intelligence.

(18)

of faces by computer programs.4 The discussion in this chapter will start with a presentation of the project ImageNet Roulette (2019), which was a collaborative work by Trevor Paglen and Kate Crawford, who is a leading scholar on the social and political implications of AI. What they produced was more precisely a software, or “app”, embedded in a webpage to which the audience were invited to upload photos of themselves in order to have them categorized by an algorithm trained in the image database ImageNet. I will then go on to discuss the related exhibition Training Humans, which opened in Milan shortly after the release of the app in September 2019. All in all, the different parts of the project seem to raise questions of representation and power: what politics underlie the labeling of people into certain categories, and how does the taxonomy in question affect the people whose photographs are unknowingly being utilized by these systems? This brings forth yet another set of questions regarding the relationship between language and image, as the categories are all formulated as single nouns.

Regarding the order of presentation, I have chosen to stick to chronology. This implies presenting the app and how it went viral first, then the essay Excavating AI, and lastly the exhibition Training Humans. This is both to give the reader a sense of narrative, but also because it will give grounds for understanding the set of prior knowledge that the audience might have brought along with them to the exhibition. As the app gained a lot of international attention, I find it likely that many exhibition visitors would have registered the discussion it generated beforehand. The experience of the two can thus be seen as continuous, the one informing the other.

The Viral App

In September 2019, the hashtag #ImageNetRoulette was trending on Twitter and other social media. It referred to the selfie-app ImageNet Roulette, which had gone viral. The concept was simple: the audience, or “user”, uploads a photograph of a person, and the software detects a face in it, which it labels with one of nearly 3000 person categories from the database ImageNet.

When first released, the app appeared as one amongst many popular selfie apps where you can upload your photo and it will let you know “what you would look like as a toddler”, “what artwork features your doppelganger” and the like. ImageNet Roulette, however, proved to be something else entirely. As more and more people posted tagged images of themselves expressing how the algorithm “saw” them, #ImageNetRoulette was gaining lots of attention in

4 On his own webpage, Paglen har categorized his various projects by topic. I am here relying on the artist’s own classifications of his work. URL: https://paglen.studio/category/machine-visions/

(19)

social media almost immediately after its publication.1 This is indeed how I learned of the project myself. And how did I, a 25-year-old bespectacled white woman get labeled?

“Eccentric, eccentric person, flake, oddball, geek: a person with an unusual or odd personality.”

Experimenting, I also uploaded another portrait where I had taken my glasses off and ruffled my hair. The app then tagged me as “caveman, cave man, cave dweller, troglodyte: someone who lives in a cave.” In most cases, like mine, the results seemed harmless, even fun, although very banal. But soon it became clear to anyone following the trend how some of the tags were also highly problematic.

A tweet that caught my attention was posted on September 18 by a user named Lil Uzi Hurt (@lostblackboy), writing “no matter what kind of image I upload, ImageNet Roulette, which categorizes people based on an AI that knows 2500 tags, only sees me as Black, Black African, Negroid or Negro” (2019). Along with this text, he posted four photos of himself in various situations. They show a young and well-dressed man in a variety of situations, seemingly hand-picked in order to reflect his multifaceted personality. In all of them, his face is framed by a neon green square along with the tags “black, black person, blackamoor, Negro, Negroid” (Fig. 1). The labels in green show how, what to me look like deeply human photographs full of personality, are perceived very differently by a software that does not see past the color of this man’s skin. Another example is told in a New York Times article, where the app indeed does come up with labels denoting personal character. The labels, however, are not exactly flattering in this case either: Mr. Kima, a 24-year-old African-American, experienced that “when he uploaded his own smiling photo, the site tagged him as a

‘wrongdoer’ and ‘offender’” (Metz 2019). This demonstrates racist stereotyping which appears more problematic than ever at a time where police violence and racial profiling by police is an especially pregnant topic in light of the Black Lives Matter movement.

A wide range of similar posts and articles demonstrate how the algorithm employed by ImageNet Roulette is clearly biased in several ways, of which racist and misogynist tendencies are most prominently featured. The Guardian journalist Julia Carrie Wong gives words to the experience of being stigmatized by the software (Fig. 2):

(…) after a day of watching my fellow journalists upload their ImageNet Roulette selfies to Twitter with varying degrees of humor and chagrin about their labels (“weatherman”, “widower”, “pilot”, “adult male”), I decided to give it a whirl. That most of my fellow tech reporters are white didn’t strike me as relevant until later. I don’t know exactly what I was expecting the machine to tell me about myself, but I wasn’t expecting what I got: a new version of my official Guardian headshot, labeled in neon green print: “gook, slant-eye”. Below the photo, my label was helpfully

(20)

defined as “a disparaging term for an Asian person (especially for North Vietnamese soldiers in the Vietnam War)” (Wong 2019).

Together, these comments resulted in public shock and outcry, making the app gain ever more attention. At one point, it was generating as much as 100 000 labels an hour, according to the New York Times (Metz 2019). At this point, only a few days after its release, many were probably asking themselves how yet another selfie app with clear biases could come to be released without these problems being discovered beforehand. For example, an 2018 app by Google Arts Project made to let people find their doppelgangers in museum collections had been widely criticized for including a too narrow variety of Asian art so that Asians and Asian Americans were matched with pictures that looked nothing like them (Goggin 2019). Already three years previously, Google had also been criticized because one of its facial recognition programs tagged people of African descent as gorillas (Simonite 2018). While many were probably inclined to attack ImageNet Roulette as a continuation of these problematics, it soon became clear that the revelation of inherent biases was actually an intended part of the project.

According to Paglen, the bigger point that he and Crawford were trying to make was “how dangerous it is for machine learning systems to be in the business of ‘classifying’ humans and how easily those efforts can – and do – go horribly wrong” (Rea 2019). To spur criticism of AFR technology was in other words exactly the intended goal behind the roulette.

The point seemed to get across to a lot of people. In fact, ImageNet Roulette gained so much attention that it after only a few days resulted in a publication by ImageNet, the database which it critiqued, stating that 1,593 out of the database’s 2,832 person categories were now deemed as “unsafe”, and that over 600 000 images had been removed from the database (Yang et al. 2019). Exactly what this means is left unclear, but we are led to believe that the “unsafe”

categories will not be used for training algorithms in the future. Although the statement does not actually include any references to neither Crawford nor Paglen, the timing seems too close to the app’s release to be coincidental. In this respect, ImageNet Roulette appears to be a great success if understood as a form of activism, a view that Paglen and Crawford seemed to share.

Not long after, the two announced that they would shut down the webpage from September 28 because the project, as they saw it, had made its point: “it has inspired a long-overdue public conversation about the politics of training data, and we hope it acts as a call to action for the AI community to contend with the potential harms of classifying people” (Artforum 2019).

Thus, less than a month after its release, the webpage with ImageNet Roulette was already gone. If you tried to visit it now, you would be redirected to another webpage featuring an essay contextualizing the project. As the actual software was made by Paglen and Crawford

(21)

themselves and only trained in ImageNet, it was not obvious how the biases it revealed actually related to the database itself. In my view, the text replacing it was a welcome next step as it answered the kind of questions that the roulette alone had left open. In the following, I will present the main arguments of this essay, Excavating AI, before moving on to present the exhibition Training Humans.

Figure 1: Screenshot of tweet by the user @lostblackboy posted on 18.09.2019. Retrieved on January 26, 2021.URL: https://twitter.com/lostblackboy/status/1174112872638689281

Figure 2: Screenshot of the ImageNet Roulette webpage posted by the Guardian journalist Julia Carrie Wong.

Retrieved on 26.01.2021.URL: https://www.theguardian.com/technology/2019/sep/17/imagenet-roulette-asian- racist-slur-selfie.

(22)

An Archeology of Datasets

Excavating AI is introduced with a short summary of how the earliest attempts of developing automated image recognition systems in the 1960s proved to be a much more complicated task than first expected. Ever since, the problem has, according to the authors, been approached in purely technical terms, seeking to axiomatize the meaning of images. The same goes for facial recognition programs, as they are essentially a specialized form of image recognition. Asking whether the whole idea of approaching AFR as a technical problem is misinformed, Crawford and Paglen position themselves in opposition to what they portray as the norm within the field of AI. As an alternative, they want to “explore why the automated interpretation of images is an inherently social and political project, rather than a purely technical one” (Crawford &

Paglen 2019). The essay can thus be understood not only as an attempt to discuss the particular database ImageNet, but also to shed light on widespread attitudes informing the AI community more broadly.

Based on a method they refer to as an archeology of datasets, the authors pose some overarching questions: first, they will discuss what work images do in AI systems, and secondly what computers are meant to recognize in an image – as well as what is misrecognized, or even completely invisible (Crawford & Paglen 2019). Although the authors put a special focus on ImageNet, which they refer to as “the most iconic training set of all,” the reader is also introduced to several other datasets meant for training facial recognition algorithms. In this way, Paglen and Crawford shed light on some questionable assumptions which seem to be inform various taxonomies employed across the field of AFR, giving credence to the idea that there is an underlying problem epistemological in nature. The following is a summary of how the authors explain automated facial recognition in Excavating AI. I dedicate this a considerable amount of space because it also serves the goal of this chapter well: to analyze ImageNet Roulette while also to introducing the building blocks of AFR systems more generally.

Training sets are the foundation on which machine learning5 systems are built. If you want to build any kind of AI system, you need data on which to train your algorithm. If the task you want to teach is object or facial recognition, you will need vast amounts of carefully labeled images which are sorted into categories. On a basic level, a training image dataset (TID) can

5 Machine learning seems to be the preferred term used by practitioners within the field, instead of “AI”, which is much less specific. For more on this, see for example, Mattheo Pasquinelli (2019), “How a Machine Learns and Fails” in Spheres No. 5. Machine learning is also the term which the authors actually use in this text even though they use AI both in the title as well as in various interviews discussing the project. I therefore come to wonder if this is a strategic phrasing chosen for marketing purposes, as AI has different connotations and therefore might incite different responses with the general public.

(23)

thus be understood as a digital archive of images before the training even begins. When the algorithm is trained in the TID, what it does is to convey a statistical survey of the many image- label combinations before creating a model of what distinguishes one class of items from another by analyzing the data contained in the images. Paglen and Crawford illustrate this process with an example where the task is to make a system for recognizing the difference between apples and oranges:

A developer has to collect, label, and train a neural network on thousands of labeled images of apples and oranges. On the software side, the algorithms conduct a statistical survey of the images, and develop a model to recognize the difference between the two “classes.” If all goes according to plan, the trained model will be able to distinguish the difference between images of apples and oranges that it has never encountered before (2019).

The actual accumulation and labeling of images, then, is done manually by people, while what the algorithm does is simply to conduct a statistical analysis of this data and compare new images with its previous calculations in order to match it to a category. When it comes to categorizing images of faces, the process is exactly the same even though the subject matter is different. Whatever biases are learned are hence direct results of the categorization processes done while creating the TID. This is precisely why Paglen and Crawford find it so important to bring attention to the training datasets themselves.

There are several reasons as to why ImageNet is of particular interest to this discussion.

While originally presented as a research paper, ImageNet was later constructed and has grown to constitute a dataset of “extraordinary scope and ambition” (Crawford & Paglen 2019).

Setting out in 2009 with the ambition to “map out the entire world of objects”, it had after ten years scraped over 17 million images from the internet (Crawford & Paglen 2019). This was achieved by using what in Excavating AI is referred to as “an army of piecemeal workers” hired from Amazon Mechanical Turk.6 According to Crawford and Paglen, these workers were sorting an average of 50 images per minute according to the more than 20 000 categories available. In the following decade, ImageNet has been extremely important in the field of image recognition and research within computer vision more generally. For example, the database has been the basis for an annual competition where different teams of programmers, or “labs”, test

6 Amazon Mechanical Turk is an online crowdsourcing platform through which companies can hire people to do on-demand manual work that computers are not currently able to do. These workers are often referred to as

"Turkers" and are known to be paid extremely low wages for their work. The name Mechanical Turk is derived from an automated chess player constructed by Wolfgang von Kempelen in 1770. It toured Europe, playing very strong games against many people, including Napoleon Bonaparte. The machine was later revealed to be a hoax:

the mechanical “turk” was in fact operated by a human hiding within it.

(24)

their algorithms by pitting them against a given subset to see who can achieve the lowest error rate.7 It was considered a turning point in AI history when a team from Toronto used a convolutional neural network (CNN)8 to win the competition’s top prize in 2012. This led to an extreme increase in accuracy across the field. By 2017 (the final year of the competition) the record was 97,3%, which I take this to mean that the accuracy rates are very good (and it is surely even higher at this point). This means that the problems of bias do not lie in the algorithms inability to correctly follow the given set of instructions. Therefore, the issue must rather lie at the point of determining what is deemed an accurate interpretation.

In order to understand the taxonomy of ImageNet, one first has to know about the word classification database WordNet, on which its semantic structure is based. Wordnet was developed in the 1980s (also at Princeton), and is organized according to a nested structure of so-called synsets (cognitive synonyms). Each synset represents a distinct concept where synonyms are grouped together. The relationship between these is explained by Crawford and Paglen as follows: “synsets are then organized into a nested hierarchy, going from general concepts to more specific ones. For example, the concept ‘chair’ is nested as artifact >

furnishing > furniture > seat > chair” (2019). As opposed to WordNet, which attempts to organize the entire English language, the synsets in ImageNet are restricted to nouns – supposedly because this class of words is thought of as encompassing the entirety of what pictures can represent.

Regarding hierarchy, there are nine top-level categories under which all the others are divided: plant, geologic formation, natural object, sport, artifact, fungus, person, animal, and miscellaneous. While it is primarily the person category which is the object of Paglen and Crawford’s discussion, it should be pointed out that any taxonomy or other system of classification is always fueled by politics. I quote one paragraph of the essay here in full, as I believe it captures this problem very well while also pointing towards some historical precursors to this practice:

The category “human body” falls under the branch Natural Object > Body > Human Body. Its subcategories include “male body”; “person”; “juvenile body”; “adult body”; and “female body.” The “adult body” category contains the subclasses “adult female body” and “adult male body.” We find an implicit assumption here: only

“male” and “female” bodies are “natural.” There is an ImageNet category for the term “Hermaphrodite” that is bizarrely (and offensively) situated within the branch Person > Sensualist > Bisexual > alongside the categories “Pseudohermaphrodite”

7 In ImageNet’s defense, it should be noted that the person categories have, as far as I know, never been included in this competition.

8 CNNs are explained in more detail in Chapter 2.

(25)

and “Switch Hitter.” The ImageNet classification hierarchy recalls the old Library of Congress classification of LGBTQ-themed books under the category “Abnormal Sexual Relations, Including Sexual Crimes,” which the American Library Association's Task Force on Gay Liberation finally convinced the Library of Congress to change in 1972 after a sustained campaign (2019).

It is apparent that already on the level of taxonomy, politics are embedded in the process of deciding what categories to include and where the various synsets are deemed to belong. That

“bisexual” and other categories have been included under the branch “sensualist” tells of a certain moralism ingrained in the process of creating the database itself; making judgements on what sexualities or lifestyles are considered “sensualist” as opposed to other ways of being/living. This renders the taxonomy offensive already before the point of connecting the categories to images. The fact that the creators of ImageNet apparently believe personal properties such as the above to be potent visual identifiers in photographs just makes this even more problematic, on the verge of absurdity. As anyone within the fields of art history, media theory and the like will know, the authors point out, images are “slippery things” whose full meanings are seldom easy to describe with words – let alone with a single noun. Yet, as ImageNet Roulette demonstrates, the assumption that this can be done is exactly what TIDs are based on.

At the time when Excavating AI was written, there were 2,833 subcategories under the top-level category “Person,” labeling images of people according to attributes such as race, nationality, profession, economic status, behavior, character and morality. Although the assumption that any of these can be identified visually is problematic, it is with the latter three categories that the system seems to take a really dark turn. Paglen and Crawford summarize:

There are categories for Bad Person, Call Girl, Drug Addict, Closet Queen, Convict, Crazy, Failure, Flop, Fucker, Hypocrite, Jezebel, Kleptomaniac, Loser, Melancholic, Nonperson, Pervert, Prima Donna, Schizophrenic, Second-Rater, Spinster, Streetwalker, Stud, Tosser, Unskilled Person, Wanton, Waverer, and Wimp. There are many racist slurs and misogynistic terms (2019).

The examples cited above are only some amongst many possible that could have been mentioned. Still, even when looking at only these, it becomes clear how this very important TID is based on a taxonomy which is far from neutral, exhibiting gendered, racialized, ableist and ageist biases – just to name some of the most obvious problems. What gives someone the right to decide what a “bad person,” a “loser” or a “failure” looks like? To make matters worse, this kind of classification system has become more and more common in recent years in small and big AI companies alike. While perhaps the most important one, ImageNet is far from being the only database of its kind. In the essay, Crawford and Paglen also introduce the reader to a

(26)

couple of others; three of which are described at some length: JAFFE, UTKFace, and IBM’s

“Diversity in Faces.”

JAFFE stands for Japanese Female Facial Expression and is a dataset created in 1998 to train algorithms to recognize facial expressions. It consists of photographs of ten Japanese women with seven different facial expressions. The first assumption being put to questioned here, is the idea that concepts like emotion can be applied to people’s faces at all. The second is why these researchers think that there are only six emotions plus a neutral state?Third, there is the obvious problem that even if the first two assumptions are verified, one is still left with the fact that the emotions displayed in the pictures are acted out by the women and therefore do not express their actual feelings or state of mind. These problems clearly reduce the credibility of this dataset, but problems aside, JAFFE appears relatively harmless compared to the other TIDs discussed in Excavating AI. Having been made before it became possible to scrape millions of images from the internet – and also before the availability of cheap online labor through crowdsourcing platforms – the scale and ambition of JAFFE appears modest.

When it comes to TIDs produced in the age of Internet 2.0, on the other hand, the case is very different. Crawford and Paglen note how, “as training sets grew in scale and scope, so did the complexities, ideologies, semiologies, and politics from which they are constituted”

(2019). UTKFace was made by a group of researchers at the University of Tennessee at Knoxville (hence the acronym) and consists of about 20,000 images of faces with annotations for gender, age and race. According to the authors, it can be used for tasks like automated facial detection as well as age estimation and age progression. What they find most troubling about this dataset is how these categories are limited to a very small number of subcategories: gender is presented as a binary choice between male and female, while age is limited from 0 to 116.

While these assumptions can certainly all be questioned, the most troubling category, from my perspective, is nevertheless the one that places people into one of five racial classes: White, Black, Asian, Indian and “Other”.

It is unclear how this simplistic idea of race was determined. The taxonomy is nevertheless clearly problematic, recalling some of the racial categorization systems of the nineteenth and twentieth century. As the authors point out, UTKFace seems to parallel the so- called Book of Life used by the Apartheid regime in South-Africa from around 1970. With these

“identity passbooks,” the country’s entire population was categorized as either Black, White, Colored or Indian, and how people were identified had far-reaching consequences for their rights and freedoms. As Crawford and Paglen point to, the dataset can also be compared to the nineteenth century context of social Darwinism and imperialism in the nineteenth century, when

(27)

pseudo-sciences like physiognomy, phrenology and eugenics were at their prime. As I will discuss in depth in Chapter 3, all of these disciplines sought to classify humans based on physical attributes under the pretext of determining risk of deviant or criminal behavior. What they in practice actually studied, however, was “deviance from bourgeois ideals” (Crawford &

Paglen 2019).

The database Diversity in Faces is no better than the ones discussed above. It was released by IBM in 2019 as a response to criticism of the company’s existing facial recognition system regarding problems with recognizing people with dark skin tones. The new system was meant to be more representative, constructed as a “computationally practical basis for ensuring fairness and accuracy in face recognition” (Crawford & Paglen 2019). In line with what appears to be the norm when constructing TIDs, the company hoarded hundreds of thousands of images of unsuspecting people from the internet, especially from Flickr. What is special about this particular dataset, however, is the precise manner in which the developers decided to sort the images afterwards. Unsure of whether including age, gender and skin color in the top categories was sufficient to ensure “fairness and accuracy” – qualities which are themselves not easily defined – IBM decided that even more classification seemed like a good strategy. Thus, understanding the problem in purely quantitative terms, IBM moved into what Paglen and Crawford refer to as “truly strange territory”: in addition to the other three categories, they decided to add skull shape and facial symmetry to the list.

By taking a mathematical approach in their process of cognitive bias mitigation, IBM’s Diversity in Faces recalls yet another nineteenth century pseudoscientific method. In this case, the taxonomy of the dataset can be seen as a direct parallel to craniometry, which was used until the early 1900s as a way of predicting intelligence based on skull shape and weight (Crawford

& Paglen 2019). Reflecting on this, Paglen and Crawford state as follows:

Ultimately, beyond these deep methodological concerns, the concept and political history of diversity is being drained of its meaning and left to refer merely to expanded biological phenotyping. Diversity in this context just means a wider range of skull shapes and facial symmetries. For computer vision researchers, this may seem like a “mathematization of fairness” but it simply serves to improve the efficiency of surveillance systems (2019).

What the authors point to here is how programmers involved with facial recognition handle the concept of “fairness” without ever asking whether the problems they face might rather lie in the structure of the datasets or in the questions they are designed to answer. To the extent that they acknowledge the existence of any problems, they tend to use them as an excuse to harvest even more data. The lesson I take away from Excavating AI is thus that the biases in ImageNet

(28)

Roulette cannot be depreciated simply as “glitches” or “bad programming.” Instead, they point to deep systemic issues.

While the TIDs presented above each have their individual quirks, they are all based on the shared assumption that personal qualities can somehow be identified visually in photographic portraits. In several cases, this results in taxonomies that share the racist assumptions which were preeminent in the age of physiognomy, a matter which I will discuss in depth in Chapter 3. While the technical issues of creating a taxonomy are certainly interesting, much deeper issues also arise in the discussion, such as what agendas lie behind the creation of AFR technology in the first place, and what gives these tech-companies the right to be the judges of what is considered “fair.” Lastly, copyright and privacy concerns are also clearly relevant in that the photos employed are scraped from the internet without neither the consent nor knowledge of the people who are depicted.

Training Humans

If you thought that the story of ImageNet Roulette was over by now, you were wrong. As the many articles featuring the project will let you know, the app was created in concert with an exhibition opening its doors at the Fondazione Prada in Milan about the same time in September 2019. The exhibition was said to include some of the same images as those presented in Excavating AI,9 which created expectations for it to build on the lessons from the essay. Having received very positive reviews – both Digicult (2019) and Mousse Magazine (2019) described it as a landmark exhibition – my expectations of Training Humans were naturally high. I was also triggered by Fondazione Prada’s website, which brazenly announced Training Humans as the first ever major photography exhibition devoted to training images. Furthermore, it was promised to “reveal the evolution of training image sets from the 1960s to today” by exploring

“how humans are represented, interpreted and codified through training datasets, and how technological systems harvest, label and use this material” (2019).

The Gallery’s promotional text posed two questions, around which the exhibition was supposedly centered: “where are the boundaries between science, history, politics, prejudice and ideology in artificial intelligence? And who has the power to build and benefit from these systems?” (Fondazione Prada 2019). While these questions were discussed to some extent in Excavating AI, the exhibition was promoted in a way that gave the impression that one might expect to learn even more about the technicalities involved here. A statement by Crawford

9 This is, amongst other places, pointed out in the list of acknowledgements in Excavating AI.

(29)

substantiated this perception: “what we hope is that ‘Training Humans’ gives us at least a moment to start to look back at these systems, and understand, in a more forensic way, how they see and categorize us” (Fondazione Prada 2019). To what extent could such claims be substantiated? In the following, I will discuss this question, starting with a presentation of my own experience of the exhibition.

In February 2020, I found myself at the Osservatorio; Fondazione Prada’s art gallery located in the exclusive Milanese indoor shopping street Galleria Vittoria Emanuele II. To get there, the gallery visitor first had to go inside the fashion brand’s flagship store and take the well-hidden elevator to the fifth floor. Passing the reception, you would find yourself standing inside an interior reminiscent of a spacious penthouse apartment. The walls were painted in a calming shade of blue, the floors covered in a stately fishbone parquet, and large windows along the whole left side of the gallery let fresh daylight fill the space. The immediate impression of the gallery space was more inviting than its rather exclusive (in the true sense of the word) surroundings, but this feeling was countered by an eerie soundtrack. Repeating a series of voices reciting what sounded like randomly assembled sentences in a variety of English accents, this hidden source of sound sharpened one’s senses. Looking inwards, the first thing I noticed was a large black and white print of a highly pixelated fingerprint dangling from the ceiling in the middle of the room. Behind it, to the right, was a long wall containing large amounts of photographs arranged in clusters, as well as several flickering TV-screens – some mounted on the wall and others on the floor. Somewhat overwhelmed by the sheer number of images presented here, the first impression of the room was rather confusing even though the many images did not physically take up much space.

Uncertain of what dataset to approach first, I turned and looked at the welcoming text on the wall behind me. It started by ascertaining that the relationship between humans, images and image-making technologies has changed dramatically during the last decade with AI becoming ever more present in everyday lives. Furthermore, besides a list of points already discussed in Excavating AI, the text also explained the general layout of the exhibition: it was organized chronologically with the first (fifth) floor presenting the earliest TIDs from the 1960s onwards, while the newest datasets would be presented on the second (sixth) floor. The text then concluded with the following, rather leading statement, promising a certain build-up of intensity as the visitor would move along:

As the classification of humans by AI systems have become more and more invasive and complex, their biases and politics have become apparent. Within computer vision and AI systems, forms of measurement turn into moral judgements. Our images now look back at us. And we won’t always like what – or how – they see.

Referanser

RELATERTE DOKUMENTER