Understanding Data - The Data-Driven Life

Chapter 2. Literature Review

2.1 The Data-Driven Life

2.1.3 Understanding Data

This thesis will argue that quantified data is a new means of mediating experience. To understand what this means, first we need to understand the constructed nature of data. As I’ll show below - there is considerable academic consensus from sociological

perspectives on the necessarily constructed and subjective nature of data, but it’s worth unpacking this fully, in four parts. Having established this, we can recognise how what data means is not constant, neutral, or independent, but deeply situated. Data has to be interpreted, and applied in a context to give it any meaning – for it to gain purchase as a representation of some reality.

A) Data is not raw

Firstly, ‘data’ is not simply out there in the world, a raw resource, waiting to be mined. On the cover of ‘Raw Data is an Oxymoron’, Gitelman (2013) suggests “we shouldn’t think of data as a natural resource but as a cultural one that needs to be generated, protected, and interpreted.” In boyd and Crawford’s (2012) take on Gitelman’s work;

“data needs to be imagined as data in the first instance…and this process of the imagination of data entails an interpretative base: every discipline and disciplinary institution has its own norms and standards for the imagination of data.” (p.667). This makes clear that interpretation, and hence subjectivity, “is at the center of data analysis” (p.668), even from its very conception. The counting of steps depends upon a prior interpretation of what constitutes ‘a step’, and that steps effectively represent activity. Conveniently, and not uncoincidentally, steps are also easier to imagine and measure as a unit of activity. And so, they can become a dominant metric for measuring daily activity. To emphasise this inherent constructivism, Drucker (2011) argues that we should talk about capta – that which is captured – rather than data – that which is simply given. Rather than a mere reflection of reality or any ‘facts’ about the world data is instead actively constructed

More critically, beyond simply the imagination of data - authors have highlighted the politics of measurement (Beer, 2016; Pine and Liboiron, 2015) and the way data is constructed. Bowker and Star (2000) show historically how ‘sorting things out’ – any operation involving categorisation, and classification to scaffold ‘information

infrastructures’ (i.e. databases) – requires imaginative interpretation, and frequently admits bias and politics. Classification requires choices about categories; what counts, and what does not. Further, there are all kinds of ‘work’ people undertake to make these categories mutable or to fit better within them. As Bowker (2000) has argued, “the database will shape the world in its image”. In the context of healthcare, Pine and Liboiron (2015) show how such politics are rendered visible, or invisible, in the user interface of interactive systems. The implication is hence to question what kind of memory self-tracking tools will shape, and how this will manifest at the interface. B) Data is reductive

As it models, and abstracts, from experience, data is essentially reductive. Steps, become a proxy for ‘physical activity’, occluding many other movements which might otherwise count towards being physically active. Indeed, data is a powerful tool because of its reductiveness. Borges (1946/1996) illustrates the point in an absurd single paragraph short story “Del rigor en la ciencia” (or “On Exactitude in Science”) where to create a

perfectly exact map of an empire the map must be of the same scale of the empire itself. This echoes an earlier short story by Lewis Carroll, (1894) remarking the uselessness of a map on the scale of ‘a mile to the mile’.

Sharon (2017) expresses the concern about the ‘reductionist effect’ of viewing the world as data, where the scale is not properly acknowledged, or its fidelity and granularity overstated.

“Self-tracking works on the basis of categories or indicators that act as proxies for what are commonly very messy and rich phenomena, from ‘mood’ to ‘health’ to ‘productivity’. When devices are described as giving users a ‘dashboard’ or a ‘perfect picture’ of their health, these data have a tendency to come to denote what health is.”

A central issue is that as rough edges are smoothed, outliers are removed and the rich messiness of everyday life is cleaned up, a model of reality is produced that is average, idealised or normative – “thereby pushing users to think about their own behaviors in

accordance with predetermined standards and to conform to them in practice.” Lupton

(2014b) demonstrates this viscerally in her analysis of a huge range of ‘Quantified Sex’ apps, which she argues reinforce gender stereotypes and “specific limited types of

sexuality”.

This reductive nature can be understood further as the process of ‘commensuration’ described by Espeland and Stevens (1998).

“Commensuration can be understood as a system for discarding information and organizing what remains into new forms. In abstracting and reducing information, the link between what is represented and the empirical world is obscured and uncertainty is absorbed. Everyday experience, practical reasoning and empathetic identification become increasingly irrelevant bases for judgment as context is stripped away and relationships become more abstractly represented by numbers.”

Many of the contemporary concerns about ‘dataism’ are present in a careful reading of this above description – that ‘uncertainty is absorbed’ and that empathy and everyday experience become ‘irrelevant bases for judgment’. Commensuration is the essence of quantification, and the ability to represent and model reality numerically. Things that are essentially different are stripped down until they share some essential or defining

characteristics, rendering them available for comparison, numerical manipulation and representation. Reducing my physical activity to steps, allows me to count the number of steps I take each day, and compare them. Even though those steps were at different times, to different places, for different reasons; for the purposes of tracking physical activity, they can be commensurate. The qualitative difference between my steps yesterday and today has been reduced, as they are deemed to be insignificant to the question of physical activity. And indeed, this may very well be the case.

In this light, it is clear that the construction of data is about choosing how to abstract, reduce and model phenomena in the world. So, the core questions then concern how people account for and interpret these reductive models of reality. When is data ‘too reductive’, and how is this made sense of? When or why does it matter that I know that my steps were taken on a trip to Paris, rather than a commute to work? To reiterate, data is necessarily reductive. The matter of interest lies in how real-world phenomena are reduced, and then interpreted.

C) Data displaces other ways of knowing

Bowker (2005) has described archives – be they data or other kinds of records – as ‘jussive’ exercising judgment, precisely because they exclude, or preclude, other ways of remembering.

“[B]y remembering all and only certain facts/discoveries/observations, [it] consistently and actively engages in the forgetting of other sets. This exclusionary principle is, I argue, the source of the archive’s jussive power.” (Bowker, 2005: 9) Much the same could be said for data. As Rettberg (2014) notes, data often appears ‘beyond argument’ and presents an authoritative representation of the world. The

proposed precision of data makes it seem especially determinative, and certain, especially in the face of seemingly subjective argument, or as Wolf (2010) disparages, human

intuition. Often, uncertainty is abstracted out; the constructed nature of data, or the numerous assumptions on which it turns are often left unacknowledged.

It displaces other ways of knowing by prioritising what can be easily or accurately categorised and counted. Especially as infrastructures are built on the basis of particular measures, data is institutionalised and becomes routine, and easy to hand. It can become the ‘objective’ and established basis for making decisions and other approaches can be marginalised. Sharon (2017) summarises sentiments that “As one’s trust in numbers grows, it is feared, one’s trust in subjective, embodied, and intuitive knowledge decreases.”

One novelist argues we risk losing “the sensory connection to our lives… all the raw

materials of life, which by their very nature are disorganised” (Feiler, 2014). This fear might be felt most acutely in relation to parenthood. Numerous ‘Quantified Baby’

applications now exist (Kane, 2016), primarily pitched at easing the anxieties and burdens of new parents through monitoring of, for example, vital signs, sleeping and feeding. In short, according to one CEO: “You can look at your smartphone and know that

everything is OK”. Gaunt et al. (2014) consider how to design such technologies to avoid an unhealthy dependence on numbers, perceived to hinder the development of a parental intuition.

By contrast, however, the most active and reflective users at Quantified Self meet ups have described how self-tracking for them is not a window onto some inner truth, but instead a mirror which prompts reflection on the mundane and easily overlooked. The act of self-tracking itself – becoming attentive to something through the act of recording it – can be conveyed as a form of mindfulness (Sharon, 2017). At its most extreme, some have described heightening or training new senses – for example, learning one’s compass orientation (Stone, 2013), or how to better recognise different symptoms and their causes. This body of scholarship demonstrates the construction of data: which is not ‘raw’, which is necessarily reductive, and which can displace other ways of knowing. These critiques are essential to understanding data. However, Sharon’s work (2016; 2017) in particular offers more than criticism. Examining self-tracking practice, she suggests barbed terms

such as ‘data fetishism’ can discount the nuance and mindfulness people can apply to their data, and a diversity of practice that extends beyond the pursuit of self-optimisation:

“…the relationship between so-called objective data and so-called subjective experience is hardly a zero-sum game for many QSers, but rather a tension and a negotiation that produces meaning, a process that QSers are often aware of partaking in.” (Sharon, 2017; pp114)

It is precisely this tension and negotiation that this thesis is interested in. boyd and

Crawford (2012) highlight the importance of how the relationship between the ‘objective’ data and its ‘subjective’ context is maintained – “data out of context loses meaning”. Beyond the technical and epistemological limitations of data, what matters most is the terms on which data is interpreted and applied. In broader terms, how data is situated, and the stories it can plausibly tell.

D) Data is Situated

The idea of understanding how data is situated is captured by Taylor et al.’s (2015) concept of ‘data-in-place’:

“Data, from this viewpoint, doesn’t by itself assert things in the world; rather, it helps to surface, assemble, cement and (at times) unravel forms of knowing, ideas, controversies, and so on. Also, it combines with and is entangled in wider forms of life, not always simplifying and narrowing in on the facts, but often further

complicating what is at stake and introducing new and different forms of trouble.” (Taylor et al., 2015; p.2863)

In the context of street-level data collection on issues such as traffic, air quality and noise levels, Taylor et al. (2015) bring focus to how data of one kind or another actually comes to matter to those collecting it and living with it. Drawing from Wilson’s accounts of street-level data tracking (Wilson, 2011) they show how data can gain legitimacy and be put to work, for example in petitioning the council or raising awareness about an issue. Crucial here, is a gentle push against the implication (from critics and technologists) that data, by its own technological clout and sophistication, is self-explanatory and

feature of data – not all data matters – instead, it is intimately tied to the social-geography of the community. As such, there are structures and boundaries to the data that is

collected and how it is shared. Mattering has a dual meaning: both the data itself that matters; and how and when it is materialised, brought up, and made visible.

In this respect, data-in-place suggests “a reconceptualisation of data, one that accounts for the ways in which it is contingent on very particular circumstances.” In so doing, it does not “presume an intrinsic generality”, which is neutral, objective and abstract from its context. Instead, it “acknowledges precisely its place in and amongst other worldly things.” In such a way, Taylor et al. (2015) suggest their approach (like this thesis) is about small and particular rather than big and general data.

‘Data-in-place’ therefore invites analysis of the social process of data.

Ethnomethodological approaches (Crabtree and Mortier, 2015; Tolmie et al., 2016) are acutely attentive to the interactional qualities of this social process, emphasising the potential of data as a boundary object, and the ‘collaborative work’ that goes into making accounts of it. Drawing from Star and Griesemer (1989) this “turns upon ‘a mutual modus operandi’ involving ‘communications’ and ‘translations’ that order the ‘flow’ of information through ‘networks’ of participants. This, in turn, creates an ‘ecology’ of collaboration in which data interaction becomes stable.” (p.338)

This work concerns how data ‘coheres’ as a boundary object across settings. How do you and I agree upon and interpret data in the same way, so that it is a reliable mediator? Or, returning to Taylor’s terms: how does data settle in place, becoming stable in a particular context – in its meaning, interpretation and use? For example, Fiore-Gartland and Neff (2015) report that consumer health and self-tracking data is mostly unstable and fails as a boundary object in many interactions between doctors and patients. The meaning,

interpretation and the resulting action are not at all stable. Instead, they argue that the doctors and patients have different data valences, which may rarely overlap. A clinician is looking for data that offers a definitive diagnosis, while a patient may see their data as a way of evidencing symptoms or a mode of self-discovery.

Turning specifically to self-tracking, this thesis asks broadly how quantified data comes to matter to people, particularly over the longer term. How does data gain purchase on

reality and the way people can account for the past in their lives? What is the context in which a quantified past “helps to surface, assemble, cement and (at times) unravel forms of knowing, ideas, controversies” about the past? (Taylor et al., 2015; p.2863)

These are questions I will return to. However, from this more nuanced understanding of data, its construction and situated nature, we can now reflect on extant work, especially in HCI, on the design of self-tracking technologies.

In document A quantified past : fieldwork and design for remembering a data-driven life (Page 37-44)