The Ethics of Social Media Data. Huffer, Wood and Graham. Internet Archaeol. 52.

We begin with the living, which makes the first ethical dilemma presented by our research that of anonymisation. We are scraping tens of thousands of posts from Instagram and other social media platforms, but just because somebody has put material online in what could be construed as a public forum doesn't necessarily mean that they are agreeing to have their material, their identities, their posts, studied and pulled together into a dataset. Just because one can scrape this material doesn't necessarily mean that it's appropriate to be sharing usernames, for instance. Although the Instagram Terms of Service specifies that users 'own' the content they upload, the act of uploading grants a worldwide right to Instagram that makes it inevitable that material will spread, and be collected into third-party databases. That said, an ability to 'eavesdrop' or gain access to 'big' datasets doesn't necessarily translate to informed consent to use information obtained (Richardson 2018; Kirkegaard and Bjerrekaer 2016).

The question becomes one of whether or not a person has a reasonable expectation to privacy while posting publicly. If they are posting, for instance, things for sale or are inviting others to admire whatever is depicted, then it seems reasonable to conclude that they have waived that right to privacy. It follows then that all other materials collected as a by-product of the search do retain a right to privacy, and should be eliminated from the dataset. Of course 'posted for others to admire' is the entire raison d'être for Instagram, and for others to admire or purchase in regard to commerce-orientated Facebook groups and Facebook Marketplace. As the Cambridge Analytica scandal so graphically reminds us (Kozlowska 2018) there might be substantial amounts of data related to an individual collector's activity patterns on Facebook now inadvertently stored by Cambridge Analytica as well. (Even the so-called 'Dark Web' and transactions completed using 'untraceable' crypto-currencies like BitCoin do indeed leave sufficient digital traces for traders to be unmasked, see Paul 2018.)

Nevertheless, that does not mean that we have the right to re-post or identify individuals, or to facilitate ways in which those identities could become known. In any event, it seems impossible to provide true anonymisation anymore. The advertising ecology and its attendant arms-race track our every movement - a New York Times investigation found they could purchase data from data-brokers (whose tracking codes are built into myriad smartphone apps) which they could de-anonymise with ease. Movements could be tracked with a time resolution of every 21 minutes, on average (Valentino-DeVries et al. 2018). For the archaeologist interested in topics at the intersection of archaeology and social media, one way to move research forward ethically might be only to discuss and reflect on aggregate patterns where the original data is posted with an expectation of being public, and not to share the raw data collected in the course of the study.

In our earlier study, we noted that despite trying to anonymise our data, elements of the posts can provide a unique fingerprint and a lever into tracking an individual across platforms - especially hashtags. We surmised that the stability of hashtag 'trains' (sequences of hashtags) being used by individuals was a result of using the smartphone app for Instagram, probably with autocorrect or a text expander so that the sequence of hashtags that they want to use is inserted. However, the use of autocorrect results in the same pattern of hashtags each time, with which a researcher can 'de-anonymise' a person. In which case, if we made that dataset available but subsequently discover errors or elements that can be used to de-anonymise, should we delete it? People know that comments can be trawled, observed, or read (indeed we have seen posts in various Facebook groups devoted to buying and selling human remains that shared our original article discussing this very point), so it is likely that the image itself will contain information relevant to other collectors or dealers 'in the know'. This raises another ethical point: given the morally grey and dubious legality of this trade, should we be publishing in open access venues where participants might see our work? Perhaps not. Open access publishing itself is laden with questions of privilege and power (e.g. Tennant et al. 2016; De Castro and Salinetti 2004; Chesler 2013).

Ultimately, we wish to avoid the danger in which the publication of new research drives the trade further underground or into 'corners' of the internet not yet being actively investigated, or further muddies the water between what is and is not licit.

Internet Archaeology is an open access journal based in the Department of Archaeology, University of York. Except where otherwise noted, content from this work may be used under the terms of the Creative Commons Attribution 3.0 (CC BY) Unported licence, which permits unrestricted use, distribution, and reproduction in any medium, provided that attribution to the author(s), the title of the work, the Internet Archaeology journal and the relevant URL/DOI are given.

Terms and Conditions | Legal Statements | Privacy Policy | Cookies Policy | Citing Internet Archaeology

Internet Archaeology content is preserved for the long term with the Archaeology Data Service. Help sustain and support open access publication by donating to our Open Access Archaeology Fund.

2. The Ethics of Social Media Data