The Responsible Data newsletter: regular, curated links and updates from the responsible data community.

View this email in your browser

Open access, gone wrong

Vox reported this week that a group of Danish researchers scraped data on 70,000 users of the online dating site OKCupid, used it for their research and then uploaded it to the Open Science Framework. This is incredibly personal data, and at no point did OKCupid nor any of the users give consent for their data to be used like this.

The researchers’ excuse? “The data was already public.” Answering critics on Twitter, the main researcher behind it reveals an astonishing lack of understanding into what they’ve actually done, and the consequences. Issues like different levels of visibility and accessibility, people’s consent in giving data for one particular purpose, and the consequences that might arise from uploading such personal content to the web, seem to have totally escaped him. He actually asked fellow Twitter users to “leave him out of ethics discussions” (because gosh, why would ethics be relevant here?) - and seems to think that using pseudonyms is enough to “anonymise” the data.

I’m not linking to his tweets here because a small (yes, perhaps naive) part of me hopes that he sees sense, deletes his tweets, and issues a heartfelt apology, perhaps to look back in years to come and rue his thoughtlessness.

Above all, the lesson here is the importance of teaching ethics in computer or data science courses - like Anna Lauren’s Data and Ethics course. For a great summary of pedagogical approaches to data ethics, check out this report from the Council of Big Data, Ethics and Society. Kate Crawford and Jacob Metcalf also recently published a paper which touches on some of these issues, too: “Where are Human Subjects in Big Data Research?”

Transparency is a two-sided coin

As above, when we reveal information - even with the best intentions, wanting to be transparent or more open in what we do - this can sometimes be used in unexpected ways. This week, ProPublica revealed that their Prescriber Checkup database, which outlines the prescribing habits of hundreds of thousands of doctors across the United States, has been used in a way they hadn’t intended: to find doctors who are prescribing “widely abused drugs, many of them opioids” - which are addictive.

In their thoughtful post on the topic, they rightly highlight that this isn’t a new problem: misusing journalistic work for malicious intent has been happening for decades. But what I appreciate most about this case is the way they’ve dealt with it: 1) they weren’t required to to tell anyone that the data was being used counter to original intentions, and 2) they noticed it only through internal analytics they see from database queries. They’ve added warning labels to the public facing database, and been overwhelmingly open about the challenges they’re facing, saying they “believe it is responsible to discuss”. Refreshingly responsible 💚 .

Trust us, we're professionals

During the rare occasions that we actually do read the Terms of Service of an app, we make one important assumption: that they’re telling the truth. Sadly, it turns out that this isn’t always the case. The Norwegian Consumer Council have an ongoing campaign based on a research report, Appfail, which analysed the terms of 20 mobile apps, looking for potential threats to consumer protection.

As part of that campaign, this week they revealed that the fitness app Runkeeper “tracks users and transmits personal data to a third party even when the app or handset is not in use”, and have subsequently lodged a complaint with the Data Protection Authority. Excuse me a second while I delete the app from my phone.

Thanks to Elinor Carmi for pointing me to the Appfail report recently!

It's conference season

...at least here in Berlin, it is. I’ve been keeping an eye out for talks to share with a responsible data focus, and come up with these two, both from people I admire hugely. Kate Crawford keynoted re:publica last week with this talk on bias and discrimination in machine learning, big data and predictive analytics, stocked full with real-life examples, as well as some really thoughtful analysis of what this could mean in the future.

At the same conference, long-time responsible data ally Mushon Zer-Aviv turned a post he wrote in preparation for the Responsible Data Visualisation event in January, on “if everything is a network, nothing is a network” into this great 25 minute talk. Even if you’ve read the post (do it) - I really recommend the talk, too.

GIF copied from Mushon's post - among his many talents, he's great at finding them.

Relatedly, and not to toot my own horn, but I had the pleasure of giving a talk at CSVConf where I talked about bridging the gap between technology and activism, with lots of responsible data themed examples. While preparing for that, I came to the conclusion that lots of the responsible data work we do is far less about knowing the right answers - and more about asking the right questions.

Save the map

Indian activists have yet another fight on their hands, thanks to a new bill which threatens to regulate geospatial data. As Scroll.in explains, under the new draft Geospatial Information Regulation Bill 2016, anyone who depicts India in a map that “does not match the government’s official approved version, could be fined between RS 10 lakh and Rs 100 crore, and/or sentenced to seven years in prison.”

Writing in Medianama, Nikhil Pahwa goes into more detail about just how ridiculous this idea is. In its current iteration, it will affect a lot of people and services - ie. anyone that plots locations of anything at all, including people (like sending your location to a friend via Whatsapp) - or taxis (with Uber) - or even, anybody taking digital photographs which contain location metadata.

It’s not the first time that the Indian government have been sensitive about their borders, either - back in 2010, they got into a fight with the Economist about the way that Kashmir was depicted in the map. In a hark back to the recent Save the Internet campaign, activists have now set up the Save the Map campaign: http://savethemap.in/, to encourage citizens to write to the Indian government and protest against the law.

Responsible data food for thought

It was recently revealed that Google’s artificial intelligence company, DeepMind, has a data-sharing agreement with the UK’s National Health Service (NHS). For me, this reveals at least one thing that isn’t a great surprise: internally, the NHS don’t have the technical capacity to deal with all that data. The main controversy centres around the fact that the British public were unaware of the level of access granted to Google through this agreement - historical data as well as live data - coupled with a lack of clarity and transparency around what the aims of this partnership actually are. Predictive analytics around health can open up a lot of tricky responsible data issues: so, what’s the best way of responsibly learning from the massive amounts of data that the NHS holds? Being optimistic, there is potential to be providing better health services to the British public - but it’s unrealistic that the NHS is going to develop an advanced AI department internally. What needs to be done for these developments to happen responsibly?

Community updates

If you’re in India, you have until June 4th, 2016, to let the government know how ridiculous that Geospatial Information bill is. Send them an email.
Under a slightly different label, the newly revamped Digital Impact site has a lot of useful resources to help thinking through more responsible use of data. It’s focused largely on the use case of non-profits in the US, but there’s some cross-cutting tools, too.
The Humanitarian Tech Festival 2016 is coming up soon on June 4th,, and there are still limited travel stipends available for those in need.
The White House recently released a new report on ‘the intersection of big data and civil rights’.
If you work with or in human rights, we’d love to hear about how you’re using data in your work: we have a very short (4 question) survey open at the moment: https://engn.it/hrsurvey (and in Spanish here). The answers from the survey will inform an upcoming writing sprint, during which we’ll be trying to create useful resources to help human rights defenders engage with new and emerging data streams. More on that soon!

As always: feedback, suggestions or links for the next newsletter are very much welcomed
- Zara and the engine room team

This work is licensed under a Creative Commons Attribution 4.0 International License.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list