PhD Progress & Thoughts

Using Online Data From a Third-Party Source: The Ethics of Security Breaches

Replying to:

Research Whisperer posted on Twitter earlier this year with the question, ‘would you use data provided through a security breach, made public on the internet?’

My simple answer was no, and I’ll briefly delve into why.

My research topic, scope and focus deals with real, highly-ethically-careful individuals. Every aspect of their work is subject to ethical and moral scrutiny. For many tasks they need to submit forms in triplicate to justify their decisions just so there’s a clear paper trail in case something goes wrong. In reflection of that culture, and the fact that revealing participant identity could seriously risk that participant’s career, I subject myself voluntarily to the same ethical screening process as everyone else. I applied for a Health Research clearance even though my research isn’t Health Research; I approach participants with a higher level of anonymity and confidentiality than is usually expected of someone in my field, and I take extreme lengths to preserve that trust. I go hours out of my way to ensure that participants are respected, safe, and comfortable with the project before proceeding. This process has taken months – even years – of reading, hand-shaking, listening and preparation to facilitate. It’s not a task I take lightly.

And yet, certainly in this highly-connected world, there are individuals who do not see data the same way I do.

Conducting humanistic research, and research involving people, is highly charged. People volunteer their thoughts, feelings, and ideas to researchers and certain outlets under the assumption that those perspectives are secure and protected. So what happens when there’s a data breach and those confidential perspectives are leaked to the wider internet? Or, those perspectives are mined for their data for a price?

I say it’s highly unethical. Were I a journal reviewer or editor I would immediately reject a paper with a questionable source of data; online or not.
I believe that the researcher cannot – and should not – be separated from the data collection process. They should be able to clearly state how and where they collected their data for analysis. If the ‘how’ and ‘where’ is sketchy and there isn’t a VERY good reason for doing so (such as research on drug cartels, and it’s not safe for the researcher to obtain the data through conventional channels?) I’d give it an instant ‘No’.

If a researcher openly admits, or dances vaguely around the fact, that data was mined or paid for as a result of a security breach, then I’d have serious questions about the integrity of the research team – and the university at large – for allowing this research to proceed. Openly-published data is one thing – scraping public internet comments off a news site is one example – but data that was never meant to be in the hands of the public is entirely another. The participants never provided consent, they’re not aware of how their responses will be used, and in many cases this can’t be excused as a simple ‘double-blind’ treatment to facts. It starts to get very close to blackmail and coercion.

Don’t believe me? Take a look at how these data breaches occur, and then tell me that a data breach couldn’t be orchestrated by a research department with enough money that wants access to information they cannot legally obtain. Think a government site containing health information of private citizens where the individuals trusted the site to collect their information for medical use only. A thorough researcher could retrieve that information ethically through various Freedom of Information or hospital database channels; to grow impatient or overconfident in the access that money and hacking can obtain removes the integrity of the project through unconscionable means. The official channels, in most cases, serve an important purpose. Accountability. Track-ability of where the data is going, and to whom. And, perhaps the most importantly, before data is handed over it’s usually scrubbed of identifying information first. Handled by a professional who knows how to protect patient identity. Not a random techie who dumps gigabytes of raw information into a drive and leaves it for anyone to find. Properly accessed data is fit-for-purpose in the directness of the request; no extraneous information is included. If something goes wrong and the data is mis-used, there’s a name on a form detailing who is accountable for that error.

I take research ethics very seriously. After a few brushes with academics who were willing to openly participate in academic misconduct, I now scrutinize EVERY detail of where I’m getting my facts from, how I intend to use those facts, and whether my results are for the good of the stakeholders.

See, here’s a thing everyone; If I conduct research, and my findings are going to hurt people, I simply don’t publish them. I scrap the project. Time and effort investment is not as important as the livelihoods,careers, and well-being of people. Real people.

Do I have the discretion to hide my data and results if I know that the publication of such results could have serious adverse impacts to my research population? Actually, yes. The considerate use of data goes beyond ensuring that it’s collected ethically – you have to use it ethically, too. Which is why I don’t trust studies based off breached and stolen data. With a handful of a few, uncommon exceptions, stolen data is not accurate data. Stolen data is not the best data for research, and it certainly wasn’t collected in close design with your research project. Using stolen data means that the team retroactively changed the study aims and parameters to fit what they found; implying that the analysis could also be skewed for their convenience as well. Or, perhaps worse, the research team designed the study anticipating the receipt of hacked information.

Those who are willing to engage with, and utilize, stolen information might lead ethics committees and academic conduct officers to more unscrupulous practices in the department. As for me, I’m going to continue carefully curating my methods and sample so as to collect accurate, respectful information that can be consented to (within reason) for the benefit of the community. That way, no matter what I find, I know that I conducted myself in good faith with stakeholders.

And I’m never left wondering who, exactly, those stakeholders are.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s