This piece was originally published on Medium.
This past weekend, years of advocacy and reporting from Carole Cadwalladr, Paul-Olivier Dehaye, and Max Schrems finally got its due on the front pages of the New York Times, the Guardian, and then, well, everywhere else. Without going into the minutiae, Cambridge Analytica, a voter manipulation firm, harvested the records of 50m mostly US voters through a quasi-academic organization called Global Science Research. The theory is that they used that data and Facebook’s micro-targeting services to stoke division and, ultimately, elect Donald Trump, among others.
The reporting has painted this as a breach, though there’s argument of what, and commentators are highlighting a wide range of legal, ethical, and regulatory theories to seek accountability. While we brace for a lot of hearings and performative discipline, it’s important to recognize a bigger issue: technology platforms and data collectors share huge amounts of sensitive data with researchers with very little review or enforcement all the time.
Cambridge Analytica knows that. In fact, their CEO offered to pose as an academic to get access to data in a Channel 4 documentary. Sharing data with researchers is a feature, not a bug — but the failure of organizations that share data to build any kind of ethical review or enforcement infrastructure is proving to be a big problem.
The practice of sharing data with “for good” has become so common that it’s launched an entire branch of charitable giving, called “Data Philanthropy.” Some of the world’s largest companies and governments set up foundations — not to mention technology platforms — explicitly for the purpose of collecting and sharing huge data sets. Unfortunately, like many other forms of philanthropy, data philanthropists don’t do a very good job of defining “for good.” Even where they do, they rarely build the governance and enforcement processes to ensure that data sharing achieves its intended process.
Even scarier, when they do share data, they rarely impose meaningful limitations on use, which is what opens the door to abuse. Most data philanthropists do what Facebook eventually did — rely on template contracts and researchers’ good will to protect incredibly sensitive data on hundreds of millions of people. They rarely develop any way to monitor what researchers actually do with shared data or take any practical steps to verify or enforce the conditions they impose. And those, unfortunately, are the organizations who sincerely mean well.
There are also some “big data for social good” initiatives with lots of internal, opaque processes that systematically violate the privacy of billions of people at a time — and then share the results with some of the world’s most dangerous governments. As danah boyd warned in 2016, Virginia Eubanks explored in Automating Inequality, and Safiya Umoja Noble underlined in Algorithms of Oppression — it’s often the people trying to do the most good that lead to the largest harms. Facebook didn’t take this process or this problem seriously, opting for a liability fig leaf and the absurd hope that it would never go wrong.
But it does go wrong. So often, it’s hard to keep up. So often, we can’t keep pretending that punishment is a substitute for preventing predictable abuse in the first place.
As I, and others, have written about, big data research lacks the processes or institutions to ethically evaluate access to data or its use. Perhaps most concerning, big data and algorithmic research rarely distinguishes experimentation from live application— meaning that untested approaches are commonly put into practice, long before we have any idea what their effects are. Worse, there’s a growing body of literature proving that the people who suffer the most are those who are already vulnerable. Big data research now is where medical human subjects research was at the end of the 19th Century. For context, that’s when Bayer sold heroin to children.
But naming a problem isn’t the same thing as pointing to a way forward. And while there may be many, there’s already been quite a bit of excellent work, in- and outside of technology platform companies, showing us the way.
The way forward is independent, fiduciary governance of third-party data sharing between platforms and researchers. Here’s what that means, and might look like, in practice:
Let’s start with governance. At a recent conference at Santa Clara University, a number of the largest and most powerful technology companies presented the internal mechanisms they’ve built for a similarly complex problem: content moderation and safety. Namely, how do platforms decide what gets published on their platforms and who gets banned? And while there are a lot of nuances, what emerged was that almost every company has developed an internal governance mechanism — they have teams that set rules like legislatures, execute those rules, or outsource execution, like executive branches, and settle disputes when users raise them, like courts. The biggest companies, like Facebook and Google, have huge internal teams — and smaller companies, like Reddit and Wikimedia, empower users to take action themselves with clear governance processes. None of these mechanisms is working perfectly — or that these approaches don’t create their own problems. It is to say, though, that platforms are already building purpose-built governance systems to handle these nuanced and complicated challenges.
But we still have so many problems, how could they be working?
Firstly, they’re working a lot better than you think. The amount of horrific content — from murders to child pornography — that platforms already filter is so bad. Some companies employ as many as 10,000 people to sort through it — and we should all be grateful for them, and worried about their mental health and well-being.
Secondly, these systems are still wholly managed inside of companies. So, if limiting data access or taking legal responsibility for the use of shared data imposes cost or reduces profitability, then it can be hard to balance with the legal duty to shareholder profit. Even where well-intentioned — and many platform companies are well-intentioned — they’re not independent, which means they’re hard to believe. Many technology companies underestimate how hard trust is to earn — and that no matter what your intentions, if you don’t invite transparency, accountability, and agency into your systems, it’s almost impossible.
The third, and most important, answer is that we still need them to do better. As Harvard’s Jon Zittrain and Yale’s Jack Balkin suggest, it’s time to treat data collectors and technology platforms as the fiduciaries that they are. Legally, a fiduciary is someone that has a duty to protect the best interest of a person or client, in systems that have large power asymmetries. The most obvious examples are doctors and lawyers, but you may have encountered fiduciaries as insurance brokers, financial advisors, or real estate agents. Zittrain says it even more simply, platforms should have to “be who they say they are.” In some legal systems, fiduciary duties can be imposed constructively — meaning that you can prove that someone had a duty to you, even if there’s no explicit law forcing them to protect your best interests.
There are three, practical problems with regulatory and constructive approaches to fiduciary duty: (1) the specific rules, where they exist, vary by jurisdiction— and data collectors and platforms operate everywhere; (2) creating constructive fiduciary duties only happens after someone has violated one, meaning it makes the punishment more severe, but doesn’t prevent the offense in the first place; and (3) just about everyone is a data collector in 2018 — meaning that choosing one type of organization to regulate (i.e. social platforms), will end up being arbitrary and likely just shift risky behavior to other companies or sectors, like academia.
There is an easier way forward: Data Review Boards. Data Review Boards are easiest to build using legal trusts. Trusts are the legal tool that have been used to protect and govern shared resources for centuries. Data trusts are already becoming a prevalent way for government, industry, and civil society, and communities to protect and share data amongst themselves. Civic data trusts add a layer of governance, standardizing processes and helping data collectors, controllers, users, and subjects, balance their interests. Just as importantly, civic data trusts are privately written contracts — meaning they can be written at the speed of industry, used globally, volitional, built right now, and inclusive of a wide range of actors.
Data Review Boards, here, are an application of civic data trusts. They’re intended to be independent governance bodies, modeled on the in-house systems platform companies build for content moderation, applied to data sharing decisions. Like academia’s Internal Review Boards (IRBs), Data Review Boards would be independent, fiduciary governance structures that review, monitor, and enforce the ways that collectors share data with third parties. Importantly, and unlike IRBs or content moderation and safety teams, Data Review Boards would need to create leverage for data subjects, ideally giving legal ownership of data to platform users (like most platforms do now). Then, they could broker conditional licenses to third-party users — like academic researchers — based on an ethical review that includes both industry and data subjects.
Importantly, Data Review Boards could charge for doing this as a service, monitor data use, and enforce conditions on researchers. And, lastly, because trusts are legal infrastructure that already exist — they wouldn’t need to go through the intensive, and sometimes subjective, process of designing regulation. When framed as a service, Data Review Boards also become certifications — as has happened with any number of standards bodies — and create fiduciary duties to data subjects.
In this case, Facebook could have hired or built a Data Review Board to make decisions about data sharing with researchers, while creating real legal accountability and enforcement. That’s not to say that Data Review Boards would have easily prevented this — data sharing “for good” will be difficult to define for years to come — but that’s all the more reason to be building governance now. Those decisions shouldn’t be made inside single companies or governments. Active, independent, fiduciary governance is how we’ll be able to engage and make decisions together about our digital norms.
Research is meant to drive progress — and here, the abuse of the good will that Facebook affords to researchers may well drive quite a bit of it. But if we don’t seize this momentum to build governance mechanisms in data systems, then we’ll keep needing tireless reporters, outraged politicians, and massively inefficient regulatory infrastructure to have any hope for ethical data sharing. Instead of outrage, let’s build public interest governance and services.
Data Review Boards won’t fix everything — but they can fix a lot of things. If Facebook and Cambridge Analytica have taught us anything, it’s that there’s plenty of work to do.
Image by Stijn te Strake via Unsplash (CC BY 2.0)