11 January 2016

What's Wrong with this Picture?

The Ethics Gap in Data Visualization

What do ethics have to do with data visualization? Over the years, researchers and lawyers have come up with some rules and best practices to guide the proper collection and use of data, with particular attention on human subject research. Questions related to the collection of data go to the heart of what constitutes ethical research methods: did the subjects give informed consent for how their personal data would be used? Does using, collecting, or publishing this data put anyone at risk? Is the data appropriately protected or anonymized? The rules continue to evolve, and are not without gray areas and open questions, and many universities have review processes to provide guidance and make sure the critical ethical questions are raised. In fact, these ethical questions and review processes are required under U.S. law for research institutions receiving federal funding.

In contrast, ethical discussion and guidelines around data visualization, that rambunctious cousin of data, are less established. On January 15, 2016 organizers at the Responsible Data Forum will host a workshop with artists, activists, academics, and practitioners on hand to draw out a set of recommendations on ethics in data visualization and to distill a set of best practices.

During the past six months the need for clarity in structuring guidelines around data visualization has repeatedly surfaced up as colleagues and I at New York University have explored the intersection of data visualization and human rights. Our project is an innovative, cross-disciplinary collaboration between the Center for Human Rights and Global Justice and the Tandon School of Engineering with the goal of developing and testing data visualization tools, insights, and principles that will enhance the state of human rights advocacy.

The field of human rights is about making change: bringing us from an unacceptable state of atrocity and abuse to one of humane treatment, justice and dignity. To make that change one must convince others of the nature of the problem and its solution, which means that persuasion is key. An initial research project by the NYU team found that data presented in charts are more persuasive than data presented in tables, when viewers do not have strong opinions on the subject matter. A 2014 study by researchers at Cornell looking at the field of advertising found that even trivial graphs and formulas increase the persuasiveness of advertisements and consumers' belief in a product's efficacy. Decoding the means to persuade and get others on-side could have profound ramifications for human rights advocates, but it carries with it the responsibility to put this knowledge to use with care and attention to the potential ethical pitfalls.

One of these potential pitfalls is the possibility of deception. Our research group also tested a number of common visualization techniques that may (sometimes unintentionally) distort data. These techniques—some found in the publications of well-respected human rights organizations—turned out to be dangerously deceptive. Techniques such as truncated axis (where the y-axis does not start at zero) or using area to represent quantity (for instance comparing the size of two adjacent circles) were found to lead the viewer to inaccurate conclusions. Given the ease with which some visual devices can distort an underlying message, graphics producers have an ethical responsibility to pause and consider carefully how to design graphics that lead to accurate, faithful interpretations of the data and the subjects represented by the data. Of course, to do this they need the proper guidelines, and that's exactly what we hope to develop.

We've already taken some steps in this direction. Our group recently conducted a review of the use of satellite imagery to display before and after analyses in human rights reports. In times of war, bombings or mass migrations satellite images can provide insight into the human rights situation on the ground. For instance, researchers have used satellite images to identify large-scale destruction of housing and other infrastructure as a result of bombing or demolition. To develop a set of best practices for the design before and after images in human rights communications, we looked at label placement, annotation, color, and other attributes. An ethical tension arises when two satellite images of the same location at two different times have two different hues or colors spaces, perhaps resulting from different lighting, weather, or image capture methods. Using the unaltered images remains true to the image source data, while adjusting the color space to be more similar may viewers understand what they are being shown and comprehend the scale of the atrocities depicted.

While nearly all the maps we looked at included copyright and source information, one thing often lacking was the inclusion of geocoordinates that would allow others to confirm and reproduce findings. In addition to design recommendations we encourage the inclusion geocoordinates for satellite images in human rights reports. The bread and butter of human rights organizations is their credibility, so human rights research must be rigorous and its methods transparent. This applies to data visualization used in human rights work, as well. In a human rights context it is particularly important to link to source data and explain data collection, analysis, and visualization methodologies, especially given the likelihood that research will be closely scrutinized or even cross-examined in legal proceedings. Where possible, data visualization should not just indicate the source of the data, but also link to the source and even make it available to download. Protection of the subject and respect for the viewer should both be guiding principles for data visualization-and there may arise an ethical tension: balancing transparency with anonymization.

Looking at a sample of human rights reports from the last 15 years, our team found growth in the use of data and data visualization within human rights advocacy. We found that advocates increasingly used both quantitative and qualitative charts, graphs, and maps in human rights reports. The broader trend of "dataficiation" of human rights, however, runs against challenges including limited access, incomplete data, and the motives of powerful people and institutions who have an interest in hiding human rights abuse. For instance, in official data sets, marginalized groups may be undercounted as a result of policy, structural racism, or language barriers. Groups may intentionally opt out of participating in official census for fear of legal or other consequences or because of a history of abuses at the hands of state actors. And data about the most egregious violations-including torture, enforced disappearances, and summary executions-are often hidden or actively falsified by abusive governments. Groups providing services to those marginalized and abused may collect data but such convenience samples are inherently partial. As such, human rights data is nearly always incomplete and frequently biased. We can use statistical models and techniques to remove sampling bias some of the time. And we can use data as an abstraction to overcome discrimination and stereotypes by focusing the viewer on statistical facts. But data is an artifact of its collection. Even if data collection is automated, its parameters are shaped by human decision-making and assumptions. This is what makes the setting of parameters for the use of the resulting data so critical. For all these reasons and more, it is essential that data visualization come to the table with clear methodologies and rules of engagement, as well as better techniques for visualizing uncertainty and bias.

Even with solid data, data visualization should be principled and rigorous. Misleading, incomprehensible, or incredible data visualization can jeopardize people's trust, good will, or faith in research and advocacy on vital human rights issues. Right now, this is a movement in the making: the discourse around ethics in data visualization is still in its early days and there is much to hash out. The good news is that topics such as how to be inclusive, to incorporate accessibility, a to ensure the accountability of data visualization are rich for ethical examination and discussion.