Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.

“Troves of Personal Data, Forbidden to Researchers”

May 21st, 2012 / in policy, Research News / by Erwin Gianchandani

The New York Times has posted an interesting story to its website this evening — authored by John Markoff — describing researchers’ access to personal data collected by companies:

When scientists publish their research, they also make the underlying data available so the results can be verified by other scientists.


At least that is how the system is supposed to work. But lately social scientists have come up against an exception that is, true to its name, huge.


It is “big data,” the vast sets of information gathered by researchers at companies like Facebook, Google and Microsoft from patterns of cellphone calls, text messages and Internet clicks by millions of users around the world. Companies often refuse to make such information public, sometimes for competitive reasons and sometimes to protect customers’ privacy. But to many scientists, the practice is an invitation to bad science, secrecy and even potential fraud.


The issue came to a boil last month at a scientific conference in Lyon, France, when three scientists from Google and the University of Cambridge declined to release data they had compiled for a paper on the popularity of YouTube videos in different countries.


The chairman of the conference panel — Bernardo A. Huberman, a physicist who directs the social computing group at HP Labs here — responded angrily. In the future, he said, the conference should not accept papers from authors who did not make their data public. He was greeted by applause from the audience [more after the jump].


In February, Dr. Huberman had published a letter in the journal Nature [subscription required] warning that privately held data was threatening the very basis of scientific research. “If another set of data does not validate results obtained with private data,” he asked, “how do we know if it is because they are not universal or the authors made a mistake?”


He added that corporate control of data could give preferential access to an elite group of scientists at the largest corporations. “If this trend continues,” he wrote, “we’ll see a small group of scientists with access to private data repositories enjoy an unfair amount of attention in the community at the expense of equally talented researchers whose only flaw is the lack of right ‘connections’ to private data.”


Facebook and Microsoft declined to comment on the issue. Hal Varian, Google’s chief economist, said he sympathized with the idea of open data but added that the privacy issues were significant.


“This is one of the reasons the general pattern at Google is to try to release data to everyone or no one,” he said. “I have been working to get companies to release more data about their industries. The idea is that you can provide proprietary data aggregated in a way that poses no threats to privacy.”



Last year the National Science Foundation said that researchers who receive its funds would be “expected” to share data with other researchers.


Many scientists agree that this is as it should be.


“The obvious answer is that there needs to be more access to data,” said Alex Pentland, director of the Human Dynamics Laboratory at MIT. “That is beginning to happen as governments and industry realize that they need to better understand the promise and limits of big data; for instance, we will be announcing a huge, multicountry release of phone data soon.”

Read the full article here. And once you have, please take a moment to share your thoughts in the space below.

(Contributed by Erwin Gianchandani, CCC Director)

“Troves of Personal Data, Forbidden to Researchers”