Politics & society, Research, Technology & engineering

Thwarting disinformation, defending democracy — scholar sees a new approach

If social media are required to share their data with researchers, we could learn how dangerous propaganda goes viral, says Berkeley scholar

By Edward Lempinen

February 11, 2022

illustration featuring two views of the U.S. Capitol, mirror images in red and black

Huge data platforms such as Facebook and YouTube are having a corrosive effect on politics in the U.S. and Europe. Proposed laws would require them to open their data to researchers, and that could yield insights into the hidden systems that enable and promote disinformation. (Illustration by Neil Freese/UC Berkeley)

Facebook, YouTube, Twitter— in the space of barely a decade, these massive data platforms and others have transformed society. But each is like a black box: While they are blamed for undermining public health and eroding democracy, and while their profits mount to tens of billions of dollars every year, their innermost operations are largely hidden from view.

Now, however, lawmakers in the U.S. and Europe are advancing measures that would give researchers new rights to access and analyze data from these powerful platforms. That could be an essential step for understanding how they help to spread misinformation and disinformation that imperil the nation’s well-being, says technology scholar Brandie Nonnecke, director of the CITRIS Policy Lab at UC Berkeley.

In an analysis published today (Feb. 11) in the journal Science, Nonnecke and co-author Camille Carlton detail the political measures that would force Facebook and other platforms to open access onto the oceans of data they collect from billions of users. While the fate of the legislation is uncertain, Nonnecke said, the stakes are historic.

Brandie Nonnecke (Credit: Eric Nonnecke)

“These platforms are taking on an increasingly influential role in our social, economic and political interactions — with very little oversight,” Nonnecke said in an interview. “This is problematic. In the U.S., we saw after the election how platforms were used to manipulate public opinion and influence voting behavior.

“So, yes,” she added, “these bills are necessary to protect democracy — absolutely.”

Nonnecke is an influential scholar on information and communication technology, artificial intelligence and internet governance — and how all of these intersect with the public good. Her research has been published in top academic journals and widely featured in the news media, and she has consulted with policymakers in the U.S. and around the world.

Carlton is the communications manager at the Center for Humane Technology.

Ignoring risks in pursuit of profits

The proposed laws come at a time when criticism of the 21st century media titans is escalating across the political spectrum.

Investigations and news reports have detailed how a range of domestic and foreign actors used social media aggressively in an effort to manipulate the U.S. presidential elections in 2016 and 2020.

Then, last October, whistleblower Frances Haugen, a former data scientist at Facebook, testified in Congress that company leaders know their products sow political division and put children at risk, but have continued the practices because they’re massively profitable.

The workings of such influence rest on the use of advanced technology to manipulate readers and viewers. But a core concern, Nonnecke said, is how companies such as Facebook and YouTube (which is owned by Google) use “recommender systems” that assess users’ interests and steer them to content that’s provocative — but not necessarily reliable.

These systems “prioritize the sensational over the sensible because people engage with that type of content,” she explained. “We are hard-wired to rubberneck and focus on shocking content, and they know that it keeps eyes on screens.”

But how exactly do the companies work those processes?

“We have some idea, but much more research is needed,” Nonnecke said. “For example, as a researcher, it’s quite hard for me to see the virality of a misinformation and disinformation campaign.”

Some data on traffic and targeting is available by other legal means. “While some researchers have been able to peer into a platform’s recommender system through careful experiments and analysis, it’s difficult work,” she added. “It’s difficult for us to see how the recommender system has pushed forward a given narrative, or helped to make disinformation or misinformation go viral, and especially who’s behind it.

“As researchers, we’ve been asking for more access to platform data for years, knowing that, in order for us to understand the effects these platforms have on our society, we need to have access to that data.”

Using law to pry open the black box

The digital media giants haven’t blocked all data access. Nonnecke stressed that some platforms have made sustained efforts to build cooperative relationships with independent researchers. But the companies have at times released incomplete data, and much information remains off-limits.

a 2011 photo of a single building at Facebook's data center in Prineville, Oregon, a windowless facade lit by late-day sun

In 2011, Facebook’s still-new data center in the high-desert town of Prineville, Oregon, covered 150,000 square feet. The complex has grown massively since then to handle the data generated by Facebook’s global communities. By the end of 2023, Facebook will have 11 data centers there, covering more than 4.5 million square feet, at a cost of $2 billion. (Credit: Tom Raftery/Wikimedia Commons)

Various legislative efforts have been mounted to regulate the media platforms, thus far with little success. The proposed laws in Europe and the U.S. would take great strides to make the companies more transparent, requiring them to work with researchers who want to probe their inner workings.

“There needs to be accountability,” Nonnecke said, “and accountability comes from transparency. It comes from allowing researchers to access the data … in the spirit of establishing appropriate oversight and guidance by law.”

The proposed laws, she said, could be “transformative.”

Europe: The European Commission has already approved a “Code of Practice on Disinformation” to support researchers’ access to data. Now, the Digital Services Act — passed by the European Parliament and under review by member countries — would require the biggest online platforms to become active partners in the fight against disinformation.

They would be required to assess systemic social, economic and political risks that arise from their systems, then implement strategies to limit the risks. Their assessments and related data would be open to audit by government bodies and researchers.

Specifically, the proposed law would require large platforms to provide access to data related to the risks posed by their operations; that reflect the working and accuracy of algorithms that shape content recommendations; and that show the processes of moderating content and handling complaints.

The European bill appears to be on a trajectory to becoming law, Nonnecke said.

U.S.: Nonnecke and Carlton write that a bill introduced in Congress in December — with bipartisan support — is “the most comprehensive” ever proposed in the U.S. to require large platforms to make their data available for research and oversight.

Known as the “Platform Accountability and Transparency Act,” the measure sets a formal process by which the National Science Foundation (NSF) would evaluate proposed research that requires platform data. When projects are approved, the Federal Trade Commission (FTC) would work with the platforms to manage release of data to NSF-approved researchers.

The FTC also would have the authority to require the platforms to disclose data and other information that would help researchers, journalists and others to assess how the platforms might be harming society.

Controlling ‘the real, tangible, visible harms’

Both the European and U.S. measures seek to protect the identities of individual users, and both give companies mechanisms to protect some data related to trade secrets.

Right wing extremist at US Capitol wearing a mask

Many analysts have suggested that the violent insurrection at the U.S. Capitol on Jan. 6, 2021, was driven in part by disinformation about election fraud, COVID-19 vaccines and the workings of a shadowy deep state. (Photo courtesy of PBS/Frontline)

But Nonnecke flagged several possible shortcomings. A particular concern, she said, is that access might be limited to research institutions that have advanced infrastructure for managing the data or cybersecurity requirements. That could be a barrier to smaller or less affluent institutions, she said, and thereby undermine the essential need for diversity among researchers.

Second, for data to be of high value for scientific inquiry, platforms should include metadata and other contextual information, such as how data were cleaned, transformed or modified before being handed off to researchers. “This better ensures that the data and research insights will be of higher quality and accuracy,” Nonnecke said.

How Facebook, YouTube and other large platforms will react to these measures remains to be seen. They could welcome the laws for providing a rational process and clear procedures, Nonnecke said.

Or they might resist.

“By and large, the companies want to protect themselves, and they want to protect shareholder value,” she explained. “So, they don’t want to make transparent to the world the skeletons in their closet.”

Nonnecke sees some similarity with efforts to regulate tobacco use in the 1950s and ‘60s.

When confronted with scientific evidence that smoking causes cancer, Big Tobacco at first “denied, denied, denied,” she said. “Finally, Congress said, ‘No, the research is clear. We are putting these restrictions on you for the public’s health and well-being.’”

And yet, assessing the impact of big data platforms across society is more difficult, she said. “How do I do that with a platform that’s hard to evaluate? When I don’t understand its inner workings, and I don’t know how people are interacting with it?

“It’s clear that the effects are not the same as causing cancer,” she said, “but there are real, tangible, visible harms that platforms cause. … We see them, and we need more research to understand what’s happening and how to minimize the harms.”

The CITRIS Policy Lab is a research initiative of the Center for Information Technology Research in the Interest of Society and the Banatao Institute (CITRIS), a California Institute for Science and Innovation headquartered at UC Berkeley.