Since the launch of Donald Trump’s presidential campaign, reports of hate speech targeting various minority groups have risen dramatically. Although this surge is well-reported, it remains difficult to quantify the magnitude of the problem or even properly classify hate speech, let alone identify and measure its effects. Keyword searches and dictionary methods are often imprecise and overly blunt tools for detecting the nuance and complexity of hate speech. Without the tools to identify, quantify, and classify hate speech, we cannot even begin to consider how to address its causes and consequences.
Zeynep Tufekci, an associate professor at the UNC School of Information and Library Science and author of Twitter and Tear Gas: The Ecstatic, Fragile Politics of Networked Protest in the 21st Century, discusses hate speech research being conducted at UC Berkeley through the Social Sciences D-Lab, focusing on corporate responsibility and the importance of preserving free speech.
Today, when people say “AI,” they’re usually talking about machine learning, says Tufekci. Rather than instructing the computer what to do, she says, you feed the computer lots of data. It goes through all the data, then creates models that classify and do things with the data.
“The other day, I was trying to explain this to someone, and a metaphor that comes up is this,” she says. “If you have a maze where you put the ball and it just goes ping, ping, ping, like a Plinko thing, and the ball comes out this way or that way, and there’s this big maze. That’s like a neural network. It just sort of has these many, many, many layers, and then you put all the inputs in. That maze has been created from eating lots of data. And then you put your new input in, and it goes ping, ping, ping, ping, ping, and it says this or that. It does its classification. Except rather than being like a small visual thing you can see, we’re talking about like a million by a million thing. We’re talking about these giant matrices. We’re talking about things that are very, very big, and it turns out these things can actually work fairly well for tons of things we’re doing.
“But here’s the kicker. It works in a way that we do not understand, because we did not program it. We just fed it a lot of data of what worked, or we trained on, we labeled, and then it creates these new ways of classifying things.”
This talk was the keynote lecture for the spring 2019 Digital Humanities Fair, which showcases recent scholarship in the digital humanities and hosts a campuswide conversation on the state of the field. Learn more about the Digital Humanities at Berkeley.