Berkeley Talks: The erosion of shared reality in the age of deepfakes
UC Berkeley digital forensics expert Hany Farid explores how generative AI and real-time deepfakes are undermining public trust and reshaping our ability to discern truth online.
Screenshot from lecture
April 6, 2026
Follow Berkeley Talks, a Berkeley News podcast that features lectures and conversations at UC Berkeley. See all Berkeley Talks.
Two decades ago, when Hany Farid first began studying digital misinformation and manipulated media, fake content was easier to detect. Today, that landscape has shifted with a speed that he describes as “breathtaking.” In just the last year or two, he says, we’ve moved from an era where a computer takes seconds or minutes to produce a static file to “full-blown interactive deepfakes” that can hold a live conversation in real time.
In this Berkeley Talks episode, Farid, a digital forensics expert and professor at UC Berkeley’s School of Information, discusses the rapidly accelerating landscape of generative AI and the unique threat it poses to our collective understanding of the world.
Farid notes that tools once reserved for governments or well-funded organizations are now freely available, radically expanding the threat landscape. “We have taken a mechanism that was in the hands of state-sponsored actors and bad actors and given it to 8 billion people in the world,” he says. This democratization of powerful technology makes it much easier to create convincing false images, audio and video — and much harder to trust what we see online.
And he explains that human perception is no longer a reliable defense, as his research shows people are only slightly better than chance at identifying AI-generated content.
To reduce the damage to our shared sense of reality, Farid suggests solutions should focus on the systems that profit from harmful content, including platforms and ad networks that help it spread. He also gives a warning about news consumption: “Stop getting your news from social media. That’s not what it was designed for.
Despite the rise of deepfakes and online deception, Farid says he rejects the idea that there is no truth or fact. He believes that, although it takes effort, people can still work together to understand what is happening in the world.
This lecture, which took place on March 13, was part of LNS 110: Brilliance of Berkeley, a course featuring distinguished researchers working on the world’s most pressing issues.
Watch Farid’s presentation (with slides) on YouTube.
(Music: “No One Is Perfect” by HoliznaCC0)
Anne Brice (intro): This is Berkeley Talks, a UC Berkeley News podcast from Strategic Communications at Berkeley. You can follow Berkeley Talks wherever you listen to your podcasts. We’re also on YouTube @BerkeleyNews. New episodes come out every other Friday. You can find all of our podcast episodes, with transcripts and photos, on UC Berkeley News at news.berkeley.edu/podcasts.
(Music fades out)
Chrystal Chern: All right, let’s get started. Who is excited to talk about deepfakes today? All right.
Professor Hany Farid is a professor at UC Berkeley in the School of Information. He’s also the co-founder and chief science officer at GetReal Security. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.
Over the past two decades, Farid has been a leading figure in the development of techniques used to detect manipulated images, videos and audio. His work has helped advance the scientific foundations of media authentication and has been applied in journalism, law enforcement and national security contexts. Farid frequently collaborates with technology companies, governments and media organizations to combat online misinformation and develop tools for identifying AI-generated or altered media. Please welcome Professor Hany Farid.
Hany Farid: This falls under the category of, now for something completely different that has nothing to do with proteins, except for my brother being our common bond here.
OK. I think everybody’s heard about deepfakes. You’ve all heard about generative AI. You all know what’s going on. I just want to level set a little bit and then talk about what I think is the landscape with respect to this content, with respect to us as individuals, societies, democracies, economies, and then what does the future look like to the extent that we can see a little bit into the future?
So first of all, we have been talking about deepfake/generative AI for about 10 years, which seemed hard to imagine, but the term first started bubbling up in around 2015, but it was really quite nascent for the first eight years. And it’s really only in the last two years that you’ve seen really, really dramatic changes.
So broadly speaking, deepfake refers to fully AI-generated, machine-generated content, language, images, audio and video. I don’t deal with language, so I’m going to put that aside for a little bit, but you’ve all been using your favorite large language model. We’re going to talk about multimedia.
This, for example, is an image generated using a prompt. You literally type a young woman making a heart with their hands, and this is the image that came out 100 fully percent AI-generated, no Photoshop, no graphics, nothing. It takes nothing to do this other than a laptop and an internet connection. And most of these are free to generate. So the day of the six fingers and the weird faces and the hair, all of that is over. These images are hyper, hyper-realistic. And I’ll tell you in a little bit how realistic we think it is.
Video, just I would say in the last six to nine months, has started to move through what we call the uncanny valley. That is, it’s so realistic you can’t tell the difference. I’m going to play for you of this video. This video was also 100% fully AI generated with a single text prompt, a YouTuber filming from their bedroom. So what you’re about to see, the audio, the video, everything is fully generated in seconds on a computer. So let me go play this and then we’ll talk about it.
(Video plays)
AI-generated person: What’s going on, everybody? Welcome back to another video. Today, we’re going to be talking about something a little bit different.
Hany Farid: That’s insane that you can do that with a text prompt. If you had asked me a year ago, two years ago, I’d be like, “No, not for another 5, 10 years.” And I would have been wrong. I’m pretty consistently wrong about, by the way, when these things are coming. We know they’re coming, but they are accelerating at a pace that is unbelievable.
Now, the two things I showed you, the image and the video, were what we call an offline process. You type into the computer, it generates something, takes about a few seconds and then you get it back and then you can do whatever you want with that file. What we’ve started to see in the last few months is full-blown interactive deepfakes.
OK. So I’m going to play you a little video, which is a clip of me talking to what, I guess, we would probably call these days an AI agent.
So let me just set up the stage for what’s happening. So this is a commercial service. You connect through something that looks like Zoom. You can see that here. I’m talking, this is me and talking in the bottom. And in real time, the agent is listening to me, forming a response, generating a voice, and then synthesizing the video to speak back to me.
I’m going to play for you a little clip of it, but just for this demo purposes, we are playing 20 questions. OK? And you’re just going to hear a little snippet of it. And just understand that everything you see in the top there is fully AI generated. The text, the audio, the video in real time. I’m literally just recording my screen. OK. Let me play this.
(Video plays)
Hany Farid (in the video): Are they a famous actress?
AI agent: No. The person I’m thinking of is not a famous actress, but they are a pretty well-known figure in popular culture. Keep trying.
Hany Farid (in the video): Are they a musician?
AI agent: No. The person I’m thinking of is not a musician. You’re not even close. Try again.
Hany Farid (in the video): Are they a scientist?
Hany Farid: Getting snarky.
AI agent: No, the person I’m thinking of is not a scientist.
Hany Farid: One more.
AI agent: Think more like entertainment. Try again.
Hany Farid (in the video): Are they a musician?
AI agent: I already told you they’re not a musician, pay attention. You’re repeating yourself. Try something new.
Hany Farid: You’re repeating … First of all, if you’re going to get yelled at by an AI agent, they better have a British accent. That’s amazing. By the way, I did that on purpose. I wanted to see if she had memory and she did and I got scolded.
And by the way, I never figured it out. I was so distracted by the agent. It is amazing. This is one of half a dozen services now that you’re going to get on video calls with customer support, your doctor, your lawyer, and it’s going to be a full-blown AI agent. And you saw the delay was maybe half a second or so. It’s incredible.
It was a little muted because of the way I just recorded the screen, but it blows your mind that this is on here today. I mean, it’s not going anywhere. OK. So that’s sort of state of the art.
Things are really, really good and only accelerating. Only accelerating. Every few weeks, we see advances in image, audio, video, and now agentic AI. So this train has left the station. There is no slowing down. This is sort of our new reality and we’ll talk about what the implications are in a minute.
Now, the first thing you want to ask yourself, because I get this question a lot as well, I look at stuff all the time. I think I can tell the difference. I know when I’m looking at an image that’s AI generated or a video or a voice or a person that I’m talking to, you don’t. You’re actually quite bad at it. The only thing you’re good at is having a lot of confidence in it, but it turns out you’re quite bad at it. And let me tell you how bad you are.
So in addition to developing mathematical and computational techniques that I’ll be talking about, we do perceptual studies in my lab. We show people content and we ask them to judge real, fake, AI, or not.
So this is a study that we did about two and a half years ago with Sophie Nightingale, who was then a postdoc in my lab, now a professor at Liverpool in the UK. We showed people half the images were like the ones on the top. Those are all fake faces. Half of the images were real. We told them half were fake, half were real. We set them in a lab setting. We set them up for success. We told them this is a task you are doing and their average accuracy was 55 and 65. Chance if you’re flipping a coin was 50%. They’re basically slightly better than chance. OK? So it’s over and these images were generated two and a half years ago. So this has only gotten harder at this point.
With voice, this was a joint study with Emily Cooper, who’s on the faculty here. And Sarah Barrington, who’s a Ph.D. student in my lab. Same thing. You listen to voices, half real, half fake. We tell you what you’re going to do. We set you up for success. We give you training. We have you listen to some examples. Chances, again, 50%. Your accuracy is somewhere between 60 and 65%, slightly better than chance. It’s over.
And this is about a year ago now, so this has only improved. Full-blown video. This study is in review right now. We had half the videos were real like this. Half of them were AI generated with the same underlying content. So barbershop, barbershop, a brook, water moving in a brook, water … The content was exactly the same. And here again, chance is 50%.
And here you’re a little bit better. You’re like 62, 68. But give it eight weeks, 12 weeks, and it’ll be over. You’re just really bad at this. I mean, sorry. Sorry, that sounds really judgmental. It’s just we’re bad at it. Our visual system just has not evolved to do this task well. And we have to face that reality. And this is nothing. This is nothing compared to if you’re on social media.
Our visual system didn’t adapt to this. Millions of years of evolution, it’s not like we had pressure to tell you whether something was real or not. This is not something we’re equipped to. And it is nothing compared to you being on social media and doomscrolling and seeing things that are emotionally charged and politically charged and partisan. Well, you’re way worse than this.
So this is our new reality. The things that you read, see, and hear online, you’re having a really hard time knowing if they’re real or not. So think the current conflict in Iran, think Venezuela, think Minneapolis, think every major conflict in the last few years, a lot of what you are seeing online is just not real. And that’s a problem.
So let’s talk about the weaponization of this technology. There are positive implications of this. There’s no question about it. I wouldn’t deny that. But there is real weaponization of this.
So if you’d been reading the news over the last few months, you have seen that Grok AI was allowing people to create what’s called non-consensual AI images, where they would take images, primarily of women and children, and nudify them. That is, render them without clothes. And it was doing that on their services, and then of course hosting those images on X. And it is brutally bad. It is ugly, ugly part. In fact, it’s where the term deepfakes came from. It was a moniker of a Reddit user who used some of the early technology to do this. And this is everywhere. School kids are doing this. Adults are doing this.
Cyber criminals are doing this. They’re using to extort children. They’re using it to weaponize against women — extort them, embarrass them, humiliate them and drive them off platforms. And it is rampant. It is rampant around the world and it is awful. It is awful thing happening to individuals. And the companies are just profiting by this and doing nothing about it.
About two and a half years ago, about 20 minutes after the U.S. stock market opened on the East Coast, somebody posted a fake image of the Pentagon being bombed. And it wasn’t a very good fake, by the way. It was two and a half years ago, so the images were just so-so. In about 90 seconds, the stock market dropped a half a trillion dollars in 90 seconds before people figured out we weren’t under attack and it had recovered.
By the way, there’s pretty good evidence that there was several things going on here. There was the AI image that was posted that made it look like we were being attacked. And then humans started responding to that. And then the AI trading bots responded to the humans and there was this massive half trillion dollar selloff before people figured out what’s going on. And now I don’t know if somebody was trying to manipulate the market, but you damn well know somebody noticed that and you are going to see market manipulations around earnings calls, around IPOs, around things like this. So there is a potential huge, massive impact to our economy.
You heard, I’m a chief science officer over at GetReal. I talked to executives at Fortune 500 companies almost every day. And every day, I hear the same thing, is that they are losing millions, tens of millions of dollars to scams that are starting to hit the enterprise. Not just your parents and your grandparents, those stupid phishing scams that everybody gets. This is one where somebody was on a call with who they thought was their chief financial officer. It was a completely AI-generated person, impersonating the face and the voice. They were closing a $25 million deal. They sent them the wire instructions and $25 million went bye-bye in about 10 seconds.
This is not the first time. It is not the last time. And by the way, for every one of these that you see in the newspaper, I can tell you there’s 10 of them that you don’t because people don’t want to talk about it. It’s embarrassing.
So the enterprise is getting attacked at a scale that they have not seen before. This is an amazing thing. The FBI has now, over the last two years, released three reports about this. A lot of you are probably applying for jobs. And I guarantee you, all of you are doing them over Zoom, and in many cases, you’re doing them right away with agentic AI. You’re not even talking to a human in the beginning. And what’s starting to happen is that people are scamming this process.
So for example, the North Koreans drive their entire economy with fake IT workers. They have North Koreans applying for jobs here in the U.S., competing with you, by the way. They are masking their voices, masking their faces, masking their location. They’re penetrating these companies. In many cases, they’re just good workers collecting paychecks and giving it to the North Korean government.
In many cases, they’re installing malware. In many cases, they’re installing viruses. And in many cases, they’re stealing IP. We talked to a U.S. defense contractor who had five North Korean, a U.S. defense contractor. Every single Fortune 5 company has done this.
There’s also these bait-and-switches, where the people will hire somebody to do an interview for them, and then that person gets a job and somebody else shows up to work. It’s a bait and switch. Nobody’s connecting the dots. So the enterprise is now getting attacked on human resources. Well, by the way, don’t get any ideas about that, I just saw somebody smiling back there. “That’s a good idea.” This isn’t an instruction manual.
This is geopolitics. In the early days of the Russian invasion of Ukraine, the mayors of Madrid, Berlin and Vienna were on a call with who they thought was the mayor of Kyiv. It wasn’t. It was the Russians and they were extracting information about NATO movements. You’re now getting on real-time video calls. This has happened in the U.S., members of the Senate foreign relations. Members of this White House have been on calls with Russians and they thought they were talking to somebody else. So this is now has geopolitical implications as well.
So this is our reality. Think about what I just enumerated — individuals, enterprises, societies, democracies. And I didn’t even mention the insane disinformation that is spreading online around elections, around conflicts, around protests, around candidates. This is our new reality. We don’t have a shared reality is our new reality. We are living in alternate realities, and that is exceedingly dangerous for us as individuals, institutions, societies and democracies.
So what do we do about it? I mean, there are days where I just think, “Well, the internet was an interesting experiment. Let’s just turn it off and let’s get on with our lives.” But I don’t think that’s going to happen.
And so what we spend most of our time doing is developing techniques that can help news agencies, law enforcement, national security and enterprise figure out what the hell is going on in their world. And I want to just give you a sampling of how we do this. It’s not going to be obviously an exhaustive list of things.
And one of the things that you have to understand about generative AI is that it is very, very good. You saw those images and videos. It is very good, but it actually doesn’t know about the physical world. It’s purely a statistical inference engine. It’s just extracting statistics from billions and billions of uploads, many of them from people in this room.
If you’ve uploaded anything online, it’s learning from you. But it doesn’t know about physics. It doesn’t know about geometry. It doesn’t know about cameras. It doesn’t know about optics. It doesn’t know anything except statistics. And turns out statistics are pretty powerful. So a big part of how we think about analyzing content is to look for deviations from physical reality, things that simply don’t make sense physically.
Let me give you a couple of examples of that. OK. Anybody who’s taken an art class knows what I’m about to tell you if you’ve learned about two-point perspective or three-point perspective. So what I’m showing you here is an image of a flat surface, the floor, and on that floor is a tiles. And those tiles, of course, if I was looking at them the way I’m looking down right now, they would be square. But of course, when I image them, they’re not square.
They have these sort of trapezoidal shapes. Why? Because of the linear perspective. And the reason is that when things are further away from us, whether that’s a camera or this thing on my face, my eyes, things are smaller relative to their distance. Oliver in the back of the room has a head that’s very small relative to this. It’s not because his head is actually small. Sorry, Oliver. He’s sort of my boss, too. I can’t believe I’m doing this. It’s because he’s far away from me, but my brain’s not confused by that. My brain understands perspective projection.
So if I take these parallel lines and I annotate them and extend them outwards, they intersect at a single point. It’s called a vanishing point. And that’s just a physical property of the physical world and linear perspective. And beautifully, mathematically beautifully, if I even take lines on parallel surfaces that are in the physical world parallel and I draw lines, they intersect at a common vanishing point.
And that has to be true for any manmade’s object that is both flat and has parallel lines in it. Just a really nice, simple geometric property that we’ve known about since Renaissance painters.
Now, again, generative AI is statistical in nature and it doesn’t always get these things right, but your brain doesn’t notice. So this is a case we worked on a few months ago. This was before we invaded Iran, by the way. So this was purportedly showing an Iranian nuclear facility. I’m showing you four frames of it being bombed. And I’m not playing the video because it’s a little gross, so I’m just going to show you the four frames. And we love manmade objects because manmade objects are extremely regular. So here is one frame of the video, and you can see that I’ve annotated one, two, three, four lines, and you can see that when you extend them outwards, they don’t intersect.
There’s a physical implausibility. And we can do a perturbation analysis to ask, well, if we’re off by one pixel, it doesn’t matter. And so we can ask: How likely is this to happen within the constraints?” And then when you find these anomalies, it’s game over, right? Something’s wrong. Something’s wrong with these things. And now we can go back to the news outlet and tell them that it’s not real.
OK. Second technique. This is one of my favorite ones because I love shadows. I’m obsessed with them. Even when I walk outside, I always notice where shadows are. So if you go outside now, it’s a sunny day and stand outside, look at where your shadow is. And if it’s being cast over here, you know where the sun is. It’s on the other direction. Why? Because if I take a point on the shadow and connect it to the corresponding point on the object and keep going, I must intersect the light source.
That’s the very definition of a shadow. And there’s a very simple constraint in the physical 3D world. Literally you can drive … And it doesn’t matter if the sun is infinitely far away, like the sun. It doesn’t matter if the light is anywhere, and it doesn’t matter what the shadow is being cast onto.
Now, in the physical 3D world, I have a line that constrains three things. Point on a shadow, point on an object light source. Now, take an image of that virtual line, if you will, and it remains a line. Why? Because it’s a linear perspective, unless I’ve got some weird lens distortion in my camera. So in the image, I also have a constraint that tells me the relationship between a shadow and object and a light source. Now, if the object doesn’t have that kind of distinctive feature, what do I do?
I have this potato-like thing here. What do I do with that? So there’s a point on a shadow here. I know that there’s a point on an object that it corresponds to, but I don’t know what it is, but I know it’s somewhere on that object. So here I can draw this wedge-shaped constraint that says the light must be somewhere in that region. I know that. I don’t know exactly where it is, but it’s somewhere in there.
Now, if I’m outdoors and I take a photo and there’s the single dominant light source like the sun, what do I know? Everybody’s shadow better be consistent with that dominant light source. That’s the physical constraint. And if it’s not, something is wrong. Something is physically implausible. OK?
All right. So let me give you an example of that. I can’t show you the image I actually did for this because it was classified, but I’ll give you an example of that.
So this is an image of a bunch of soldiers walking down a path that I generated myself and it looks great, right? You don’t really see anything obvious. They’ve all got a shadow. In fact, this shadow is particularly beautiful here under the feet. You see all the soldiers have shadows, everything looks fine.
But if you start going in and analyzing those shadows very carefully, here’s one constraint, two constraints, three constraints, and suddenly there’s multiple light sources in here and something is wrong, right? Unless they’re on a different planet with more than one sun, which I suppose is possible, but I don’t care about that. Something is physically implausible. Yep. OK. All right. So those are two image things.
Let’s talk about video for a little bit. And I promised Oliver I’d leave time for questions, 1:30 p.m. OK. I think we’re good. All right. So this was a video made by Jordan Peele, who I just adore, and it was made, God, maybe five years ago. So really early on. It’s not the best deepfake, but I’m going to show it for a second to make a point. So go ahead and watch this and then we’ll talk about what it is.
(Video plays)
Deepfake of Obama: Now, you see, I would never say these things, at least not in a public address, but someone else would. Someone like Jordan Peele. This is a dangerous time. Moving forward, we need to be more vigilant with what we trust from the internet.
Hany Farid: OK. So first he does a really good Obama. You got to be impressed by that. So this is what’s called a lip-sync deepfake. So what Jordan Peele did is he’s speaking in Obama’s voice, of course, but that could have been now AI generated today. You couldn’t have done that when he made this. And that’s an actual video of President Obama talking, and that’s a lip sync deepfake where they replace just his mouth to be consistent with the new audio track. Think about, by the way, the power in that. You can take anybody’s video and make them say anything you want them to say, which is pretty striking.
So these things are quite common, and what’s so powerful about them is the only thing that’s manipulated is the mouth. Everything else is real, right? This is very localized manipulation. So this is work of Sarah Barrington, who you can actually see right there, and Maty Bohacek, who’s a student down at Stanford who works in our lab.
And this is a great story, by the way. I like telling this story to students, because this is how science is actually done. I was giving this talk, and I showed that Obama video, and there was a woman in the audience who was a lip-reader for the hearing impaired, and she came up to me afterwards. She said, “You know if you don’t listen to the sound, the mouth is moving wrong. It’s wrong. If I’m lip-reading, he’s not saying the words I’m hearing.” I’m like, “Oh, that’s really cool because I wouldn’t have noticed that because I’m not reading lips. I’m just listening to it.” And then we had this idea is, “Well, what if the mouth is not moving properly?” And that seemed like a really cool signal. So here’s what we do. We take the video, we throw away the audio and we do lip-reading, automatic lip-reading.
This is a computational technique. We didn’t invent it. It came out of the computer vision literature. And then we take the audio and we transcribe it, audio to text. So here, Sarah up there is saying, “I just think it’s a really feel good and excellent piece of cinema,” responding to what her favorite movie is.
But when we read her lips, what she’s saying is, “I just had its bread roll. It’s your preserve about the media.” It’s just complete nonsense, gibberish. And you don’t notice. Your brain doesn’t notice because you’re doing sensory integration. You see the mouth, you see the … And by the way, you know this is true because if you ever watched … If you were like me and you grew up on Sesame Street when the puppets talk, they just sort of move their mouth like this and it looks fine.
It’s not like we’re that discriminating in how the mouth is moving. We don’t care. So what we do is we simply take these two transcriptions, we measure the distance to them using a standard mathematical techniques. Fake audios are really different. Real audios are … I got it backwards. The distance between real is small, the distance between fake ones is high, and we just split the baby down the middle, right? We just see these differences. The mouth is not saying what the ear is hearing. Really, really simple technique. And you don’t notice it. You just don’t notice it, but we can measure it.
Now, as we were playing around with these things with Obama, we started noticing things. So we were looking at a lot of videos of him and we started thinking about, could we build a more specific deepfake detector that was not anybody talking or any image or any video?
What if we could start learning characteristics of individuals, people like Obama, world leaders, CEOs, or let’s agree when they say something, implications are different than somebody like me says something. So I’m going to play you a series of very short clips of President Obama in the opening moments of his weekly address, his video address. See if you notice anything.
(Video plays)
Deepfake of Obama: Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody. Hi, everybody.
Hany Farid: Those are all different. And every single video, he does the same thing: “Hey, everybody.” He just this little head bob backwards. It’s just like this little tick. We all do it, by the way. We all do it when we talk. We have different ones. By the way, I noticed when I talk to my brother, we have the same ones. That’s a little weird, but not surprising either. But we all have these specific mannerisms that we have. And we thought it’d be really cool if we could learn those.
And for people like Obama and CEOs and prime ministers and kings and queens, there’s a lot of footage of them. It’s not hard to find. And so here’s an example of that. On the top is an example of an authentic Obama video. And what I’m measuring here, a horizontal axis is time and vertical axis is two things.
Head rotation up and down like this, and then whether he’s smiling or frowning in orange. So those are the two plots. What do you notice? They’re correlated. So Obama has this other tick. When he smiles, he tilts his head up a little bit. And when he frowns, he tilts his head down a little bit, which I love. And you can see it. You can see it in his videos.
Now, think about the lip-sync deepfake that I showed you from Jordan Peele a few minutes ago. What’s the difference there? The mouth is doing what the fake is telling it to do. The head is doing what’s in the original. I’ve created a chimera, right? These things don’t know about each other and they’re decorrelated. Something’s wrong. This is not Obama. This is not a pattern consistent with him. And so we can learn these distinct patterns for individuals by making measurements of how the eyes move, how the mouth moves, all these different, what are called action units.
This young man right here worked on this last summer, I think, right? Charlie worked on this last summer for us. And then we can cluster them. So we’ve gotten these mannerisms for Obama, O’Rourke, Booker, Biden, go down the list here. And what’s important here is that all the Obama videos are here. Everybody else is somewhere else. And then interestingly, the deepfakes are here. This is not actually a deepfake Obama detector. This is an Obama detector. I don’t care if you’re deepfake Obama or Booker or Harris or Buttigieg, you’re not Obama.
And so we learn what’s called a one-class model that just says, this is the person, there is Obama. And it’s not face biometrics and it’s not voice biometrics; it’s mannerisms. How you talk and how you express yourself. And we’ve actually been able to also introduce hand motions into this as well because people talk with their hands in very distinct ways.
So since I have a couple minutes … No, I’m going to skip this. I’m going to skip this. This is weird and don’t do it. And that’s the short answer. Don’t ask Grok. I’m going to condense this into five seconds. Don’t ask Grok if an image or a video is real or not. It doesn’t know. It’s unbelievably stupid. In fact, don’t use Grok at all. OK.
So I want to just finish with responsibility. I get this question a lot, which is, “We’re entering a weird time where this technology is being weaponized against us. You can’t deny that. Who’s responsible?” And the answer is: everybody is.
So let’s start with the people in the room. We are responsible. Stop, for the love of God, getting your news and information from social media. This is not what it was designed for. It’s not good at it.And you’re being lied to on a regular basis. This is not a place to spend your time. In fact, just get off of social media. It’s so bad for you. Overwhelming evidence that it’s bad for your mental health and physical health. And I swear to God, it drops your IQ 20 points. So we have a responsibility. Fake news works because we click on it and we share it. We are part of the problem whether we want to admit it or not. So that’s on us.
Now, working upstream, the social media platforms, the Facebooks, the TikToks, the YouTubes, the Instagrams, all of those platforms have taken an incredibly cavalier attitude to online safety for the last 25 years and profited handsomely for it. They have a responsibility to make their platform safer. The AI companies that are commercializing and monetizing these tools are not doing enough to keep them safe.
They are allowing their tools to create non-consensual intimate imagery. They’re allowing it to create child sexual abuse material. They’re allowing it to create fake explosions in Tehran. They’re allowing it to create fake videos of presidents talking, and they know that they’re doing it and they’re profiting from it and they’re not doing enough. That’s their job.
And then of course, upstream is regulators. Our regulators have to do better. Our congressmen, our congresswomen, our presidents, our leaders have to do better. This is not a partisan statement. I’m not particularly hopeful where we are in this country or not with leadership. This is a White House that has made clear they have no interest in regulating AI, but there is some leadership coming out of the UK, coming out of the EU, coming out of Australia. And I’m hoping that we will eventually absorb some of that, but we need regulation at the highest levels.
And here’s how I know that, because when it comes to things like cars and airplanes and medicines and foods and pharmaceuticals and everything that we buy physically, we have incredibly high standards for safety. But somehow we’ve let Silicon Valley off the hook for 25 years. The game’s over. There is no more online and offline world. There’s a world and it has real consequences. And when we don’t regulate Silicon Valley, we have to do better. OK. I’m going to stop there and I’m happy to take some questions. Thanks, everybody.
(Applause)
Chrystal Chern: Wonderful. Questions.
Audience 1: That was fantastic. So my question is, how easy would it be to fix the images to overcome all of the … I mean, so in some sense, it’s like warfare …
Hany Farid: I got it. Yeah. Great question. It’s absolutely 100% the right question, which I’ve just told you a bunch of stuff we do isn’t the adversary going to make it better.
OK. Two things. First of all, I haven’t told you everything we do. I’m not stupid. OK, good. So we have to hold that. So we do things that we don’t talk about. But now let’s get to your question. It’s a two-part answer.
One is that it’s not actually that easy because this is 3D physics and 3D geometry, and these things are inherently 2D. When it’s rendering these things, it doesn’t know about the 3D world. So it would have to reason about the 3D world and put all of those things in. And that’s very, very hard to do, number one.
Number two, there’s actually not a lot of incentive. And here’s why. Because say what you will about OpenAI and Sam Altman and Midjourney and Gemini, they’re not trying to defeat me. They’re not actually my adversary. This isn’t like malware and ransomware and spamware — they really are adversarial. They’re just trying to make pretty pictures. And if the physics are wrong and the geometry’s wrong, they don’t actually care. As long as your brain doesn’t care, they don’t care. So there’s not a lot of incentive. And even if there was, it’s incredibly difficult because they have to do full-blown 3D rendering.
Now, is it possible that we’ll get there? Sure. And that’s why we have hold backs and that’s why we do a lot of things that we don’t talk about. But then you’ve got to get everything right. You got to get every single pixel and the statistics and the geom … I mean, now it’s getting hard, but is it possible? Sure.
I did a panel the other day where somebody was telling me, “I don’t understand why you work on these problems. You’re going to get defeated.” And I said, “Well, let me ask you this. Did you lock your front door when you left the house this morning?” And he said, “Yes.” And I’m like, “Well, then shut the hell up about it.” Because people can pick locks, you can batter doors. We do things that give reasonable precautions. Yeah? OK.
Audience 1: I have one follow-up.
Hany Farid: Please.
Audience 1: So do you imagine a world in which I can put on a pair of glasses that will immediately tell me whether what I’m looking at is real or fake?
Hany Farid: No. And here’s why. Well, not in my lifetime because we can’t operate at that scale.
If you think about the billions and billions and billions of uploads to do that, I mean, I forget about the glasses, even a plugin on your browser, the scale at which that has to operate, the computational demands and the accuracy that you would demand is so unbelievably high. So I don’t see this yet coming down to consumers level in the near future, which is why we work with AP and Reuters and Agence France and all the major news outlets, because the way we should get information is not to become investigative reporters. We are not capable of doing that. I know a lot of really, really smart investigative reporters, we can’t do it. So we rely on people who do that job really well, who talk to me. I think that’s the mechanism for getting the truth. I’ll let you pick who. There’s one here and one here.
Audience 2: Thank you very much. That was deeply interesting.
Hany Farid: I thought you were going to say disturbing, but OK.
Audience 2: Both. All of the above. Kind of bouncing off my question and off of your answer, this ultimately sounds kind of like a problem and drawing an internal parallel to what’s going on in Iran with attributable drones versus high value systems. It seems like a very similar problem where the creation of deepfakes is easy, cheap, fast, reliable, and the detection of deepfakes is costly, expensive, and difficult. So yeah, we can make sure that videos of Obama and Trump and Buttigieg are real, but if someone uploads a video of me saying something on Twitter, no one’s going to care …
Hany Farid: Yeah, you’re screwed.
Audience 2: … except for the people who know me, right?
Hany Farid: Yeah.
Audience 2: So is there a solution, do you think, in the near future to this attributable warfare or are we just screwed?
Hany Farid: We’re pretty screwed, but let me try to give you a little bit more hope. So first of all, you’re 100% right, mis- and disinformation are cheap and reliable information is expensive, and that’s the reality of our world, right? And that’s why social media is littered with false information, plus you can make a lot of money by lying online, and that’s what people are doing.
So now let’s get to your question. So everything I’ve talked to you, we call passive forensics or reactive. We wait. So I wake up every morning, I’ve got a flood of email from every major news outlet and we’re just like, “All right, what the hell’s going on? Is this a real drone? Is this an attack?” But we’re all responding after the fact, right? And to the question earlier is: Can that work at scale? And the answer is: Not really. Right?We’ve got to pick our battles.
So there’s a whole other effort that’s called active forensics. And the way this works is there’s an effort called the C2PA, the Coalition for Content Provenance and Authenticity. And the way it works is that this isn’t true now. There’s a couple of cameras that do this, but the idea would be when you pick up this phone to record police misconduct, human rights violation, a drone attack, whatever, this device will authenticate for you. So on chip, cryptographically signed, it will say, “OK, it is March 13 at 1:45 p.m. in Berkeley, California.” Maybe even my identity, if I want to give that up, here’s what has been recorded. I’m going to cryptographically sign all that, maybe put that on a blockchain on a centralized ledger so nobody can manipulate it. And then when I share this with the world or with a news outlet or with law enforcement, it can authenticate it.
That will work at scale. And if it really does get deployed to the tune of billions of devices, then when you don’t have that signature, you’re like, “All right, I don’t trust this.” So I think that could work, but it requires a phenomenal infrastructure.
Now the good news is there’s some effort. So Leica has a camera, Sony has a camera, all high-end cameras, not these things that are in our pocket. So if Apple and Samsung overnight decided to do it, anybody recording anything, we’d be able to authenticate very, very quickly. Not a lot of financial incentives for them to do it, by the way. So that’s sort of where the rub is.
But I think that there’s at least a technology that I could envision that could work to your question at scale, but the standard’s been around for about five, six years now and the penetration is very, very low. But I think as we get more desperate and more in need of it, I think we’ll start to see an uptick. Yep. OK.
Chrystal Chern: Next question.
Audience 3: So in light of the development of indistinguishable AI videos, what is one thing, one big change you think is going to happen that you think people don’t expect?
Hany Farid: Yeah. I think the real time … I mean, I think most people know when they go online to be a little cautious, even around video. I think people are not expecting you’re going to get a FaceTime call from what looks like your parents and it’s going to be a scam. I think people aren’t ready for that.
So I think that we have to … This is like a whole new level of phishing scams. It’s not going to be a text message saying, “This is your boss, your mom, your dad,” whatever. It’s going to be a FaceTime call and it’s going to sound like and look like your loved one and you’re not going to know. And I think people aren’t ready for that. By the way, I have a really simple analog solution to that. My wife and I have code words. Somebody calls and something is out of the ordinary, what’s the code word?
And that’s not hypothetical. This happened to me. I was working a really sensitive case, legal case with a lawyer. Last year, he got a phone call from my phone number in my voice talking about the case and about three minutes into the call, he got suspicious, hung up, called me back, and it wasn’t me. So this is happening. So now I think people are not ready for that. That real time, the call at 2 a.m. in the morning: “Mom, Dad had been in an accident, you’ve got to send money.” This is happening now. People aren’t ready for that.
Audience 3: In terms of, if I may ask another one, in terms of like structural things, and this is quite radical, for example, non-mainstream media goes bankrupt because people can only trust the biggest one. You expect one change, like structurally?
Hany Farid: Yeah. First of all, I think journalism is in trouble again. You saw what happened in the Washington Post two weeks ago, 300 unbelievably talented reporters got eviscerated. I have mixed feelings about this. On the one hand, I grew up at a time when there were three news channels: ABC, CBS and NBC. And then the internet came and CNN and 24/7, and the idea was that more information is better.
Turns out that’s actually not quite true. I think we were probably a better-informed public when we had three newscasts. I’m not saying we should necessarily go back to that, but there’s a lot of noise out there. And I’m not sure that more is better in this case. I would rather have less but higher quality, but I am worried about consolidation. You’re seeing tech oligarchs, multi-billionaire getting on trillionaires, swooping in and just buying everything up. That’s not going to be good for us.
You’re too young to know this, but back in the day, news was not meant to be profitable. It was the price that the companies paid for the airwaves. I actually think the death of journalism, or that’s a strong word, but what has hurt journalism was CNN, when they said, “We can monetize news 24/7.” I don’t think news should be monetized. It’s here for the public good, and I think this monetization of news and information is dangerous, but I’m also not naive and I realize this is the way the world works.
Audience 4: Yeah. You were talking about social media and being very concerned about anything you see there or digest from there. What about regular media? How accurate is stuff that we’re seeing on networks and …
Hany Farid: Good. I think that’s the right question to ask. So here’s what I can tell you. Everybody makes mistakes. Mainstream media makes mistakes. New York Times, the Post, everybody makes mistakes, but here’s the difference.
First of all, they’re trying to get it right, and you can’t say that about Elon Musk and social media. People aren’t necessarily trying and there’s no consequence for getting it wrong every single day. So I have much, much more confidence in what I read in the large networks than I do what I see on social media.
Because first of all, they have standards, they have ethics, they have consequences, they have an unbelievably smart people who work incredibly hard every day to bring you reliable information. Do they get it right 100% of the time? Of course not. So what do you do? You don’t just pick one newspaper. You read three of them and you wait.
You don’t need to get your news in the first 30 seconds of something happening for God’s sakes. This isn’t a race. We’re not sprinting. Take your time. And by the way, you got to read past the first paragraph. This is for the young people in the audience. TikTok is not going to give you news about what’s happening in Gaza or in Iran. You’ve got to dig in. This is really complicated and it’s hard. And you’ve got to reserve judgment. You’ve got to keep your biases aside.
But do I think they do better than social media? A hundred times better. A hundred times better. I would much rather get my information from BBC, NPR, New York Times, Washington Post, Wall Street Journal than anywhere else. I feel like I’m definitely better informed about what is going on in the world. Down here? She’s a journalist, by the way, just so you know.
Audience 5: I am. I am. Hi, Hany.
Hany Farid: Hi.
Audience 5: Thank you for being here. And thank you for this talk. I have a question about slopaganda being posted by our current administration.
Hany Farid: Yeah.
Audience 5: So the Trump administration posts these AI-generated videos, sometimes AI-altered photos of protesters. What impact on the consciousness of the American people do these AI-generated videos and slopaganda, institutional shitposting? What impact does that have on our consciousness?
Hany Farid: I hadn’t heard slopaganda. I like that. The shitposting’s pretty good, too. OK. Couple of things. One is misinformation, you don’t have to get people to believe your lies. That’s actually not what Russian state-sponsored propaganda does necessarily. They just create noise. They create chaos. They create uncertainty. And that’s actually relatively easy to do. I don’t have to believe them. Just create noise, right? And then the signal gets buried. It’s a really effective strategy, in fact, number one.
Number two: What you are referring to is that this White House and not just the Oval Office, but many of its agencies, is routinely posting fake images, fake videos. Some of it is clearly propaganda. Some of it is less obviously fake, which is disturbing, particularly coming out of Minneapolis. I think the problem with that … There’s a couple of problems with that. First of all, I think it just demeans the office of the White House and it’s beneath us. And I think that’s a dangerous precedent to set, number one.
Number two, I think the White House is not thinking through this very carefully, because there’s going to come a day where they’re going to post a video and they’re going to want us to believe it. And why should I trust them if half the things they post are fake and half of them are real? They’re eroding our trust in this government, which they may not care about today, but there’s going to come a day where they care about this. And I think that’s incredibly dangerous.
So I am bothered by the White House and the various agencies posting this, and they are unapologetic about it, by the way. They see no problem with this. They think it’s funny. They think it’s memes, but I think it’s an unbelievably dangerous precedent. And so I think the thing that you have been seeing is not just an erosion in the visual record, but it’s also an erosion in the trusted institutions.
What this administration has done, it has demonized people like you, really smart reporters, institutions, academics, scientists and institutions. And that’s incredibly dangerous combination because we are the people, the journalists and the scientists and the academics who are trying to figure this out. We are the one with the skills. When you erode trust in what they say, you simply dismiss us. And now, what are we left with? We’re street gangs at this point, right? We’re clans. There’s no more shared reality. There’s no more trust. There’s no more truth. And I think that’s incredibly dangerous for a stable democracy. He’s been really patient over here raising his hand.
Audience 6: It’s OK. All right. Thank you. It’s very interesting.
Hany Farid: It’s one word for it.
Audience 6: But all the prevention that you mentioned is more on the consumer side. And what I learned is that prevention in consumer side is two times or three times harder than on the supply side. What type of, if you can regulate the supply side, what type of …
Hany Farid: Good. First of all, it’s not two or three times. It’s a thousand times harder. And the reason why it’s a thousand times harder, there’s 8 billion people in the world, and there’s what? A couple of dozen AI companies, right? So I don’t disagree with you. I’m all for going up the chain and where the bottleneck is.
So when it comes to … Let me give you an example of that, to combating child sexual abuse material and non-consensual intimate imagery and that type of individual abuse, I don’t want to go after the individuals. You can’t arrest your way out of this. I mean, I would like to if I can, but you can’t. And you can’t even go after the platforms, right? You can’t go after X and Facebook. And even Apple said these are too big to fail. You know where you go to four companies: Visa, MasterCard, American Express and PayPal. These things are getting monetized. There are four financial institutions that if I cut off your ability to monetize, it’s over, right?
And then the other one is ads. There’s two ways you monetize on the internet: credit cards and ads. So ads, easy. It’s Google. Google controls like 80% of the market. Get Google to stop putting ads on these companies and get four financial companies to cut them off. And I’ll tell you why I know this works: because it has. Pornhub, a couple years ago, one of the biggest adult content generators or hosters, was hosting child sexual abuse. They were hosting non-consensual intimate imagery. Nicholas Kristof discovered this magically somehow. I don’t know where he’d been, but OK, and wrote an op-ed about it. And he called it out. He said: “Visa, Mastercard, this is the right call.” Visa, Mastercard, American Express. Overnight, they terminated the relationship with Pornhub.
Pornhub had to sell the company and that company had to completely change the way they do business. So a light switch. You go upstream, you find the bottleneck. You want to go even up higher? Cloudflare, Google, Amazon, infrastructure, internet service providers. There’s choke points. You can’t exist on the internet without the help of a huge number of infrastructures. I can cut you up.
Now there’s something dangerous here, right, f we start deciding who gets to use these things. But this is illegal material. I don’t think this is particularly controversial. So I’m with you. Go upstream and just choke off the money. Everything falls from there. That’s the strategy. Oh, sorry. He’s going up. Sorry.
Audience 7: Thank you. I wanted to ask about your strategy of analyzing personal mannerisms as a way of determining what is real and what is fake. A concern that I would think of for that would be, it doesn’t seem so hard to train a model just to pick up on those mannerisms themselves. And so what separates mannerisms from any other statistical pattern wish you would be able to pick up?
Hany Farid: It’s a great question. Here’s the difference. Time. Time is what was at our advantage. When we measure mannerisms, we measure them over 10-second windows. That’s 300 frames at a normal 30 hertz frame rate. These synthesis engines work one to two frames at a time. Time is my superpower. You can’t synthesize by waiting 10 seconds. Why? Because you have to deliver it in real time. I have a huge advantage when I’m looking at a Zoom call. My adversary has to synthesize every 30th of a second. I can wait 10 seconds and then look to see what you did. I have a huge advantage. I have a huge advantage. And even if it’s offline, doing synthesis where you are holding in memory 300 frames and slot … It’s impossible. We just can’t do it today computationally. So that’s my advantage with the mannerisms each time. Are we getting close?
Audience 8: Thank you, Dr. Hany. I was particularly interested when you said that we have seem to lost truth and reality in the age of AI, but it kind of relates to the philosopher Jean Baudrillard in his book, Simulacra and Simulation. He argued that in 20th century, science have already replaced reality. For example, even when we read the BBC News, we don’t see the reality. We see the science, words, representation of reality. And so I think truth has long been lost.
Like when we read BBC News, we don’t see facts. We see facts rearranged by words and science. And if one can manipulate storytelling and narrative, then one can manipulate what fact is. And so the same event, for example, the strike of Gaza can be reported with very different narrative and have very different effects. And so I think AI kind of just makes the situation even worse, but it’s not the first time our ever experiencing the laws of truth and reality.
Hany Farid: Good. OK. Two things. First of all, everything I’ve described is part of a continuum. Didn’t come out of nowhere. When the printing press was invented, we were pushing propaganda and lies. And we know when radio came along, we did that and when TV came along, this is part of the continuum.
But what is different, what is radically different is that if you wanted to push propaganda and lies before the internet, you needed to have a radio station, a printing press, a publishing company, or a TV station. It was not in the hands of the average person. And we have taken a mechanism that was in the hands of state sponsored actors and bad actors and given it to 8 billion people in the world. The threat vector has changed radically. A 15-year-old in Macedonia can interfere with our election in the United States with 350 million people, and that wasn’t true 10 years ago or 20 years ago.And so that is a radically different threat vector, is who can do this kind of damage, number one.
Number two is, I fundamentally disagree that there’s no truth and no facts. I agree that there are times where things are confusing and we don’t fully understand it, but there are also days when we know exactly what happened but people don’t want to believe it.
One hundred seventy-five people died in a school in Tehran. That is a fact, right? A Tomahawk missile blew that school up. That is a fact, right? We can dispute about why it happened and how it happened and who’s responsible, that’s different, but there are facts and then there are interpretations of that, and consequences to that, and how we deal with that, and what the root causes of that. But I simply reject this notion that there is no truth or fact. I think it is hard and I think it takes hard work and it takes effort, but we can come to an understanding of what is happening in the world in most situations, I would argue.
Chrystal Chern: Amazing. I think that’s the last question we have time for.
Hany Farid: OK.
Chrystal Chern: Thank you so much.
Hany Farid: Thank you. Take care.
(Applause)
(Music: “No One Is Perfect” by HoliznaCC0)
Anne Brice (outro): You’ve been listening to Berkeley Talks, a UC Berkeley News podcast from Strategic Communications at Berkeley. Follow us wherever you listen to your podcasts. You can find all of our podcast episodes, with transcripts and photos, on UC Berkeley News at news.berkeley.edu/podcasts.
(Music fades out)