Is the clip stupid or terrifying? I cant decide. To be honest, its a bit of both.
I just think I would love to get Ratatouilled, a familiar-sounding voice begins.
Ratatouilled? asks another recognizable voice.
Like, have a little guy up there, the first voice replies. You know, making me cook delicious meals.
It sounds like Joe Rogan and Ben Shapiro, two of podcastings biggest, most recognizable voices, bantering over the potential real-world execution of the Pixar movies premise. A circular argument ensues. What constitutes getting Ratatouilled in the first place? Do the rats powers extend beyond the kitchen?
Read: Of gods and machines
A friend recently sent me the audio of this mind-numbing exchange. I let out a belly laugh, then promptly texted it to several other peopleincluding a guy who once sheepishly told me that he regularly listens to The Joe Rogan Experience.
Is this real? he texted back.
Theyre AI voices, I told him.
Whoa. Thats insane, he said. Politics is going to get wild.
I havent stopped thinking about how right he is. The voices in that clip, while not perfect replicants of their subjects, are deeply convincing in an uncanny-valley sort of way. Rogan has real-world Joe Rogans familiar inflection, his half-stoned curiosity. Shapiro, for his part, is there with rapid-fire responses and his trademark scoff.
Last week, I reached out to Zach Silberberg, who created the clip using an online tool from the Silicon Valley start-up ElevenLabs. Eleven brings the most compelling, rich and lifelike voices to creators and publishers seeking the ultimate tools for storytelling, the firms website boasts. The word storytelling is doing a lot of work in that sentence. When does storytelling cross over into disinformation or propaganda?
I asked Silberberg if we could sit down in person to talk about the implications of his viral joke. Though he didnt engineer the product, he had already seemed to master it in a way few others had. Would bad actors soon follow his lead? Did he care? Was it his responsibility to care?
Silberberg is in his late 20s and works in television in New York City. On the morning of our meeting, he shuffled into a TriBeCa coffee shop in a tattered sweater with an upside-down Bart Simpson stitched on the front. He told me how he had been busy making otherin his wordsstupid clips. In one, an AI version of President Joe Biden informs his fellow Americans that, after watching the 2011 Cameron Crowe flop, We Bought a Zoo, he, Biden, also bought a zoo. In another, AI Biden says the reason he has yet to visit the site of the East Palestine, Ohio, train derailment is because he got lost on the island from Lost. While neither piece of audio features Biden stuttering or word-switching, as he often does when public speaking, both clips have the distinct Biden cadence, those familiar rises and falls. The scripts, too, have an unmistakable Biden folksiness to them.
The reason I think these are funny is because you know theyre fake, Silberberg told me. He said the Rogan-Shapiro conversation took him roughly an hour and a half to produceit was meant to be a joke, not some well-crafted attempt at tricking people. When I informed him that my Rogan-listening friend initially thought the Ratatouille clip was authentic, Silberberg freaked out: No! God, no! he said with a cringe. That, to me, is fucked up. He shook his head. Im trying to not fall into that, because Im making it so outlandish, he said. I dont ever want to create a thing that could be mistaken for real. Like so much involving AI these past few months, it seemed to already be too late.
Read: Is this the start of an AI takeover?
What if, instead of a sitting president talking about how he regrets buying a zoo, a voice that sounded enough like Bidens was caught on tape saying something much more nefarious? Any number of Big Lie talking points would instantly drive a news cycle. Imagine a convincing AI voice talking about ballot harvesting, or hacked voting machines; voters who are conspiracy-minded would be validated, while others might simply be confused. And what if the accused public figureBiden, or anyone, for that mattercouldnt immediately prove that a viral, potentially career-ending clip was fake?
One of the major political scandals of the past quarter century involved a sketchy recording of a disembodied voice. When youre a star, they let you do it, future President Donald Trump proclaimed. (You know the rest.) That clip was real. Trump, being Trump, survived the scandal, and went on to the White House.
But, given the arsenal of public-facing AI tools seizing the internetincluding the voice generator that Silberberg and other shitposters have been playing around withhow easy would it be for a bad actor to create a piece of Access Hollywoodstyle audio in the run-up to the next election? And what if said clip was created with a TV writers touch? Five years ago, Jordan Peele went viral with an AI video of then-President Barack Obama saying Killmonger was right, Ben Carson is in the sunken place, and President Trump is a total and complete dipshit. The voice was close, but not that close. And because it was a video, the strange mouth movements were a dead giveaway that the clip was fake. AI audio clips are potentially much more menacing because the audience has fewer context clues to work with. It doesnt take a lot, which is the scary thing, Silberberg said.
He discovered that the AI seems to produce more convincing work when processing just a few words of dialogue at a time. The Rogan-Shapiro clip was successful because of the Whos on first? back-and-forth aspect of it. He downloaded existing audio samples from each podcast hosts massive online archivethree from Shapiro, two from Roganuploaded them to ElevenLabs website, then input his own script. This is the point where most amateurs will likely fail in their trolling. For a clip to land, even a clear piece of satire, the subjects diction has to be both believable and familiar. You need to nail the Biden-isms. The shorter the sentences, the less time the listener has to question the validity of the voice. Plus, Silberberg learned, the more you type, the more likely the AI voices will string phrases together with flawed punctuation or other awkward vocal flourishes. Sticking to quick snippets makes it easier to retry certain lines of the script to perfect the specific inflection, rather than having to trudge through a whole paragraph of dialogue. But this is just where we are today, 21 months before the next federal elections. Its going to get better, and scarier, very fast.
If it seems like AI is everywhere all at once right now, swallowing both our attention and the internet, thats because it is. While transcribing my interview with Silberberg in a Google doc, Googles own AI began suggesting upcoming words in our conversation as I typed. Many of the fill-ins were close, but not entirely accurate; I ignored them. On Monday, Mark Zuckerberg said he was creating a new top-level product group at Meta focused on generative AI to turbocharge our work in this area. This news came just weeks after Kevin Roose, of The New York Times, published a widely read story about how he had provoked Microsofts Bing AI tool into saying a range of unsettling, emotionally charged statements. A couple of weeks before that, the DJ David Guetta revealed that he had used an AI version of Eminems voice in a live performancelyrics that the real-life Eminem had never rapped. Elsewhere last month, the editor of the science-fiction magazine Clarkesworld said he had stopped accepting submissions because too many of them appeared to be AI-generated texts.
Derek Thompson: The AI disaster scenario
This past Sunday, Sam Altman, the CEO of OpenAI, the company behind the ChatGPT AI tool, cryptically tweeted, A new version of Moores Law that could start soon: the amount of intelligence in the universe doubles every 18 months. Altman is 37 years old, meaning hes of the generation that remembers living some aily life without a computer. Silberbergs generation, the one after Altmans, does not, and that cohort is already embracing AI faster than the rest of us.
Like a lot of PEOPLE , I first encountered a naturalistic AI voice when watching last years otherwise excellent Anthony Bourdain documentary, Roadrunner. News of the filmmakers curious decision to include a brief, fake voice-over from the late Bourdain dominated the media coverage of the movie and, for some viewers, made it distracting to watch at all. (You may have found yourself always listening for the moment.) They had so much material to work with, including hours of actual Bourdain narration. What did faking a brief moment really accomplish? And why didnt they disclose it to viewers?
My opinion is that, blanket statement, the use of AI technology is pretty bleak, Silberberg said. The way that it is headed is scary. And it is already replacing artists, and is already creating really fucked-up, gross scenarios.
A brief survey of those scenarios that have already come into existence: an AI version of Emma Watson reading Mein Kampf, an AI Bill Gates revealing that the coronavirus vaccine causes AIDS, an AI Biden attacking transgender individuals. Reporters at The Verge created their own AI Biden to announce the invasion of Russia and validate one of the most toxic conspiracy theories of our time.
The problem, essentially, is that far too many people find the cruel, nihilistic examples just as funny as Silberbergs absurd, low-stakes mastery of the form. He told me that as the Ratatouille clip began to go viral, he muted his own tweet, so he still doesnt know just how far and wide it has gone. A bot notified him that Twitters owner, Elon Musk, liked the video. Shapiro, for his part, posted LMFAO and a laughing-crying emoji over another Twitter accounts carbon copy of Silberbergs clip. As he and I talked about the implications of his work that morning, he seemed to grow more and more concerned.
Im already in weird ethical waters, because Im using peoples voices without their consent. But theyre public figures, political figures, or public commentators, he said. These are questions that Im grappling withthese are things that I havent fully thought through all the way to the end, where Im like, Oh yeah, maybe I should not even have done this. Maybe I shouldnt have even touched these tools, because its reinforcing the idea that theyre useful. Or maybe someone saw the Ratatouille video and was like, Oh, I can do this? Let me do this. And Ive exposed a bunch of right-wing Rogan fans to the idea that they can deepfake a public figure. And that to me is scary. Thats not my goal. My goal is to make people chuckle. My goal is to make people have a little giggle.
Neither the White House nor ElevenLabs responded to my request for comment on the potential effects of these videos on American politics. Several weeks ago, after the first round of trolls used Elevens technology for what the company described as malicious purposes, Eleven responded with a lengthy tweet thread of steps it was taking to curb abuse. Although most of it was boilerplate, one notable change was restricting the creation of new voice clones to paid users only, under the thinking that a person supplying a credit-card number is less likely to troll.
Near the end of our conversation, Silberberg took a stab at optimism. As these tools progress, countermeasures will also progress to be able to detect these tools. ChatGPT started gaining popularity, and within days someone had written a thing that could detect whether something was ChatGPT, he said. But then he thought more about the future: I think as soon as youre trying to trick someone, youre trying to take someones job, youre trying to reinforce a political agendayou know, you can satirize something, but the instant youre trying to convince someone its real, it chills me. It shakes me to my very core.
On its website, Eleven still proudly advertises its uncanny quality, bragging that its model is built to grasp the logic and emotions behind words. Soon, the unsettling uncanny-valley element may be replaced by something indistinguishable from human intonation. And then even the funny stuff, like Silberbergs work, may stop making us laugh.