Perplexity’s grand theft AI

Posted by
Check your BMI
Vector collage of the Perplexity logo.
What, exactly, is Perplexity’s innovation? | Image: The Verge
toonsbymoonlight

In every hype cycle, certain patterns of deceit emerge. In the last crypto boom, it was “ponzinomics” and “rug pulls.” In self-driving cars, it was “just five years away!” In AI, it’s seeing just how much unethical shit you can get away with.

Perplexity, which is in ongoing talks to raise hundreds of millions of dollars, is trying to create a Google Search competitor. Perplexity isn’t trying to create a “search engine,” though — it wants to create an “answer engine.”

The idea is that instead of combing through a bunch of results to answer your own question with a primary source, you’ll simply get an answer Perplexity has found for you. “Factfulness and accuracy is what we care about,” Perplexity CEO Aravind Srinivas told The Verge.

That means that Perplexity is basically a rent-seeking middleman on high-quality sources. The value proposition on search, originally, was that by scraping the work done by journalists and others, Google’s results sent traffic to those sources. But by providing an answer, rather than pointing people to click through to a primary source, these so-called “answer engines” starve the primary source of ad revenue — keeping that revenue for themselves. Perplexity is among a group of vampires that include Arc Search and Google itself.

But Perplexity has taken it a step further with its Pages product, which creates a summary “report” based on those primary sources. It’s not just quoting a sentence or two to directly answer a user’s question — it’s creating an entire aggregated article, and it’s accurate in the sense that it is actively plagiarizing the sources it uses.

Forbes discovered Perplexity was dodging the publication’s paywall in order to provide a summary of an investigation the publication did of former Google CEO Eric Schmidt’s drone company. Though Forbes has a metered paywall on some of its work, the premium work — like that investigation — is behind a hard paywall. Not only did Perplexity somehow dodge the paywall but it barely cited the original investigation and ganked the original art to use for its report. (For those keeping track at home, the art thing is copyright infringement.)

Aggregation is not a particularly new phenomenon — but the scale at which Perplexity can aggregate, along with the copyright violation of using the original art, is pretty, hmm, remarkable. In an attempt to calm everyone down, the company’s chief business officer went to Semafor to say Perplexity was developing revenue sharing plans with publications, and aw gee whiz, how come everyone was being so mean to a product still in development?

At this point, Wired jumped in, confirming a finding from Robb Knight: Perplexity’s scraping of Forbes’ work wasn’t an exception. In fact, Perplexity has been ignoring the robots.txt code that explicitly asks web crawlers not to scrape the page. Srinivas responded in Fast Company that actually, Perplexity wasn’t ignoring robots.txt; it was just using third-party scrapers that ignored it. Srinivas declined to name the third-party scraper and didn’t commit to asking that crawler to stop violating robots.txt.

“Someone else did it” is a fine argument for a five-year-old. And consider the response further. If Srinivas wanted to be ethical, he had some options here. Option one is to terminate the contract with the third-party scraper. Option two is to try to convince the scraper to honor robots.txt. Srinivas didn’t commit to either, and it seems to me, there’s a clear reason why. Even if Perplexity itself isn’t violating the code, it is reliant on someone else violating the code for its “answer engine” to work.

To add insult to injury, Perplexity plagiarized Wired’s article about it — even though Wired explicitly blocks Perplexity in its text file. The bulk of Wired’s article about the plagiarism is about legal remedies, but I’m interested in what’s going on here with robots.txt. It’s a good-faith agreement that has held up for decades now, and it’s falling apart thanks to unscrupulous AI companies — that’s right, Perplexity isn’t the only one — hoovering up just about anything that’s available in order to train their bullshit models. And remember how Srinivas said he was committed to “factfulness?” I’m not sure that’s true, either: Perplexity is now surfacing AI-generated results and actual misinformation, Forbes reports.

We’ve seen a lot of AI giants engage in questionably legal and arguably unethical practices in order to get the data they want. In order to prove the value of Perplexity to investors, Srinivas built a tool to scrape Twitter by pretending to be an academic researcher using API access for research. “I would call my [fake academic] projects just like Brin Rank and all these kinds of things,” Srinivas told Lex Fridman on the latter’s podcast. I assume “Brin Rank” is a reference to Google co-founder Sergey Brin; to my ear, Srinivas was bragging about how charming and clever his lie was.

I’m not the one who’s telling you the foundation of Perplexity is lying to dodge established principles that hold up the web. Its CEO is. That’s clarifying about the actual value proposition of “answer engines.” Perplexity cannot generate actual information on its own and relies instead on third parties whose policies it abuses. The “answer engine” was developed by people who feel free to lie whenever it is more convenient, and that preference is necessary for how Perplexity works.

So that’s Perplexity’s real innovation here: shattering the foundations of trust that built the internet. The question is if any of its users or investors care.