It has been a month since Google’s spectacular goof. Its new AI Overviews feature was supposed to “take the legwork out of searching,” offering up easy-to-read answers to our queries based on multiple search results. Instead, it told people to eat rocks and to glue cheese on pizza. You could ask Google what country in Africa starts with the letter “K”, and Google would say none of them. In fact, you can still get these wrong answers because AI search is a disaster.
This spring looked like a turning point for AI search, thanks to a couple of big announcements from major players in the space. One was that Google AI Overview update, and the other came from Perplexity, an AI search startup that’s already been labeled as a worthy alternative to Google. At the end of May, Perplexity launched a new feature called Pages that can create custom web pages full of information on one specific topic, like a smart friend who does your homework for you. Then Perplexity got caught plagiarizing. For AI search to work well, it seems, it has to cheat a little.
There’s a lot of ill will over AI search’s mistakes and missteps and critics are mobilizing en masse. A group of online publishers and creators took to Capitol Hill on Wednesday to lobby lawmakers to look into Google’s AI Overviews feature and other AI tech that pulls content from independent creators. This is just a couple days after the Recording Industry Association of America (RIAA) and a group of major record labels sued two AI companies that generate music from text for copyright infringement. And let’s not forget that several newspapers, including the New York Times, have sued OpenAI and Microsoft for copyright infringement for scraping their content in order to train the same AI models that power their search tools. (Vox Media, the company that owns this publication, meanwhile, has a licensing deal with OpenAI that allows our content to be used to train its models and by ChatGPT. Our journalism and editorial decisions remain independent.)
Generative AI technology is supposed to transform the way we search the web. At least, that’s the line we’ve been fed since ChatGPT exploded on the scene near the end of 2022, and now every tech giant is pushing its own brand of AI technology: Microsoft has Copilot, Google has Gemini, Apple has Apple Intelligence, and so forth. While these tools can do more than help you find things online, dethroning Google Search still seems to be the holy grail of AI. Even OpenAI, maker of ChatGPT, is reportedly building a search engine to compete directly with Google.
But despite many companies’ very public efforts, AI search won’t make finding answers online effortless any time soon, according to experts I spoke to.
It’s not just that AI search isn’t ready for primetime due to some flaws, it’s that those flaws are so deeply integrated into how AI search works that it’s now unclear if it can ever get good enough to replace Google.
“It’s a good addition, and there are times when it’s really great,” Chirag Shah, a professor of information science at the University of Washington, told me. “But I think we’re still going to need the traditional search around.”
Rather than going into all of AI search’s flaws here, let me highlight the two that were on display with the recent Google and Perplexity kerfuffles. The Google pizza glue incident shows just how stubborn generative AI’s hallucination problem is. Just a few days after Google launched AI Overview, some users noticed that if you asked Google how to keep cheese from falling off of pizza, Google would suggest adding some glue. This particular answer appeared to come from an old Reddit thread that, for some reason, Google’s AI thought was an authoritative source even though a human would quickly realize that the Redditors are joking about eating glue. Weeks later, The Verge’s Elizabeth Lopatto reported that Google’s AI Overview feature was still recommending pizza glue. Google rolled back its AI Overview feature in May following the viral failures, so it’s difficult to access AI Overview at all.
The problem isn’t just that the large language models that power generative AI tools can hallucinate, or make up information in certain situations. They also can’t tell good information from bad — at least not right now.
“I don’t think we’ll ever be at a stage where we can guarantee that hallucinations won’t exist,” said Yoon Kim, an assistant professor at MIT who researches large language models. “But I think there’s been a lot of advancements in reducing these hallucinations, and I think we’ll get to a point where they’ll become good enough to use.”
The recent Perplexity drama highlights a different problem with AI search: It accesses and republishes content that it’s not supposed to. Perplexity, whose investors include Jeff Bezos and Nvidia, made a name for itself by providing deeper answers to search queries and showing its sources. You can give it a question and it will come back with a conversational answer, complete with citations from around the web, which you can refine by asking more questions.
When Perplexity launched its Pages feature, however, it became clear that its AI had an uncanny ability to rip off journalism. Perplexity even makes Pages it generated look like a news section of its website. One such Page it published included summaries of some Forbes’s exclusive, paywalled investigative reporting on Eric Schmidt’s drone project. Forbes accused Perplexity of stealing its content, and Wired later reported that Perplexity was scraping content from websites that have blocked the type of crawlers that do such scraping. The AI-powered search engine would even construct incorrect answers to queries based on details in URLs or metadata. (In an interview with Fast Company last week, Perplexity CEO Aravind Srinivas denied some of the findings of the Wired investigation and said, “I think there is a basic misunderstanding of the way this works.”)
The reasons why AI-powered search stinks at sourcing are both technical and simple, Shah explained. The technical explanation involves something called retrieval-augmented generation (RAG), which works a bit like a professor recruiting research assistants to go find out more information about a specific topic when the professor’s personal library isn’t enough. RAG does solve a couple of problems with how the current generation of large language models generate content, including the frequency of hallucinations, but it also creates a new problem: It can’t distinguish good sources from bad. In its current state, AI lacks good judgment.
When you or I do a Google search, we know that the long list of blue links will include high-quality links, like newspaper articles, and low-quality or unverified stuff, like old Reddit threads or SEO farm garbage. We can distinguish between the good or bad in a split second, thanks to years of experience perfecting our own Googling skills.
And then there’s some common sense that AI doesn’t have, like knowing whether or not it’s okay to eat rocks and glue.
“AI-powered search doesn’t have that ability just yet,” Shah said.
None of this is to say that you should turn and run the next time you see an AI Overview. But instead of thinking about it as an easy way to get an answer, you should think of it as a starting point. Kind of like Wikipedia. It’s hard to know how that answer ended up at the top of the Google search, so you might want to check the sources. After all, you’re smarter than the AI.