How Google taught AI to doubt itself

Posted by
Check your BMI
A graphic showing Bard’s logo with Gmail, Drive, Docs, and other apps
Image: Google
toonsbymoonlight

This is Platformer, a newsletter on the intersection of Silicon Valley and democracy from Casey Newton and Zoë Schiffer.

Today let’s talk about an advance in Bard, Google’s answer to ChatGPT, and how it addresses one of the most pressing problems with today’s chatbots: their tendency to make things up.

From the day that the chatbots arrived last year, their makers warned us not to trust them. The text generated by tools like ChatGPT does not draw on a database of established facts. Instead, chatbots are predictive — making probabilistic guesses about which words seem right based on the massive corpus of text that their underlying large language models were trained on.

As a result, chatbots are often “confidently wrong,” to use the industry’s term. And this can fool even highly educated people, as we saw this year with the case of the lawyer who submitted citations generated by ChatGPT — not realizing that every single case had been fabricated out of whole cloth.

This state of affairs explains why I find chatbots mostly useless as research assistants. They’ll tell you anything you want, often within seconds, but in most cases without citing their work. As a result, you wind up spending a lot of time researching their answers to see whether they’re true — often defeating the purpose of using them at all.

A Bard answer with a pop-up reading “Double-check Bard’s response: This button helps you assess Bard’s responses by using Google Search to find content that’s likely similar or different. Click the highlighted statements in Bard’s response to learn more.”
Google highlights the new feature to check Bard’s responses. Screenshot: The Verge

When it launched earlier this year, Google’s Bard came with a “Google It” button that submitted your query to the company’s search engine. This made it slightly faster to get a second opinion about the chatbot’s output, but still placed the burden for determining what is true and false squarely on you.

Starting this week, though, Bard will do a bit more work on your behalf. After the chatbot answers one of your queries, hitting the Google button will “double check” your response. Here’s how the company explained it in a blog post:

When you click on the “G” icon, Bard will read the response and evaluate whether there is content across the web to substantiate it. When a statement can be evaluated, you can click the highlighted phrases and learn more about supporting or contradicting information found by Search.

Double-checking a query will turn many of the sentences within the response green or brown. Green-highlighted responses are linked to cited web pages; hover over one and Bard will show you the source of the information. Brown-highlighted responses indicate that Bard doesn’t know where the information came from, highlighting a likely mistake.

When I double-checked Bard’s answer to my question about the history of the band Radiohead, for example, it gave me lots of green-highlighted sentences that squared with my own knowledge. But it also turned this sentence brown: “They have won numerous awards, including six Grammy Awards and nine Brit Awards.” Hovering over the words showed that Google’s search had shown contradictory information; indeed, Radiohead has (criminally) never won a single Brit Award, much less nine of them.


“I’m going to tell you about a tragedy that happened in my life,” Jack Krawczyk, a senior director of product at Google, told me in an interview last week.

Krawczyk had cooked swordfish at home, and the resulting smell seemed to permeate the entire house. He used Bard to look up ways to get rid of it and then double-checked the results to separate fact from fiction. It turns out the cleaning the kitchen thoroughly would not fix the problem, as the chatbot had originally stated. But placing bowls of baking soda around the house might help.

If you’re wondering why Google doesn’t double-check answers like this before showing them to you, so did I. Krawczyk told me that, given the wide variety of ways people use Bard, double-checking is frequently unnecessary. (You wouldn’t typically ask it to double-check a poem you wrote, or an email it drafted, and so on.)

A Bard answer about rainforest precipitation. Two lines are covered in green and two are in brown. One isn’t highlighted at all.
A Bard response showing lines that could be backed up with a Google search (green) and those that couldn’t (brown). Screenshot: The Verge

And while double-checking represents a clear step forward, it does still often require you to pull up all those citations and make sure Bard is interpreting those search results correctly. At least when it comes to research, human beings are still holding the AI’s hand as much as it is holding ours.

Still, it’s a welcome development.

“We may have created the first language model that admits it has made a mistake,” Krawczyk told me. And given the stakes as these models improve, ensuring that AI models accurately confess to their mistakes ought to be a high priority for the industry.

Bard got another big update Tuesday: it can now connect to your Gmail, Docs, Drive, and a handful of other Google products, including YouTube and Maps. Extensions, as they’re called, let you search, summarize, and ask questions about documents you have stored in your Google account in real time.

For now, it’s limited to personal accounts, which dramatically limits its utility, at least for me. It is sometimes interesting as an alternative way to browse the web — it did a good job, for example, when I asked it to show me some good videos about getting started in interior design. (The fact that you can play those videos inline in the Bard answer window is a nice touch.)

But extensions get a lot of stuff wrong, too, and there’s no button to press here to improve the results. When I asked Bard to find my oldest email with a friend who I’ve been exchanging messages with in Gmail for 20 years now, Bard showed me a message from 2021. When I asked it which messages in my inbox might need a prompt response, Bard suggested a piece of spam with the subject line “Hassle-free printing is possible with HP Instant Ink.”

It does better in scenarios where Google can make money. Ask it to plan an itinerary for a trip to Japan including flight and hotel information, and it will pull up a good selection of choices from which Google can take a cut of the purchase.

Eventually, I imagine that third-party extensions will come to Bard, just as they previously have to ChatGPT. (They’re called plug-ins over there.) The promise of being able to get things done on the web through a conversational interface is huge, even if the experience today is only so-so.

The question over the long term is how well AI will ultimately be able to check its own work. Today, the task of steering chatbots toward the right answer still weighs heavily on the person typing the prompt. In this moment, tools that push AIs to cite their work are greatly needed. Eventually, though, here’s hoping that more of that work falls on the tools themselves — and without us always having to ask for it.