When Apple first launched Siri in 2011 alongside the iPhone 4S, the company made a series of very compelling ads showing how you might use this newfangled voice assistant thing. In one, Zooey Deschanel asks her phone about delivering tomato soup; in another, John Malkovich asks for some existential life advice. There’s also one with Martin Scorsese shuffling his schedule from the back of a New York City taxi. They showed reminders, weather, alarms, and more. The point of the ads was that Siri was a useful, constant companion, one that could tackle whatever you needed. No apps or taps necessary. Just ask.
Siri was a big deal for Apple. At the launch event for the 4S, Apple’s Phil Schiller said Siri was the best feature of the new device. “For decades, technologists have teased us with this dream that you’re going to be able to talk to technology and it’ll do things for us,” he said. “But it never comes true!” All we really want to do, he said, is talk to our device any way we want and get information and help. In a moment of classic Apple bravado, Schiller proclaimed Apple had solved it.
Apple had not solved it. In the 13 years since that initial launch, Siri has become, for most people, either a way to set timers or a useless feature to be avoided at all costs. Siri has been bad for a long time, long enough that it has seemed for years that Apple either forgot about it or simply chose to pretend it didn’t exist.
But next week at WWDC, if the rumors and reports are true, we might be about to meet the real Siri for the first time — or at least something much closer to it. According to Bloomberg, The New York Times, and others, Apple is going to unveil a huge overhaul for the assistant, making Siri more reliable thanks to large language models but without much new functionality. Even that would be a win. But Apple also appears to be working on, and may be almost ready to launch, a version of Siri that will actually integrate inside of apps, meaning the assistant can take action on your device on your behalf. In theory, at least, anything you can do on your phone, Siri might soon be able to do for you.
This has obviously been the vision for Siri all along. You can even see it in those iPhone 4S commercials: these celebs are asking Siri for help, and Siri almost never actually finishes the job. It provides Deschanel with a list of restaurants that mention delivery but doesn’t offer to order anything or show her the menu. It tells Scorsese there’s traffic but doesn’t reroute him — and shouldn’t it already know he’s going to be late for his meeting? Siri tells Malkovich to be nice to people and read a good book but doesn’t offer any practical help. So far, using Siri is like having a virtual assistant whose only job is to Google stuff for you. Which is something! But it isn’t much.
Siri’s inabilities have been all the more frustrating because everything it needs to be useful is right there on your phone. When I want pizza, why can’t Siri check my email for the receipt from the last time I ordered, open DoorDash, enter the same order, pay with one of the cards in my Apple Wallet, and be done with it? If I have a Scorsese-level busy day, Siri seems to be right there next to all my contacts, my Slack, my email, and everything else it needs to quickly move stuff around on my behalf. If Siri could take over my phone like one of those remote access tools that lets someone else move your computer’s cursor, it would be unstoppable.
There are really two reasons Siri never lived up to its potential in this way. The first is the simple one: the underlying technology wasn’t good enough. If you’ve used Siri, you know how frequently it mishears names, misunderstands commands, and falls back to “here’s some stuff I found on the web” when all you wanted was to play a podcast. This is where large language models are unequivocally very exciting because we’ve seen how much better speech-to-text tools like Whisper are and how much more broadly these models can understand language. They’re not perfect, but they’re a huge improvement over what we’ve had before — which is why Amazon is also pivoting Alexa to LLMs and Google’s Assistant is being overrun by Gemini.
The second reason Siri never quite worked is simply that neither Apple nor third-party developers ever figured out how it should work. How are you supposed to know what Siri can do or how to ask? How are developers supposed to integrate Siri? Even now, if you want to add a task to your to-do list app, Siri can’t just figure out which app you use. You have to say, Hey Siri, remind me to water the grass in Todoist, which is a weird sentence that makes no sense and, in my experience, fails half the time anyway. If you want to do a multistep action, your only option is to muck around in Shortcuts, which is a very powerful tool but falls just short of requiring you to write code. It’s too much for most people.
AI might also give Apple a chance to end run the whole problem. Its researchers published a paper earlier this year detailing a system called Ferret-UI, which uses an AI model to understand small details of an onscreen image. The researchers even detail how an overall app using Siri might work: OpenAI’s GPT-4 does a good job of broadly understanding what an image is, and then Ferret is able to understand small regions and details. In practice, that might mean one system says, “This is the Ticketmaster app!” and the other says, “That right there is the buy button.”
We should be skeptical about whatever claims Apple makes for Siri. More than a decade ago, Schiller stood onstage and proclaimed that Apple had built a better voice assistant, and it hadn’t. The same might be true now, as the hype for AI continues to move a lot faster than the actual technology. Humane, Rabbit, Google, and others are all working on similar ideas — “agent” is the buzzword of the summer in the AI world — and no one has demonstrated that it’s ready yet.
But if Apple has cracked something here, this could be the first time we ever get to see the real Siri — the Siri we were promised all those years ago. Maybe in the next commercial, Deschanel’s tomato soup will just magically appear at her house, and the Headspace app will fire up to bring Malkovich some inner peace. Maybe, finally, we’re going to get the Siri Apple always wanted to make.