Why CEO Matt Garman is willing to bet AWS on AI

Posted by
Check your BMI
Photo illustration of AWS CEO Matt Garman.
Photo illustration by The Verge / Photo: Amazon
toonsbymoonlight

AWS chief Matt Garman says Amazon is already seeing the benefits of its massive AI investments.

Today, I’m talking with Matt Garman, the CEO of Amazon Web Services, or AWS. Matt took over as CEO last June — you might recall that we had his predecessor, Adam Selipsky, on the show just over a year ago. That makes this episode terrific Decoder bait, since I love hearing how new CEOs decide what to change and what to keep once they’ve settled into their role.

Matt has a really interesting perspective for that kind of conversation since he’s been at AWS for 20 years — he started at Amazon as an intern and was AWS’s original product manager. He’s now the third CEO in just five years, and I really wanted to understand his broad view of both AWS and where it sits inside an industry that he had a pivotal role in creating.

You’ll hear Matt say that most companies are still barely in the cloud, and that opportunity remains massive for AWS, even though it’s been the market leader for years. If you’re a product manager or an aspiring product manager, you’ll catch Matt talking about these things exactly like the product manager he was from the start, only now with a broad view from the CEO chair.

But just acquiring new customers isn’t the game any longer: like every cloud provider, Amazon is reorienting its entire computing infrastructure for a world of generative AI. That includes more than $8 billion in funding for Anthropic, a huge push to build its own AI chips to compete with Nvidia, and even nuclear power investments as the energy demand for AI continues to grow. After Matt and I talked before the holidays, AWS announced an $11 billion investment to expand its data center operations in Georgia.

Matt’s perspective on AI as a technology and a business is refreshingly distinct from his peers, including those more incentivized to hype up the capabilities of AI models and chatbots. I really pushed Matt about Sam Altman’s claim that we’re close to AGI and on the precipice of machines that can do tasks any human could do. I also wanted to know when any of this is going to start returning — or even justifying — the tens of billions of dollars of investments going into it.

His answers on both subjects were pretty candid, and it’s clear Matt and Amazon are far more focused on how AI technology turns into real products and services that customers want to use and less about what Matt calls “puffery in the press.”

One note before we start — we recorded this episode just before the holidays, so I asked Matt about Netflix, one of AWS’s biggest customers, and whether it would hold up while streaming live events, especially the NFL games it streamed on Christmas. Turns out, Netflix did just fine with those, but the answers here were pretty interesting. Matt still checks in on his big customers, even as CEO.

Okay, AWS CEO Matt Garman. Here we go.

This transcript has been lightly edited for length and clarity. 

Matt Garman, you’re the CEO of Amazon Web Services (AWS). Welcome to Decoder.

Thanks for having me.

I am very excited to talk to you. You’re like a perfect Decoder guest. You are, I believe, the first product manager at AWS, you started as an intern and now you’re the CEO. We have a lot of listeners who want to be on that journey, so there’s lots to talk to you about just in that. 

You’re also the new CEO. We had your predecessor, Adam Selipsky, on the show just a little over a year ago. You’re about six months on the job now. So, there’s a lot of Decoder stuff in there — how you’re changing the organization and how you’re thinking about it. And then, obviously, we’re going to talk about AI. It’s going to happen. I hope you’re ready for it.

I’m ready for it. Shoot, fire away. I’m happy to go wherever you want.

All right. But I actually want to start with a very hot-button, deeply controversial topic. Are you ready?

Great. Fire away.

Okay, it’s Jake Paul. I want to start with Jake Paul. My understanding is Netflix is the prototypical AWS customer, right? They started on AWS, they made a big bet on AWS. They’re still the customer, right? They haven’t left AWS?

Yeah, Netflix is a great customer of ours. Absolutely.

They just had the live stream of Jake Paul fighting Mike Tyson. You can think anything you want about those two men fighting each other.

I was hoping Mike would win, honestly.

So was I.

I think most were, but that’s okay. It was fun to see him out there.

You’ve just set off a million more conspiracy theories about this fight. Anyhow, I told you it was controversial. All right, but the stream was pretty glitchy. I think everybody agrees on that. When I watched it, it degraded to 360p at some point for me. Netflix CEO Ted Sarandos was just on stage at a conference. Netflix said the demand is 108 million people globally, and here’s what Ted said about that stream: “We were stressing the limits of the internet itself that night. We had a control room up in Silicon Valley that was re-engineering the entire internet to keep it up during this fight because of the unprecedented demand that was happening.” 

You’re the CEO of AWS, you’re the internet. Did they have to re-engineer the internet for the Jake Paul fight?

You’ve got to ask Ted about that. I think where they were stressed about the [content delivery network] they run, and you can ask Ted about that too. Netflix has its own homegrown CDN that it uses, and that’s the part that I think was stressed. I don’t know the details of exactly where they were running into barriers, but it wasn’t in the AWS infrastructure, it was in the Netflix-controlled part of their structure.

Yeah, their CDN is really fancy, right? They’ve got boxes and ISPs and everything. I was just curious because what we’re about to talk about, in a huge way, is how providers like AWS can meet the growing demand for compute everywhere and then get it to the people who need it. And it feels like most people in 2024 take video streaming for granted, but it’s still pretty hard.

It is. And I think in particular, there are a couple of things around that that are challenging, right? By the way, it’s a super hard thing that they did. Number one, it’s their first time doing a big, scaled live stream like that. The first time is actually what’s hard. Other people have done that before. We’ll stream Thursday Night Football and other places like that that have figured out how to do things at that scale, but it’s not the first time. So, I’m sure that the next time — I think they have a Christmas day game — they’ll probably work out some of those kinks and figure that piece out.

The first time you do it you’ll find those bottlenecks. And it’s true about any compute system where you have an order of magnitude more [to figure out]. They obviously have shows that have streamed more, but they’re spread across more time. So it’s this single spike up where everybody comes in a 30-minute window, and if it’s outside of what you planned for … If they planned for — I don’t know what their numbers were — 150 million and they got 180 million, it was outside of what they thought their upper limit was. We’ve seen this before in AWS and we’ve seen this in Amazon. The first time we did Prime Day we probably had issues across that too, of just people hitting the website and other things. So the first time you do events like this, it’s a learning process.

I think it’s probably overstating it to say that they had to re-architect the whole internet, but it is that key spike where a lot of applications are just not … Particularly when you own the infrastructure, and this is one of the benefits of the cloud, by the way, is you get to ride on the law of large numbers where any one spike doesn’t overwhelm everything else. Netflix obviously has a huge number of customers, and I guess that they’ll be much more prepared for next time. But it’s a good learning experience for anybody even at a much smaller scale. When you’re planning an event that has the potential to be materially more than your average baseline, there are always risks that there are some scaling factors you don’t anticipate.

So it’s not a surprising problem to me. We’ve seen it over and over again and it’s one of those problems that the cloud helps to solve. But even in the cloud, planning is required and you have to think about how you scale ahead of it, and things like that.

When you were at home watching the fight, did your pager go off?

I was texting back and forth to our support team to make sure we were supporting the Netflix team as much as possible, yes.

How often does that happen to you as you use the internet and you think, “Boy, this is probably running on AWS. I had better make sure it’s going fast?”

More back in the day when we were scaling and learning — back in 2007 and 2008 where we were learning how to scale there. Today, we’re often at a broad scale and so everything, lots of things on the internet and around the world, run on AWS. And we usually run pretty reliably, so it comes up less than it used to, for sure.

Do you have Down Detector bookmarked on your laptop?

I don’t, no.

We’ve got to get the CEO of Down Detector on the show. That is a fascinating service across the board.

Let me ask the Decoder questions because I think this theme of “we are going to be more reliant on cloud infrastructure for compute in the world of AI,” and that’s got to reach all the people and hopefully make everybody some money and generate some useful products and services — that’s the theme. And I think whether or not we can stream people punching each other, and whether or not we can stream AI, the problems there are the same in the general sense.

But I want to ask the Decoder questions first so I can understand how you are solving those problems, having been at AWS for so long. So you are taking over for Adam who was on about just a little over a year ago. He stepped down about six months ago, you took over. You’ve been there a long time. You started as the first product manager of AWS, which is a pretty wild place to begin a career and end up as a CEO. How are you thinking about AWS, the organization, right now?

There are a couple of things that I’m thinking about. One, I have been here for 18 years, so I’ve been fortunate to learn a lot of the different parts of the business and have seen it from the early days until where we are now. Over 18 years we’ve grown to be a $110 billion business growing at 19 percent, so that’s great, and we’re just at the early stages of what that business can be. I’m pushing the teams to consistently think about how we innovate faster. How do we think bigger? And how do we support our customers?

As we think about the potential of AWS being a $200 billion, $300 billion, $500 billion business, or whatever size it gets to, we want to continuously think: What are the organizational structures? What are the mechanisms we use? What are the ways that we supported customers, which worked to get us to $100 billion, and may not work at $200 or $300 billion?

Some of that is just thinking about how we scale those aspects. And how do we think about supporting customers in a great way? How do we think about scaling our services in a great way? How do we think about continuously innovating across many different paths? And as you think about it, we have to really innovate along our core — the thing that got us here around compute, databases, storage, and networking. But we also have to innovate around AI, around some higher-level capabilities, and analytics.

We also have to innovate around helping customers who might be less technically savvy, so they can take advantage of the cloud. They may not be at Netflix-level sophistication, which is obviously a very sophisticated technology team, but want to take advantage of some of the cloud capabilities. I think we’re continuing to think about how we keep pushing that envelope to help more and more customers take advantage of what we have.

One of the things that I spend a lot of time thinking about is: how we organize so that our teams don’t lose agility and speed as we get bigger. That’s some of what I’m thinking about, and it’s nothing that’s broken today. Instead, it’s kind of like looking around corners to see when the business is twice as big as it is today, how do we make sure that we continue to execute and run as fast as possible?

Can I ask about that piece of the puzzle? Where does the next new customer come from?

Sure.

When you started at AWS they were all new customers. Now, most huge companies at least have an idea of what they might do with the cloud, whether they’re using AWS or something else. We have a lot of CEOs who come on here and say, “Look, I need to have multiple clouds so that I can go do rate negotiations with all of them.” Fine. 

There is a new class of companies that assumes they don’t need any software support. They’re just going to hire a bunch of software as a service (SaaS) vendors, and they’ll run their business and use the SaaS products however they want to use them. And it seems very unlikely that they will become AWS customers themselves because they’ve outsourced a bunch of business functionality to a bunch of other software vendors. I’m just wondering if that’s a new class of potential customer, right? That kind of business didn’t exist until recently.

It’s true, and I think that there’s probably subtlety there. So I’ll take a couple of those, one at a time. Number one, we do have a lot of large customers that are running in AWS in the cloud today, and a huge number of them still have massive amounts of their estate on-premise. And so there’s a huge amount of growth available there. You can even take our largest customers, many of them only have 10, 20, 30, or 40 percent of their workloads in the cloud. There’s a massive amount of growth just helping them get to 70 or 80 percent, or whatever that number is going to be, and don’t even presume you get to a hundred. There’s a huge amount of business there.

I also think there’s a huge amount of business available with customers that only have one percent, or rounding to zero, of their estate in the cloud because they’re still running on-premise workloads, whether it’s IT or core business pieces. Some of it is running in data centers. Some of that is workloads that haven’t moved to a cloud world yet. Think telco networks, broadly. Most telco networks still run in traditional telco networks. There are a handful of customers, like the Dish networks of the world, who have thought about and have moved to building in the cloud. Since they got to start from zero, and have built it in the cloud, they get the benefits of that agility — but most haven’t.

Think about all of the compute that happens in a hospital today. It’s mostly in the hospital. And they’re just examples of where there’s an enormous amount of compute that could take advantage of these broad-scale cloud systems that haven’t yet moved there. So there’s a huge amount of potential in those additional businesses. There’s also just, as you think about new customers, every single year there are a huge number of startups that are created from scratch and they all start in the cloud too. There’s still lots of greenfield opportunity for us.

I think your observation about companies leaning more into SaaS is super interesting and it’s why they’re such a focus for us. It’s why we focus on deep partnerships. How do we make sure that AWS is the best place to run SAP, it’s the best place to run Workday, it’s the best place to run ServiceNow, it’s the best place to run … Keep going down the list. And so, those SaaS independent software vendors (ISVs) have always been a really important customer base for us.

And increasingly, you see us build capabilities that make AWS even more powerful for SaaS vendors. At re:Invent, we announced a capability called Q Business Index where you can have all of your SaaS data pulled together into a single index that’s owned and controlled by the enterprise, but you can share across SaaS products. I think you’ll see more things like that where we can help customers not just say, “Okay, my data’s in a bunch of these SaaS islands and I can’t get benefits across them.”

I don’t think customers won’t be an AWS customer, because they’re still going to have a data lake of their own data, they’re still going to have their own applications, they’re still going to run their own websites. There are other things that customers are still going to want to do. And so I think more of their applications will be in SaaS as opposed to self-managed software, for sure. It’s hard to imagine many customers that won’t have their own compute storage database needs also.

When Adam was on the show, I asked him, “What’s the point of the airport ads? Who doesn’t know about AWS?” And his answer basically tracked with what you’re saying. There are still a lot of customers who we need to get thinking about moving to the cloud, and that’s why there are Thursday Night Football ads.

Is that your answer? When you get off the plane and you see the AWS logo, you’re like, “I’m going to get that guy?”

I mean, look, you can make that argument for lots of ads. Like, who doesn’t know that Coca-Cola exists? But you still see Coca-Cola ads. And so some of it is keeping it top of mind. Some of it is also … If you think about the advertising that we do together with some of the sports networks — whether it’s NFL, F1, or others — a lot of what that does is to help connect the dots. You may know that AWS exists, but helping see that in a context that you understand, which is football, F1, Bundesliga, or whatever the sport is, and how we’re helping do analytics for that sport, is one of those things that helps customers connect the dots.

And so, it’s not just an ad that says, “Hey, AWS exists,” but it is connecting those dots that says, “Okay, if we’re able to do analytics that can see how fast a football player can run, or see what the chance is that an F1 car can pass,” it helps customers just connect the dots as to where we might be able to help their business too. It also opens the door for us to do that next deep dive where we can dive in and understand that. And we find that that connection point is quite valuable even if people know that AWS exists already.

I do love the idea of some CEO coming to you and saying, “I need a win probability meter for my team every minute of the day in real time.”

That’s great.

Let me ask you about telco for one second. Just because telecommunications has long been a particular fascination of mine. Dish started from scratch. They announced loudly that they were going to use AWS as their cloud provider, that they wanted to do all the compute they needed for 5G and all that stuff to run that network in the cloud. Compare and contrast that to the other telcos. 

When Verizon was launching 5G, for example, they told me that they were going to build a competitor to AWS because they needed the compute at the edge to run the network anyway. And they said they might as well just sell the excess capacity in their data centers to customers and say it would have a lower latency, or whatever you get from being very much at the edge. Did that pan out? Or are you saying, “Okay, that didn’t work, and I can go conquer those customers now. I can go get Verizon or AT&T or whoever else on the network?”

Well, Verizon was a little bit different. It was a partnership with us where we were talking about potentially selling some of that compute space together at the edge. I think that technology is probably a little bit ahead, and I still think that there’s an interesting eventual win there. But I think that the idea was a little bit ahead of the technology of really low-latency compute at the edge, mostly because a lot of that latency was taken up in the network, and so it’s hard to get that benefit of a small latency gap.

Look, if you go back 15 years, many companies were thinking that they would just go offer the cloud. It looked like it was easy. And then they said, “Oh, it’s just a hosting thing. I have a data center. I can sell that.” I think most companies today, outside of the handful of three or four companies that are really in the space, don’t think that they can provide a real cloud offering. It’s hard.

There are niche offerings in particular slices, but I think increasingly we view this as a partnership opportunity where we can add value together. So, I think our partnership with Verizon is great. We look at how we can add value together, and over time we’d love for more of the broader network. Because if you look globally, you’re starting to see other telcos start to lean into this model of, “Okay, maybe more of the core can be run in AWS” … Then maybe that part is, “Okay, that can be run in central data centers,” and so we’re starting to see more core. And then you think about, “Can the radio access network (RAN) be run in AWS? Maybe. Yeah, it can.” And they’re starting to see that piece in there.

I think it will be a transition over time. But I do think that as we add more value and show that we can give programmability to their networks, scale to the networks, and show benefits on patching and other things like that where there’s a lot more flexibility there — I think you’ll see more and more telcos leaning into to cloud-based place deployments.

I’m sure your partners at the traditional telco companies appreciate your support in the retconning of their promises around 5G. You’re doing great. 

There’s a real split here. I hope people can hear it. We’re talking about still trying to get customers to come use cloud services. Step one: move some of your compute out of the basement of the hospital and into the cloud. And a lot of companies aren’t there yet, and it seems like you perceive that there’s still opportunity there. 

Then we’re going to, in a minute, we’re going to talk about AI, which is the absolute cutting edge of, “How do we even run these companies? What do these computers even do? How does the cost work out?” How are you structuring the organization to deal with that split? “Don’t have your own servers in the basement?” versus, “Turn your decision-making over to some agentic AI system that we’re going to run for you.”

Well, in some ways it’s a much stronger carrot. If the pitch is, “Hey, run the exact same thing that you’re doing, but do it a little bit more efficiently and a little bit less expensively,” that is less of a value proposition than if you can do something that hasn’t been possible before. And so, I think that’s why many of the workloads that you’ve seen move to the cloud already are the super scalable ones, or the ones where they need lots of compute, or the ones where they have a really large footprint because they see the wins are enormous for those types of customers. For a server running in the basement of a hospital, maybe they can save a little bit of money, or maybe they can save a little bit of IT work or whatever, but the value proposition may not be there unless we can really deliver a lot of value.

You’re not going to be able to get a lot of the value that’s promised from AI from a server running in your basement, it’s just not possible. The technology won’t be there, the hardware won’t be there, the models won’t live there, et cetera. And so, in many ways, I think it’s a tailwind to that cloud migration because we see with customers, forget proof of concepts … You can run a proof of concept anywhere. I think the world has proven over the last couple of years you can run lots and lots and lots of proof of concepts, but as soon as you start to think about production, and integrating into your production data, you need that data in the cloud so the models can interact with it and you can have it as part of your system.

And I do think that that is going to be a tailwind over the next couple of years as people want to have these agentic systems. They want to have their data in a secure environment but integrated into an AI workflow. You can’t orchestrate an AI workflow pointing it on a mainframe. It’s not going to be possible. If you have the data going back and forth to some model, the security and control of making sure that that intellectual property (IP) stays with you is risky too.

But if you move the whole data into a secure cloud environment, you’ll have a modern data lake that has all your data. Your application will work there, you’ll be colocated with where the model, all the controls, and guardrails can run, and you can have a retrieval augmented generation (RAG) index that’s nearby to take advantage of all that data — that’s when you can really start integrating it into your production applications. And that’s where you’re going to see a lot of the really meaningful wins, not just kind of a cool, “Hey, that’s neat that I can have a chatbot,” but really integrate it into how your workflows change and how you can do business changes.

I have seen early signs that, to your question about organization, they’re very complementary. It’s not A or B, it is all pushing in the same place. So we’ll have to have different capabilities, we’ll have to have different motions to help all of that. But I do think that that move of getting your data into a cloud world is kind of a necessary condition to have a really, really successful, deeply integrated AI, I think, into your business processes.

So this leads right into the classic Decoder question: How is AWS structured now? What’s the org chart?

What do you mean? So say more about that. Just what is our org structure?

Yeah. How have you structured AWS? I mean you’re new, so I imagine you might change it, but how is it structured right now, and how are you thinking about changing it?

Well, I will say that an org structure, number one, is a living thing. So whatever I tell you today may not be true tomorrow, and I think you have to be agile there. But broadly, how we think about structuring our teams, I think, is pretty well documented in the industry around Amazon. We want single-threaded teams that can focus on a particular problem and move fast. And so what that means is you really want a team who can own a problem and not be matrixed across 10 different things where they have to coordinate a bunch.

In some ways, I think about it like a big monolithic computer program — it’s very efficient as long as that monolithic computer program is small. And as it gets bigger and you have multiple people working on that program, then you get a mainframe, and it’s very slow and you can’t iterate on it or move fast.

So what you do is decouple and build services that talk to each other through well-defined APIs. And then you continue to decouple those programs, you continue to refactor. That’s how to build modern technology systems. And you can think about containers as the current way of doing that, which are small, independently running systems that can talk to each other through APIs.

Now, if you think about org structure, it’s not that dissimilar from that. If you think about how do you have teams that can run really fast? There is going to be coordination, but what you want to do is minimize that coordination tax as much as possible. And so, if you have a well-defined API between them, which is like, “I build a service over here, you build a service over here,” we can innovate. Occasionally our teams will get together and make sure that we broadly know what our vision is. We want to know what the thing is that we’re running towards. But then I can go and my service, my organization, or my feature, can run independently and not have to have coordination.

High level, if the Amazon Elastic Compute Cloud (EC2) team and the Amazon Simple Storage Service (S3) team had to talk every time they were going to launch a feature to make sure it worked together, we would move really, really slow. But we don’t, and so the teams can move really fast.

Then we make sure we have … It’s kind of part of the leadership and the product leadership team to get together and say, “Okay, we think going after this space is super important. And some of that is customers are going onto this use case, and so broadly we’re going to have to go after this thing,” but we can still then have the teams go out and run fast. That is an organizing principle that … And then there are other parts of the organization where we have teams that run kind of the data centers and other global, and some of those are our separate teams. But if you think about the product and organizing around the product and technology, that’s how we think about it.

This question is always bait for Amazon executives in particular because Amazon executives are raised in a culture to think exactly in this way and describe the company as a series of microservices. But how is AWS structured?

Just like that. I mean, even more so than Amazon.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments