Loading...

Buy Us A Coffee

tech

The Cloud Is Fragile: Why Companies Are Deleting AWS

tech

The Cloud Is Fragile: Why Companies Are Deleting AWS

Is centralized cloud infrastructure becoming too fragile and expensive for the next era of computing?

Key Takeaways

1

The recent 15-hour AWS outage was caused by a chaotic "race condition" in the DynamoDB DNS management system, revealing critical digital fragility.

2

Companies like 37signals are saving millions annually—$3.4M at peak—by moving off hyperscalers to self-hosted infrastructure for greater control.

3

Oracle is funding its AI infrastructure expansion with a record $38 billion debt deal, signaling that compute capacity is the "new safe asset."

Total Duration

37 minutes 32 seconds

5 chapters

Currently Playing

Chapter 1 of 5

Digest Info

Published10/28/2025

Category

tech

Chapters5

This Week in Tech

Ch. 1

2m 50s

This Week in Tech

The Chaotic Race Condition That Crashed AWS

Experts detail the technical root cause of the recent 15-hour AWS outage: a chaotic 'race condition' within the DynamoDB DNS management system. This failure, where two automated processes conflicted, highlights the inherent fragility and single points of failure in our increasingly cloud-dependent digital world.

“The three biggest countries where reports originated were the U.S., the U.K., and Germany.”

warnings

market_analysis

0:00

uh so how many show of hands how many of you for instance had an eight sleep bed that wouldn't recline because of the Amazon power or outage we learned a lesson didn't we this week uh about how much we are relying on AWS well most especially East one because it's the original ah so you know

0:22

Yeah, so I didn't notice it that much.

0:23

When was the outage?

0:24

It was 15 hours.

0:26

It was a long outage according to ours.

0:31

Do we know when it was?

0:32

It was... Ookla said the Down Detector Service received more than 17 million reports of disrupted services offered by 3,500 organizations.

0:43

The three biggest countries where reports originated were the U.S., the U.K., and Germany.

0:49

Snapchat was down.

0:52

Roblox was down.

0:54

And AWS...

0:56

was down, of course, because that was an AWS outage.

1:01

Amazon now says the root cause, it's always DNS, isn't it?

1:06

If it's not BGP, it's DNS.

1:07

This was a weird database, right?

1:09

I thought it was Bravo Database or something like that.

1:12

It was DynamoDB.

1:14

Dynamo, yeah.

1:16

But it was two different instances of the DNS writer running at once because the first one ran too slowly, so it kicked off another one, and then the old one overwrote the stuff from the new one.

1:28

They had what is famously known in computer science as a race condition.

1:32

which is one of my favorite errors.

1:34

First of all, because you can't predict it.

1:37

It's kind of chaotic.

1:40

It's when two threads, two processes race each other and it's unpredictable who's going to win the race.

1:48

So the settings are conflicting.

1:52

And a race condition is a notoriously difficult thing to discover because unless you're really careful about locking each of the threads, it can happen.

2:02

So the Dynamo DBS DNS management system, which is used for the load balancing,

2:10

periodically creates, it's an automated system, new DNS configurations.

2:14

This is from Ars Technica based on reporting from AWS.

2:17

Creates new DNS configurations for endpoints within AWS like AWS East because two of them tried to do the same thing.

2:28

The race condition resided in the DNS enactor

2:36

It went cuckoo is the technical term.

2:41

The old instance erased the new entries and the new instance erased the older entries.

2:45

No, no.

2:45

There's no entries.

2:47

It's after you, Alphonse.

2:48

No after you, Alphonse.

2:49

No after you, Alphonse.

2:50

And nothing ever...

All-in with Chamath, Jason, Sacks & Friedburg

Ch. 2

19m 59s

All-in with Chamath, Jason, Sacks & Friedburg

Cloud Fragility Accelerates the Multi-Cloud Shift

The AWS outage is analyzed as a beneficial event for competitors like Microsoft Azure and Google Cloud, accelerating the inevitable shift toward multi-cloud dependency among large enterprises. Experts predict that risk management and public company disclosures will eventually drive the three major hyperscalers toward an equal market share split.

“The outage that happened this week, I think starts to highlight for folks that they can't and shouldn't have a dependency on a single cloud service provider.”

market_analysis

business_strategy

0:00

Let's talk about this Amazon outage.

0:01

Tough week for Amazon.

0:03

They had this huge outage in the beginning of the week, and then they had a bunch of leaked documents about their plans for...

0:10

Jobs.

0:11

And Monday, massive AWS outage.

0:14

2,000 companies, 4 million users unable to function on the internet for half a day, 15 hours, 20 hours.

0:20

And then on Tuesday, internal docs viewed by the New York Times showed Amazon plans to not hire 600,000 planned jobs because of robots by 2033.

0:34

So this isn't they're planning on laying off 600,000 workers, but rather they're just pulling back their hiring plans and ramping up their robotic plans, which you would expect.

0:45

And their goal, according to these internal leaked documents, is to automate 75% of warehouse operations.

0:51

We talked about this the last couple of weeks.

0:53

Freberg, your thoughts on either of these two stories here?

0:57

I think the AWS story...

0:59

It's interesting in terms of its implications for the clouds.

1:02

There's effectively three major cloud vendors that compete with one another, AWS, Microsoft, and GCP or Google Cloud.

1:09

And I'll just give you these numbers.

1:10

Oracle also, by the way, coming on strong.

1:12

That's right.

1:13

But let's exclude the number four for now, Oracle.

1:15

But AWS has $124 billion revenue run rate.

1:20

Microsoft, 120 billion, and Google Cloud, 54 billion.

1:24

But AWS, which is slightly larger than Microsoft, is only growing 17% year over year.

1:30

Microsoft's 26% year over year, and Google Cloud is accelerating at 32% year over year, and some say getting closer to 40% growth rate.

1:39

The big thing I hear from partners and enterprise customers of these cloud services is that many of them, if not all of them, as they scale up, move to a multi-cloud model.

1:50

So none of them want to be dependent on a single cloud.

1:53

Many folks started on AWS because AWS was the OG.

1:57

Back in the day, when I was running Climate Corp, I was the largest EC2 user on AWS for about a year and a half.

2:03

which was their Elastic Compute Cloud Service.

2:05

We were running all these models back then.

2:06

So I knew that service very early on and it was very unique, it was very powerful.

2:10

And so a lot of companies that are old school established themselves in AWS very early on.

2:15

But the outage that happened this week, I think starts to highlight for folks that they can't and shouldn't have a dependency on a single cloud service provider and will only accelerate the diversification of companies into the other clouds.

2:28

And so I do think this is actually a very beneficial situation for Microsoft and GCP and TierPoint, JCal, perhaps even Oracle, in terms of giving those sales teams, which are very aggressive, a hard story to go and sell for and say, guys, you don't want to just sit on AWS in case this happens again.

2:45

We've got better infrastructure.

2:47

We're more reliable, et cetera, than these other guys.

2:50

So come and move over to us.

2:51

That might be a little bit of a naive, simplistic, kind of reductive way to think about what happened this week.

2:56

But we are seeing the smaller competitors accelerate.

2:59

And I think that this might be another kind of moment of acceleration for those folks.

3:03

And multi-cloud, it's been around for a while, Chamath, when you're doing stuff with 8090.

3:08

Are the big companies already doing that?

3:11

Or do they assume, hey, there's going to be some downtime?

3:14

Yeah, it's okay to risk, or are they really thinking multi-cloud, neo-cloud, let's have some smart, intelligent routing and redundancy here?

3:23

I think there are two markets.

3:24

There's the AI market, then there's the non-AI market.

3:28

In the non-AI market, everybody has everything.

3:32

It all looks effectively the same.

3:34

There's certain products and services that are unique to Azure versus GCP versus...

3:40

AWS, but by and large, the market is big enough and important enough that you'd have to be pretty insane to take a single vendor approach.

3:53

And so what typically happens in these markets is that they start off really small.

3:58

One person has all the share.

4:00

And then as the market becomes very valuable and very big, everybody diversifies because it's a risk management thing.

4:06

And these things flow into the disclosures you have to make as a public company.

4:10

And if you didn't have that diversification and something bad happened and it impacted your business, you could get sued.

4:16

So there's all these reasons why eventually all these three big companies will converge effectively roughly a third, a third, a third.

4:24

We're going to debate the path to get there, but that's where they'll end up.

4:26

You know, there's this principle called the rule of three, where they say like all markets eventually mature to kind of a 60-30-10 split.

4:34

that you end up having your market leader at 60% market share.

4:38

Second place is usually half the size at 30.

4:40

And then you always, there's some balance in the market where there's some competitor that resolved to about a 10%.

4:46

It's really interesting.

4:47

If you guys were to place a bet, who would you think is the 60, 30, 10?

4:50

I don't think that applies to you.

4:51

I think that's bullshit.

4:52

You think they're going to be a third, a third, a third?

4:54

I think it's all some idiot making something up.

4:57

But what do you think happens in cloud?

4:59

Like, do you think that these all converge to equal market share?

5:02

In non-AI, it's a third, a third, a third.

5:04

It will.

5:05

It'll take circuitous paths, but that's where we'll end up.

5:07

By the way, a good point to make is that this revenue number that I highlighted for Google Cloud, Microsoft, and Amazon actually include their applications.

5:15

So as you know, like Microsoft, GCP have pretty sizable enterprise application stacks that are built into that number, which gives them obviously the ability to drive cloud usage because they've got demand and sales relationships into those enterprises.

5:28

I think the way it works in AI is that you initially, right now we're in this early phase where there's two paths.

5:34

Path one is you need a specific model and it's relatively well integrated using a specific subsidized form of hardware on one of the hyperscalers.

5:43

But eventually you'll get more of that abstracted away as it gets pushed into the infrastructure so that you have less dependence on one model.

5:52

There's a lot of work that has to get done and a lot of in-memory

5:57

infrastructure that is not yet built that has to exist.

5:59

But once that exists, it'll be easier for all of us at the application level to view these models a little bit more fungibly.

6:07

And then at the bleeding edge, you'll have the folks that basically give you some form of a hypervisor or virtual machine or the bare metal.

6:15

And that's where the neo scalers are doing really well.

6:17

But I think my point is that in any important market,

6:23

in compute, in technology, where there really isn't much of a differentiation, I think you'll end up with these hyperscalers at a third, a third, a third.

6:32

Now, if one model is way, way better, and it's only on one of the clouds because Google writes a big check or Amazon writes a big check, I could see that swaying the AI share, but in the absence of that,

6:45

I think cheaper, faster, better is sort of the end destination for everybody.

6:49

What an extraordinary outcome for Amazon, where AWS is like 15% of their revenue right now, Freeberg, but it's 60% of their profits today.

6:59

And that was just a side hustle, like a little project they took out of nowhere, and it's having the same impact on Google and other places.

7:06

So side bets and side quests are just... You look at the Waymo side quest for Google, or even a lot of Sergey's other bets, like...

7:14

And Larry, flying cars, looms, low-earth satellites, Google Fiber, all those X projects had so much potential.

7:23

GPU, DeepMind, TensorFlow, GFS.

7:27

Robotics.

7:28

It's pretty crazy.

7:29

Boston Robotics.

7:30

They bought all those robotics companies.

7:31

Man, it's like somebody got to them and were like, yeah, you're seven, eight years into this.

7:36

It didn't happen.

7:37

The problem that Google has, unfortunately, is they have so much stuff.

7:41

Mm-hmm.

7:42

it's not really valued.

7:44

And so they're going to go through the same problem that everybody else who's a conglomerate has, which is this decision.

7:51

Now, Buffett, when he got to that decision said, I don't care, this is my life's work.

7:55

And so I'm just going to keep everything aggregated.

7:59

But now you're going to get to this thing where the intrinsic value

8:04

of everything they have will far exceed the actual value that it trades at.

8:09

And so there'll always be these fissures of pressure.

8:12

And then if one of these things requires a lot of money, there'll be pressure.

8:16

And that pressure will be segregate these things so that I can own one versus the other.

8:21

And that's always the thing that happens in public markets is you kind of swing back and forth.

8:25

So I suspect that this is gonna happen at Google.

8:28

This was what they set up to do with Alphabet was to be the holding company

8:32

And then to your point, they made that evolution, particularly in a company like Waymo, where they said, we can't be the sole funder.

8:38

They brought in Silver Lake.

8:39

They brought in all these other investors.

8:40

They did this actually with Verily.

8:42

They did this with a bunch of these, what they call other bets, is they made the conscious decision.

8:46

Because Chamath, on the flip side,

8:48

by bringing in outside capital and having an independent board for these subsidiaries, they were actually able to drive better outcomes because now there was governance and there was aligned interests that could then take management and say, guys, if you can deliver these results, and you have this kind of external pressure as opposed to the softness.

9:06

But it's that, it's something else.

9:07

There's no way somebody as smart as Silverlake comes in if they think there's not a path to liquidity.

9:12

So the other thing they have to promise is, they're like, listen, we will take this company public and in return,

9:18

you will help us build a better company than we could build ourselves.

9:21

Well, it seems that Silver Lake has done their part of the bargain.

9:25

Now it's up to Google to live up to their part of the bargain because if it doesn't get liquid, it sets a very bad precedent for everybody that committed capital into that company.

9:33

Of course, yeah.

9:34

Waymo going public would be unbelievable next year, man.

9:37

If they did that, what would that look like in the public markets?

9:40

$250 billion?

9:41

No.

9:42

$100 billion?

9:43

Take it easy.

9:44

Stop.

9:44

Don't do that.

9:46

You don't think so?

9:47

I think it'd be huge.

9:48

Jason...

9:50

We all objected to talking yet again about AI-driven job loss, yet you insisted on putting this AI robot story from Amazon in.

9:58

I think you have something to say.

10:00

Thanks.

10:01

Let me take you through a presentation.

10:04

Well done.

10:05

You have slides?

10:07

What is this?

10:08

I'm working on a presentation based on a lot of stuff we've been talking about here.

10:11

I threaded it together.

10:12

We were just talking about Google and the size of the company.

10:16

Right now, they are in 2025 at 187,000.

10:17

They were at 190,000 people in 2022.

10:19

And their revenue has just gone from 283 to 350 billion in basically three years.

10:22

And when you look at this Amazon stuff that came out, I just wanted to point out a couple of things.

10:35

It's not just that they're not hiring these 600,000 jobs.

10:38

It's that they are in full-blown crisis preparation for this.

10:42

They have crisis teams writing up how to handle this and be a good corporate citizen.

10:48

And they're talking about having parades and paying for toys for tots.

10:53

And they're even trying to get the executives to say things like co-bots as opposed to robots.

10:58

Let's not call them that.

10:59

Let's call them co-workers and co-bots.

11:02

And when you look at this, just to open up the aperture here,

11:06

Right now, Walmart and Amazon are the number one and two employers in the US.

11:11

2.1 million people work at Walmart, over a million at Amazon, and 3 million people, as we know, work in taxis, Uber, door dashers.

11:17

All those jobs are at risk.

11:19

And we talked about this back in June when Andy Jassy telegraphed all this in a blog post where he said, the next few years, we expect that this will reduce our total corporate workforce as we get efficiency gains from using AI extensively across the company.

11:34

They believe

11:36

that they're going to have significant job displacement.

11:42

Let's just use the more neutral term here as opposed to job loss or not hiring.

11:46

And when you look, I don't know if you saw it today, there were a bunch of MAGA people saying like, oh, these interlopers in the MAGA movement are not taking into account the bottom half of the MAGA movement, the workers, people who don't own equities.

11:58

And when we look at electricity,

12:02

You were on that story last week, Chamath, or maybe it was even two weeks ago now.

12:07

The energy department just said electricity costs for residential are going to go up 4.8% this winter.

12:14

And this is going to start this anti-AI boom.

12:19

counter.

12:20

And I tweeted about this.

12:22

And I thought I would maybe end here with Elon replied to my tweet and said, AI and robotics replace all jobs.

12:28

Working will be optional, like growing your own vegetables instead of buying them from the store.

12:33

And Senator Bernie Sanders came out and said, I don't often agree with Elon Musk, but I fear that he may be right when he says AI and robotics will replace all jobs.

12:41

So what happens to workers when they have no jobs or income?

12:43

AI and robotics must benefit all humanity and not just billionaires.

12:48

And I'll stop there.

12:50

Because this, I think, feeds into your story for the last two years on this podcast, Freeberg, which is the rise of socialism.

12:58

These things, and Bernie Sanders being the standard bearer for democratic socialism, these things are starting to come together.

13:04

They're starting in people's minds, whether it's the original MAGA guy saying, well, what's going to happen for American workers, right?

13:10

We know that the Trump 2.0 agenda...

13:12

is doing great, AI build out crypto, all this great stuff, trade.

13:17

But the bottom half that you keep talking about, Freeberg, is starting to connect on this issue.

13:24

I think that you are characterizing AI automation and technological progress as the core driver of the socialists

13:36

influence.

13:37

And what I would argue is that the actual core driver of the socialist influence is the fact that we put in place a lot of people into government passed a lot of laws that caused an increase in spending because we promised people that the government would do more for them over the last 40 years.

13:52

That is not possible in a true market based system.

13:55

Oh, I agree with that.

13:56

Yeah, I agree.

13:57

And so by telling everyone, hey, we're going to make sure you get better jobs, we're going to make sure you all get housing, we're going to make sure you get education, you cannot actually get a government to effectively do that.

14:07

Because what ends up happening is the government inflates the cost of those things, and the market doesn't actually work.

14:12

So the truth is, this is now, like all other things, a scapegoat for the true cause of the socialist movement, which is that government has become too big, too unwieldy, and its natural inefficiency has distorted markets to the point that there is maybe no point of return anymore.

14:30

And people will not see that.

14:31

They do not see it.

14:32

And they're going to look for reasons.

14:33

They're going to look for scapegoats.

14:34

And they're going to say, oh, my God, look over there.

14:36

There's a robot.

14:37

That's the reason I'm losing my job.

14:39

Oh my God, look over there.

14:41

There's a rich person that works at a pharmaceutical company.

14:43

That's the reason I can't get healthcare.

14:45

Or an immigrant took my job, right?

14:46

Is the one from the last 20 years.

14:48

And so fundamentally, I think that people aren't willing to, and they're not going to see the true cause because there's no one that runs to go work as a politician that is going to raise their hand and say, government is the problem.

15:00

No one says, I need to reduce government, elect me.

15:03

No one ever has gotten elected in a democracy doing that.

15:06

So the natural course of things over 250 years is that people raise their hand and they say, I'm gonna give you more and I'm gonna use the government to do it.

15:13

And then they go into the government, they make the government bigger.

15:15

And as a result of making the government bigger, the government is spending more, the dollar goes down, the performance of the services goes down, and fundamentally, we end up in a socialist spiral.

15:24

I think it's confirmation bias for you to see that story as confirming a point of view.

15:28

I mean, it confirms what I predicted last year, that Amazon would be cutting all these jobs for robots.

15:34

That's all.

15:35

It's not confirmation bias.

15:36

It's confirming my prediction.

15:37

They haven't cut one job.

15:38

They haven't cut one job.

15:41

Actually, they have less employees now than they did three years ago.

15:43

No, not true.

15:44

Yep.

15:46

It's actually not true.

15:46

The New York Times story doesn't even say that.

15:48

You've got these like hobby horses where you keep coming back to the job loss narrative, the copyright narrative.

15:53

And then there's one story in the New York Times, which was a leaked internal document from the automation department, which doesn't even mean that it's going to happen.

16:02

This is like their sales pitch.

16:05

The barber is trying to sell you a haircut.

16:08

And you read that and you're like, oh, it confirms everything I've been saying.

16:11

What the article actually says is that they've tripled their number of employees since 2018 and they're not planning on cutting jobs if it pans out.

16:19

If the program pans out, then the rate of hiring will simply be slower.

16:23

Yeah, it's interesting you picked 2018 as the point because the actual peak employment there was 1.6 million in 2021, and it's now 1.55 in 2025.

16:32

I didn't pick that cherry pick.

16:35

It has actually been flat to down, which is fine.

16:39

I'm quoting the New York Times article, which is the source for this.

16:42

Yeah, yeah.

16:42

Amazon's US workforce has more than tripled since 2018 to almost 1.2 million.

16:48

You have to read these New York Times stories carefully because they want to make the headline as salacious as possible.

16:53

And then the echo chamber wants to make it even more salacious.

16:57

And they make it a story about job loss when it really is a story about operating leverage in their business, which is a slightly more nuanced take.

17:05

Yeah, no, there's definitely nuance here.

17:07

I would believe Andy Jassy when he says we're going to be reducing jobs.

17:10

And when this chart shows that they're flat to down over the last five years and that that same trend is just happening at Google, like I just showed, because there is a static team size or slightly down team size that's occurring at all these companies.

17:21

And it is notable.

17:22

And then on top of this, which has occurred in the review mirror for the past five years because of COVID return to office and efficiencies.

17:29

They're saying, hey, we've got to come up with a way to frame these robots coming into the factory as a good thing.

17:34

So Americans don't get really upset at us and we need to buy more toys for tots.

17:38

So here's the problem.

17:40

First of all, I don't I don't believe in this job loss narrative as the way that you keep portraying it.

17:44

I think it's much more nuanced and complicated.

17:46

I think Freeberg does, too.

17:49

And every time there's a story, you want to bring it up and make a story of the week.

17:53

And it's all confirmation bias.

17:55

And my point is not that Amazon isn't seeking ways to improve its operating leverage and avoid hiring more people.

18:03

Obviously they are, but the headlines that this has been turned into are so exaggerated and salacious.

18:10

And the point is they don't say in this article that they are even going to be cutting jobs.

18:16

They're simply planning to double their sales volume over this time period and hoping to not have to double their workforce.

18:23

Obviously they want to get a lot more operating leverage.

18:25

By the way, this is not something that started since AI.

18:28

And look, I'm just quoting the New York Times story.

18:32

which is not even the most reliable narrator for this.

18:34

But what they say in the story is that Amazon's been using automation for over a decade when they acquired a major company to do automation.

18:42

They've had robots running around these factories for a long time.

18:45

Yeah, 100%, yeah.

18:46

Exactly.

18:47

They're the tip of the spear.

18:49

But this is just a continuation of a trend that's been going on for the last decade, as opposed to, oh, like AI is suddenly going to cut all the jobs.

18:56

Right.

18:56

It's effectively software.

18:57

You could argue software is a job loss creator.

19:00

I think you'd be underestimating exactly what's happened with LLMs being put into robots.

19:04

We've had these robots before, but they were very purpose built.

19:06

As you pointed out many times, Friedberg, they were able to do like one very simple thing very well.

19:12

Now we're going into general robotics, like the Optimus, like the figure.

19:15

And those are designed to be able to learn anything.

19:19

And they're going to be absolutely a game changer.

19:22

They're going to be able to do a hundred times, a thousand times what the purpose-built robots do.

19:27

So I think that's where we're probably having a little bit of a disconnect here.

19:31

These little tiny Kiva bots, I'll show you, I'll just put an image in here so we have it.

19:35

These do one thing, the Kiva bots.

19:38

Those move packages around.

19:39

That's not an optimist going around and packing the boxes and bringing them to your first step.

19:43

Optimist is going to be really cool.

19:45

And when it comes, it's going to be really interesting in terms of all the things it can do.

19:50

But right now, that's a narrative for the future.

19:53

And it's being portrayed as something that's already happening when the current round of automation has been going on for a decade.

This Week in Startups

Ch. 3

4m 31s

This Week in Startups

37signals Deletes AWS to Save Millions

37signals co-founder David Heinemeier Hansson (DHH) details their successful strategy of moving off cloud providers like AWS and Google Cloud to self-host. This move resulted in peak annual savings of $3.4 million, highlighting a critical trade-off for founders between the speed of cloud adoption and the long-term financial control of owning infrastructure.

“We were paying AWS $3.4 million. We've taken that money, invested it into our own gear and we own it ourselves and we're not down today.”

lessons

market_analysis

0:00

And one of our crack researchers will pull up this video that was just shared on Twitter, now x.com, of...

0:08

David Handmeyer Hanson, DHH, which is his Twitter handle as well, x.com slash DHH.

0:14

Really smart cat, friend of the pod.

0:15

We got to have him back on soon.

0:17

DHH just did an analysis and he open sourced it and told everybody of how much money they're saving by getting off of other clouds.

0:26

I don't know if they're Azure or AWS, but play this video and show the chart when we have it.

0:33

This is extraordinary.

0:35

I think this is like main character energy.

0:37

Some of my producers, I think we miss this main character energy.

0:41

But DHH.

0:43

I think I had the right clip right here, Jason.

0:45

Let me get this set up for us.

0:47

By the way, great six-hour conversation between Lex Friedman and David back in the day.

0:53

Oh, the clip that I have is him standing up talking.

0:55

No, that's the one I'm talking about.

0:57

I was just also giving a promo to my friend Lex Friedman.

1:00

All right, here is DHH talking about AWS spend.

1:03

We were paying AWS $3.4 million.

1:07

We've taken that money, invested it into our own gear and we own it ourselves and we're not down today.

1:14

It was not a major crisis when AWS was down and now we can nuke it.

1:20

So let's do it.

1:20

Ready?

1:22

Let's push delete.

1:24

They literally are deleting their AWS instance.

1:27

That is such good TV.

1:29

Yeah.

1:32

He shared the table of how much he spent year after year on cloud computing, and he estimated that they are probably saving, like I said, a million or two million a year, who knows?

1:40

Let's just say if they're saving a million dollars a year over 10 years, it's $10 million to the bottom line.

1:45

It does add complexity.

1:48

there is a back and forth that all founders go through.

1:51

Should we stand up infrastructure?

1:53

Should we use other people's infrastructure?

1:55

You can go faster when you use other people's infrastructure because you don't have to take on that responsibility and build out the team when you do have unlimited resources.

2:03

And, you know, the great example of this would be Elon with XAI.

2:08

And Elon knows more about factories and physical production of items than anybody on the planet with the possible...

2:16

exception or co-leader of Flexport maybe, right?

2:20

Not Flexport.

2:21

Yeah, Flexport produces- Ryan Peterson?

2:25

No, no, that's Flexport.

2:27

Flextronics, are they the ones who the iPhone goes-

2:31

They subcontract too.

2:32

Anyway, there's China- Singaporean American multinational manufacturing company that does end-to-end advanced manufacturing, Flextronics?

2:38

Yes, Flextronics.

2:39

So Flextronics are the people who build the iPhones.

2:42

They're based in Singapore, like some companies like TikTok are based in Singapore, but they're Chinese companies.

2:49

But if you look at this chart,

2:52

They have their AWS and Google bill in the total.

2:54

And at their peak, they're spending $3.4 million in 2019.

3:00

The total they spent between AWS and Google from 2017 to 2025, $21 million.

3:08

And so there you have it, folks.

3:10

You could potentially save a lot of money doing this, but you will be slowed down.

3:14

So you've got to be thoughtful about it.

3:16

That extra million dollars they spent a year, let's say, probably well worth it if you're building, you know, I think they probably have 30, 40, 50 million dollars in revenue, 37 signals.

3:26

So if they got 50 billion in revenue, you don't sweat the million dollars in expense if you're trying to go fast and you've got competitors.

3:32

However, at some point, you may want to look at it and optimize.

3:35

This is why Oracle...

3:37

is being so aggressive.

3:38

Oracle's going to people saying, give us your cloud bill, whatever your cloud bill is, we're going to cut it in half.

3:44

And so I just, I know some of the people at Oracle and they are aggressively, like Google Cloud and Azure were very aggressive last couple of years, trying to increase their growth rate and compare it to AWS.

3:57

All of them are growing.

3:58

They're all incredible businesses.

3:59

Well, that's because the cloud is just doing insanely awesome things.

4:02

I mean, dear God, can you imagine?

4:04

What happened when AWS region went down?

4:07

The entire world ground to a halt.

4:09

That's how much we depend on these guys.

4:10

And that's when you can be multi-cloud.

4:11

You can just go multi-cloud at some point and distribute a cloud and you can move from one to the other.

4:18

You can use your own internal infrastructure for some things that are cheaper.

4:22

And you can have hybrid cloud, right?

4:24

So there's a lot of options here.

4:25

Very important for founders, I think, to not take this on early.

4:29

I don't think it's an advantage in the first three or four years at all.

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Ch. 4

2m 53s

The AI Breakdown: Daily Artificial Intelligence News and Discussions

Oracle's $38 Billion Bet: Compute as the New Safe Asset

Oracle secured a record $38 billion debt deal to fund two massive AI data centers for OpenAI, highlighting the intense financialization of compute infrastructure. This deal, structured with four-year maturities and high interest rates, signals that private credit markets view AI-driven demand and data center cash flows as the new stable, safe assets.

“A consortium of banks are putting together a $38 billion debt deal, which will be the largest financing deal to date for AI Infra.”

trends

business_strategy

0:00

We kick off today with an update in the AI infrastructure space.

0:08

Bloomberg reports a consortium of banks are putting together a $38 billion debt deal, which will be the largest financing deal to date for AI Infra.

0:16

The deal is split into two tranches, $23.25 billion associated with a project in Texas, and a $14.75 billion package that will fund a data center in Wisconsin.

0:26

Both data centers are being developed by Vantage Data Centers and will be operated by Oracle to provide compute for OpenAI.

0:32

The institutions underwriting the deal include a laundry list of the world's largest banks, including JP Morgan, Wells Fargo, Goldman Sachs, Societe General, Mitsubishi, UFJ.

0:41

The banks will sell the debt onto high net worth clients, private credit firms and pension funds.

0:46

Now, right now, data center debt is red hot, so there will likely be no shortage of buyers to snatch up the record breaking deal.

0:52

Earlier this month, Meta closed a $27 billion deal with PIMCO as the major buyer, and that debt surged once it started trading on public markets, making PIMCO $2 billion in paper gains.

1:02

The deal also gives us some insight into how data center financing is being structured.

1:07

Both tranches have four-year maturities with two one-year extension options.

1:10

Sources said they're priced at 2.5 percentage points above the benchmark, so likely between 6.5 and 7% interest rates.

1:17

Now, there are a lot of interpretations of this.

1:19

Of course, the people who want to see an AI bubble say, OMG, look at the size of that debt deal.

1:23

Debt is coming in.

1:24

That must mean it's bad.

1:26

There's a real tyranny of big numbers for those folks.

1:28

On the other hand, are people who realize that at least right now, there is enormous demand in the markets for this sort of debt.

1:35

Private credit just has a voracious appetite for this.

1:37

And so to them, in short, this is fine.

1:39

Endgame Macro notes that it is part and parcel of a larger paradigm shift.

1:43

They write, This is about the new arms race in AI infrastructure and who can lock in the physical and financial foundation of that ecosystem first.

1:50

Data centers have become the modern versions of oil fields.

1:53

Whoever controls the power, cooling, and fiber capacities controls the economy that runs on them.

1:57

Oracle's $38 billion debt sale is an attempt to seize that ground before it gets fully priced out.

2:02

They continue, Oracle is using leverage to buy its way to the front of the line, converting future AI workloads into guaranteed bond-financeable cash flows.

2:09

It's turning the data center business into a quasi-utility model with stable, contracted revenue in exchange for enormous upfront capex.

2:16

They also point out something else important.

2:18

There's a deeper macro signal here too.

2:20

While the government is issuing hundreds of billions in treasuries, private credit markets are happily absorbing corporate infrastructure debt like this.

2:26

That tells you investors are betting that AI-driven demand will hold up, even if the broader economy slows.

2:32

That the cash flows from training and hosting large models are the new safe assets.

2:36

Ultimately, that means, they write, the deal represents two overlapping forces, the financialization of compute and the monopolization of digital energy.

2:43

Oracle is trying to own the pipes and power sources that the next economy will run through.

2:47

And by funding it with record debt, it's making a massive bet that AI demand will become the backbone of global economic growth itself.

This Week in Startups

Ch. 5

7m 15s

This Week in Startups

The Anthropic-Google Compute Paradox

Anthropic, creator of the Claude LLM, struck a deal worth billions to purchase up to 1 million of Google's proprietary TPUs, despite Google's Gemini being a direct competitor. This paradoxical partnership reveals that in the current AI arms race, the immediate need for scarce, specialized compute power overrides traditional competitive rivalry.

“The company has tapped Google for an enormous compute purchase. They're going to get up to 1 million of Google's TPUs in a deal worth up to tens of billions of dollars.”

technology

predictions

0:00

The big news this week on the anthropic front, Jason, is that the company has tapped Google for an enormous compute purchase.

0:07

They're going to get up to 1 million of Google's TPUs in a deal worth up to tens of

0:12

billions of dollars, and it's going to bring on more than a gigawatt of capacity to the company in 2026.

0:20

Now, reading this news, I know a lot of folks out there might not know what a TPU is.

0:25

So I went ahead and did a little prep work and asked producer Claude, what is that?

0:29

Well, it's an ASIC or an application specific integrated circuit.

0:33

that is essentially designed just for these machine learning workloads.

0:38

Basically, they're designed to do a lot of mathematical operations very, very quickly.

0:43

And that's what powers the neural nets that we all love and know.

0:47

And just for a little background here, they've been working on the TPU since I think 2013, something around there.

0:52

So they've been in the game for a long time.

0:54

If you want to use Claude like we do, you can go to claude.ai slash twist.

0:58

It gets started.

0:58

We have a 50% discount for three months.

1:00

But Jason, essentially Google TPUs are better.

1:02

They're more efficient.

1:03

They have lower costs.

1:04

And they're tailored for this kind of work.

1:06

But more to the point, I'm shocked that Anthropic is kind of putting aside its long-term deal with Amazon, its preferred partner for both training and inference, to go rack up with Google.

1:16

I was curious what you thought about why they made this choice.

1:20

I think any...

1:24

Any compute advantage these large language models can get, they're going to take.

1:31

What's fascinating about this one is that Gemini, Google, the Gemini large language model, Anthropic, and Claude, they're direct competitors.

1:44

So this is perplexing, interesting, confounding, strange bedfellows.

1:52

Yeah.

1:53

If you're Anthropic and you're giving money and doing this partnership with Google and you're both competing for customers, users, businesses, developers, and APIs, it's kind of interesting.

2:08

And as we say in our industry, no conflict, no interest.

2:12

What this signals to me is the Anthropic team and the Google team are collaborating.

2:18

they're in like with each other.

2:21

They might not be in love, but they're in like with each other.

2:24

Who else is in like with each other?

2:26

Satya Nadell and OpenAI were in like, were in love with each other.

2:30

I think they've dropped down to in like with each other.

2:32

And I noticed Satya, Nadella, and Elon are in like with each other.

2:37

So, and Nvidia is in love with everybody.

2:41

Jensen loves everybody equally.

2:43

But this is the place we are.

2:46

The other thing I think is super interesting is amongst the LLM companies, some have money printing machines.

2:56

So if you were to look at the foundation models, you've got Grok, OpenAI, Anthropic, Mistral.

3:05

Yeah, Mistral.

3:05

Mistral.

3:07

I guess I'm including them.

3:12

And you have Llama, what Zuck is working on.

3:16

I think that's probably your big six.

3:18

Am I missing anybody there?

3:20

I mean, we could bring up the Amazon and Microsoft models, but given that you and I always forget the names of those families of models.

3:26

Yeah, I don't see them coming up yet.

3:28

So if we just take those six, look at those six.

3:31

Of those six, which ones have money printing machines?

3:37

Which ones have money printing machines?

3:39

Profitable machines, earnings machines.

3:42

Well, Gemini.

3:43

Google, correct.

3:45

And Meta.

3:46

Okay.

3:47

This puts them at a significant long-term advantage.

3:52

Meta and Google can build infrastructure at a pace...

3:57

that the other four have to go raise money for.

4:01

So let's pause on this for a second.

4:04

Now, one would argue Gemini is probably in third or fourth place typically, and I think Lama is typically in sixth place or fifth or sixth place.

4:14

So the companies at the top are obviously OpenAI, and then I would say Claude and Grok, that's probably your one, two, and three.

4:23

And then your four, five, and six, Mishrel, probably Lama's six, Mishrel's five, and yeah.

4:30

So you can start to look at this, and I wonder if Google and Meta have a huge advantage in that they could start deploying hardware at a scale that the others cannot.

4:43

Now, the others are the bells of the ball right now because they have the best product, and people want to be on their cap table.

4:50

And then this makes me wonder why is Apple with their war chest and Microsoft with their war chest, not participating in this lunacy of building out huge data centers, huge infrastructure.

5:04

I guess Google is, I wonder if Microsoft too, but it's less speculative for them, Jason, because they talked about it in earnings and they've said, we just have the inference demand and we're just building against kind of like price.

5:14

proven bookings.

5:16

So for them, I'm not that concerned.

5:17

But I think that the economy between the money printers and the money needers is very important because when you think about XAI and OpenAI, they are spending tens of billions of dollars on infra.

5:29

Anthropic isn't.

5:31

They're raising money from the major cloud companies.

5:33

Amazon, I think $8 billion.

5:35

Google, about $3 billion.

5:37

And they're getting access to compute.

5:38

So I kind of wonder, and I don't want to sound too nice to anyone here, but is Anthropic kind of nailing this because they're not taking on the infra side quest?

5:49

Yeah, partnerships, I think, are a way to win here.

5:54

If you want to go far, you go together.

5:56

If you want to go fast, you go alone.

5:59

So going alone, I think, will give you a massive speed advantage in the short term.

6:08

Of course, you might be inspiring the value to be

6:12

accreting to your partner, not you.

6:14

And this is the OpenAI Microsoft relationship personified.

6:19

Who's getting the value?

6:21

OpenAI has the branch at GPT that has the largest number of users, but their percentage of market share has been going down consistently.

6:29

Now they're still growing because the pie is getting bigger.

6:32

But if you look at, they were 98% of the market three years ago.

6:37

I don't know where they are now, but I think they're probably 80% of the market now.

6:41

in terms of consumers and developers using this stuff, they're going to keep going down on a percentage basis.

6:46

Just like Amazon Web Services,

6:49

had the game to themselves before Azure and... Quite literally.

6:52

Yeah, literally.

6:52

And before Azure and Google Cloud, Oracle came on strong.

6:57

So now you have four people competing for that, three major and Oracle coming on strong.

7:01

Oracle's really not going to take this lying down.

7:04

They see cloud computing as a big accelerant of their business, obviously.

7:10

So really interesting.

7:11

I do think these kind of partnerships mean more stability and...

Topics

Topics

Key players, companies, and concepts in this digest

AWS

company

Amazon Web Services, the market leader in cloud infrastructure, recently experienced a major 15-hour outage due to a technical flaw.

DynamoDB

concept

A proprietary NoSQL database service from AWS whose DNS management system failure caused the widespread outage via a race condition.

37signals

company

A software company that successfully moved off hyperscale cloud infrastructure to self-hosting, saving millions and gaining reliability.

David Heinemeier Hansson (DHH)

person

Co-founder of 37signals who publicly detailed the multi-million dollar cost savings achieved by leaving AWS and Google Cloud.

Oracle

company

A major cloud competitor aggressively pursuing market share by offering deep discounts and securing record debt deals to fund massive AI infrastructure for partners like OpenAI.

Anthropic

company

A leading AI lab and competitor to Google and OpenAI, forced to rely on Google's proprietary TPUs for billions in compute power due to scarcity.

Google TPUs

technology

Google's proprietary Application Specific Integrated Circuits (ASICs) designed specifically for machine learning workloads, representing a key bottleneck in the AI compute arms race.

Multi-Cloud Strategy

concept

The enterprise strategy of utilizing multiple cloud providers to mitigate risk, which is accelerating rapidly following recent major outages.

AI Infrastructure Debt

concept

The financialization of data center construction through massive debt deals (e.g., Oracle's $38B) where compute capacity is viewed as a stable, utility-like asset.

Ch. 1/5: The Chaotic Race Condition That Crashed AWS

0:00 / 0:00(Total: 37:28)

Speed

About this digest

Release notes

We remix the strongest podcast storytelling into a tight, twice-weekly digest. These notes highlight when this edition shipped and how to reference it.

Published: 10/28/2025
Last updated: 10/28/2025
Category: tech
Chapters: 5
Total listening time: 38 minutes
Keywords: the cloud infrastructure wars: cost, fragility, and compute power