1
00:00:00,040 --> 00:00:02,520
Quick note, this episode isn't 
sponsored. 

2
00:00:02,680 --> 00:00:06,760
I'm building a new kind of IDE 
for the AI error called Rex. 

3
00:00:07,000 --> 00:00:09,440
If it's interesting, the link is
in the description. 

4
00:00:09,720 --> 00:00:12,400
OK, let's unpack this. 
You hear serverless, right? 

5
00:00:12,840 --> 00:00:17,160
And the pitch is it's basically 
magic, right? 

6
00:00:17,200 --> 00:00:19,640
No servers to manage infinite 
scaling. 

7
00:00:19,640 --> 00:00:23,120
And the best part the the part 
that gets everyone to sign up is

8
00:00:23,240 --> 00:00:26,760
you only pay for what you use. 
Sounds, well, perfect. 

9
00:00:26,760 --> 00:00:29,000
It does sound perfect until the 
end of the month rolls around. 

10
00:00:29,120 --> 00:00:31,640
Exactly. 
The magic fades pretty fast when

11
00:00:31,640 --> 00:00:35,240
you open that AWS invoice and 
realize pay for what you use 

12
00:00:35,240 --> 00:00:38,040
actually means pay for every 
millisecond you didn't realize 

13
00:00:38,040 --> 00:00:40,120
you were wasting. 
That is usually how the story 

14
00:00:40,120 --> 00:00:42,080
goes. 
You start with this, this dream 

15
00:00:42,080 --> 00:00:45,800
of efficiency, and you end up 
with a bill that makes your CFO 

16
00:00:45,800 --> 00:00:49,720
want to have a very serious, 
very uncomfortable conversation 

17
00:00:49,720 --> 00:00:52,400
with you. 
So that is the mission for this 

18
00:00:52,400 --> 00:00:54,640
deep dive. 
We are strictly looking at the 

19
00:00:54,640 --> 00:00:56,240
bottom line. 
Today we've pulled together a 

20
00:00:56,240 --> 00:00:59,600
stack of technical reports, the 
latest AWS pricing guides, and 

21
00:00:59,600 --> 00:01:03,040
some really rigorous 
architectural analysis from 

22
00:01:03,040 --> 00:01:05,560
experts like Cloud Zero, Edge 
Delta, and Lumigo. 

23
00:01:05,960 --> 00:01:09,360
We want to stop guessing how AWS
Lambda charges work and start 

24
00:01:09,360 --> 00:01:13,280
engineering them to be cheaper. 
And to do that, you have to 

25
00:01:13,280 --> 00:01:16,800
adopt A specific mindset. 
It's not just about writing code

26
00:01:16,800 --> 00:01:18,520
anymore. 
It's about understanding the 

27
00:01:18,520 --> 00:01:21,840
machinery underneath the code. 
It's about, you know, looking at

28
00:01:21,840 --> 00:01:25,240
your logs, seeing where the 
money is bleeding out and and 

29
00:01:25,240 --> 00:01:29,440
sometimes realizing the best way
to save money on Lambda is to 

30
00:01:29,440 --> 00:01:32,400
not use Lambda at. 
All I love that the best Lambda 

31
00:01:32,400 --> 00:01:35,320
is no Lambda. 
But OK, before we get to the 

32
00:01:35,320 --> 00:01:37,040
philosophy, we have to start 
with the paradox. 

33
00:01:37,280 --> 00:01:39,920
There is this core tension in 
the research we looked at. 

34
00:01:40,560 --> 00:01:44,160
It's this idea that making your 
function smaller, you know, 

35
00:01:44,160 --> 00:01:47,120
giving it less memory, should 
logically save you money. 

36
00:01:47,400 --> 00:01:49,800
But we're finding that might 
actually double your bill. 

37
00:01:49,800 --> 00:01:52,760
It's a classic trap. 
It is intuitively correct, less 

38
00:01:52,760 --> 00:01:57,080
resources equals less money. 
But in the world of Lambda it is

39
00:01:57,760 --> 00:02:00,040
technically wrong. 
We are going to dig into that 

40
00:02:00,040 --> 00:02:02,320
because it kind of breaks my 
brain a little bit, but first we

41
00:02:02,320 --> 00:02:05,160
have to eat our vegetables. 
We need to look at the billing 

42
00:02:05,160 --> 00:02:08,520
equation itself because if you 
don't understand the unit of 

43
00:02:08,520 --> 00:02:10,680
measurement, you can't optimize 
it. 

44
00:02:10,680 --> 00:02:11,760
You. 
Have to know the rules of the 

45
00:02:11,760 --> 00:02:14,240
game to win it. 
So lay it out for me when I look

46
00:02:14,240 --> 00:02:16,800
at that Bill, what are the 
actual levers? 

47
00:02:17,080 --> 00:02:18,920
There are essentially 2 main 
levers. 

48
00:02:19,000 --> 00:02:22,120
First, you've got requests. 
This is just a flat fee. 

49
00:02:22,120 --> 00:02:23,680
OK? 
Every time your Lambda wakes up 

50
00:02:23,680 --> 00:02:26,160
to do something, whether it 
succeeds fails. 

51
00:02:26,160 --> 00:02:28,160
Times out. 
That's one request. 

52
00:02:28,680 --> 00:02:33,080
Currently the going rate is like
$0.20 per million request. 

53
00:02:33,080 --> 00:02:35,640
Which, to be honest, sounds 
incredibly cheap. 

54
00:02:35,920 --> 00:02:38,320
I mean, $0.20 for a million in 
vacations? 

55
00:02:38,320 --> 00:02:39,920
I feel like I could just ignore 
that. 

56
00:02:39,960 --> 00:02:42,680
For many people you can. 
I mean, unless you're operating 

57
00:02:42,680 --> 00:02:48,080
at massive massive scale like ad
Tech Levels or something, that 

58
00:02:48,080 --> 00:02:50,120
request cost is often just 
noise. 

59
00:02:50,240 --> 00:02:52,840
So where's the real money? 
The real money, the place where 

60
00:02:52,840 --> 00:02:56,000
budgets go to die, is the second
lever duration. 

61
00:02:56,080 --> 00:02:57,560
The time the code is actually 
running. 

62
00:02:57,800 --> 00:02:59,960
Precisely. 
But here is where it gets a 

63
00:02:59,960 --> 00:03:02,600
little nuanced. 
You aren't just paying for time.

64
00:03:02,600 --> 00:03:05,480
You don't ay per second. 
You are paying for a compound 

65
00:03:05,480 --> 00:03:08,720
unit called GB seconds. 
OK, GB seconds unpack that for 

66
00:03:08,720 --> 00:03:10,640
me. 
It's basically a multiplication 

67
00:03:10,640 --> 00:03:12,640
game. 
You pay for the amount of memory

68
00:03:12,640 --> 00:03:15,840
you allocated to the function. 
Multiply by how long the code 

69
00:03:15,840 --> 00:03:17,680
runs. 
O let's say you have a function 

70
00:03:17,680 --> 00:03:20,840
configured with 1 gig of RAM. 
OK, if you run that for one 

71
00:03:20,840 --> 00:03:23,040
second, the price is, let's call
it X. 

72
00:03:23,600 --> 00:03:25,920
But if you configure that 
function with two gigs of RAM 

73
00:03:26,080 --> 00:03:29,120
and run it for that same second,
the price is 2X. 

74
00:03:29,120 --> 00:03:32,800
So you're paying for the size of
the container and the time it 

75
00:03:32,800 --> 00:03:33,920
exists. 
Exactly. 

76
00:03:33,920 --> 00:03:38,280
It's volume times duration, and 
there's a crucial detail here 

77
00:03:38,280 --> 00:03:42,800
that changed fairly recently. 
It used to be that AWS rounded 

78
00:03:42,800 --> 00:03:46,040
your duration up to the nearest 
100 milliseconds. 

79
00:03:46,040 --> 00:03:48,920
Oh right, I remember this. 
So if my code finished in like 

80
00:03:49,280 --> 00:03:51,600
12 milliseconds, I was paying 
for 100. 

81
00:03:51,600 --> 00:03:54,400
You paid for 100. 
If it ran for 101 milliseconds, 

82
00:03:54,400 --> 00:03:56,800
you paid for 200. 
There was so much waste. 

83
00:03:57,040 --> 00:03:59,080
That feels like the old mobile 
phone plans. 

84
00:03:59,080 --> 00:04:02,480
Were you paid by the minute? 
It was exactly like that, but 

85
00:04:02,480 --> 00:04:05,560
now billing is rounded to the 
nearest single millisecond. 

86
00:04:05,760 --> 00:04:08,360
Wow, so that changes the 
incentive structure completely? 

87
00:04:08,440 --> 00:04:10,880
It does. 
It means micro optimizations 

88
00:04:10,880 --> 00:04:14,680
actually pay off now. 
Before shaving 20 milliseconds 

89
00:04:14,680 --> 00:04:16,959
off, your code was just vanity, 
didn't change the bill. 

90
00:04:17,120 --> 00:04:20,920
Now it's direct savings. 
Every millisecond you cut is 

91
00:04:20,920 --> 00:04:25,000
money kept in your pocket. 
However, nothing is ever purely 

92
00:04:25,000 --> 00:04:27,680
good news, is it? 
There was a scary note in that 

93
00:04:27,680 --> 00:04:31,960
Edge Delta report about a new 
tax involving the initialization

94
00:04:31,960 --> 00:04:34,480
phase. 
Yes, the init billing change 

95
00:04:34,760 --> 00:04:36,880
this kicked in around August 
2025. 

96
00:04:36,920 --> 00:04:38,920
Right, so walk me through the 
life cycle. 

97
00:04:38,920 --> 00:04:40,600
A request comes in. 
What happens the? 

98
00:04:40,600 --> 00:04:42,760
Lambda life cycle has two 
distinct parts. 

99
00:04:42,760 --> 00:04:45,040
First you have the in it. 
This is the cold start. 

100
00:04:45,040 --> 00:04:48,920
OK, the container has to spin 
up, the OS loads, it downloads 

101
00:04:48,920 --> 00:04:50,400
your code, and it starts the 
runtime. 

102
00:04:50,840 --> 00:04:53,800
Then once that's ready, you 
enter the invoke phase where 

103
00:04:53,800 --> 00:04:55,440
your actual handler function 
runs. 

104
00:04:55,920 --> 00:04:58,600
For the longest time AWS didn't 
charge for that first part 

105
00:04:58,600 --> 00:05:00,320
right? 
The in it was on the House it. 

106
00:05:00,320 --> 00:05:03,240
Was a free lunch and developers 
loved it, especially those using

107
00:05:03,240 --> 00:05:06,680
heavy languages like Java or C#.
You could have a massive Spring 

108
00:05:06,680 --> 00:05:10,160
boot application that took 5 or 
6 seconds just to wake up and 

109
00:05:10,320 --> 00:05:12,480
AWS just ate that cost. 
But that's gone now. 

110
00:05:12,760 --> 00:05:14,600
Gone. 
You now pay for the 

111
00:05:14,600 --> 00:05:17,200
initialization phase too. 
So if you're running a heavy 

112
00:05:17,200 --> 00:05:20,480
Java app that takes 5 seconds to
wake up, yeah, you are paying 

113
00:05:20,480 --> 00:05:23,840
for those 5 GB seconds every 
single time a cold start 

114
00:05:23,840 --> 00:05:24,680
happens. 
Ouch. 

115
00:05:25,160 --> 00:05:27,280
So looking at your logs becomes 
critical here. 

116
00:05:27,280 --> 00:05:30,480
You need to see how long that 
init phase is actually taking. 

117
00:05:30,560 --> 00:05:32,960
Exactly. 
If you check your Cloudwatch 

118
00:05:32,960 --> 00:05:36,240
logs and see a massive spike in 
duration during cold starts, you

119
00:05:36,240 --> 00:05:38,320
have a problem. 
You might need to look at 

120
00:05:38,320 --> 00:05:40,360
mitigation strategies like 
Snapstar. 

121
00:05:40,640 --> 00:05:42,720
Which is like a memory snapshot.
Right, right. 

122
00:05:42,720 --> 00:05:44,880
It resumes instantly. 
Or you could use provision 

123
00:05:44,880 --> 00:05:46,840
concurrency. 
That's where you pay to keep 

124
00:05:46,840 --> 00:05:48,480
them warm. 
You can, yes. 

125
00:05:48,920 --> 00:05:51,440
That's basically paying AWS to 
keep a certain number of 

126
00:05:51,440 --> 00:05:53,120
environments warm and ready to 
go. 

127
00:05:53,200 --> 00:05:56,000
That sounds like the solution. 
Just pay to keep the lights on. 

128
00:05:56,120 --> 00:06:00,120
It solves the latency issues, 
sure, but be very, very careful.

129
00:06:00,520 --> 00:06:04,160
You are effectively moving from 
a serverless paper use model 

130
00:06:04,440 --> 00:06:07,040
back to a server model where 
you're paying per hour. 

131
00:06:07,800 --> 00:06:10,160
So if your traffic drops at 
night, but you're paying for 

132
00:06:10,160 --> 00:06:12,680
provision concurrency, you're 
just burning cash. 

133
00:06:12,680 --> 00:06:14,960
So it's a double edged sword. 
OK, so we have the billing 

134
00:06:14,960 --> 00:06:16,840
basics down. 
Request counts are cheap, 

135
00:06:16,840 --> 00:06:20,160
duration is where the pain is, 
and we're paying for init time 

136
00:06:20,160 --> 00:06:21,520
now. 
That's the foundation. 

137
00:06:21,680 --> 00:06:24,080
Now I want to circle back to 
that paradox you teased at the 

138
00:06:24,080 --> 00:06:27,880
beginning. 
The aha moment regarding memory.

139
00:06:27,880 --> 00:06:30,560
The memory duration paradox. 
This is where most people get 

140
00:06:30,560 --> 00:06:33,040
Lambda optimization completely 
wrong. 

141
00:06:33,040 --> 00:06:35,040
So play devil's advocate with 
me. 

142
00:06:35,160 --> 00:06:37,240
I'm a developer. 
I'm looking at the console. 

143
00:06:37,240 --> 00:06:41,320
I see a slider for memory. 
Logic tells me I'm paying for GB

144
00:06:41,320 --> 00:06:44,200
seconds. 
If I cut my memory from 1 gig to

145
00:06:44,200 --> 00:06:47,440
512, I'm paying half the rate 
per second, therefore I save 

146
00:06:47,440 --> 00:06:49,600
50%. 
Why is that wrong? 

147
00:06:49,640 --> 00:06:54,040
It's wrong because in AWS Lambda
you cannot decouple memory from 

148
00:06:54,040 --> 00:06:56,200
CPU. 
Memory equals power. 

149
00:06:56,400 --> 00:06:59,080
Memory equals power. 
When you slide that memory 

150
00:06:59,080 --> 00:07:01,640
toggle in the console, you 
aren't just giving the function 

151
00:07:01,640 --> 00:07:05,160
more RAM to store variables, you
are proportionally allocating 

152
00:07:05,160 --> 00:07:07,200
more CPU cycles and network 
bandwidth. 

153
00:07:07,360 --> 00:07:10,640
Oh interesting, so 128 megabyte 
function isn't just small 

154
00:07:10,640 --> 00:07:14,720
memory, it's weak CPU. 
It is incredibly weak. 

155
00:07:14,800 --> 00:07:17,400
You are getting a tiny sliver of
a processor. 

156
00:07:17,840 --> 00:07:21,480
In fact, the sources highlight a
very specific magic number you 

157
00:07:21,480 --> 00:07:26,240
want to remember. 
It's roughly 1769 millibyte. 

158
00:07:26,440 --> 00:07:30,400
That is a suspiciously specific 
number, 1769. 

159
00:07:30,520 --> 00:07:33,320
It is at around 1.8 gigs of 
memory. 

160
00:07:34,120 --> 00:07:37,920
AWS allocates your function the 
equivalent of 1 full VCPU. 

161
00:07:37,920 --> 00:07:40,280
And below that. 
Below that you are fighting for 

162
00:07:40,280 --> 00:07:44,240
fractional CPU time. 
At 128 millibyte you might only 

163
00:07:44,240 --> 00:07:48,480
be getting like 7% of a core. 
So let's play that out in a real

164
00:07:48,480 --> 00:07:51,080
scenario. 
Say I have ACPU intensive task 

165
00:07:51,080 --> 00:07:54,440
like resizing an image or 
parsing a massive Jason file. 

166
00:07:54,720 --> 00:07:56,800
If I put that on 128 millib 
setting. 

167
00:07:56,840 --> 00:07:59,240
It's going to struggle, yeah. 
It might take 10 seconds to 

168
00:07:59,240 --> 00:08:01,400
process because the CPU is just 
totally bottlenecked. 

169
00:08:01,400 --> 00:08:03,880
You're paying a low rate, sure, 
but you're paying it for 10 long

170
00:08:03,880 --> 00:08:04,400
seconds. 
What if? 

171
00:08:04,400 --> 00:08:05,840
I bump that memory up to two 
gigs. 

172
00:08:05,840 --> 00:08:09,320
Then you have a full VCPU. 
That same task might finish in 

173
00:08:09,320 --> 00:08:11,720
what, 200 milliseconds? 
Wait, so let me do the mental 

174
00:08:11,720 --> 00:08:13,240
math. 
The rate per second went up 

175
00:08:13,760 --> 00:08:16,920
maybe 10 or 15 times, but the 
duration dropped by like 50 

176
00:08:16,920 --> 00:08:18,040
times. 
Exactly. 

177
00:08:18,080 --> 00:08:20,840
You are paying a higher rate for
a much shorter time. 

178
00:08:21,240 --> 00:08:24,880
When you multiply it out the 
total cost, the GB seconds is 

179
00:08:24,880 --> 00:08:27,240
actually lower. 
At the higher memory setting you

180
00:08:27,240 --> 00:08:28,720
get the job done faster and 
cheaper. 

181
00:08:28,880 --> 00:08:32,120
That is wild. 
It completely flips the cloud 

182
00:08:32,120 --> 00:08:35,360
optimization mindset of turn it 
off or turn it down. 

183
00:08:35,480 --> 00:08:37,360
It does, yeah, but there is a 
catch. 

184
00:08:37,720 --> 00:08:41,919
This works for CPU bound tasks. 
If your function is just, you 

185
00:08:41,919 --> 00:08:45,680
know, waiting for a database to 
respond, adding more CPU doesn't

186
00:08:45,680 --> 00:08:47,760
make the database faster. 
Right, you're just paying more 

187
00:08:47,760 --> 00:08:49,440
to wait. 
You're just paying more to wait 

188
00:08:49,440 --> 00:08:50,400
so. 
How do I know? 

189
00:08:50,400 --> 00:08:51,920
I mean, I can look at the logs, 
right? 

190
00:08:52,000 --> 00:08:55,000
Yes, start with your logs. 
Look for two things. 

191
00:08:55,240 --> 00:08:59,240
First, look at Max memory used. 
If you provision 2 gigs but your

192
00:08:59,240 --> 00:09:02,880
function only ever uses 100 megs
you might be over provisioned. 

193
00:09:02,880 --> 00:09:05,120
But again, remember the CPU link
right? 

194
00:09:05,360 --> 00:09:09,200
Second, look for time outs. 
If you see your function timing 

195
00:09:09,200 --> 00:09:11,880
out at the lower memory 
settings, it means it's just not

196
00:09:11,880 --> 00:09:13,520
powerful enough to finish the 
job. 

197
00:09:13,960 --> 00:09:17,920
You are paying for the execution
up to the time out and getting 0

198
00:09:17,920 --> 00:09:21,320
value from it. 
So do I just guess let's try 1 

199
00:09:21,320 --> 00:09:23,080
1/2 gigs today and see what 
happens? 

200
00:09:23,560 --> 00:09:26,520
Please do not guess you'll go 
crazy trying to manually 

201
00:09:26,520 --> 00:09:29,680
benchmark this. 
There's a fantastic open source 

202
00:09:29,680 --> 00:09:34,720
tool mentioned in the reports 
called AWS Lambda Ower Tuning. 

203
00:09:34,880 --> 00:09:36,880
Ower tuning? 
Is that an AW service? 

204
00:09:36,880 --> 00:09:40,160
It's a community tool, but it 
uses AW step functions. 

205
00:09:40,520 --> 00:09:42,840
You deployed it into your 
account, you oint it at your 

206
00:09:42,840 --> 00:09:45,960
function and you tell it. 
Test this function at 128 

207
00:09:45,960 --> 00:09:51,800
million beat 2565121GB and 2GB. 
And it just runs them all. 

208
00:09:51,800 --> 00:09:54,840
It runs them all in parallel, 
measures the execution time and 

209
00:09:54,840 --> 00:09:57,360
the cost for each, and then this
is the best part. 

210
00:09:57,720 --> 00:10:01,000
It plots a curve on a graph. 
You can visually see exactly 

211
00:10:01,000 --> 00:10:03,800
where the sweet spot is. 
That seems like a no brainer. 

212
00:10:03,800 --> 00:10:06,280
Every engineering team should be
running that as part of their 

213
00:10:06,280 --> 00:10:08,160
CICD pipeline. 
Absolutely. 

214
00:10:08,160 --> 00:10:11,320
It takes 5 minutes to set up and
it can save you 2030% on your 

215
00:10:11,320 --> 00:10:13,600
bill permanently. 
Speaking of easy wins, we have 

216
00:10:13,600 --> 00:10:16,560
to talk about a hard The reports
mentioned that if you're running

217
00:10:16,560 --> 00:10:19,280
on the default settings, you're 
probably on old hardware. 

218
00:10:19,280 --> 00:10:22,200
Likely yes. 
By default many functions still 

219
00:10:22,200 --> 00:10:25,040
run on by 86 architecture. 
You know, think Intel 

220
00:10:25,040 --> 00:10:28,720
processors, but AWS has their 
own custom silicon called 

221
00:10:28,720 --> 00:10:30,960
Graviton. 
The ARM based processors. 

222
00:10:31,200 --> 00:10:35,320
Right, Graviton 2 or Graviton 3 
for almost all interpreted 

223
00:10:35,320 --> 00:10:39,200
languages, Python, Node, JS, 
Ruby's Switching to Graviton is 

224
00:10:39,400 --> 00:10:42,280
literally just changing a drop 
down menu in the settings. 

225
00:10:42,400 --> 00:10:45,360
And the benefit? 
Usually about 20% better price 

226
00:10:45,360 --> 00:10:48,400
performance instantly. 
It's cheaper per millisecond and

227
00:10:48,400 --> 00:10:50,520
often faster. 
And no code rewrite. 

228
00:10:50,800 --> 00:10:53,000
For those scripting languages, 
usually 0. 

229
00:10:53,640 --> 00:10:56,640
If you're using compiled 
languages like Rust or Go, you 

230
00:10:56,640 --> 00:10:59,320
do have to recompile for ARM, 
but for most people it's the 

231
00:10:59,320 --> 00:11:00,800
easiest money they'll save all 
year. 

232
00:11:01,040 --> 00:11:04,920
So step one, check your logs and
verify memory with power tuning.

233
00:11:04,920 --> 00:11:08,640
Step 2, switch to Graviton. 
Now let's get into the code 

234
00:11:08,640 --> 00:11:12,000
itself, because you can have the
best hardware in the world, but 

235
00:11:12,000 --> 00:11:14,440
if you write bad code, it's 
still going to cost you. 

236
00:11:14,560 --> 00:11:16,560
True. 
And when we talk about Lambda 

237
00:11:16,560 --> 00:11:18,480
code efficiency, we have to talk
about scope. 

238
00:11:18,600 --> 00:11:20,880
Scope. 
You mean like variable scope 

239
00:11:20,960 --> 00:11:24,080
global versus local? 
Exactly, this is the number one 

240
00:11:24,080 --> 00:11:26,080
coding mistake I see in 
serverless functions. 

241
00:11:26,240 --> 00:11:28,840
Remember we talked about the 
init phase and the invoke phase.

242
00:11:28,920 --> 00:11:31,880
Right init is the startup, 
invoke is the handler running. 

243
00:11:32,040 --> 00:11:35,360
The handler function is what 
runs on every single request, 

244
00:11:35,880 --> 00:11:38,480
but any code you write outside 
the handler function runs only 

245
00:11:38,480 --> 00:11:41,320
once during the init phase. 
OK, so this is about where you 

246
00:11:41,320 --> 00:11:43,880
put your heavy lifting. 
Give me a concrete example. 

247
00:11:43,880 --> 00:11:45,640
Let's say you need to connect to
a database. 

248
00:11:45,880 --> 00:11:49,440
A classic mistake is putting the
DB dot connect line inside the 

249
00:11:49,440 --> 00:11:51,720
handler function. 
I can see why people do that. 

250
00:11:52,040 --> 00:11:54,640
I received a request. 
I need the database let me 

251
00:11:54,640 --> 00:11:57,720
connect. 
Logical, but so expensive. 

252
00:11:58,080 --> 00:12:01,680
If you do that, your code has to
open a new fresh connection to 

253
00:12:01,680 --> 00:12:05,120
the database every single time a
user makes a request. 

254
00:12:05,680 --> 00:12:08,400
That has to do the handshake, 
authenticate, establish the 

255
00:12:08,400 --> 00:12:09,640
socket. 
Which takes time. 

256
00:12:09,640 --> 00:12:12,480
It takes time, latency you pay 
for and it puts massive stress 

257
00:12:12,480 --> 00:12:14,320
on the database. 
So what's the fix? 

258
00:12:14,480 --> 00:12:17,600
You move that connection logic 
to the global scope outside the 

259
00:12:17,600 --> 00:12:19,800
handler. 
You initialize your database 

260
00:12:19,800 --> 00:12:23,960
clients, your AWS SDKS, your 
secrets, all of it at the top of

261
00:12:23,960 --> 00:12:25,560
the file. 
So it runs during the init 

262
00:12:25,560 --> 00:12:29,920
phase. 
Yes, and here's the magic AWS 

263
00:12:29,960 --> 00:12:32,880
reuses containers. 
If a second request comes in 

264
00:12:32,880 --> 00:12:36,600
five seconds later, AWS will 
likely use the same container. 

265
00:12:36,800 --> 00:12:40,160
Your handler runs, but it sees 
the DB client variables already 

266
00:12:40,160 --> 00:12:42,720
populated. 
It skips the connection and goes

267
00:12:42,720 --> 00:12:45,040
straight to business. 
That's the warm start. 

268
00:12:45,200 --> 00:12:47,440
Exactly, you are caching the 
connection. 

269
00:12:47,440 --> 00:12:49,960
It's faster and cheaper. 
But you mentioned stress on the 

270
00:12:49,960 --> 00:12:51,800
database. 
I feel like I've heard horror 

271
00:12:51,800 --> 00:12:55,840
stories about Lambda and 
relational databases like MySQL 

272
00:12:55,840 --> 00:12:58,600
or PostgreSQL specifically. 
Well, the stories are real. 

273
00:12:58,680 --> 00:13:01,840
It's the connection storm. 
See, Lambda scales pretty much 

274
00:13:01,840 --> 00:13:04,640
infinitely if your marketing 
team sends a push notification 

275
00:13:04,840 --> 00:13:07,000
and 10,000 users open your app 
at once. 

276
00:13:07,000 --> 00:13:09,600
Lambda spins up 10,000 
concurrent functions. 

277
00:13:10,080 --> 00:13:12,840
And if each one tries to open a 
connection to my poor little 

278
00:13:12,840 --> 00:13:15,840
PostgreSQL instance, boom. 
The database runs out of memory,

279
00:13:15,960 --> 00:13:19,640
rejects connections and crashes.
And your service goes down and 

280
00:13:19,640 --> 00:13:22,080
you still pay for the 10,000 
lambdas that failed. 

281
00:13:22,280 --> 00:13:24,880
Nightmare scenario. 
So what's the fix? 

282
00:13:24,880 --> 00:13:28,800
Do we just not use relational 
databases with serverless? 

283
00:13:28,960 --> 00:13:31,160
You can, but you need to meet 
mediator. 

284
00:13:31,920 --> 00:13:35,440
That's where RDS Proxy comes in.
It's a managed service that sits

285
00:13:35,440 --> 00:13:38,120
between Lambda and the database.
Like a bouncer. 

286
00:13:38,280 --> 00:13:41,360
Exactly like a bouncer, it pools
the connections. 

287
00:13:41,680 --> 00:13:46,320
So even if 10,000 lambdas wake 
up, RDS proxy might only keep 50

288
00:13:46,320 --> 00:13:49,680
efficient connections open to 
the database and just route the 

289
00:13:49,680 --> 00:13:51,720
traffic through them. 
That's amazing. 

290
00:13:51,720 --> 00:13:54,720
It prevents the database from 
dying and it reduces the time 

291
00:13:54,720 --> 00:13:57,200
your Lambda sits idle waiting 
for a handshake. 

292
00:13:57,400 --> 00:14:00,160
OK, so we've optimized the 
memory, the code scope, 

293
00:14:00,160 --> 00:14:03,320
protected the database, but one 
of the most provocative ideas in

294
00:14:03,320 --> 00:14:06,640
the research was this concept of
Lambda less architectures this. 

295
00:14:06,640 --> 00:14:09,640
Is my favorite part. 
It requires really thinking 

296
00:14:09,640 --> 00:14:12,160
out-of-the-box. 
The cheapest millisecond of 

297
00:14:12,160 --> 00:14:14,200
compute is the one you don't 
use. 

298
00:14:14,440 --> 00:14:17,520
Which sounds like a Riddle. 
It means we often use Lambda as 

299
00:14:17,520 --> 00:14:20,640
glue where we don't need to. 
Like let's say you have an API 

300
00:14:20,640 --> 00:14:23,720
endpoint that just receives a 
contact us form and saves it to 

301
00:14:23,720 --> 00:14:25,880
Dynamodb. 
Sure, standard operating 

302
00:14:25,880 --> 00:14:28,520
procedure. 
API Gateway triggers A Lambda. 

303
00:14:28,520 --> 00:14:31,360
The Lambda parses the JSON, 
maybe validates it and writes it

304
00:14:31,360 --> 00:14:32,920
to Dynamo. 
But why? 

305
00:14:33,320 --> 00:14:36,680
Why pay for compute just to move
data from point A to point B? 

306
00:14:37,240 --> 00:14:40,320
API Gateway is perfectly capable
of writing directly to Dynamo 

307
00:14:40,320 --> 00:14:42,960
DB. 
Wait, really without any code in

308
00:14:42,960 --> 00:14:45,160
the middle? 
Without any Lambda code, you use

309
00:14:45,160 --> 00:14:48,800
something called VTL templates, 
Velocity Template Language or 

310
00:14:48,800 --> 00:14:50,440
newer direct service 
integrations. 

311
00:14:50,760 --> 00:14:53,880
You configure API Gateway to say
take the body of this request 

312
00:14:54,080 --> 00:14:56,000
and put it in this Dynamo DB 
table. 

313
00:14:56,040 --> 00:14:58,240
So you cut the Lambda out 
entirely. 

314
00:14:58,240 --> 00:15:01,440
Completely the request hits API 
Gateway. 

315
00:15:01,640 --> 00:15:04,440
API Gateway talks to the 
database and response. 

316
00:15:04,960 --> 00:15:06,840
You paid $0.00 for Lamb to 
compute. 

317
00:15:07,160 --> 00:15:10,000
You also get lower latency 
because there is no cold start. 

318
00:15:10,120 --> 00:15:13,240
I have to admit though, VTL I've
seen it, it looks a bit nasty. 

319
00:15:13,560 --> 00:15:16,160
It is not developer friendly. 
I will grant you that it has a 

320
00:15:16,160 --> 00:15:20,400
learning curve, but for high 
volume simple endpoints it is 

321
00:15:20,400 --> 00:15:23,440
worth its weight in gold. 
So the mindset shift is, is this

322
00:15:23,440 --> 00:15:26,240
function adding value or is it 
just transport? 

323
00:15:26,240 --> 00:15:27,960
Exactly. 
If it's just transport, delete 

324
00:15:27,960 --> 00:15:30,160
the code. 
I love that now another pattern 

325
00:15:30,160 --> 00:15:33,640
that came up for cost saving is 
using queues, specifically SQS. 

326
00:15:33,960 --> 00:15:35,760
But again, this feels 
counterintuitive. 

327
00:15:35,760 --> 00:15:38,120
I'm adding more infrastructure, 
a queue to save money. 

328
00:15:38,320 --> 00:15:40,520
It sounds wrong, but goes back 
to batching. 

329
00:15:40,520 --> 00:15:43,080
Let's look at the math without a
queue. 

330
00:15:43,600 --> 00:15:47,640
If 1000 requests come in 
simultaneously, you invoke 1000 

331
00:15:47,640 --> 00:15:50,960
lambdas. 
You pay for 1000 in it's 1000 

332
00:15:50,960 --> 00:15:54,200
execution overheads. 
But if you put an SQSQ in the 

333
00:15:54,200 --> 00:15:56,640
middle, the requests pile up in 
the buffer. 

334
00:15:57,320 --> 00:15:59,880
Then Lambda can wake up and say 
give me a batch of messages. 

335
00:15:59,880 --> 00:16:03,120
How big of a batch? 
Up to 10 or even 10,000 with 

336
00:16:03,120 --> 00:16:04,720
some configurations, but let's 
say 100. 

337
00:16:05,160 --> 00:16:09,320
So now you process 100 user 
requests with 1 Lambda 

338
00:16:09,320 --> 00:16:10,840
invocation. 
Oh, I see. 

339
00:16:10,920 --> 00:16:14,880
You amortize that startup cost, 
that expensive in IT and network

340
00:16:14,880 --> 00:16:17,440
setup across 100 records instead
of 1. 

341
00:16:17,440 --> 00:16:21,440
Exactly, it is drastically more 
efficient, but there was a 

342
00:16:21,440 --> 00:16:23,360
historical gotcha. 
Here which was. 

343
00:16:23,480 --> 00:16:26,360
Partial failures. 
Let's say you grab 100 items. 

344
00:16:26,600 --> 00:16:30,240
You process 99 of them 
perfectly, but item number 42 is

345
00:16:30,240 --> 00:16:33,560
malformed and causes an error. 
Does the whole batch fail? 

346
00:16:33,560 --> 00:16:36,960
Do you have to rerun everything?
In the old days, yes, the whole 

347
00:16:36,960 --> 00:16:39,400
batch would fail, the messages 
would go back to the queue and 

348
00:16:39,400 --> 00:16:41,240
you'd reprocess the 99 good ones
again. 

349
00:16:41,720 --> 00:16:43,280
Wasted money. 
That sounds messy. 

350
00:16:43,440 --> 00:16:47,160
But AWS fixed this with a 
feature called Report Batch Item

351
00:16:47,160 --> 00:16:50,440
Failures. 
Basically your Lambda can return

352
00:16:50,440 --> 00:16:55,400
a specific response saying hey I
processed 99 of these perfectly,

353
00:16:55,600 --> 00:16:57,440
but here is the ID of the one 
that failed. 

354
00:16:57,800 --> 00:17:01,240
And the queue handles the rest. 
The queue deletes the successful

355
00:17:01,240 --> 00:17:03,720
ones and only keeps the bad one 
for a retry. 

356
00:17:04,000 --> 00:17:07,839
That is huge so you don't waste 
money reprocessing successful 

357
00:17:07,839 --> 00:17:09,720
work. 
Precisely, And if you want to 

358
00:17:09,720 --> 00:17:12,000
take filtering a step further, 
maybe you don't even want the 

359
00:17:12,000 --> 00:17:14,520
data to reach the queue. 
You should look at Eventbridge 

360
00:17:14,520 --> 00:17:15,720
Pipes. 
Pipes. 

361
00:17:15,920 --> 00:17:18,040
That's a newer service, right? 
Relatively new. 

362
00:17:18,400 --> 00:17:21,240
Imagine you have a stream of 
data coming in, maybe 

363
00:17:21,240 --> 00:17:24,599
transactions, but you only care 
about transactions over $100. 

364
00:17:25,119 --> 00:17:27,240
Historically, you'd trigger a 
Lambda for every single 

365
00:17:27,240 --> 00:17:29,840
transaction. 
Right code to check is amount 

366
00:17:29,840 --> 00:17:33,360
100 and if not just exit. 
But you still paid for the 

367
00:17:33,360 --> 00:17:35,160
invocation just to say no, 
right? 

368
00:17:35,200 --> 00:17:36,640
Exactly. 
You paid to say no. 

369
00:17:37,240 --> 00:17:40,080
With Eventbridge pipes. 
You can put a filter rule before

370
00:17:40,080 --> 00:17:42,040
the Lambda. 
The pipe checks the data 

371
00:17:42,040 --> 00:17:44,240
payload. 
If it's under $100, it drops it 

372
00:17:44,240 --> 00:17:46,600
instantly. 
Your Lambda never wakes up. 

373
00:17:46,760 --> 00:17:48,760
You never pay. 
That connects back to the 

374
00:17:48,760 --> 00:17:52,320
cheapest code is no code idea. 
It really does filter at the 

375
00:17:52,360 --> 00:17:54,680
infrastructure level, not the 
application level. 

376
00:17:54,840 --> 00:17:57,600
OK, I want to pivot to a hidden 
cost that seems to catch 

377
00:17:57,600 --> 00:18:00,800
everyone off guard. 
The sources call it the VPC 

378
00:18:00,800 --> 00:18:02,880
trap. 
Oh, this is a painful one. 

379
00:18:02,960 --> 00:18:06,400
Yeah, I have seen grown 
engineers cry over this bill. 

380
00:18:06,440 --> 00:18:11,360
So the scenario is, I'm a 
responsible engineer, I want 

381
00:18:11,360 --> 00:18:14,960
security, so I put my Lambda 
inside a VPC. 

382
00:18:15,160 --> 00:18:16,600
It feels like the right thing to
do. 

383
00:18:16,600 --> 00:18:17,880
It does. 
It feels secure. 

384
00:18:18,360 --> 00:18:22,200
But here is the catch. 
By default a Lambda inside a 

385
00:18:22,200 --> 00:18:24,440
private VPC cannot talk to the 
Internet. 

386
00:18:25,080 --> 00:18:28,320
It is locked in a padded room. 
OK, but often your code used to 

387
00:18:28,320 --> 00:18:32,520
call a third party API or even 
public AWS services like Dynamo 

388
00:18:32,520 --> 00:18:35,160
DB or S3. 
Yeah, to get out of that private

389
00:18:35,160 --> 00:18:37,400
room, it needs a door. 
And that door is. 

390
00:18:37,400 --> 00:18:39,240
A Nat gateway. 
And I'm guessing Nat gateways 

391
00:18:39,240 --> 00:18:40,400
aren't free. 
Far from it. 

392
00:18:40,400 --> 00:18:42,840
They're one of the most 
expensive line items on many AWS

393
00:18:42,920 --> 00:18:45,120
bills. 
You pay an hourly charge just 

394
00:18:45,120 --> 00:18:48,040
for it to exist. 
Roughly 30 to $40 a month per 

395
00:18:48,040 --> 00:18:50,440
availability zone. 
So for high availability, that's

396
00:18:50,440 --> 00:18:53,440
over 100 bucks a month before 
you even send anything, right? 

397
00:18:54,000 --> 00:18:58,600
But the killer and the data 
processing fee you pay per GB of

398
00:18:58,600 --> 00:19:00,680
data that passes through that 
gateway. 

399
00:19:01,160 --> 00:19:04,520
So if I'm processing a lot of 
data, say downloading images 

400
00:19:04,520 --> 00:19:07,520
from S3 or reading from Dynamo 
DB and I'm routing it through 

401
00:19:07,520 --> 00:19:10,520
this Nat. 
You paying a tax on every single

402
00:19:10,520 --> 00:19:12,520
byte. 
I have seen bills where the 

403
00:19:12,520 --> 00:19:16,480
actual Lambda compute was $50 
and the Nat gateway charges were

404
00:19:16,480 --> 00:19:18,160
500. 
That is brutal. 

405
00:19:18,160 --> 00:19:21,080
So what is the fix? 
Do we take the Lambda out of the

406
00:19:21,080 --> 00:19:23,720
VPC? 
That is the simplest fix if you 

407
00:19:23,720 --> 00:19:25,400
don't strictly need private 
networking. 

408
00:19:25,400 --> 00:19:29,000
If you aren't connecting to a 
private database, just take the 

409
00:19:29,000 --> 00:19:31,920
Lambda out of the VPC. 
It gets public Internet access 

410
00:19:31,920 --> 00:19:34,120
for free. 
But what if I do need the VPC? 

411
00:19:34,120 --> 00:19:35,880
What if corporate security 
requires it? 

412
00:19:35,880 --> 00:19:38,040
Then use VPC endpoints. 
Out of those help. 

413
00:19:38,120 --> 00:19:40,000
A V PC endpoint is like a secret
tunnel. 

414
00:19:40,240 --> 00:19:44,200
It allows you to talk to AWS 
services like S3 or Dynamo DB 

415
00:19:44,480 --> 00:19:46,760
without leaving the AWS private 
network. 

416
00:19:47,240 --> 00:19:50,440
The traffic never hits the Nat 
gateway and the cost for S3 and 

417
00:19:50,440 --> 00:19:53,160
Dynamo DB specifically the 
gateway endpoints are completely

418
00:19:53,160 --> 00:19:56,280
free $0.00. 
Free is my favorite price, so 

419
00:19:56,320 --> 00:19:58,760
check if you're routing traffic 
through a Nat that doesn't need 

420
00:19:58,760 --> 00:20:00,040
to be there. 
Absolutely. 

421
00:20:00,040 --> 00:20:01,920
It's one of the first things 
I've looked for at a cost audit.

422
00:20:02,040 --> 00:20:05,040
We have covered a lot. 
Memory, math, code, scope, 

423
00:20:05,040 --> 00:20:08,680
queues, networking, traps. 
There is one last area I want to

424
00:20:08,680 --> 00:20:11,640
touch on orchestration. 
What about when we need to wait 

425
00:20:11,640 --> 00:20:13,520
for things? 
This is the golden rule. 

426
00:20:14,400 --> 00:20:19,240
Lambda is terrible at waiting. 
Because we pay for duration. 

427
00:20:19,280 --> 00:20:24,440
Right, if you write code that 
says sleep 1000 row pause for 10

428
00:20:24,440 --> 00:20:27,480
seconds because you are waiting 
for an API response, you are 

429
00:20:27,480 --> 00:20:31,120
literally burning money while 
the CPU does absolutely nothing.

430
00:20:31,120 --> 00:20:33,720
You're paying for a taxi to sit 
in the driveway with the meter 

431
00:20:33,720 --> 00:20:34,680
running. 
Exactly. 

432
00:20:34,760 --> 00:20:37,960
So what should you use instead? 
AWS Step Functions. 

433
00:20:38,360 --> 00:20:40,040
This is the state machine 
service. 

434
00:20:40,040 --> 00:20:42,760
Yes, it allows you to coordinate
steps visually. 

435
00:20:42,760 --> 00:20:45,080
You can have a state that says 
wait for 10 seconds or wait for 

436
00:20:45,080 --> 00:20:46,560
a call back from this external 
system. 

437
00:20:46,600 --> 00:20:49,320
And the billing there. 
For standard workflows, you pay 

438
00:20:49,320 --> 00:20:53,040
per state transition, basically 
per steps you take, but the time

439
00:20:53,040 --> 00:20:55,040
spent in the state. 
The waiting time is free. 

440
00:20:55,040 --> 00:20:57,440
You can wait for a year and it 
costs $0.00. 

441
00:20:57,440 --> 00:21:00,080
That is a massive difference 
compared to paying per 

442
00:21:00,080 --> 00:21:02,480
millisecond. 
It is so the architectural 

443
00:21:02,480 --> 00:21:05,960
pattern is Use Lambda for a 
compute, transforming data, 

444
00:21:05,960 --> 00:21:08,760
calculating things. 
Use step functions for flow, 

445
00:21:08,760 --> 00:21:10,720
waiting, retrying, branching 
logic. 

446
00:21:10,840 --> 00:21:12,920
Don't mix them up. 
I like that distinction. 

447
00:21:12,920 --> 00:21:15,320
Lambda for compute step 
functions for flow. 

448
00:21:15,520 --> 00:21:18,920
And just a quick note, if you 
have high volume really fast 

449
00:21:18,920 --> 00:21:22,160
workflows, look at Express 
workflows and step functions. 

450
00:21:22,840 --> 00:21:24,800
They were cheaper for high 
throughput though that they 

451
00:21:24,800 --> 00:21:28,360
don't have the wait forever for 
free benefit, but for most 

452
00:21:28,360 --> 00:21:30,960
orchestration get the logic out 
of the Lambda. 

453
00:21:31,080 --> 00:21:32,640
This has been incredibly 
comprehensive. 

454
00:21:32,640 --> 00:21:35,320
Let's try to summarize this into
a checklist before people go 

455
00:21:35,320 --> 00:21:38,480
audit their AWS accounts. 
Sure, let's break it down #1 

456
00:21:39,160 --> 00:21:40,240
right? 
Size your memory. 

457
00:21:40,520 --> 00:21:44,320
Don't assume smaller is cheaper.
Use the AWS Lambda Power Tuning 

458
00:21:44,320 --> 00:21:48,240
tool to find the sweet spot. 
Remember that 1.8 GB threshold 

459
00:21:48,240 --> 00:21:51,320
#2 hardware? 
Switch to ARM based Graviton 

460
00:21:51,320 --> 00:21:53,560
processors. 
It's an almost guaranteed 20% 

461
00:21:53,560 --> 00:21:56,800
saving for most languages. #3. 
Think out-of-the-box with 

462
00:21:56,800 --> 00:21:59,120
batching. 
Put an SQSQ in front of your 

463
00:21:59,120 --> 00:22:02,960
Lambda process messages in bulk 
to amortize the startup costs. 

464
00:22:02,960 --> 00:22:05,520
And the architecture stuff. 
Use direct integrations. 

465
00:22:06,080 --> 00:22:10,120
If you are just moving data, see
if you can skip Lambda entirely 

466
00:22:10,400 --> 00:22:13,080
and please watch out for that 
Nat Gateway tax. 

467
00:22:13,400 --> 00:22:15,800
Use VPC endpoints wherever 
possible. 

468
00:22:15,800 --> 00:22:19,480
And finally stop waiting inside 
your functions, use step 

469
00:22:19,480 --> 00:22:20,400
functions. 
Exactly. 

470
00:22:20,400 --> 00:22:23,120
Don't pay the taxi to wait. 
You know what strikes me about 

471
00:22:23,120 --> 00:22:26,240
all this is that cost 
optimization in serverless isn't

472
00:22:26,240 --> 00:22:29,160
really about writing cheaper 
code in the traditional sense. 

473
00:22:29,360 --> 00:22:32,160
It's not about writing a more 
efficient sorting algorithm. 

474
00:22:32,160 --> 00:22:34,880
That's the key insight. 
In a traditional server 

475
00:22:34,880 --> 00:22:37,880
environment, efficient code 
meant using less RAM or fewer 

476
00:22:37,880 --> 00:22:40,520
cycle. 
In serverless, efficient code is

477
00:22:40,520 --> 00:22:43,200
often about architecture. 
It's about knowing when not to 

478
00:22:43,200 --> 00:22:45,600
use the compute service. 
Which leads to our final thought

479
00:22:45,600 --> 00:22:47,840
for you, the listener. 
We've given you a lot of 

480
00:22:47,840 --> 00:22:50,560
tactics, but what's the big 
strategic take away? 

481
00:22:50,760 --> 00:22:53,360
I'd leave you with this. 
The ultimate goal of serverless 

482
00:22:53,360 --> 00:22:55,160
optimization is to delete your 
code. 

483
00:22:55,640 --> 00:22:58,600
Every line of code you write is 
a line you have to debug, 

484
00:22:58,600 --> 00:23:02,600
maintain, and pay to execute. 
If AWS has a service that can do

485
00:23:02,600 --> 00:23:05,560
the job for you, use it. 
That is a great question to chew

486
00:23:05,560 --> 00:23:07,600
on. 
Are you reinventing the wheel 

487
00:23:07,600 --> 00:23:10,960
and paying for the privilege? 
Ask yourself, are you writing 

488
00:23:10,960 --> 00:23:12,800
code that AWS has already 
written for you? 

489
00:23:13,040 --> 00:23:15,280
Thank you so much for breaking 
this down. 

490
00:23:15,280 --> 00:23:17,800
This was a true deep dive. 
My pleasure. 

491
00:23:17,880 --> 00:23:20,080
And to our listener, good luck 
with those bills. 

492
00:23:20,080 --> 00:23:22,600
Go crack the code. 
We'll see you on the next deep 

493
00:23:22,600 --> 00:23:22,960
dive.