1
00:00:00,120 --> 00:00:02,040
OK, let's unpack this. 
So if you're building software 

2
00:00:02,040 --> 00:00:06,840
today, you've got this, this 
fundamental tension, right? 

3
00:00:06,840 --> 00:00:09,320
Businesses are demanding 
constant updates, daily 

4
00:00:09,320 --> 00:00:12,960
deployments, maybe even more. 
But the code going in, it's 

5
00:00:12,960 --> 00:00:16,800
often half baked, untested, 
potentially really broken. 

6
00:00:16,800 --> 00:00:19,560
Yeah, it's that classic clash 
speed versus stability. 

7
00:00:19,560 --> 00:00:22,840
You want continuous deployment, 
that ability to just push 

8
00:00:22,840 --> 00:00:26,640
features out instantly. 
But if you're doing things like 

9
00:00:26,640 --> 00:00:28,920
trunk based development, 
everyone emerging code all the 

10
00:00:28,920 --> 00:00:32,000
time into the main branch, well,
things get unstable fast. 

11
00:00:32,360 --> 00:00:33,880
How do you stop that new 
feature? 

12
00:00:34,000 --> 00:00:36,640
Let's call it Feature X, the one
that's not quite ready from 

13
00:00:36,640 --> 00:00:37,920
hitting customers? 
Right. 

14
00:00:37,920 --> 00:00:41,120
And that tension, it used to be 
a huge headache the old way. 

15
00:00:41,120 --> 00:00:42,360
And the sources are pretty 
clear. 

16
00:00:42,360 --> 00:00:44,560
This is the way anymore. 
I was, I know, keeping your code

17
00:00:44,560 --> 00:00:49,360
totally separate on a branch for
weeks, months, sometimes waiting

18
00:00:49,360 --> 00:00:51,880
until it was absolutely, 
positively perfect before 

19
00:00:51,880 --> 00:00:53,840
merging. 
Which led straight into what? 

20
00:00:53,840 --> 00:00:57,760
Developers absolutely hate 
integration hell or merge hell. 

21
00:00:58,160 --> 00:01:02,360
You know, spending days, maybe 
weeks, just fighting with code 

22
00:01:02,360 --> 00:01:04,360
conflicts because everything's 
diverged so much. 

23
00:01:04,760 --> 00:01:07,720
You basically threw out 
continuous integration just to 

24
00:01:07,720 --> 00:01:11,240
avoid deployment risk and well, 
that's just not sustainable now.

25
00:01:11,520 --> 00:01:13,360
OK. 
So continuous integration is non

26
00:01:13,360 --> 00:01:14,680
negotiable. 
We need it. 

27
00:01:14,680 --> 00:01:18,080
So we need some kind of 
technical tricks, something to 

28
00:01:18,080 --> 00:01:21,440
split the risk apart and that 
right there, that's what we're 

29
00:01:21,440 --> 00:01:24,160
diving into today. 
Feature flags, sometimes called 

30
00:01:24,160 --> 00:01:25,080
feature toggles. 
Yeah. 

31
00:01:25,200 --> 00:01:26,560
Exactly. 
They're, well, they're a really 

32
00:01:26,560 --> 00:01:30,080
elegant fix, actually. 
Put simply, they're just boolean

33
00:01:30,080 --> 00:01:34,120
variables like a simple true or 
false, and their whole job in 

34
00:01:34,120 --> 00:01:38,440
this continuous deployment world
is to stop unfinished code from 

35
00:01:38,440 --> 00:01:40,640
actually running. 
So the codes gets committed and 

36
00:01:40,720 --> 00:01:43,080
it keeps happening, but the 
feature itself is basically 

37
00:01:43,080 --> 00:01:45,960
switched off sleeping. 
OK, so that explains the why 

38
00:01:45,960 --> 00:01:49,080
saving us from that merge hell. 
But how does this tiny little 

39
00:01:49,080 --> 00:01:51,640
switch actually work? 
How does it hold back what could

40
00:01:51,640 --> 00:01:54,320
be like thousands of lines of 
new code in a huge system? 

41
00:01:54,320 --> 00:01:55,880
It's surprisingly 
straightforward. 

42
00:01:56,280 --> 00:01:59,400
While the developer is working 
on the feature, the flag, let's 

43
00:01:59,400 --> 00:02:02,200
stick with feature X is set to 
false everywhere. 

44
00:02:02,400 --> 00:02:06,080
And then wherever that new 
unfinished code lives, it gets 

45
00:02:06,080 --> 00:02:08,400
wrapped in a little check like 
an if statement. 

46
00:02:08,479 --> 00:02:09,919
Think of it like a protective 
active bubble. 

47
00:02:10,479 --> 00:02:15,440
OK, so the code effectively says
if feature X is true, run this 

48
00:02:15,440 --> 00:02:17,840
new stuff, otherwise just skip 
it. 

49
00:02:18,120 --> 00:02:20,800
Carry on like normal with the 
old stable code path. 

50
00:02:21,000 --> 00:02:23,920
So even though the new code is 
technically on the servers 

51
00:02:24,000 --> 00:02:27,120
deployed everywhere, it doesn't 
execute for any real users, 

52
00:02:27,280 --> 00:02:28,840
right? 
And the sources give a really 

53
00:02:28,840 --> 00:02:31,360
good example to picture this. 
Imagine you're redoing a whole 

54
00:02:31,360 --> 00:02:33,600
web page. 
You definitely don't want users 

55
00:02:33,600 --> 00:02:36,680
seeing, you know, half built 
elements or weird errors. 

56
00:02:36,680 --> 00:02:39,000
No, definitely not. 
So you'd guard the whole page 

57
00:02:39,000 --> 00:02:41,200
rendering. 
You keep the old page code, you 

58
00:02:41,200 --> 00:02:43,440
write the new page code, and 
then you put the new version 

59
00:02:43,440 --> 00:02:45,040
inside a flag check, maybe 
called new page. 

60
00:02:45,240 --> 00:02:47,760
If new page is true, boom, they 
see the shiny new version. 

61
00:02:47,960 --> 00:02:51,000
If it's false, the system just 
shows the old reliable page 

62
00:02:51,000 --> 00:02:53,000
instead, Like flipping a switch.
Exactly. 

63
00:02:53,000 --> 00:02:56,320
It's an instant toggle. 
But what's interesting is the 

64
00:02:56,320 --> 00:02:58,360
trade off this creates, at least
for a while. 

65
00:02:58,720 --> 00:03:01,960
You end up with code duplication
for some period. 

66
00:03:01,960 --> 00:03:05,240
Both the old page and the new 
page logic are sitting there in 

67
00:03:05,240 --> 00:03:07,360
the code base together. 
Duplication. 

68
00:03:07,720 --> 00:03:10,160
That sounds a bit messy, like 
technical debt. 

69
00:03:10,280 --> 00:03:13,440
Even if it's temporary. 
Is there a rule for how long 

70
00:03:13,520 --> 00:03:15,920
that's OK? 
The keyword is definitely 

71
00:03:15,920 --> 00:03:18,840
temporary. 
The idea is, once that new 

72
00:03:18,840 --> 00:03:24,000
feature is totally done, tested,
rolled out, and critically, once

73
00:03:24,000 --> 00:03:27,240
you're sure that old code isn't 
needed at all, then you clean it

74
00:03:27,240 --> 00:03:28,840
up. 
You remove the old code and the 

75
00:03:28,840 --> 00:03:31,360
feature flag itself. 
It's often called flag hygiene 

76
00:03:31,800 --> 00:03:33,920
and yeah, it's a really 
important step people sometimes 

77
00:03:33,920 --> 00:03:34,680
forget. 
Gotcha. 

78
00:03:34,680 --> 00:03:36,600
So the beauty is decoupling 
those two things. 

79
00:03:36,760 --> 00:03:39,840
Merging code which happens 
constantly, and activating code 

80
00:03:39,840 --> 00:03:41,800
for users, which is the actual 
release. 

81
00:03:41,800 --> 00:03:42,960
They're not tied together 
anywhere. 

82
00:03:43,000 --> 00:03:46,760
Exactly total separation. 
OK, but then the sources say 

83
00:03:46,760 --> 00:03:50,000
companies often keep these flags
around even after the features 

84
00:03:50,000 --> 00:03:51,400
out and working. 
They don't remove them 

85
00:03:51,400 --> 00:03:55,000
immediately, they use them for 
other things strategically. 

86
00:03:55,280 --> 00:03:58,120
You got it. 
That's where the flag evolves 

87
00:03:58,120 --> 00:04:01,360
from just being a safety net 
into a really powerful business 

88
00:04:01,360 --> 00:04:04,360
lever. 
And the first big strategic use 

89
00:04:04,360 --> 00:04:06,520
is something called a Canary 
release. 

90
00:04:07,120 --> 00:04:10,000
Ah, the Canary. 
Like testing the waters, right? 

91
00:04:10,440 --> 00:04:13,600
So the features deployed hidden 
behind its flag, but instead of 

92
00:04:13,600 --> 00:04:16,720
turning it on for everyone, you 
switch it on for just a tiny 

93
00:04:16,720 --> 00:04:20,200
fraction of users first, like 
maybe 5%. 

94
00:04:20,200 --> 00:04:22,079
Exactly. 
It's all about minimizing risk. 

95
00:04:22,440 --> 00:04:26,040
If something goes badly wrong, a
major bug performance tanks. 

96
00:04:26,040 --> 00:04:28,280
You've only affected that small 
5% group. 

97
00:04:28,280 --> 00:04:30,640
You get the warning signs 
immediately, you flip the flag 

98
00:04:30,640 --> 00:04:32,840
back off for them and the damage
is contained. 

99
00:04:32,840 --> 00:04:36,080
Most users never saw a thing. 
And the name Canary, that comes 

100
00:04:36,080 --> 00:04:38,280
from the old coal mining 
practice, doesn't it? 

101
00:04:38,360 --> 00:04:41,280
Miners took a Canary down and if
the air got bad, the bird would 

102
00:04:41,280 --> 00:04:42,840
collapse first, warning them to 
escape. 

103
00:04:42,840 --> 00:04:46,040
So the flag acts like that early
warning, testing the safety on a

104
00:04:46,040 --> 00:04:47,880
small scale before you go all 
in. 

105
00:04:48,120 --> 00:04:51,000
Perfect analogy. 
And once that initial 5% looks 

106
00:04:51,000 --> 00:04:54,040
good, the team managing the 
flags can slowly dial it up. 

107
00:04:54,280 --> 00:04:57,960
Maybe go to 10%, watch the 
metrics, then 25 percent, 50% 

108
00:04:58,360 --> 00:05:02,360
all the way to 100%. 
If any step shows problems, you 

109
00:05:02,360 --> 00:05:04,600
just dial it back by switching 
the flag off again. 

110
00:05:04,600 --> 00:05:07,040
For that group, it's controlled 
exposure. 

111
00:05:07,200 --> 00:05:09,640
Which seems like it flows 
naturally into the next big use 

112
00:05:10,000 --> 00:05:12,440
AB tests. 
How do flags help run those 

113
00:05:12,440 --> 00:05:14,320
experiments? 
But with AB testing, you 

114
00:05:14,320 --> 00:05:17,120
typically have two versions of 
something, version A and version

115
00:05:17,120 --> 00:05:19,360
B. 
Both are probably stable and 

116
00:05:19,360 --> 00:05:21,360
ready. 
The feature flag in this case 

117
00:05:21,360 --> 00:05:23,960
isn't just on off, it's used to 
route different users to 

118
00:05:23,960 --> 00:05:26,520
different versions. 
It directs specific segments of 

119
00:05:26,520 --> 00:05:29,960
your users to see either A or B.
OK, so the flying decides which 

120
00:05:29,960 --> 00:05:32,680
version a particular user sees, 
not if they see the feature at 

121
00:05:32,680 --> 00:05:34,400
all. 
You could send, say, half your 

122
00:05:34,400 --> 00:05:37,920
traffic to the old way, half to 
the new idea B, and then you 

123
00:05:37,920 --> 00:05:39,520
measure which one actually 
performs the better. 

124
00:05:39,520 --> 00:05:41,000
More clicks, more purchases, 
whatever. 

125
00:05:41,240 --> 00:05:44,720
Exactly right. 
And connecting this back, these 

126
00:05:44,720 --> 00:05:48,000
techniques Canaries in AB tests,
they let companies move beyond 

127
00:05:48,000 --> 00:05:50,960
just guesswork decisions about 
whether to fully launch 

128
00:05:50,960 --> 00:05:53,960
something become data-driven. 
You see the real impact on 

129
00:05:53,960 --> 00:05:56,920
users, on revenue, on engagement
before committing. 

130
00:05:57,040 --> 00:06:01,120
OK, that makes a lot of sense. 
But the logistics managing all 

131
00:06:01,120 --> 00:06:04,160
these switches, it must get 
complicated fast. 

132
00:06:04,400 --> 00:06:07,320
And that Canadian university 
study on the Chrome browser 

133
00:06:07,480 --> 00:06:08,960
really highlights the scale, 
doesn't it? 

134
00:06:09,080 --> 00:06:11,160
Absolutely. 
The numbers they found are kind 

135
00:06:11,160 --> 00:06:14,000
of staggering. 
They look at 39 Chrome releases 

136
00:06:14,000 --> 00:06:18,360
over five years from 2010 to 
2015 and they identified over, 

137
00:06:18,440 --> 00:06:22,680
wait for it, 2400 separate 
distinct feature flags just in 

138
00:06:22,680 --> 00:06:25,760
the Chrome code base during that
time. 2400 flags. 

139
00:06:25,760 --> 00:06:27,520
Wow. 
That's not just a few switches, 

140
00:06:27,520 --> 00:06:29,840
that's a massive control panel. 
And the growth is huge too, 

141
00:06:29,840 --> 00:06:32,600
right from a couple 100 
initially to over 2400 by the 

142
00:06:32,640 --> 00:06:33,880
end. 
Yeah, huge growth. 

143
00:06:34,200 --> 00:06:36,040
But it wasn't just accumulating 
flags. 

144
00:06:36,320 --> 00:06:39,920
There's constant churn. 
The data showed that on average,

145
00:06:39,920 --> 00:06:44,640
each new Chrome release added 
about 73 new flags, but it also 

146
00:06:44,640 --> 00:06:48,240
removed around 43 old ones as 
part of that cleanup, that flag 

147
00:06:48,240 --> 00:06:50,440
hygiene we mentioned. 
So there's this continuous 

148
00:06:50,440 --> 00:06:53,360
cycle. 
Add tests, maybe graduate, then 

149
00:06:53,360 --> 00:06:56,240
remove. 
That constant activity led to 

150
00:06:56,440 --> 00:06:58,560
the big net increase. 
We should probably linger on 

151
00:06:58,560 --> 00:07:01,880
that turn for a second, adding 
73 flags per release. 

152
00:07:01,880 --> 00:07:04,840
That means constantly defining 
new guards, implementing new 

153
00:07:04,840 --> 00:07:07,640
code pads, and then having the 
discipline to clean up almost 

154
00:07:07,640 --> 00:07:09,720
half of them later. 
That sounds like a massive 

155
00:07:09,720 --> 00:07:12,600
maintenance effort. 
If you forget to remove one, it 

156
00:07:12,600 --> 00:07:15,920
just sits there dead code. 
That's exactly the risk, and 

157
00:07:15,920 --> 00:07:18,360
it's the core challenge of 
managing flags at scale. 

158
00:07:18,440 --> 00:07:20,920
Flag hygiene becomes critical 
when you're dealing with 

159
00:07:20,920 --> 00:07:23,520
hundreds or like Chrome, 
thousands of these things. 

160
00:07:23,720 --> 00:07:26,120
You can't just use the absolute 
simplest way of implementing 

161
00:07:26,120 --> 00:07:27,960
them and the. 
Simplest way is what just 

162
00:07:27,960 --> 00:07:30,920
defining it in the code itself? 
Yeah, just having like boolean 

163
00:07:30,920 --> 00:07:33,640
feature X equals false right 
there in the source code. 

164
00:07:34,320 --> 00:07:37,520
The problem is, if that flag is 
false and you need to change it 

165
00:07:37,520 --> 00:07:41,120
to true to start, say, a Canary 
release, you have to actually 

166
00:07:41,120 --> 00:07:43,760
edit the code file, commit the 
change, rebuild the entire 

167
00:07:43,760 --> 00:07:45,440
application and then redeploy 
everything. 

168
00:07:45,600 --> 00:07:47,200
Right, which kind of defeats the
purpose. 

169
00:07:47,200 --> 00:07:50,240
If you have to do a full 
redeploy just to flip a switch, 

170
00:07:50,600 --> 00:07:53,520
you've lost that speed and 
separation the flag was supposed

171
00:07:53,520 --> 00:07:55,480
to give you. 
You're back to waiting on 

172
00:07:55,480 --> 00:07:57,600
deployments. 
Precisely so. 

173
00:07:57,600 --> 00:08:01,080
For big systems with lots of 
flags, the solution is usually 

174
00:08:01,080 --> 00:08:04,040
external management. 
They use dedicated libraries or 

175
00:08:04,040 --> 00:08:07,360
systems basically like a big 
configuration database or table 

176
00:08:07,600 --> 00:08:11,320
that stores the current state 
true or false of every single 

177
00:08:11,320 --> 00:08:13,640
flag outside of the application 
code itself. 

178
00:08:13,720 --> 00:08:16,160
And the big win there is. 
Complete decoupling. 

179
00:08:16,800 --> 00:08:19,560
The running application code 
reads the flag state from this 

180
00:08:19,560 --> 00:08:23,400
external source at runtime. 
So the operations team or maybe 

181
00:08:23,400 --> 00:08:27,200
product managers can change a 
flag state, start an AB test, 

182
00:08:27,480 --> 00:08:30,720
rollback A problematic feature 
using a dashboard or an API 

183
00:08:30,720 --> 00:08:33,200
call. 
No code changes, no recompiling,

184
00:08:33,200 --> 00:08:35,880
no redeploying. 
The change takes effect almost 

185
00:08:35,880 --> 00:08:39,520
instantly, Essential when you 
have 2400 switches to manage. 

186
00:08:39,520 --> 00:08:41,360
OK. 
So that makes sense for managing

187
00:08:41,360 --> 00:08:43,760
the complexity. 
Now stepping back a bit, we 

188
00:08:43,760 --> 00:08:46,640
should probably clarify that 
feature flag is kind of a broad 

189
00:08:46,640 --> 00:08:48,600
term. 
The ones we've mostly talked 

190
00:08:48,600 --> 00:08:51,280
about for safe deployment, 
Canaries, AB test, those have a 

191
00:08:51,280 --> 00:08:54,800
specific right release flags. 
That's correct. 

192
00:08:54,800 --> 00:08:57,240
Release flags are all about 
managing the software delivery, 

193
00:08:57,240 --> 00:09:00,160
life cycle, risk reduction, 
gradual rollouts. 

194
00:09:00,400 --> 00:09:02,720
But there's another major 
category, one that's less about 

195
00:09:02,720 --> 00:09:06,600
technical risk and more about, 
well, the business model, and 

196
00:09:06,600 --> 00:09:09,160
those are called business flags.
Business flags, How are they 

197
00:09:09,160 --> 00:09:11,120
different from release flags and
practice? 

198
00:09:11,400 --> 00:09:14,680
They're used to essentially 
create different versions of the

199
00:09:14,680 --> 00:09:18,440
software for different users, 
but all from the same deployed 

200
00:09:18,440 --> 00:09:20,720
code base. 
I think user permission, 

201
00:09:20,720 --> 00:09:23,200
subscription to yours, maybe 
regional differences. 

202
00:09:23,400 --> 00:09:26,360
The code for all features is 
deployed everywhere, but the 

203
00:09:26,360 --> 00:09:29,560
business flag controls who is 
actually allowed to use certain 

204
00:09:29,560 --> 00:09:32,720
features. 
OK, so that's how a company runs

205
00:09:32,720 --> 00:09:36,000
like a free version and a paid 
version of their app from the 

206
00:09:36,000 --> 00:09:38,800
same underlying cut out. 
Exactly that, it's the classic 

207
00:09:38,800 --> 00:09:41,720
freemium model example. 
The code for the premium paid 

208
00:09:41,720 --> 00:09:45,200
features exists on the free 
users installation too, but 

209
00:09:45,200 --> 00:09:48,160
there's a business flag check. 
Maybe is premium user wrapped 

210
00:09:48,160 --> 00:09:50,280
around it? 
If you're not paying, that flag 

211
00:09:50,280 --> 00:09:52,800
is false and the feature is 
hidden or disabled. 

212
00:09:53,440 --> 00:09:56,200
If you upgrade, the flag flips 
to true and the feature 

213
00:09:56,200 --> 00:09:58,920
magically appears often without 
needing any new software 

214
00:09:58,920 --> 00:09:59,880
install. 
Wow. 

215
00:09:59,880 --> 00:10:02,440
OK, that's the Yeah, that's 
powerful. 

216
00:10:02,440 --> 00:10:05,000
It really makes you think. 
Feature flags aren't just a 

217
00:10:05,000 --> 00:10:07,680
deployment tactic anymore, they 
kind of transform the code 

218
00:10:07,680 --> 00:10:09,520
itself. 
It's not this fixed, monolithic 

219
00:10:09,520 --> 00:10:11,520
thing anymore. 
It becomes fluid, adaptable. 

220
00:10:11,520 --> 00:10:14,600
You can control not just if 
something runs, but who it runs 

221
00:10:14,600 --> 00:10:18,800
for based on business rules. 
It really is the technology that

222
00:10:18,800 --> 00:10:23,680
underpins how modern software 
can be deployed continuously and

223
00:10:23,680 --> 00:10:27,040
be flexible enough to meet 
complex business needs. 

224
00:10:27,360 --> 00:10:30,040
Well, that pretty much wraps up 
our deep dive into feature 

225
00:10:30,040 --> 00:10:32,320
flags. 
They really are this this clever

226
00:10:32,320 --> 00:10:35,240
mechanism that lets developers 
move fast without, you know, 

227
00:10:35,440 --> 00:10:38,200
breaking things or driving 
everyone into that dreaded merge

228
00:10:38,200 --> 00:10:40,600
hell. 
And importantly, keeping you, 

229
00:10:40,680 --> 00:10:43,400
the user, safe from seeing stuff
that just isn't ready. 

230
00:10:43,400 --> 00:10:46,160
Yeah, they're fundamental now. 
So thank you for joining us for 

231
00:10:46,160 --> 00:10:46,880
this deep dive.
