1
00:00:00,000 --> 00:00:02,200
Hey, a quick message. 
For those of you who are 

2
00:00:02,200 --> 00:00:04,300
listening to this episode on 
Spotify. 

3
00:00:04,600 --> 00:00:07,500
I have a small favor to ask 
Spotify. 

4
00:00:07,500 --> 00:00:10,100
Now allows mobile users to rate 
podcasts. 

5
00:00:10,400 --> 00:00:13,300
I would really appreciate it. 
If you can take a quick, pause 

6
00:00:13,300 --> 00:00:16,100
to go to the technique Journal 
podcast page, and leave your 

7
00:00:16,100 --> 00:00:18,800
favorite show. 
Your best rating on Spotify. 

8
00:00:19,200 --> 00:00:21,800
It will help me a lot to get 
this podcast to reach more 

9
00:00:21,800 --> 00:00:24,300
people on the platform. 
Thanks a lot. 

10
00:00:24,800 --> 00:00:28,900
Observability is a technique for
ensuring that you can 

11
00:00:29,000 --> 00:00:31,500
understand. 
Novel problems in your system. 

12
00:00:31,900 --> 00:00:34,200
That's why it was a necessary 
addition for me to go from 

13
00:00:34,200 --> 00:00:36,000
working on necessary to work in 
observability. 

14
00:00:36,500 --> 00:00:39,700
So, when we think about defining
observability, it's basically 

15
00:00:39,700 --> 00:00:42,600
this question. 
Can you understand what's 

16
00:00:42,600 --> 00:00:46,200
happening in your system and why
without having to push your 

17
00:00:46,200 --> 00:00:48,900
code? 
And to do so, very quickly by 

18
00:00:48,900 --> 00:00:52,600
slicing, and dicing existing 
data that you already have, in 

19
00:00:52,600 --> 00:00:55,100
terms of telemetry signals that 
are coming out of your system. 

20
00:01:00,000 --> 00:01:02,700
Hey everyone. 
My name is Henry Surya with 

21
00:01:02,700 --> 00:01:06,000
Robin. 
And you're listening to the 

22
00:01:06,000 --> 00:01:09,300
technology, you know, podcast 
the show where I'll be bringing 

23
00:01:09,300 --> 00:01:12,400
you the greatest technical 
leaders practitioners and 

24
00:01:12,400 --> 00:01:16,200
thought leaders in the industry 
to discuss about their Journey 

25
00:01:16,400 --> 00:01:20,900
ideas and practices that we all 
can learn and apply to build a 

26
00:01:20,900 --> 00:01:24,400
highly performing technical team
and to make an impact in your 

27
00:01:24,400 --> 00:01:27,700
personal work. 
So let's dive into our Journal. 

28
00:01:32,900 --> 00:01:35,100
Hello to all of you, my friends 
and listeners. 

29
00:01:35,300 --> 00:01:38,300
Welcome to the technology. 
Now, podcast the show where you 

30
00:01:38,300 --> 00:01:41,300
can learn about technical 
leadership and Excellence from 

31
00:01:41,300 --> 00:01:44,800
my conversation, with great 
thought leaders and very happy 

32
00:01:44,800 --> 00:01:48,300
today to present the 88th 
episode of the podcast. 

33
00:01:48,600 --> 00:01:51,100
Thank you for tuning in 
listening to this episode. 

34
00:01:51,300 --> 00:01:53,800
If this is your first time 
listening to tackle the journal,

35
00:01:54,000 --> 00:01:57,400
subscribe and follow the show on
your podcast app and social 

36
00:01:57,400 --> 00:02:01,300
media on LinkedIn, Twitter and 
Instagram, if you are a regular 

37
00:02:01,300 --> 00:02:04,800
listener and enjoy listening to 
the episodes, support me by 

38
00:02:04,800 --> 00:02:08,000
subscribing, as a patron, at 
technology node, Dev slash 

39
00:02:08,000 --> 00:02:11,200
Patron. 
My guests for today's episode is

40
00:02:11,200 --> 00:02:13,600
list. 
Found Jones list is the 

41
00:02:13,600 --> 00:02:17,400
principal developer advocate for
a sorry and observability at 

42
00:02:17,400 --> 00:02:19,000
honeycomb. 
And recently. 

43
00:02:19,000 --> 00:02:21,600
She just published a book 
titled, observability, 

44
00:02:21,600 --> 00:02:24,700
engineering, which she 
co-authored with her colleagues,

45
00:02:24,700 --> 00:02:26,800
charity majors and George 
Miranda. 

46
00:02:27,400 --> 00:02:30,500
In this episode. 
List shit in depth about the 

47
00:02:30,500 --> 00:02:33,800
concept of observability and why
it is becoming an important 

48
00:02:33,800 --> 00:02:35,800
practice in the industry 
nowadays. 

49
00:02:36,200 --> 00:02:39,400
She started by explaining the 
fundamentals of observability 

50
00:02:39,700 --> 00:02:43,100
and how it differs from 
traditional monitoring and how 

51
00:02:43,100 --> 00:02:46,700
observability can help us to run
a more reliable and stable 

52
00:02:46,700 --> 00:02:51,000
production systems including its
relation with the devops and SRE

53
00:02:51,000 --> 00:02:53,700
practices. 
She explained some interesting 

54
00:02:53,700 --> 00:02:57,300
Concepts such as the corn 
Alice's Loop, cardinality and 

55
00:02:57,300 --> 00:03:00,300
dimensionality. 
And doing debugging from a first

56
00:03:00,300 --> 00:03:04,000
principle in the later. 
Part of the conversation list, 

57
00:03:04,000 --> 00:03:07,200
shared her view of the current 
state of observability, 

58
00:03:07,500 --> 00:03:11,200
including the proliferation of 
vendor and open source tools and

59
00:03:11,200 --> 00:03:15,700
how we Engineers can improve our
systems, observability by doing,

60
00:03:15,700 --> 00:03:18,800
observability driven 
development, and improving our 

61
00:03:18,800 --> 00:03:22,500
practices based on the proposed.
Observability maturity model 

62
00:03:22,600 --> 00:03:25,700
found in the book. 
I really enjoyed my conversation

63
00:03:25,700 --> 00:03:29,900
with list diving deep into Tea 
and understanding the different 

64
00:03:29,900 --> 00:03:32,300
nuances of observability 
Concepts. 

65
00:03:32,700 --> 00:03:35,400
If you are interested in this 
topic, I would also highly 

66
00:03:35,400 --> 00:03:39,100
recommend reading further the 
observability engineering book 

67
00:03:39,300 --> 00:03:42,900
in which honeycomb has kindly 
provided the e-book for free on 

68
00:03:42,900 --> 00:03:45,300
their website. 
Find out the link provided in 

69
00:03:45,300 --> 00:03:47,600
the show notes to get your free 
copy. 

70
00:03:48,100 --> 00:03:50,900
And if you also enjoyed this 
episode and find it useful, 

71
00:03:51,200 --> 00:03:53,000
share it with your friends and 
colleagues. 

72
00:03:53,100 --> 00:03:56,500
Who may also benefit from 
listening to this episode, leave

73
00:03:56,500 --> 00:03:58,200
a rating and review on your 
podcast. 

74
00:03:58,400 --> 00:04:01,700
Up and share your comments or 
feedback about this episode on 

75
00:04:01,700 --> 00:04:04,300
social media. 
It is my ultimate mission to 

76
00:04:04,300 --> 00:04:06,800
make this podcast available to 
more people. 

77
00:04:07,100 --> 00:04:09,700
And I need your help to support 
me towards fulfilling my 

78
00:04:09,700 --> 00:04:11,800
mission. 
Before we continue to the 

79
00:04:11,800 --> 00:04:14,400
conversation. 
Let's hear some words from our 

80
00:04:14,400 --> 00:04:16,700
sponsor. 
Today's episode is proudly 

81
00:04:16,700 --> 00:04:20,300
sponsored by skills matter. 
The global community and events 

82
00:04:20,300 --> 00:04:24,800
platform with more than 100,000 
software professionals here 

83
00:04:25,100 --> 00:04:28,100
members, can organize their 
learning experiences around the 

84
00:04:28,300 --> 00:04:31,100
Technology topics. 
They care about most you get 

85
00:04:31,100 --> 00:04:34,800
on-demand access to their latest
content thought, leadership 

86
00:04:34,800 --> 00:04:38,400
insights, as well as the 
exciting schedule of tech events

87
00:04:38,400 --> 00:04:42,200
running across all time zones. 
So where the devops our data 

88
00:04:42,200 --> 00:04:46,300
science is your bus or you're a 
fan of functional programming or

89
00:04:46,300 --> 00:04:49,900
all things Cloud, you can make 
real connections with people who

90
00:04:49,900 --> 00:04:54,000
share your interests head on 
over to skills method or Cam to 

91
00:04:54,000 --> 00:04:56,800
become part of the tech 
community that matters most to 

92
00:04:56,800 --> 00:04:58,200
you. 
It's free to join. 

93
00:04:58,500 --> 00:05:01,400
And you will find it easy to 
keep up with the latest tech 

94
00:05:01,400 --> 00:05:05,900
Trends. 
Hello everyone, welcome back to 

95
00:05:05,900 --> 00:05:08,200
another new episode of the 
package, you know, podcast 

96
00:05:08,200 --> 00:05:10,300
today. 
I'm very excited to finally meet

97
00:05:10,300 --> 00:05:12,500
someone who I have been 
following for quite a while. 

98
00:05:12,800 --> 00:05:16,300
This funk Jones is here with us.
So today, we'll be talking a lot

99
00:05:16,300 --> 00:05:20,100
about SRE and observability, 
which is the topics and the 

100
00:05:20,100 --> 00:05:23,500
trends that are up and coming. 
Also, these days, this actually 

101
00:05:23,500 --> 00:05:26,200
is the principal developer 
Advocate at honeycomb. 

102
00:05:26,300 --> 00:05:28,200
She has been working in this 
area. 

103
00:05:28,300 --> 00:05:32,100
And observability for maybe 16 
years by her bio, really is a 

104
00:05:32,100 --> 00:05:35,000
pleasure to meet you today and 
I'm looking forward to learning 

105
00:05:35,000 --> 00:05:37,200
everything about a sari and 
observability today. 

106
00:05:38,100 --> 00:05:40,300
I never really get to 
everything, but we can certainly

107
00:05:40,300 --> 00:05:42,300
do a good chunk of it. 
Hi. 

108
00:05:42,300 --> 00:05:43,500
Thanks for having me on the 
show. 

109
00:05:44,300 --> 00:05:46,800
So list for people who may not 
know you yet. 

110
00:05:46,900 --> 00:05:48,900
Maybe if you can help to 
introduce yourself, if you 

111
00:05:48,900 --> 00:05:51,400
telling us more about your 
highlights or turning points in 

112
00:05:51,400 --> 00:05:55,600
your career. 
Yeah, so I started working as a 

113
00:05:55,600 --> 00:06:02,200
systems engineer in 2004. 
I've been doing all kinds of 

114
00:06:02,200 --> 00:06:04,800
work on reliability and making 
systems better. 

115
00:06:04,800 --> 00:06:08,000
It easier to operate, that 
includes spending a number of 

116
00:06:08,000 --> 00:06:11,000
years working at in studio a 
number of years working at 

117
00:06:11,000 --> 00:06:12,500
Google. 
That's where I spent most of my 

118
00:06:12,500 --> 00:06:16,400
career as a site reliability 
here at Google and then my 

119
00:06:16,400 --> 00:06:18,700
career took a turn towards 
thinking about. 

120
00:06:18,700 --> 00:06:22,400
How do I teach not just my team 
to work on better systems the 

121
00:06:22,400 --> 00:06:24,300
house? 
All teams across the suffer 

122
00:06:24,300 --> 00:06:26,800
Industries, do better, that's 
wanting to come a developer 

123
00:06:26,800 --> 00:06:29,800
Advocate and I switched over to 
working out how to vote. 

124
00:06:30,500 --> 00:06:33,600
So, yeah, I've been following 
you with all your SRE Contents. 

125
00:06:33,800 --> 00:06:36,800
I started probably with the SRE 
contents, mostly, and then it 

126
00:06:36,800 --> 00:06:38,600
took a turn since you join 
honeycomb. 

127
00:06:38,600 --> 00:06:41,700
And now we are talking a lot 
more about observability and you

128
00:06:41,700 --> 00:06:44,700
have an upcoming book which is 
titled observability 

129
00:06:44,700 --> 00:06:46,700
engineering. 
So, this is something that you 

130
00:06:46,700 --> 00:06:49,700
wrote with Charity majors and 
one of your colleague in 

131
00:06:49,700 --> 00:06:50,700
honeycomb. 
Yes. 

132
00:06:50,700 --> 00:06:53,200
Your friend up here. 
There's just Mirror and I 

133
00:06:53,300 --> 00:06:57,400
observability is definitely one 
of the hottest topics these days

134
00:06:57,400 --> 00:06:59,400
especially in the technology 
World. 

135
00:06:59,500 --> 00:07:02,600
Maybe you can start by helping 
us to understand what is 

136
00:07:02,600 --> 00:07:05,800
actually observability because 
this term is commonly used these

137
00:07:05,800 --> 00:07:08,900
days. 
Let's contextualize in terms of 

138
00:07:08,900 --> 00:07:11,900
both as a re and observability. 
Not all of your listeners will 

139
00:07:11,900 --> 00:07:13,600
necessarily have heard of a 
story either. 

140
00:07:13,900 --> 00:07:18,600
So when we start with thinking 
about what s res do the goal of 

141
00:07:18,600 --> 00:07:24,000
a site, reliability, engineer is
to try to Systems easier to 

142
00:07:24,000 --> 00:07:27,700
operate for the people that are 
working on member, whether it's 

143
00:07:27,700 --> 00:07:30,200
a tourism settle for Ron Paul. 
Or whether it's the software 

144
00:07:30,200 --> 00:07:32,500
engineering team that is kind of
building an operating and 

145
00:07:32,500 --> 00:07:36,100
running the system. 
One of those key considerations 

146
00:07:36,100 --> 00:07:40,400
in terms of ensuring the system 
is highly, available is thinking

147
00:07:40,400 --> 00:07:43,000
about what are your service 
level objectives? 

148
00:07:43,200 --> 00:07:45,800
What are your reliability goals 
for the system? 

149
00:07:46,200 --> 00:07:50,500
Mmm, Flipside, how do we think 
about trying to achieve those 

150
00:07:50,500 --> 00:07:52,500
goals? 
What happens if there is an 

151
00:07:52,600 --> 00:07:54,300
incident, does it take 5 minutes
to fix? 

152
00:07:54,300 --> 00:07:56,900
Or does it take three hours to 
fix to me? 

153
00:07:56,900 --> 00:08:00,100
That's where observability comes
in because observability is a 

154
00:08:00,100 --> 00:08:04,800
technique for ensuring that you 
can understand novel problems in

155
00:08:04,800 --> 00:08:07,100
your system. 
That's why it was an F is 

156
00:08:07,108 --> 00:08:09,400
transition for me to go from 
working on necessary to working 

157
00:08:09,400 --> 00:08:12,300
abs or Billy. 
So, when we think about defining

158
00:08:12,300 --> 00:08:14,700
observability, it's basically 
this question. 

159
00:08:14,900 --> 00:08:18,500
Can you understand what's 
happening in your system and why

160
00:08:19,100 --> 00:08:21,000
without having to question your 
code? 

161
00:08:21,000 --> 00:08:24,000
And she do so very quickly. 
Five, slicing and dicing 

162
00:08:24,000 --> 00:08:27,500
existing data that you already 
have in terms of telemetry 

163
00:08:27,500 --> 00:08:29,200
signals that are coming out of 
your system. 

164
00:08:30,000 --> 00:08:31,800
So you mentioned a couple of key
points. 

165
00:08:31,800 --> 00:08:35,500
When you describe observability 
here, the first without pushing 

166
00:08:35,500 --> 00:08:37,799
new code. 
If I imagine last time, I used 

167
00:08:37,799 --> 00:08:40,000
to be a software developer 
whenever I want to troubleshoot 

168
00:08:40,000 --> 00:08:42,299
a debug something. 
Sometimes I introduce new code. 

169
00:08:42,600 --> 00:08:44,200
Yes. 
Event print after box statement,

170
00:08:44,200 --> 00:08:47,000
all of us are guilty of it. 
So, tell us more about this 

171
00:08:47,000 --> 00:08:49,800
without pushing new code. 
How can we actually do this? 

172
00:08:50,600 --> 00:08:54,400
Yeah, select she way that we 
think about observability and 

173
00:08:54,400 --> 00:08:58,200
kind of that first step of 
instrumentation is as you're 

174
00:08:58,200 --> 00:09:00,900
developing your application. 
You should add its rotation to 

175
00:09:00,900 --> 00:09:04,100
help you understand what's 
happening inside of your codes 

176
00:09:04,100 --> 00:09:06,200
that you don't want to be caught
in the bat Lair. 

177
00:09:06,600 --> 00:09:09,400
That doesn't mean you have to 
predict every single failure. 

178
00:09:09,400 --> 00:09:11,900
In the spring happen. 
Advanced be kind of have to 

179
00:09:11,900 --> 00:09:14,400
believe yourself, the 
breadcrumbs at your voided need 

180
00:09:14,400 --> 00:09:18,800
to debug in the future. 
So this starts with, in my view 

181
00:09:18,800 --> 00:09:23,700
at least kind of having some I'm
form tracing to know when did 

182
00:09:23,700 --> 00:09:26,200
each request start and stop in 
each service. 

183
00:09:26,400 --> 00:09:27,700
Where did that request come 
from? 

184
00:09:27,800 --> 00:09:31,200
Which other service called you 
intern and maybe some other 

185
00:09:31,200 --> 00:09:33,600
properties? 
Like what user is it the more 

186
00:09:33,600 --> 00:09:37,500
you enrich your face gives with 
information that may or may not 

187
00:09:37,500 --> 00:09:40,900
seem relevant the time. 
But if you can freely add that 

188
00:09:40,900 --> 00:09:44,600
detail of what feature Factor on
that side, which user ID is it, 

189
00:09:44,600 --> 00:09:47,200
which grows are they using? 
Which language are they using? 

190
00:09:47,400 --> 00:09:49,500
Then? 
It means down the road. 

191
00:09:49,600 --> 00:09:52,400
You don't. 
Have to have created by any of 

192
00:09:52,400 --> 00:09:55,100
those Dimensions. 
You can kind of see what you 

193
00:09:55,100 --> 00:09:56,900
have any of those dimensions are
relevant. 

194
00:09:57,600 --> 00:10:01,100
So instrumentation tracing all 
these are being mentioned these 

195
00:10:01,100 --> 00:10:02,900
days. 
And people also normally 

196
00:10:02,900 --> 00:10:06,200
categorize it with these three 
pillars of observability, called

197
00:10:06,200 --> 00:10:09,700
locks metrics raising. 
So tell us more are these three 

198
00:10:09,700 --> 00:10:12,300
the most important things that 
we need to implement for 

199
00:10:12,300 --> 00:10:16,400
observability. 
I think that having high quality

200
00:10:16,400 --> 00:10:19,400
signals does matter, you're not 
going to get great 

201
00:10:19,400 --> 00:10:21,400
observability, unless the data 
is there. 

202
00:10:22,000 --> 00:10:25,300
However, you can't just throw a 
bunch of data into a data. 

203
00:10:25,300 --> 00:10:28,600
We can see that you're done. 
So I think that's kind of why I 

204
00:10:28,600 --> 00:10:32,000
pushed back against the three 
pillars narrative, because one 

205
00:10:32,000 --> 00:10:34,700
or two really high quality 
signal is enough. 

206
00:10:34,800 --> 00:10:38,300
You don't necessarily need kind 
of the Luminous, debug logs, if 

207
00:10:38,300 --> 00:10:41,500
you have tracing, that is 
representative of what you 

208
00:10:41,500 --> 00:10:45,900
addressed, but Toronto. 
Lucky, Metrics can be useful in 

209
00:10:46,000 --> 00:10:49,500
high-volume situations, but the 
problem of metric is that their 

210
00:10:49,500 --> 00:10:52,100
pre attitude. 
You can't re slice the data 

211
00:10:52,100 --> 00:10:54,300
what's been pretty accurate data
source, right? 

212
00:10:54,300 --> 00:10:58,500
So I didn't fix each signal has 
some benefits and drawbacks that

213
00:10:58,500 --> 00:11:01,100
you need to evaluate and you 
don't need to collect all three 

214
00:11:01,100 --> 00:11:03,100
of them. 
In fact, there are new and 

215
00:11:03,100 --> 00:11:05,600
emerging signals like, 
continuous profiling. 

216
00:11:05,800 --> 00:11:08,700
So they're not really even three
pillars were lenses or whatever.

217
00:11:08,700 --> 00:11:10,700
You want to call them. 
There are many different 

218
00:11:10,700 --> 00:11:13,400
Telemetry types that we can 
think about utilizing. 

219
00:11:13,500 --> 00:11:16,700
As we kind of compiled with 
capability, that's really not a 

220
00:11:16,700 --> 00:11:20,600
technical capability. 
But instead a socio technical 

221
00:11:20,600 --> 00:11:22,800
capability. 
What can your engineers 

222
00:11:22,800 --> 00:11:26,100
accomplish with the system? 
Not what are you measuring on a 

223
00:11:26,108 --> 00:11:28,000
system? 
But can you actually analyze it?

224
00:11:28,000 --> 00:11:30,800
You actually get that result of 
I can understand, figure out 

225
00:11:30,800 --> 00:11:33,200
what's happening. 
So one thing that is also 

226
00:11:33,200 --> 00:11:36,200
interesting for me when I read 
and study but observability, 

227
00:11:36,200 --> 00:11:38,200
right? 
It mentions that the goal of 

228
00:11:38,200 --> 00:11:41,300
observability is actually to 
provide a level of introspection

229
00:11:41,300 --> 00:11:45,500
or details that helps you. 
Stand, the internal state of the

230
00:11:45,500 --> 00:11:47,900
system. 
So the key word here is internal

231
00:11:47,900 --> 00:11:49,900
State. 
Tell us more about this. 

232
00:11:50,300 --> 00:11:52,900
How do you differ this with 
something that monitors external

233
00:11:52,900 --> 00:11:54,500
State? 
Yeah. 

234
00:11:54,500 --> 00:11:59,500
So I think that when you're 
measuring external State, you 

235
00:11:59,500 --> 00:12:03,000
are measuring things. 
Like what's be CPU utilization. 

236
00:12:03,000 --> 00:12:06,100
What's the memory utilization? 
Those are things that are 

237
00:12:06,100 --> 00:12:10,000
potentially useful but they 
don't give you an idea of what 

238
00:12:10,000 --> 00:12:14,300
was the application doing at the
time that this You gonna request

239
00:12:14,300 --> 00:12:17,000
executed. 
So I think that kind of detail 

240
00:12:17,000 --> 00:12:20,700
into what was the code actually 
doing and not, what are the side

241
00:12:20,700 --> 00:12:23,100
effects of the code. 
I think that's kind of what 

242
00:12:23,100 --> 00:12:26,600
differentiates observing it from
the inside out, rather than from

243
00:12:26,600 --> 00:12:30,800
the outside in The other thing 
that people in the industry, I 

244
00:12:30,800 --> 00:12:33,500
used to traditionally, we call 
it monitoring. 

245
00:12:33,800 --> 00:12:36,900
So when we talk about system 
administrators, last time we 

246
00:12:36,900 --> 00:12:39,200
used to talk about monitoring 
and in the past, I don't know 

247
00:12:39,200 --> 00:12:42,000
how many years recently is we 
start to shift from monitoring 

248
00:12:42,000 --> 00:12:45,000
the observability. 
So is there any significant 

249
00:12:45,000 --> 00:12:48,000
difference between 23 
observability or is just another

250
00:12:48,000 --> 00:12:51,200
term being coined to Brand? 
The same thing with the new term

251
00:12:52,100 --> 00:12:55,300
to me? 
Observability is a superset of 

252
00:12:55,300 --> 00:12:57,500
the capabilities that monitoring
would provide. 

253
00:12:57,600 --> 00:13:01,500
I'd so to leave are not at all 
synonymous because with 

254
00:13:01,500 --> 00:13:04,900
monitoring, you're just trying 
to figure out when something has

255
00:13:04,900 --> 00:13:09,500
broken but won't necessarily 
answer why, which is why people 

256
00:13:09,500 --> 00:13:12,800
had considered for a long time. 
You know, I use my monitors to 

257
00:13:12,800 --> 00:13:15,100
tell you something has gone 
wrong and I use my logs to 

258
00:13:15,100 --> 00:13:16,800
figure out what happens. 
Guess what? 

259
00:13:16,800 --> 00:13:19,800
In a complex distributed system 
logging is not going to save you

260
00:13:19,800 --> 00:13:21,200
anymore. 
It take forever to search 

261
00:13:21,200 --> 00:13:24,400
through deep doesn't give you 
causality of what causes what. 

262
00:13:25,000 --> 00:13:28,200
So what we think about the 
relationship of Monitoring and 

263
00:13:28,200 --> 00:13:30,500
observability. 
I think this is why I mentioned 

264
00:13:30,500 --> 00:13:32,000
service. 
Level objectives, early. 

265
00:13:32,500 --> 00:13:35,600
Can me service. 
Level objectives are kind of 

266
00:13:35,600 --> 00:13:39,300
monitoring 2.0 there. 
The answer to be tell me when 

267
00:13:39,300 --> 00:13:41,700
something has gone. 
Wrong that monitor played before

268
00:13:42,000 --> 00:13:46,200
words, observability kind of 
maps more closely to me to kind 

269
00:13:46,200 --> 00:13:48,700
of replacing London. 
B does to replacing the metrics 

270
00:13:48,700 --> 00:13:52,200
to use for monitoring the past. 
So what do I mean, - well, what 

271
00:13:52,200 --> 00:13:54,000
do you have your service level 
objective? 

272
00:13:54,000 --> 00:13:57,500
You define what a acceptable 
level of successes? 

273
00:13:57,600 --> 00:14:01,300
As for instance, at honeycomb, 
we aim for India's pipeline, to 

274
00:14:01,300 --> 00:14:06,500
be 99.99 percent reliable. 
That means no more than 1 in 

275
00:14:06,500 --> 00:14:09,500
10,000, packets of the coming 
Telemetry are dropped. 

276
00:14:10,000 --> 00:14:13,900
But if we start seeing an 
elevated error rate, if we have 

277
00:14:13,900 --> 00:14:18,400
say, we're allowed to drop 1 
million contract for month and 

278
00:14:18,400 --> 00:14:21,200
we start dropping 100,000 per 
hour. 

279
00:14:21,600 --> 00:14:25,500
We're going to exhaust that 
error budget for the month in 10

280
00:14:25,500 --> 00:14:27,500
hours to us. 
That's an emergency. 

281
00:14:27,700 --> 00:14:31,200
I'll go and fix it right away. 
But if we have like a thousand 

282
00:14:31,200 --> 00:14:33,200
bad request that are just being 
dropped per hour. 

283
00:14:33,600 --> 00:14:35,100
Excellent. 
Think that thousand hours? 

284
00:14:35,200 --> 00:14:37,900
That's not an emergency. 
So kind of it differentiates 

285
00:14:37,900 --> 00:14:41,700
between expected levels of 
errors and an expected levels of

286
00:14:41,700 --> 00:14:45,200
errors and is not susceptible to
the same problem that people 

287
00:14:45,200 --> 00:14:48,500
have had with monitoring in the 
past of, oh my God. 

288
00:14:48,500 --> 00:14:51,400
It's 2 a.m. 1 out of 1 requests 
failed. 

289
00:14:51,500 --> 00:14:53,700
That's 100 percent error rate. 
Wake up everyone. 

290
00:14:54,100 --> 00:14:56,900
No one likes to paste it to. 
I am for something that flaps 

291
00:14:56,900 --> 00:14:57,900
and goes, right. 
Way. 

292
00:14:58,100 --> 00:15:01,800
So like peanut butter and jelly 
the go better together and then 

293
00:15:01,800 --> 00:15:04,400
together they tend to provide 
each superset of the kind of 

294
00:15:04,400 --> 00:15:06,600
monitoring and logging. 
Your people used to do the past.

295
00:15:07,200 --> 00:15:09,900
So one thing, when I read your 
book as well, in preparation of 

296
00:15:09,900 --> 00:15:12,800
this conversation is that you 
mentioned traditional monitoring

297
00:15:12,800 --> 00:15:14,900
is more reactive. 
You mentioned about getting 

298
00:15:14,900 --> 00:15:17,800
alerted, you only see side 
effects of certain things 

299
00:15:17,800 --> 00:15:20,900
happening while observability is
more investigative. 

300
00:15:21,100 --> 00:15:24,000
So tell us more about this 
investigative manners or when an

301
00:15:24,000 --> 00:15:26,600
incident happens. 
What actually does 

302
00:15:26,700 --> 00:15:27,900
observability. 
You. 

303
00:15:28,800 --> 00:15:34,000
Yeah, so observability really 
helps you form and test 

304
00:15:34,000 --> 00:15:37,800
hypotheses really quickly 
because when I am trying to 

305
00:15:37,800 --> 00:15:40,200
debug an incentive, my error 
budget has gone off. 

306
00:15:40,400 --> 00:15:44,300
The number one thing nicer 
asking is what's special about 

307
00:15:44,300 --> 00:15:47,800
the request their favorite 
instead of looking at this wall 

308
00:15:47,800 --> 00:15:50,300
of breaths to figure out which 
lines later at the same time. 

309
00:15:51,100 --> 00:15:54,500
What observability gives me is 
this powder to have caused 

310
00:15:54,500 --> 00:15:57,800
Ellen. 
The power to see on request That

311
00:15:57,800 --> 00:16:00,700
failed to correlate with 
properties were most associated 

312
00:16:00,700 --> 00:16:01,000
with. 
Then. 

313
00:16:01,000 --> 00:16:02,600
We what's maybe it's one 
customer review. 

314
00:16:02,600 --> 00:16:04,800
Its when building, right? 
Or maybe it's combination of 

315
00:16:04,808 --> 00:16:08,000
those two things and the really 
neat thing is making somebody 

316
00:16:08,000 --> 00:16:09,600
going to be. 
So we've implement this is a 

317
00:16:09,600 --> 00:16:11,700
feature called Bubble Up where 
you can draw a box around the 

318
00:16:11,700 --> 00:16:15,000
anomaly and will tell you what's
special about that anomaly where

319
00:16:15,000 --> 00:16:17,700
you're telling us what? 
They're not only is we're not 

320
00:16:17,700 --> 00:16:20,200
trying to guess based off of two
Sigma or whatever. 

321
00:16:20,200 --> 00:16:22,700
Ready work. 
Just telling you, here's the 

322
00:16:22,700 --> 00:16:24,400
difference between your control 
and experiment. 

323
00:16:24,600 --> 00:16:26,700
The cool thing is from there. 
You can basically go in and 

324
00:16:26,700 --> 00:16:29,900
refine where you can Filter only
to this population of the vents 

325
00:16:30,300 --> 00:16:34,300
and man it repeat the process or
to group by this field or to 

326
00:16:34,300 --> 00:16:37,600
group by a combination of fields
and to kind of confirm that 

327
00:16:37,600 --> 00:16:39,700
hypothesis. 
So instead of saying I'm like, 

328
00:16:39,700 --> 00:16:41,400
oh my God, like, how do I set up
this? 

329
00:16:41,400 --> 00:16:44,300
We re how do I write it in this 
up to square? 

330
00:16:44,300 --> 00:16:45,400
E language? 
Oh God. 

331
00:16:45,400 --> 00:16:46,900
I hope I got it. 
Right because it's going to take

332
00:16:46,900 --> 00:16:48,500
two minutes, run over all these 
logs. 

333
00:16:48,800 --> 00:16:51,300
Instead, you get that feedback 
instantly. 

334
00:16:51,600 --> 00:16:55,400
So going back to our solos, 
99.5% of honeycomb queries, 

335
00:16:55,500 --> 00:16:57,400
complete within 10 seconds. 
There it is. 

336
00:16:57,500 --> 00:17:00,600
Not a costume, messing up. 
You can feel free to just try 

337
00:17:00,600 --> 00:17:03,500
and experiment to analyze and 
slice your data particular rate.

338
00:17:03,500 --> 00:17:05,300
You can resolve within 10 
seconds. 

339
00:17:05,300 --> 00:17:07,599
If it doesn't work out, try it 
out of worried. 

340
00:17:07,599 --> 00:17:09,700
If it doesn't work out, you now 
have the MU. 

341
00:17:09,700 --> 00:17:12,800
If you need to run your neck 
square and then ultimately, end 

342
00:17:12,800 --> 00:17:16,599
of the day you can visualize 
this in either can aggregated 

343
00:17:16,599 --> 00:17:18,099
metrics. 
Like what's the distribution of 

344
00:17:18,099 --> 00:17:21,599
Lindsay's or you can go and look
at the raw data and see it kind 

345
00:17:21,599 --> 00:17:24,599
of in tabular log. 
Click format of where each of 

346
00:17:24,599 --> 00:17:27,400
the fields and the relevant 
queries attached or you. 

347
00:17:27,599 --> 00:17:29,400
And look at it as a trees, 
wonderful. 

348
00:17:29,600 --> 00:17:31,600
Okay. 
Think that kind of allows you to

349
00:17:31,600 --> 00:17:34,500
slice and visualize the data. 
Any way that you need in order 

350
00:17:34,500 --> 00:17:36,500
to understand. 
Why is the slow worst failure 

351
00:17:36,500 --> 00:17:38,400
coming from. 
So it starts with the forming of

352
00:17:38,400 --> 00:17:41,000
a place this narrowing down the 
data and then looking at the 

353
00:17:41,008 --> 00:17:43,400
data to confirm your hypothesis.
Yeah. 

354
00:17:43,400 --> 00:17:45,600
So if I understand correctly, 
the one that you described just 

355
00:17:45,600 --> 00:17:48,400
now is actually the what you 
call Core analysis Loop. 

356
00:17:48,400 --> 00:17:51,200
So when an incident happens, 
this is what the sequence of 

357
00:17:51,200 --> 00:17:54,200
events that is going to happen. 
So you look at a bunch of data 

358
00:17:54,200 --> 00:17:56,200
that you load, initially, then 
you figure out. 

359
00:17:56,200 --> 00:17:57,400
Okay. 
This looks interesting. 

360
00:17:57,500 --> 00:18:00,400
Testing maybe you use Bubble Up 
in honeycomb right? 

361
00:18:00,500 --> 00:18:03,400
Where you maybe narrow down the 
search results. 

362
00:18:03,700 --> 00:18:05,400
And then from there you 
correlate and then you start 

363
00:18:05,400 --> 00:18:08,100
again, and then you iterate 
until you actually find the root

364
00:18:08,100 --> 00:18:11,200
cause do I understand correctly?
Yeah, maybe not necessarily the 

365
00:18:11,200 --> 00:18:14,900
root cause, but like a proximate
set of trigger sent me of hips. 

366
00:18:14,900 --> 00:18:18,500
This system over the edge, there
is usually no one root cause and

367
00:18:18,500 --> 00:18:20,900
a and then system. 
The other thing I want to point 

368
00:18:20,900 --> 00:18:22,800
out. 
Is that the thinking about what 

369
00:18:22,800 --> 00:18:26,000
a core analysis Loop is, that's 
work long before I can 

370
00:18:26,000 --> 00:18:28,300
honeycomb. 
So when I Working at Google. 

371
00:18:28,300 --> 00:18:30,700
I think to formulate this 
hypothesis about the for 

372
00:18:30,700 --> 00:18:34,100
analysis Loop and make me ask 
her and then I realized that he 

373
00:18:34,100 --> 00:18:36,500
comes for you to be the kind of 
manifestation of. 

374
00:18:36,500 --> 00:18:39,300
How do we bring this technology?
Not it's to Google employees, 

375
00:18:39,300 --> 00:18:42,300
but to the rest of the world. 
Nice, nice. 

376
00:18:42,600 --> 00:18:45,400
So one thing that you mentioned 
when we do these cornelis, this 

377
00:18:45,400 --> 00:18:48,300
Loop is actually slicing and 
dicing data, right? 

378
00:18:48,300 --> 00:18:51,800
So I think one of the maybe 
commonly mention about the 

379
00:18:51,800 --> 00:18:55,200
requirements of observability is
the high cardinality and 

380
00:18:55,200 --> 00:18:58,700
dimensionality without this is 
Very difficult for you to slice 

381
00:18:58,700 --> 00:19:01,600
and dice the data because that's
just not something interesting 

382
00:19:01,600 --> 00:19:03,500
that you can narrow down. 
So tell us more. 

383
00:19:03,500 --> 00:19:07,200
What do you mean by cardinality 
and also dimensionality and why 

384
00:19:07,200 --> 00:19:10,000
it has to be high enough. 
Yeah. 

385
00:19:10,600 --> 00:19:14,100
So earlier I referred to this 
idea when your instrument in 

386
00:19:14,108 --> 00:19:18,500
your code that you should feel 
free to add as many attributes, 

387
00:19:18,700 --> 00:19:22,700
add as many key value pairs as 
you like to really explain. 

388
00:19:22,700 --> 00:19:24,900
As you go along. 
What's going on your coded? 

389
00:19:24,900 --> 00:19:26,000
Kind of leave this bread. 
Crumbs. 

390
00:19:26,200 --> 00:19:28,500
It's almost like adding test. 
Writing comments, right? 

391
00:19:28,500 --> 00:19:31,200
Think it's just a sensible way 
of leading your future self this

392
00:19:31,200 --> 00:19:34,300
imitation that it's waves. 
What is this code doing wear 

393
00:19:34,300 --> 00:19:36,900
this coat thinking? 
The reason why people have 

394
00:19:36,900 --> 00:19:39,400
hesitated to you that in the 
past at least with kind of 

395
00:19:39,400 --> 00:19:43,300
traditional monitoring systems, 
is that modern systems bill you 

396
00:19:43,700 --> 00:19:47,800
by the number of distinct key 
value pairs that occur on each 

397
00:19:47,800 --> 00:19:50,700
metric that you're reporting. 
So that's cause people that 

398
00:19:50,700 --> 00:19:53,500
either omit, these key value 
pairs, two logs where they get 

399
00:19:53,500 --> 00:19:58,200
lost or to not record them 
adult, so, When we think of it 

400
00:19:58,200 --> 00:20:00,800
out, High dimensionality, what 
we're encouraging you to do is 

401
00:20:00,800 --> 00:20:03,300
to sprinkle these annotations 
throughout your codes. 

402
00:20:03,300 --> 00:20:07,300
And to kind of encode as many 
keys as you like, but there's a 

403
00:20:07,308 --> 00:20:08,700
problem. 
The problem. 

404
00:20:08,700 --> 00:20:12,900
Is that even if you reduce the 
number of keys in a metrics 

405
00:20:12,900 --> 00:20:16,600
based system, it turns out that 
if you have a lot of distinct 

406
00:20:16,600 --> 00:20:20,100
values like user, IE, right? 
Let's suppose you have millions 

407
00:20:20,100 --> 00:20:23,200
of users. 
It turns out that your metric 

408
00:20:23,200 --> 00:20:26,000
system has to create a Time 
series for each distinct user 

409
00:20:26,000 --> 00:20:28,800
and track it forever. 
If that user is only appeared 

410
00:20:28,800 --> 00:20:32,400
once so there is this 
amortization that an excellent 

411
00:20:32,400 --> 00:20:36,600
system expects that your keys 
and values will get really used 

412
00:20:36,600 --> 00:20:38,300
often. 
And therefore there is a high 

413
00:20:38,300 --> 00:20:41,000
upfront cost to having a new key
at Value. 

414
00:20:41,600 --> 00:20:44,900
Therefore metric system penalize
you for having a high 

415
00:20:44,900 --> 00:20:47,800
cardinality dimension. 
So to sum this up, 

416
00:20:47,800 --> 00:20:50,200
dimensionality is about the 
number of distinct keys. 

417
00:20:50,300 --> 00:20:53,500
And cardinality is about the 
number of distinct values perky.

418
00:20:53,800 --> 00:20:57,300
They deal system that supports 
observability fit allow. 

419
00:20:57,500 --> 00:21:00,700
The sand basically and limited 
cardinality in a very, very high

420
00:21:00,700 --> 00:21:04,500
amount of dimensionality. 
So, for people who are trying to

421
00:21:04,600 --> 00:21:07,000
understand about this concept, 
cardinality dimensionality. 

422
00:21:07,000 --> 00:21:09,100
Let me just repeat again. 
What is just said? 

423
00:21:09,200 --> 00:21:12,800
So cardinality refers to the 
number of unique values that 

424
00:21:12,800 --> 00:21:16,300
you're storing in your metric 
system while dimensionality 

425
00:21:16,300 --> 00:21:19,500
refers to the number of unique 
keys that you're sending to your

426
00:21:19,500 --> 00:21:22,500
metric systems. 
Another thing that I learn about

427
00:21:22,500 --> 00:21:24,200
observability that you mentioned
the book. 

428
00:21:24,200 --> 00:21:27,000
I mentioned in the beginning 
that I used to go to the server 

429
00:21:27,000 --> 00:21:28,800
Mako. 
Changes to do debugging. 

430
00:21:29,100 --> 00:21:31,800
But in your book, you actually 
mention, if you use 

431
00:21:31,800 --> 00:21:35,100
observability, you're actually 
doing a debugging from first 

432
00:21:35,100 --> 00:21:37,700
principle. 
Tell us more about this concept 

433
00:21:37,700 --> 00:21:40,000
because I still trying to 
understand this part. 

434
00:21:40,600 --> 00:21:42,200
Yes. 
So we talked earlier about the 

435
00:21:42,200 --> 00:21:44,600
core analysis Loop, which kind 
of sets us while up to think 

436
00:21:44,600 --> 00:21:47,700
about this. 
When we have a really 

437
00:21:47,700 --> 00:21:51,200
well-functioning core analysis 
Loop where you're able to 

438
00:21:51,200 --> 00:21:56,500
rapidly form hypotheses and test
them, you no longer need to make

439
00:21:56,500 --> 00:22:01,000
these kind of leaps of intuition
to the Wonder magically right 

440
00:22:01,000 --> 00:22:04,700
hypothesis and instead you can 
test a lot of different 

441
00:22:04,700 --> 00:22:08,200
hypotheses and kind of narrow 
down your search space and 

442
00:22:08,200 --> 00:22:10,500
eliminate red, herrings dead 
ends. 

443
00:22:11,200 --> 00:22:13,800
Whereas in the previous ways of 
debunking. 

444
00:22:14,000 --> 00:22:16,700
It used to be that you would 
kind of have this one person who

445
00:22:16,700 --> 00:22:19,700
knew the system really well 
under pad who could immediately 

446
00:22:19,700 --> 00:22:22,300
jump to knowing what the right 
answer was. 

447
00:22:22,500 --> 00:22:25,100
Oh, I saw this two months ago. 
It sticks, right? 

448
00:22:25,300 --> 00:22:27,300
That's why I think first 
principles. 

449
00:22:27,500 --> 00:22:31,500
Debugging is much better because
it enables anyone on your team 

450
00:22:31,600 --> 00:22:35,200
who has learned how to do first 
principles of honking to step 

451
00:22:35,200 --> 00:22:39,000
into any editing, miliar 
situation and to figure it out. 

452
00:22:39,000 --> 00:22:43,100
And if predictable amount of 
time, sure, your expert, who 

453
00:22:43,100 --> 00:22:44,600
have seen. 
Mrs. Denman, who's been in the 

454
00:22:44,600 --> 00:22:47,500
company for 15 years. 
They might be able to solve the 

455
00:22:47,500 --> 00:22:51,900
issue in two minutes, but my 
goal is to make it so that the 

456
00:22:51,900 --> 00:22:55,500
worst case scenario of someone 
who's inexperienced, but nose 

457
00:22:55,500 --> 00:22:57,300
Works whistles, the money they 
Shield. 

458
00:22:57,400 --> 00:23:00,400
Take 10 minutes or 30 minutes, 
or most of the, but rather than 

459
00:23:00,400 --> 00:23:04,100
three hours, five hours, or 24 
hours or Worse, what happens at 

460
00:23:04,100 --> 00:23:06,400
that person who found the 
company for 15 years leads to 

461
00:23:06,400 --> 00:23:08,900
company over hires that happens 
these days. 

462
00:23:09,200 --> 00:23:12,700
So we talked a lot of honeycomb 
about this idea of bringing 

463
00:23:12,700 --> 00:23:14,900
everyone on your team. 
So the level of the best of 

464
00:23:14,900 --> 00:23:16,800
other. 
So you don't need these medical 

465
00:23:16,800 --> 00:23:19,300
flashes of intuition. 
And instead that. 

466
00:23:19,300 --> 00:23:21,900
Everyone has kind of that base, 
heat ability. 

467
00:23:22,400 --> 00:23:25,900
This is really personal to me 
because I started the golden 

468
00:23:25,900 --> 00:23:31,900
2008 and I Google in 2018, 11 
news I need had been on 12 

469
00:23:31,900 --> 00:23:33,500
different teams at Google and 
there's 11. 

470
00:23:34,000 --> 00:23:37,700
I was changing teams on average 
slightly faster than once per 

471
00:23:37,700 --> 00:23:39,500
year. 
I've state two years on one 

472
00:23:39,500 --> 00:23:41,800
team, but I'd also say with six 
months later, another key. 

473
00:23:42,200 --> 00:23:46,700
I never got that amount of time 
on any one team to really be in 

474
00:23:46,800 --> 00:23:50,400
the Deep expert on assistant, 
but they do still have really 

475
00:23:50,400 --> 00:23:54,300
great and valued team member 
because I had gotten really good

476
00:23:54,300 --> 00:23:57,300
at first principles to bite at 
understanding what? 

477
00:23:57,400 --> 00:23:59,900
What software is available in 
the Google tool stack to 

478
00:23:59,900 --> 00:24:04,400
understand any and similar kind 
of service because Google had 

479
00:24:04,400 --> 00:24:07,200
this kind of standardization 
every service that you will use 

480
00:24:07,200 --> 00:24:10,500
the same tracing. 
The same metric system, the same

481
00:24:10,500 --> 00:24:14,200
logging system and therefore 
both possible for me to walk 

482
00:24:14,200 --> 00:24:17,600
into a completely unfamiliar 
situation and figure it out in 

483
00:24:17,600 --> 00:24:20,500
half an hour or two. 
That was kind of surprising to 

484
00:24:20,500 --> 00:24:22,700
people. 
Because if you spent like two or

485
00:24:22,700 --> 00:24:24,600
three years on a system, but not
10 years. 

486
00:24:24,700 --> 00:24:27,000
It might hate you several hours 
to figure out what's going on. 

487
00:24:27,500 --> 00:24:30,200
You don't quite have that expert
level knowledge of the system. 

488
00:24:30,400 --> 00:24:33,600
But also you haven't grown that 
muscle yet. 

489
00:24:33,700 --> 00:24:36,600
Develop that muscle of how I 
walk into any unfamiliar 

490
00:24:36,600 --> 00:24:38,100
situation worse. 
For me. 

491
00:24:38,300 --> 00:24:40,400
I've always summer to meeting 
day. 

492
00:24:40,400 --> 00:24:43,000
One on that new team and be 180.
I'm Nic team. 

493
00:24:43,300 --> 00:24:46,100
I would have to figure things 
out on the Fly for myself. 

494
00:24:46,200 --> 00:24:49,700
That was my survival. 
So when you mention first 

495
00:24:49,700 --> 00:24:52,600
principal pulled those who are 
not familiar with this term, 

496
00:24:52,700 --> 00:24:54,400
what do you mean by first 
principle? 

497
00:24:54,400 --> 00:24:57,100
That's the first thing and tell 
us more about what are the 

498
00:24:57,100 --> 00:25:00,100
skills or maybe techniques that 
you categorize as first 

499
00:25:00,100 --> 00:25:02,600
principle debugging techniques. 
Yeah. 

500
00:25:02,700 --> 00:25:06,100
So in the field of engineering 
as a whole, when we talk about 

501
00:25:06,100 --> 00:25:09,100
going back to First principles, 
what we mean is throw out 

502
00:25:09,100 --> 00:25:12,100
everything, you know, let's 
start from understanding the 

503
00:25:12,108 --> 00:25:15,700
system from the laws of physics 
or the laws of mathematics 

504
00:25:15,800 --> 00:25:18,000
instead of trying to read the 
manual. 

505
00:25:18,200 --> 00:25:20,500
To see what this machine says it
does. 

506
00:25:20,900 --> 00:25:23,700
The machine is ultimately made 
up of a bunch of levers and 

507
00:25:23,700 --> 00:25:27,000
wires and so forth. 
So you can kind of Trace 

508
00:25:27,000 --> 00:25:31,100
everything back to understand. 
Okay, what does this lever do 

509
00:25:31,300 --> 00:25:32,600
now? 
What does the manual say this 

510
00:25:32,600 --> 00:25:34,300
lever does? 
Or what does this person? 

511
00:25:34,300 --> 00:25:35,200
Remember that? 
It'll ever? 

512
00:25:35,200 --> 00:25:38,500
Does you go and look at the 
Machinery coltd connected to the

513
00:25:38,500 --> 00:25:40,400
back of this lever, to 
understand what it does. 

514
00:25:40,700 --> 00:25:43,000
I think that's the definition 
from the first principles to 

515
00:25:43,008 --> 00:25:47,800
Mummy is to the wrong, the book 
and use the laws of the system 

516
00:25:47,800 --> 00:25:49,700
wants. 
Of reality to understand it when

517
00:25:49,700 --> 00:25:53,100
this comes to computer systems, 
but this fundamentally meant to 

518
00:25:53,100 --> 00:25:56,100
me initially my time at Google. 
And then my time to do 

519
00:25:56,100 --> 00:26:00,100
Cunningham is like, I start 
buying looking at the flow of 

520
00:26:00,100 --> 00:26:03,900
execution of one request. 
Try to understand. 

521
00:26:03,900 --> 00:26:05,800
What does spread since a normal 
request. 

522
00:26:05,800 --> 00:26:09,100
Look like, and what does a 
abnormal reflect where do they 

523
00:26:09,100 --> 00:26:11,100
differ? 
What services are they passing 

524
00:26:11,100 --> 00:26:12,100
through? 
Where are they spending the 

525
00:26:12,100 --> 00:26:14,600
kind? 
This kind of where I find this 

526
00:26:14,600 --> 00:26:18,000
idea of exemplars. 
Very useful, fine. 

527
00:26:18,200 --> 00:26:21,900
Meaning example, traces that 
exemplify both the slow path in 

528
00:26:21,900 --> 00:26:24,400
the past, that man just working 
them up on my skin. 

529
00:26:24,400 --> 00:26:26,900
Comparing them to try to 
understand whereas one passwords

530
00:26:26,900 --> 00:26:31,100
one slow and then we'll bring to
mind hypotheses my ideas about 

531
00:26:31,100 --> 00:26:33,800
why might these two things be 
different and then I can start 

532
00:26:33,800 --> 00:26:35,900
testing them. 
So, that's what I mean. 

533
00:26:35,900 --> 00:26:37,400
When I talk about first whistles
to buddy. 

534
00:26:37,700 --> 00:26:39,700
It's not knowing magically 
what's graphite. 

535
00:26:39,700 --> 00:26:43,100
Which gas for to look at instead
spending a little at a time, 

536
00:26:43,100 --> 00:26:44,900
puzzling out. 
What's the difference between 

537
00:26:44,900 --> 00:26:47,700
these two things based off of 
what I cannot observe about 

538
00:26:47,700 --> 00:26:48,700
them? 
I'm outside. 

539
00:26:49,600 --> 00:26:52,200
Thanks for that explanation. 
So we've been talking a lot 

540
00:26:52,200 --> 00:26:54,000
about the techniques. 
The internals. 

541
00:26:54,200 --> 00:26:56,500
What observability is? 
Tell us, more white 

542
00:26:56,500 --> 00:26:58,400
observability. 
Now become very hot. 

543
00:26:58,500 --> 00:27:00,800
Is it? 
Because it's just a trend, so 

544
00:27:00,800 --> 00:27:04,300
many tools available, or is it 
actually solving a real problem?

545
00:27:05,100 --> 00:27:09,300
I think the answer and 
fortunately is yes to both and 

546
00:27:09,300 --> 00:27:13,900
it is a non overlapping set. 
So in terms of the definition of

547
00:27:13,900 --> 00:27:16,800
observability gate, earlier of 
kind, of understanding complex 

548
00:27:16,800 --> 00:27:20,500
systems from Principles and 
being able to debug in the 

549
00:27:20,500 --> 00:27:23,500
puddles. 
The reason why we need this 

550
00:27:23,500 --> 00:27:28,000
today is because we are building
some native microservices where 

551
00:27:28,400 --> 00:27:33,100
you can no longer use a 
traditional logging system, that

552
00:27:33,100 --> 00:27:36,700
collapse a lunch from one host 
where your entire query runs or 

553
00:27:36,700 --> 00:27:39,200
where you can do a p.m. 
Along the entire post to 

554
00:27:39,200 --> 00:27:40,600
understand. 
What are the slow path on 

555
00:27:40,600 --> 00:27:42,600
Justice. 
My post that doesn't work 

556
00:27:42,600 --> 00:27:44,700
anymore. 
If you have microservices a 

557
00:27:44,700 --> 00:27:47,400
request that needs to flow 
through Services maintained by 

558
00:27:47,400 --> 00:27:50,300
more than one. 
A team at that point, you kind 

559
00:27:50,300 --> 00:27:53,500
of now have these squishy 
boundaries where no one person 

560
00:27:53,500 --> 00:27:56,300
holds that information about 
what's going on in their head. 

561
00:27:56,700 --> 00:28:00,500
So I think the motivation is 
that the complexity of systems 

562
00:28:00,700 --> 00:28:04,400
has meant that we can no longer 
the bugs to Stems based off of 

563
00:28:04,400 --> 00:28:07,700
the known and knowns the metrics
that we thought to create an 

564
00:28:07,700 --> 00:28:10,600
advanced or kind of these 
magical, flashes of insight by 

565
00:28:10,600 --> 00:28:15,000
the expert in that, there is no 
longer an expert on every aspect

566
00:28:15,000 --> 00:28:17,000
of your sister might need 
advance for not only like one or

567
00:28:17,000 --> 00:28:19,300
two aspects of your system. 
Not how all the pieces fit 

568
00:28:19,300 --> 00:28:20,900
together. 
That's the motivate. 

569
00:28:20,900 --> 00:28:23,500
That's why it's our ability is 
crucial and needed. 

570
00:28:23,800 --> 00:28:27,500
The problem is that 
observability has you mentioned 

571
00:28:27,500 --> 00:28:31,800
three pillars earlier people who
are already selling you 

572
00:28:31,800 --> 00:28:34,600
solutions to one or two or maybe
even all three of these 

573
00:28:34,600 --> 00:28:37,000
so-called pillars are trying to 
persuade you. 

574
00:28:37,000 --> 00:28:40,600
That what they do, satisfies 
this requirement, but being able

575
00:28:40,600 --> 00:28:42,200
to understand your complex 
systems. 

576
00:28:42,700 --> 00:28:45,900
So basically, it's three pillars
of the durability, you can have 

577
00:28:45,900 --> 00:28:47,900
observability. 
If you just buy our logging, 

578
00:28:48,100 --> 00:28:49,700
Increasing and reference 
Solutions. 

579
00:28:49,800 --> 00:28:53,400
I think that's why he's seen so 
much marketing buzz and noise 

580
00:28:53,400 --> 00:28:56,600
about it. 
Is that everyone in the industry

581
00:28:56,600 --> 00:28:59,000
historic calling? 
What they do observability, even

582
00:28:59,000 --> 00:29:02,100
if it isn't this hilarious is 
like you see all these companies

583
00:29:02,100 --> 00:29:04,800
and you know, make data 
observability that our light 

584
00:29:04,800 --> 00:29:06,600
source code observability and 
wait. 

585
00:29:07,000 --> 00:29:08,500
Does this work mean anything 
anymore? 

586
00:29:09,000 --> 00:29:11,700
So I think basically going to 
make history, honey, come first 

587
00:29:11,700 --> 00:29:15,200
started using the words, our 
ability in 2017, and it actually

588
00:29:15,200 --> 00:29:17,900
predated Us by a little bit. 
I think the Twitter observe. 

589
00:29:18,100 --> 00:29:21,200
We'll beat increases by one or 
two years in terms of using the 

590
00:29:21,200 --> 00:29:23,400
system's control word, 
observability to spread this 

591
00:29:23,400 --> 00:29:27,400
idea of understanding systems 
and you'll see this explosion of

592
00:29:27,400 --> 00:29:30,800
people like starting to call it 
a durability from 2019 onwards, 

593
00:29:30,800 --> 00:29:33,500
really that there's kind of this
plethora of companies that say, 

594
00:29:33,500 --> 00:29:37,500
oh we do observability tool and 
it's like really, okay. 

595
00:29:37,600 --> 00:29:41,000
Well, let's see. 
Can you actually understand your

596
00:29:41,000 --> 00:29:44,000
systems altogether or is this 
just a rebranding of your 

597
00:29:44,000 --> 00:29:45,800
existing monitoring and logging 
Solutions? 

598
00:29:47,000 --> 00:29:49,300
Yeah, sometimes this is also my 
confusion. 

599
00:29:49,400 --> 00:29:52,300
So, when you see some brands or 
some products called themselves 

600
00:29:52,300 --> 00:29:55,000
observability, most of them are 
like, maybe white labeling. 

601
00:29:55,000 --> 00:29:57,400
So, to speak right away. 
You can actually see locks 

602
00:29:57,400 --> 00:29:59,300
metrics and traces. 
Sometimes they are not 

603
00:29:59,300 --> 00:30:00,700
integrated. 
In fact, right? 

604
00:30:00,700 --> 00:30:03,400
So you'll see three different 
features and three different 

605
00:30:03,400 --> 00:30:06,200
things and you will correlate 
using a human intuition. 

606
00:30:06,200 --> 00:30:08,800
Yeah, and what's worse is 
they're making you pay because 

607
00:30:08,800 --> 00:30:10,500
throughout the day at three to 
four times, right? 

608
00:30:10,500 --> 00:30:12,600
Like they're showing you three 
different skus. 

609
00:30:12,800 --> 00:30:15,600
This should be one set of data 
so that there's not any Miss 

610
00:30:15,600 --> 00:30:16,600
lashes. 
You can jump. 

611
00:30:16,800 --> 00:30:19,700
Leave between 11 and so doesn't 
cost you an arm and a leg. 

612
00:30:20,300 --> 00:30:22,700
Yeah, the cost. 
But I agree because it can cause

613
00:30:22,700 --> 00:30:24,400
you a lot. 
When you send a lot better. 

614
00:30:24,500 --> 00:30:27,100
I don't know, in your personal 
experience when working with 

615
00:30:27,100 --> 00:30:30,800
honeycomb or when you're meeting
developers out there in my 

616
00:30:30,800 --> 00:30:34,200
experience, even though we have 
this microservices cognitive. 

617
00:30:34,300 --> 00:30:37,000
There are still people who 
actually believe it or not. 

618
00:30:37,100 --> 00:30:41,200
Do not Implement any Trace. 
Do not Implement any metrics or 

619
00:30:41,200 --> 00:30:44,500
has a very little locks 
implemented in their system. 

620
00:30:44,900 --> 00:30:46,600
That is more. 
Why is this still happening? 

621
00:30:46,800 --> 00:30:49,700
Even though we see bass Trends, 
but observability and all that. 

622
00:30:50,400 --> 00:30:55,600
And the reason is that, it works
until it doesn't system. 

623
00:30:55,600 --> 00:30:58,100
Complexity is a very sneaky 
thing. 

624
00:30:58,600 --> 00:31:01,500
You think you can understand 
everything in your head and fate

625
00:31:01,500 --> 00:31:03,900
of your that magical person 
that's able to debug everything 

626
00:31:03,900 --> 00:31:06,900
within 5 minutes until you get 
the thing that stops you that 

627
00:31:06,900 --> 00:31:10,100
takes 3 hours or until when you 
go on vacation and someone is 

628
00:31:10,100 --> 00:31:12,200
calling you because they can't 
figure out the system that you 

629
00:31:12,200 --> 00:31:15,100
built. 
So I think that's the reason why

630
00:31:15,100 --> 00:31:18,100
people may not necessarily. 
We invest in observability when 

631
00:31:18,100 --> 00:31:21,300
they should be, is because the 
cost of inadequate, 

632
00:31:21,300 --> 00:31:24,900
observability sneaks up on you. 
It's a form of technical depth 

633
00:31:24,900 --> 00:31:28,400
to have a lack of observability.
We know how good the industry is

634
00:31:28,400 --> 00:31:29,800
about paying down technical 
data. 

635
00:31:30,000 --> 00:31:33,600
So I think that's what it is. 
I think the other unfortunately 

636
00:31:33,600 --> 00:31:37,400
says resetting is like, it feels
really good to be the person who

637
00:31:37,400 --> 00:31:41,000
magically debunked the problem. 
It feels like job security feels

638
00:31:41,000 --> 00:31:43,700
like accomplishing something. 
If you're the most senior 

639
00:31:43,700 --> 00:31:46,600
engineer I met him, you know, 
everything about the system and 

640
00:31:46,700 --> 00:31:48,300
The one making decisions about 
the system. 

641
00:31:48,400 --> 00:31:51,800
You may not necessarily care 
quite as much as the engineer, 

642
00:31:51,800 --> 00:31:53,600
whose New Year. 
He man is struggling to 

643
00:31:53,600 --> 00:31:55,100
understand how to solve it 
together. 

644
00:31:56,200 --> 00:31:59,800
Now, if people have listened to 
this episode, they want to start

645
00:31:59,800 --> 00:32:02,400
implementing observability, how 
they can do it. 

646
00:32:02,400 --> 00:32:05,800
Is it like go and buy or 
subscribe to our solutions that 

647
00:32:05,800 --> 00:32:09,700
are available out there install 
open source, maybe tell us more 

648
00:32:09,700 --> 00:32:12,800
how you should start. 
I think there should be just to 

649
00:32:12,800 --> 00:32:14,300
let. 
I think there's kind of the 

650
00:32:14,300 --> 00:32:17,700
foundational understanding of 
what observability is innocent 

651
00:32:17,900 --> 00:32:20,700
and you mentioned our book 
observability engineering, we 

652
00:32:20,700 --> 00:32:23,600
can drop a link in the show 
notes to a way to download a 

653
00:32:23,600 --> 00:32:25,300
free copy of the Missouri, the 
lady in here. 

654
00:32:25,500 --> 00:32:28,100
Here, so I would start by 
reading that book or at least 

655
00:32:28,100 --> 00:32:31,100
reading the first couple 
chapters just to make sure that 

656
00:32:31,100 --> 00:32:33,700
you have that language to 
explain to your key. 

657
00:32:34,100 --> 00:32:36,400
Why do you want them to change 
their practices? 

658
00:32:36,900 --> 00:32:38,300
You know, you figure out. 
It was the tools in the world, 

659
00:32:38,300 --> 00:32:39,300
right? 
This is what I discovered in 

660
00:32:39,308 --> 00:32:42,000
Google. 
We've had really good tracing at

661
00:32:42,000 --> 00:32:42,900
Google. 
Since 2000. 

662
00:32:42,900 --> 00:32:46,700
Maybe no one used it. 
No one used it because it was 

663
00:32:46,700 --> 00:32:50,200
far to use and no one could see 
why they needed use it. 

664
00:32:50,400 --> 00:32:52,900
There is no easy linkage to 
people's existing debugging 

665
00:32:52,900 --> 00:32:54,800
workloads. 
So adding a new tool doesn't 

666
00:32:54,800 --> 00:32:55,700
necessarily. 
Things. 

667
00:32:55,700 --> 00:32:58,400
I must equal understand the 
motivation and understand how 

668
00:32:58,400 --> 00:33:01,800
they are going to use it. 
Once you have to be a lot in the

669
00:33:01,800 --> 00:33:06,500
next step is to add the open 
Telemetry SDK to your 

670
00:33:06,500 --> 00:33:10,100
application. 
So what if open Telemetry open? 

671
00:33:10,100 --> 00:33:15,500
Telemetry is a vendor neutral, 
open source, SDK that allows you

672
00:33:15,500 --> 00:33:18,300
to generate this club of free 
data. 

673
00:33:18,600 --> 00:33:21,100
Whether you metrics formatted 
log form, editor Trace 

674
00:33:21,100 --> 00:33:24,400
formatted. 
The idea is that it is a common 

675
00:33:24,400 --> 00:33:28,000
language for being Be able to 
produce this data to transmit 

676
00:33:28,000 --> 00:33:31,700
around, its you propagate that 
skeet and contacts between 

677
00:33:31,700 --> 00:33:35,000
different writer service. 
So when you add opportunity to 

678
00:33:35,000 --> 00:33:37,600
your application, you're not 
locking yourself into any 

679
00:33:37,600 --> 00:33:41,600
Pickers Lucien, you're making an
investment in your future in the

680
00:33:41,600 --> 00:33:43,700
same way that you're adding a 
new test framework. 

681
00:33:43,700 --> 00:33:45,600
Your application is an 
investment in your future. 

682
00:33:46,300 --> 00:33:49,000
You still want me to add some 
taps or you might need to add 

683
00:33:49,000 --> 00:33:50,500
some anyone's tradition, but 
open plumber. 

684
00:33:50,500 --> 00:33:53,800
Tree in general, does a good job
of handling that that automatic 

685
00:33:53,800 --> 00:33:57,100
Generations faced a Your 
outpatient be something with 

686
00:33:57,100 --> 00:34:00,900
friedrichs that you're using, so
you can have a kind of this Rich

687
00:34:00,900 --> 00:34:03,500
data about what requests are 
going in and out with this brain

688
00:34:03,500 --> 00:34:07,000
work either relationship. 
Then you'll have to pick a place

689
00:34:07,000 --> 00:34:10,000
to send that data. 
And there are a white number of 

690
00:34:10,000 --> 00:34:12,300
options. 
You can certainly use open 

691
00:34:12,300 --> 00:34:14,900
source Solutions, like, Yeager 
the trouble with their 

692
00:34:14,900 --> 00:34:19,100
difference or Solutions is that.
They are great for visualizing 

693
00:34:19,100 --> 00:34:22,800
individual traces, but that it's
not necessarily going to provide

694
00:34:22,800 --> 00:34:25,300
a comprehensive replacement for 
cured. 

695
00:34:25,400 --> 00:34:28,800
Receive monetary work floats 
because people often need to 

696
00:34:28,800 --> 00:34:31,500
understand what's the average 
yield by system as a whole. 

697
00:34:31,500 --> 00:34:34,100
And then zoom into that trace 
and gave birth in fulfill these 

698
00:34:34,100 --> 00:34:36,300
image of the trace. 
They don't it's about point. 

699
00:34:36,300 --> 00:34:38,800
You might want to look at vendor
Solutions and you know, 

700
00:34:38,800 --> 00:34:40,600
certainly honeycomb is out 
there. 

701
00:34:41,000 --> 00:34:45,199
I also think very highly of our 
waitstaff, but basically any 

702
00:34:45,199 --> 00:34:48,699
back-end that supports open plum
tree is going to be a place that

703
00:34:48,699 --> 00:34:51,400
you can send that big issue and 
the bonus wasn't alone. 

704
00:34:51,400 --> 00:34:53,199
Free is. 
It's better neutral. 

705
00:34:53,199 --> 00:34:56,300
And it supports teeing the data 
you can They sent a more than 

706
00:34:56,300 --> 00:34:59,300
her career at the same time and 
see which one you like, which I 

707
00:34:59,300 --> 00:35:01,700
think is really great. 
I think competition is better 

708
00:35:01,700 --> 00:35:04,200
for the market. 
It's great for everyone and it 

709
00:35:04,200 --> 00:35:06,800
really incentivize his 
suspenders to do the right thing

710
00:35:06,800 --> 00:35:08,200
by. 
Therefore. 

711
00:35:08,200 --> 00:35:10,900
I think that's why I recommend 
open Plum Tree because it could 

712
00:35:10,900 --> 00:35:14,200
have handles a lot of what would
otherwise lead, wrote work 

713
00:35:14,200 --> 00:35:17,700
cream, Trace bands, propagating 
and palmitate around it. 

714
00:35:17,700 --> 00:35:20,000
It doesn't lock you in, it gives
you that freedom of choice. 

715
00:35:21,100 --> 00:35:24,100
So thanks for the tips off, 
opting for open Telemetry open 

716
00:35:24,100 --> 00:35:26,600
Telemetry service like the New 
standard previously. 

717
00:35:26,600 --> 00:35:30,200
It's called open tracing the 
murderer, open census, and open 

718
00:35:30,200 --> 00:35:32,300
tracing. 
If you had two standards when it

719
00:35:32,300 --> 00:35:35,700
works on at what we actually 
have managed to deprecate both 

720
00:35:35,700 --> 00:35:37,300
of its senses. 
And I've been tracing, we're not

721
00:35:37,300 --> 00:35:41,900
doing that XKCD comic thing of 
now, there are 23 standards Alia

722
00:35:41,900 --> 00:35:43,700
Eve, you choose for vendor 
Solutions. 

723
00:35:43,700 --> 00:35:47,800
Normally, they would charge you 
by the number of data being sent

724
00:35:47,800 --> 00:35:51,400
to the systems. 
So tell us more bit of practical

725
00:35:51,400 --> 00:35:53,800
choices yet. 
How do you assess Solutions was 

726
00:35:53,800 --> 00:35:55,600
of all? 
If let's say we want to go With 

727
00:35:55,600 --> 00:35:59,100
the vendor says based products. 
Yeah, I think that you have to 

728
00:35:59,100 --> 00:36:01,600
think about the cost of that. 
Is this how much is it going to 

729
00:36:01,600 --> 00:36:03,900
cost you and what are the 
benefits that you get? 

730
00:36:04,100 --> 00:36:06,400
So you may not necessarily want 
to go with the lowest cost 

731
00:36:06,400 --> 00:36:09,300
vendor because lowest-cost 
vendor might provide with a very

732
00:36:09,300 --> 00:36:11,600
primitive ability to analyze the
Bob. 

733
00:36:11,900 --> 00:36:14,400
I think a lot of favorites. 
Her investment from 

734
00:36:14,400 --> 00:36:18,300
observability comes from saving 
your in Heroes time, and 

735
00:36:18,300 --> 00:36:20,900
improving your customer outcomes
and decreasing customers. 

736
00:36:20,900 --> 00:36:24,000
Sure that is far more important 
than the cost of the Beast 

737
00:36:24,000 --> 00:36:26,400
solution for instance. 
Eat your is free. 

738
00:36:26,500 --> 00:36:27,900
But how much time is it going to
say? 

739
00:36:27,900 --> 00:36:29,100
Do you have the bugging? 
Really? 

740
00:36:29,300 --> 00:36:31,800
So I think there's kind of this 
Continuum and it's important to 

741
00:36:31,800 --> 00:36:37,000
focus on one or your evaluation 
criteria and to specify a front.

742
00:36:37,000 --> 00:36:41,400
We want to be able to understand
issues within half an hour, or 

743
00:36:41,400 --> 00:36:44,700
we want to be able to measure or
serviceable decades in the first

744
00:36:44,700 --> 00:36:46,700
place even understand where 
we're going wrong. 

745
00:36:46,900 --> 00:36:50,000
So I think that's kind of one 
dimension and I think that go 

746
00:36:50,000 --> 00:36:51,900
along with that. 
How intuitive is it? 

747
00:36:52,000 --> 00:36:54,000
Is everyone on my team adopting 
it. 

748
00:36:54,300 --> 00:36:56,900
So after we Choose all these 
open Telemetry. 

749
00:36:56,900 --> 00:36:59,800
We know which vendor solution. 
I think at the end of the day. 

750
00:36:59,800 --> 00:37:02,600
The developers themselves need 
to instrument the code. 

751
00:37:02,900 --> 00:37:05,200
That's correct. 
The Ottomans religion and only 

752
00:37:05,200 --> 00:37:07,400
get you so far. 
But other than some patients 

753
00:37:07,400 --> 00:37:10,000
never going to capture things 
through post bodies because 

754
00:37:10,000 --> 00:37:13,400
that's how you get X have for 
fleets right after you have to 

755
00:37:13,400 --> 00:37:16,000
pick and choose which attributes
you want to send along and the 

756
00:37:16,000 --> 00:37:18,100
answer is anything nonsense that
basically. 

757
00:37:18,500 --> 00:37:20,600
But the Tormentor chemicals, not
going to be able to do not 

758
00:37:20,600 --> 00:37:22,200
automatically. 
Yeah. 

759
00:37:22,400 --> 00:37:25,200
So although instrumentation can 
do that automatically. 

760
00:37:25,400 --> 00:37:28,000
For up till a certain level, but
I think at the end of the day, 

761
00:37:28,000 --> 00:37:30,700
developers needs to have a 
conscious decision to instrument

762
00:37:30,700 --> 00:37:32,800
the code, Click by gravity would
laugh? 

763
00:37:32,800 --> 00:37:34,900
At someone said, I can 
automatically create your cast 

764
00:37:34,900 --> 00:37:37,100
for you or I can automatically 
convert code for you. 

765
00:37:37,100 --> 00:37:41,200
It's like, no, you can't. 
Yeah, which brings to the 

766
00:37:41,200 --> 00:37:42,700
technique that you mentioned the
book. 

767
00:37:42,800 --> 00:37:45,300
We have heard about these term, 
a lot of times driven 

768
00:37:45,300 --> 00:37:46,000
development. 
Right? 

769
00:37:46,000 --> 00:37:49,000
So you coined, this 
observability driven development

770
00:37:49,000 --> 00:37:51,100
and shifting left for 
observability. 

771
00:37:51,400 --> 00:37:54,100
So, tell us more about this 
technique, this concept, why is 

772
00:37:54,100 --> 00:37:56,800
it important? 
Yeah, I think it's super 

773
00:37:56,800 --> 00:38:01,000
important to understand. 
What is this code going to look 

774
00:38:01,000 --> 00:38:04,300
like in production? 
So the best way is to add that 

775
00:38:04,300 --> 00:38:07,500
Telemetry to add those key value
pairs into your code. 

776
00:38:07,500 --> 00:38:10,500
As you're developing it that 
way, you can test it inside your

777
00:38:10,500 --> 00:38:13,200
test Suites that way. 
Inside of your death box. 

778
00:38:13,200 --> 00:38:15,900
You can admit that Telemetry 
data to a tracing people and to 

779
00:38:15,900 --> 00:38:18,800
see what was traced. 
And look like it. 

780
00:38:18,800 --> 00:38:22,400
Hey, you might discover, why 
your tests are failing not by a 

781
00:38:22,400 --> 00:38:25,600
kind of running a debugger, but 
instead by looking at At this 

782
00:38:25,600 --> 00:38:28,400
Pharmacy data. 
So the earlier you adopt 

783
00:38:28,400 --> 00:38:30,700
observability injured in all /, 
lifecycle. 

784
00:38:31,300 --> 00:38:35,100
The mortar developers will use 
it at all stages of development.

785
00:38:35,100 --> 00:38:38,600
Not just at production and we 
see her you'll catch box. 

786
00:38:38,900 --> 00:38:41,400
So that's the argument is that 
it's really synergistic with 

787
00:38:41,400 --> 00:38:44,600
kept certain development. 
But you're also exercising your 

788
00:38:44,600 --> 00:38:46,700
observability code during your 
tests. 

789
00:38:47,600 --> 00:38:49,200
Yeah. 
So if I can imagine, I'm being a

790
00:38:49,200 --> 00:38:53,000
developer again, so as and when 
I try to develop a feature, I 

791
00:38:53,008 --> 00:38:55,300
have to think about what kind of
things that I want to. 

792
00:38:55,400 --> 00:38:58,900
Able to observe or maybe look at
in my of the Villa T tools so 

793
00:38:58,900 --> 00:39:02,200
that when an issue happens, I 
actually have those data and 

794
00:39:02,200 --> 00:39:04,900
then the I implement or 
instrumented in my code. 

795
00:39:05,000 --> 00:39:07,200
I think that's really key. 
Another thing that you 

796
00:39:07,200 --> 00:39:10,400
mentioned, if developers want to
have more conscious decision of 

797
00:39:10,400 --> 00:39:13,100
adding instrumentation is 
putting them on call hence the 

798
00:39:13,100 --> 00:39:15,500
but their own system. 
So that's why this is actually 

799
00:39:15,500 --> 00:39:17,300
very important. 
Yeah. 

800
00:39:17,400 --> 00:39:21,200
So I think that it is important 
for developers to have seven 

801
00:39:21,200 --> 00:39:23,600
steak in the production writing 
at their application. 

802
00:39:24,200 --> 00:39:27,100
However, that does not Not 
necessarily have to take the 

803
00:39:27,100 --> 00:39:29,800
form of on called, for every 
engineer, but I think at the 

804
00:39:29,800 --> 00:39:32,800
team level is important for kids
to build and run your own 

805
00:39:32,800 --> 00:39:35,900
topper. 
So in practice, a lot of teams 

806
00:39:35,900 --> 00:39:38,000
struggle with that because you 
just say, you know, here's the 

807
00:39:38,000 --> 00:39:41,400
Peter congratulations, as 
opposed to, we're going to 

808
00:39:41,400 --> 00:39:44,800
support you with high quality 
observability with that slows of

809
00:39:44,800 --> 00:39:46,300
him. 
This methodology is to how you 

810
00:39:46,300 --> 00:39:48,600
are to service. 
Now that we've trained, you 

811
00:39:48,600 --> 00:39:51,100
pierce that hater. 
I think that if a lot of 

812
00:39:51,100 --> 00:39:54,000
developers understood the 
payment options, go through 

813
00:39:54,100 --> 00:39:57,500
maybe amount of Sound lessons 
that we formed from years of 

814
00:39:57,500 --> 00:40:00,800
being subjected to am radios and
kind of almost the masochism 

815
00:40:00,800 --> 00:40:02,900
that requires. 
They might have a little bit 

816
00:40:02,900 --> 00:40:05,000
more empathy, but it would take 
time. 

817
00:40:05,100 --> 00:40:08,000
You choose to throw someone who 
did repairs directly into that 

818
00:40:08,200 --> 00:40:09,400
these. 
We cleaned it up a little bit 

819
00:40:09,400 --> 00:40:11,600
before you say hi. 
Give us a favor by the way. 

820
00:40:11,900 --> 00:40:15,200
So, I think kind of that handoff
process can be really helpful 

821
00:40:15,200 --> 00:40:19,200
for encouraging ownership, but 
ownership, and responsibility 

822
00:40:19,200 --> 00:40:22,300
needs to be accompanied by 
giving people, the tools that 

823
00:40:22,300 --> 00:40:24,300
they need to be able to be 
successful. 

824
00:40:24,600 --> 00:40:28,000
You wouldn't Assign a brand new 
developer, to design, the 

825
00:40:28,000 --> 00:40:29,400
architecture of your future 
system. 

826
00:40:29,400 --> 00:40:32,600
So why are you asking people who
have never operated system 

827
00:40:32,600 --> 00:40:33,900
before to go swim in the deep 
end. 

828
00:40:33,900 --> 00:40:36,900
I think you kind of have to 
offer this graceful cat and I 

829
00:40:36,900 --> 00:40:40,200
think that observability really 
is this methodology for kind of 

830
00:40:40,207 --> 00:40:43,800
Bridging the Ops and deverill. 
See are the reason why people 

831
00:40:43,800 --> 00:40:47,400
did monitoring before was that 
monitoring is outside an 

832
00:40:47,400 --> 00:40:50,900
observation or outdated 
measurement because often 

833
00:40:50,900 --> 00:40:53,500
operators didn't really 
necessarily have the ability to 

834
00:40:53,500 --> 00:40:56,900
make coaching to system, so that
Were limited only to you. 

835
00:40:57,100 --> 00:40:59,800
What can my ATM agent matter or 
what can my monitoring tool 

836
00:40:59,800 --> 00:41:01,800
there? 
So I think that when you have 

837
00:41:01,800 --> 00:41:05,000
this shared responsibility of 
developing the two out of three 

838
00:41:05,000 --> 00:41:08,000
and observing and look a 
planetary at some Libre de Su 

839
00:41:08,000 --> 00:41:11,400
World scanner because you're 
having the instrumentation being

840
00:41:11,400 --> 00:41:14,000
added in this virtuous cycle 
along with looking at the 

841
00:41:14,000 --> 00:41:17,800
results of the its rotation. 
So you mentioned all these build

842
00:41:17,800 --> 00:41:21,400
and run it it comes back to 
normally the devops culture and 

843
00:41:21,400 --> 00:41:25,200
SRE culture, right? 
So is this observability also? 

844
00:41:25,300 --> 00:41:29,000
Supports the implementation of 
that particular devops and SRE 

845
00:41:29,000 --> 00:41:31,300
culture. 
Yes, I think so, when you 

846
00:41:31,308 --> 00:41:33,800
originally solve some of my 
videos on S3 and devops. 

847
00:41:34,100 --> 00:41:38,900
I highlighted that it is a key 
responsibility of s res to make 

848
00:41:38,900 --> 00:41:43,900
the system debugger to make this
system have slos and to monitor.

849
00:41:43,900 --> 00:41:47,200
The disclosed at Google, we 
certainly didn't use the word 

850
00:41:47,200 --> 00:41:51,200
observability until 2019. 
But what we were doing was 

851
00:41:51,200 --> 00:41:54,300
probably closer to observe the 
latent monitoring husband 

852
00:41:54,300 --> 00:41:57,700
Europe, doing this practice. 
Record Allison and being able to

853
00:41:57,700 --> 00:42:00,900
understand the systems. 
So for people who have been 

854
00:42:00,900 --> 00:42:03,800
implementing this, how do they 
know they have done it right? 

855
00:42:03,800 --> 00:42:06,000
Where they are lacking. 
I see in your book. 

856
00:42:06,000 --> 00:42:09,000
You mentioned this term called 
observability maturity model. 

857
00:42:09,300 --> 00:42:12,800
Is there a way to gauge where 
people are at this point in time

858
00:42:12,800 --> 00:42:15,000
and what else can they aspire to
achieve? 

859
00:42:15,500 --> 00:42:17,000
Yeah. 
I think that there are these 

860
00:42:17,000 --> 00:42:21,000
candid key find areas in that 
Observatory returning model of 

861
00:42:21,100 --> 00:42:23,300
where observability isn't isn't 
helping you. 

862
00:42:23,600 --> 00:42:25,100
We talked about observable age 
of them. 

863
00:42:25,300 --> 00:42:27,600
Velopment and kind of getting 
the code, right the first time 

864
00:42:27,600 --> 00:42:30,600
and exercising that it's 
relation have to really that's 

865
00:42:30,600 --> 00:42:33,300
kind of what area of Eternity. 
The other Airmen material that 

866
00:42:33,300 --> 00:42:36,900
we actually didn't touch on yet 
is TOS delivery for me. 

867
00:42:36,900 --> 00:42:40,200
Observability has a you must be 
this tall to ride that is. 

868
00:42:40,200 --> 00:42:42,700
You can skip code every couple 
of weeks at least because if 

869
00:42:42,700 --> 00:42:45,300
not, does it matter that you 
decrease your time to resolve 

870
00:42:45,300 --> 00:42:47,400
issues from three hours 25 
minutes. 

871
00:42:47,400 --> 00:42:50,300
If you can only ship code every 
three months or if you can only 

872
00:42:50,300 --> 00:42:53,300
shoot view, its rotation, every 
three months, but no, it doesn't

873
00:42:53,300 --> 00:42:55,100
make sense. 
Focus on making your glitter. 

874
00:42:55,300 --> 00:42:57,500
Jacob Astor. 
So, kind of, if you are 

875
00:42:57,500 --> 00:43:01,800
struggling with your code and be
able to ship it, start the 

876
00:43:01,800 --> 00:43:04,100
observing your deliberate. 
I try not your code. 

877
00:43:04,400 --> 00:43:06,200
So that's kind of another to 
maturity area. 

878
00:43:06,200 --> 00:43:08,600
Is do we had our skin white or 
build a sudden late surge. 

879
00:43:08,600 --> 00:43:11,700
He can longer to, in our case, 
15 minutes, 15 minutes of the 

880
00:43:11,700 --> 00:43:13,500
heart up were, oh my God. 
This is too slow. 

881
00:43:13,800 --> 00:43:16,500
So we try to keep our builds 
under 10 minutes so we can 

882
00:43:16,500 --> 00:43:18,500
deploy to production once an 
hour faster. 

883
00:43:18,800 --> 00:43:21,300
So kind of keeping that stuff 
for delivery cycle Snappy. 

884
00:43:21,600 --> 00:43:24,000
We're getting it should be in 
snappiest kind of another key 

885
00:43:24,000 --> 00:43:27,500
area and Then third, we talked a
lot about the brakes and 

886
00:43:27,500 --> 00:43:30,100
production resilience workflow. 
But I think also, it's really 

887
00:43:30,100 --> 00:43:33,400
important to think about the 
user analytics to what extent 

888
00:43:33,400 --> 00:43:35,200
are people making use of your 
product. 

889
00:43:35,500 --> 00:43:37,800
If no one is using it. 
Does it really matter these 

890
00:43:37,800 --> 00:43:40,700
shifts, that feature? 
And then finally, I think to tie

891
00:43:40,700 --> 00:43:43,000
it all together. 
Technical debt and managing that

892
00:43:43,000 --> 00:43:45,000
technical debt is really 
important and observable a 

893
00:43:45,008 --> 00:43:47,000
conservative. 
Hey, you have the single point 

894
00:43:47,000 --> 00:43:49,200
of failure or hey, you have the 
circular dependency. 

895
00:43:49,500 --> 00:43:52,800
So I think that those are some 
of the key responsibilities that

896
00:43:52,800 --> 00:43:54,800
you have when you're kind of 
thinking about how to add 

897
00:43:54,800 --> 00:43:56,500
Observer. 
What each shooter suffered 

898
00:43:56,500 --> 00:43:59,600
delivery facts? 
Thanks for explaining about this

899
00:43:59,600 --> 00:44:01,900
maturity model. 
I think you explained it much 

900
00:44:01,900 --> 00:44:04,200
deeper in the book as well. 
For people who wants to 

901
00:44:04,200 --> 00:44:06,400
understand. 
Where are these areas that you 

902
00:44:06,400 --> 00:44:09,100
need to invest time in? 
So make sure to check the book 

903
00:44:09,200 --> 00:44:10,700
will put it in the Show links 
later. 

904
00:44:10,900 --> 00:44:13,700
So let's find myself having a 
crash course about this 

905
00:44:13,700 --> 00:44:15,500
observability. 
So, thank you so much for 

906
00:44:15,500 --> 00:44:18,800
explaining all these Concepts, 
all these implementation. 

907
00:44:18,900 --> 00:44:21,500
Really, thank you for that. 
But unfortunately, due to time, 

908
00:44:21,500 --> 00:44:24,100
we need to wrap up soon. 
But before I let you go, I 

909
00:44:24,107 --> 00:44:26,800
normally ask one last question 
for every guest that I have in 

910
00:44:26,800 --> 00:44:30,200
the show, which is to At three 
technical leadership, wisdom for

911
00:44:30,200 --> 00:44:33,000
people to learn from your 
experience or from your journey.

912
00:44:33,200 --> 00:44:35,400
So maybe you can share us. 
What are your wisdom? 

913
00:44:36,500 --> 00:44:37,700
Sure. 
I'd like to give a brief and 

914
00:44:37,700 --> 00:44:42,800
just leave it to one piece of 
wisdom, which is that it is 

915
00:44:42,800 --> 00:44:46,700
much, much more important to 
make the social piece work in 

916
00:44:46,707 --> 00:44:49,700
this happens. 
It's really important to get 

917
00:44:49,700 --> 00:44:51,900
people talking to each other 
software. 

918
00:44:51,900 --> 00:44:54,900
Delivery is much more about 
people communicating and being 

919
00:44:54,900 --> 00:44:57,900
on the same page and kind of 
having a cheer working model. 

920
00:44:58,500 --> 00:45:02,300
It is about what tools you use. 
So don't think first about 

921
00:45:02,300 --> 00:45:06,000
introducing new tools. 
Think first about align people 

922
00:45:06,000 --> 00:45:08,500
on the outcome and making sure 
everyone is agreed on the 

923
00:45:08,500 --> 00:45:12,000
outcome and then you can worry 
about what tools you're going to

924
00:45:12,008 --> 00:45:14,300
use to implement it. 
Wow. 

925
00:45:14,300 --> 00:45:17,200
That's really wonderful. 
So social first rather than the 

926
00:45:17,200 --> 00:45:20,500
technology aspect and try to 
align people because yeah, I can

927
00:45:20,500 --> 00:45:23,100
see people implementing 
different Technologies, even 

928
00:45:23,100 --> 00:45:24,400
when you talk about micro 
service, right? 

929
00:45:24,400 --> 00:45:27,400
It's like everyone just wants to
have their own services manager 

930
00:45:27,400 --> 00:45:30,500
themselves and They don't really
care whether it integrates well 

931
00:45:30,500 --> 00:45:33,400
and serves the customers. 
Well, so, thank you so much for 

932
00:45:33,408 --> 00:45:36,400
this for your time for people 
who want to follow you or learn 

933
00:45:36,400 --> 00:45:39,300
more about this observability or
honeycomb when they can reach 

934
00:45:39,300 --> 00:45:42,200
you online. 
Yeah, I am. 

935
00:45:42,300 --> 00:45:44,100
There's the great pretty much 
everywhere. 

936
00:45:44,500 --> 00:45:46,900
So that's on Twitter. 
That's a good hon. 

937
00:45:47,100 --> 00:45:49,800
That's how they look me up. 
I am will also drop those links 

938
00:45:49,800 --> 00:45:53,300
into the show notes. 
Why is it the gray if I may ask 

939
00:45:53,700 --> 00:45:56,800
teenage fangirling about Gandalf
from Lord of the Rings? 

940
00:45:57,300 --> 00:45:59,500
Alright, so, thank you. 
Let's hope you have a great day 

941
00:45:59,500 --> 00:46:00,900
today. 
Thank you. 

942
00:46:01,000 --> 00:46:05,600
Cheers. 
Thank you for listening to this 

943
00:46:05,600 --> 00:46:09,000
episode and for staying, right 
until the end if you highly 

944
00:46:09,000 --> 00:46:11,700
enjoyed it, I would appreciate 
if you share it with your 

945
00:46:11,700 --> 00:46:14,700
friends and colleagues who you 
think would also benefit from 

946
00:46:14,700 --> 00:46:17,300
listening to this episode. 
And if you are new to the 

947
00:46:17,300 --> 00:46:20,300
podcast, make sure to subscribe 
and leave me your valuable 

948
00:46:20,300 --> 00:46:23,300
review and feedback. 
It helps me a lot in order to 

949
00:46:23,300 --> 00:46:26,500
grow this podcast better. 
You can also find the full show 

950
00:46:26,500 --> 00:46:30,000
notes of this conversation on 
the episode page at pack Legion 

951
00:46:30,000 --> 00:46:32,800
o.f website, including the full 
transcript. 

952
00:46:33,200 --> 00:46:36,700
Interesting quotes and links to 
the resources mention from the 

953
00:46:36,700 --> 00:46:39,600
conversation. 
And lastly, make sure to 

954
00:46:39,600 --> 00:46:43,200
subscribe to the shows mailing 
list on pack leader dot f to get

955
00:46:43,200 --> 00:46:45,500
notified for any future 
episodes. 

956
00:46:45,900 --> 00:46:47,400
Stay tuned for the next 
technology. 

957
00:46:47,400 --> 00:46:49,500
No episode. 
And until then. 

958
00:46:49,700 --> 00:46:50,300
Goodbye.
