1
00:00:00,040 --> 00:00:03,120
In the fast-paced world of 
software development, how do we 

2
00:00:03,480 --> 00:00:07,120
how do we really know if the 
code we're building is reliable?

3
00:00:07,120 --> 00:00:10,160
How do we get that confidence? 
Today on the Deep Dive, we're 

4
00:00:10,160 --> 00:00:12,880
cutting through some of the 
noise to unpack a really 

5
00:00:12,880 --> 00:00:15,160
fundamental concept. 
Test coverage. 

6
00:00:15,200 --> 00:00:18,600
We're looking at insights from 
software Engineering, a modern 

7
00:00:18,600 --> 00:00:22,360
approach by Marco Tulio Valente.
Mainly, our mission here is 

8
00:00:22,360 --> 00:00:25,760
simple. 
Get a real grip on what test 

9
00:00:25,760 --> 00:00:29,520
coverage means, why it matters 
for soccer quality, and you know

10
00:00:29,520 --> 00:00:31,320
how to think about using it 
practically. 

11
00:00:31,320 --> 00:00:33,080
Exactly. 
And understanding metrics like 

12
00:00:33,080 --> 00:00:35,920
test coverage, it's not just 
academic, it's fundamental for 

13
00:00:35,920 --> 00:00:38,320
building software that actually 
works reliably. 

14
00:00:38,600 --> 00:00:42,080
It's how you avoid those, well, 
those common pitfalls and bugs 

15
00:00:42,080 --> 00:00:44,240
that can really sink a project. 
It's about building that 

16
00:00:44,240 --> 00:00:46,520
assurance. 
OK, let's unpack this then. 

17
00:00:46,560 --> 00:00:48,440
When we talk about test 
coverage, it sounds fairly 

18
00:00:48,440 --> 00:00:50,720
straightforward, but what 
exactly are we measuring? 

19
00:00:50,720 --> 00:00:52,360
Is it just like a count of 
lines? 

20
00:00:52,400 --> 00:00:55,920
You're on the right track. 
Think of it maybe as a check up 

21
00:00:55,920 --> 00:00:58,640
for your tests. 
Test coverage is a metric. 

22
00:00:59,360 --> 00:01:02,320
It tells us the percentage of 
your program's code, the 

23
00:01:02,320 --> 00:01:05,080
executable statements that are 
actually run when your tests 

24
00:01:05,080 --> 00:01:08,040
execute. 
So simpler terms, number of 

25
00:01:08,040 --> 00:01:11,000
lines your tests hit divided by 
the total number of executable 

26
00:01:11,000 --> 00:01:13,120
lines. 
Each statement is basically an 

27
00:01:13,120 --> 00:01:14,680
instruction. 
Right, an instruction the 

28
00:01:14,680 --> 00:01:16,920
computer runs. 
Precisely, the more statements 

29
00:01:16,920 --> 00:01:19,400
your test touch, the higher your
coverage number goes. 

30
00:01:19,640 --> 00:01:23,200
That makes sense. 
So how do teams actually, you 

31
00:01:23,240 --> 00:01:25,800
know, see this? 
Do they calculate it by hand or 

32
00:01:25,800 --> 00:01:26,440
are there? 
Tools. 

33
00:01:26,440 --> 00:01:28,440
Oh definitely. 
Tools wouldn't do this manually.

34
00:01:28,800 --> 00:01:32,080
Not efficiently anyway. 
Our source mentions tools like 

35
00:01:32,120 --> 00:01:36,120
Jacoco, Java Code coverage. 
Very common and these tools give

36
00:01:36,120 --> 00:01:39,120
you great visual feedback, like 
a traffic light system almost 

37
00:01:39,200 --> 00:01:41,240
showing what's covered. 
A traffic light. 

38
00:01:41,240 --> 00:01:43,240
OK, tell me more. 
What do those colors mean? 

39
00:01:43,640 --> 00:01:45,040
Green. 
Yellow. 

40
00:01:45,160 --> 00:01:46,160
Red. 
Yeah, pretty much. 

41
00:01:46,160 --> 00:01:48,560
It's quite intuitive. 
Green means good. 

42
00:01:48,840 --> 00:01:52,360
Those lines are covered by your 
test yellow background that 

43
00:01:52,360 --> 00:01:55,000
flags a branch statement. 
Think of an if statement. 

44
00:01:55,280 --> 00:01:57,720
Yellow means maybe only one path
got tested. 

45
00:01:57,960 --> 00:02:01,240
Like only the true path maybe? 
Exactly or only the false? 

46
00:02:01,240 --> 00:02:04,840
It's partial coverage, a warning
sign, and then red. 

47
00:02:04,960 --> 00:02:07,720
That's the critical one. 
Red means that code wasn't 

48
00:02:07,720 --> 00:02:10,560
touched by any tests. 
That's a definite red flag. 

49
00:02:10,720 --> 00:02:13,840
That visual feedback sounds 
incredibly useful for spotting 

50
00:02:13,840 --> 00:02:16,600
gaps quickly. 
The source had the stack test 

51
00:02:16,600 --> 00:02:19,800
example right where they got 
100% coverage because the tests 

52
00:02:19,800 --> 00:02:23,040
hit everything executable. 
But what happens when tests are 

53
00:02:23,040 --> 00:02:24,880
missing? 
What's the immediate impact? 

54
00:02:24,960 --> 00:02:27,040
That's exactly the point. 
The metric shows a gap 

55
00:02:27,040 --> 00:02:30,040
immediately. 
The source shows if you leave 

56
00:02:30,040 --> 00:02:32,880
out a key test like checking 
what happens when you pop an 

57
00:02:32,880 --> 00:02:35,640
empty stack, maybe called test 
empty stack exception. 

58
00:02:35,880 --> 00:02:39,120
Well, your coverage drops 
instantly and their example it 

59
00:02:39,120 --> 00:02:42,440
went from 100% down to was it 
92.9% it's. 

60
00:02:42,440 --> 00:02:45,520
From one missing test. 
Because suddenly a few 

61
00:02:45,520 --> 00:02:47,600
instructions weren't executed by
any test. 

62
00:02:47,840 --> 00:02:50,840
It went from 56 total 
instructions down to only 52 

63
00:02:50,840 --> 00:02:52,560
being covered. 
Shows a direct link. 

64
00:02:52,680 --> 00:02:55,240
OK, so we know what it is, how 
it's measured, how tools show 

65
00:02:55,240 --> 00:02:56,440
it. 
But this leads to the big 

66
00:02:56,440 --> 00:02:59,840
question, right? 
For you, the listener, building 

67
00:02:59,840 --> 00:03:03,920
software or just curious, is 
there a magic number, a perfect 

68
00:03:03,920 --> 00:03:06,960
percentage we should aim for 
like 100% all the time? 

69
00:03:07,440 --> 00:03:11,040
The $1,000,000 question and the 
short answer, no, there's no 

70
00:03:11,040 --> 00:03:15,640
universal target. 
It really varies hugely project 

71
00:03:15,640 --> 00:03:17,600
to project. 
I mean think about it, testing 

72
00:03:17,600 --> 00:03:20,840
software for a pacemaker versus 
testing a simple website. 

73
00:03:21,360 --> 00:03:23,480
Completely different needs. 
Different stakes. 

74
00:03:23,600 --> 00:03:26,040
Exactly. 
Complexity, how critical the 

75
00:03:26,040 --> 00:03:28,920
project is, even the team's 
attitude towards risk. 

76
00:03:28,960 --> 00:03:33,280
It all factors in and going for 
100% it isn't always the best 

77
00:03:33,280 --> 00:03:34,760
goal. 
Sometimes it's not even 

78
00:03:34,760 --> 00:03:36,600
devisable. 
You often have really simple 

79
00:03:36,600 --> 00:03:39,680
methods like basic getters and 
setters just setting or getting 

80
00:03:39,680 --> 00:03:41,800
a value. 
Right, boilerplate code almost. 

81
00:03:41,840 --> 00:03:43,720
Pretty much testing those 
exhaustively. 

82
00:03:43,720 --> 00:03:46,080
It can be overkill, a waste of 
effort, plus some things are 

83
00:03:46,080 --> 00:03:49,280
just inherently harder to test. 
Well, user interfaces, complex 

84
00:03:49,280 --> 00:03:51,240
interactions, anything 
asynchronous. 

85
00:03:51,360 --> 00:03:52,840
Where timing messes everything 
up. 

86
00:03:53,080 --> 00:03:56,880
Yeah, forcing 100% coverage 
there can lead to really brittle

87
00:03:56,880 --> 00:04:02,120
tests or just massive effort for
tiny gains, diminishing returns.

88
00:04:02,360 --> 00:04:04,840
OK, that makes perfect sense. 
So chasing a rigid number is 

89
00:04:04,840 --> 00:04:07,080
out. 
What should teams do instead? 

90
00:04:07,200 --> 00:04:09,000
How do you approach this 
strategically? 

91
00:04:09,280 --> 00:04:11,560
The better approach is usually 
to watch the trend. 

92
00:04:12,240 --> 00:04:14,640
Monitor how your coverage 
changes over time. 

93
00:04:14,640 --> 00:04:19,160
Is it going up, staying stable 
or oh is it slowly dropping? 

94
00:04:19,800 --> 00:04:22,480
That tells you about the teams 
commitment to testing O it's. 

95
00:04:22,480 --> 00:04:25,080
More about the direction than 
the destination. 

96
00:04:25,200 --> 00:04:28,760
In a way, yes, But critically, 
you also need to look closely at

97
00:04:28,760 --> 00:04:31,320
what isn't covered. 
Those red lines we talked about,

98
00:04:31,320 --> 00:04:34,440
you need to assess them. 
Are they non critical, genuinely

99
00:04:34,440 --> 00:04:36,600
hard to test? 
Or is it just an oversight? 

100
00:04:36,840 --> 00:04:39,120
It's about making conscious 
decisions, not just hitting a 

101
00:04:39,120 --> 00:04:40,880
number. 
That's a much more nuanced way 

102
00:04:40,880 --> 00:04:43,920
to look at it, but are there 
like general rules of thumb, 

103
00:04:44,040 --> 00:04:46,320
industry benchmarks maybe? 
There are some general 

104
00:04:46,320 --> 00:04:48,520
observations, yeah. 
Teams that really focus on 

105
00:04:48,520 --> 00:04:51,840
testing often land somewhere 
around 70% coverage. 

106
00:04:51,840 --> 00:04:54,440
OK, 70% and. 
If you see coverage dip below 

107
00:04:54,440 --> 00:04:56,480
50%, well, they usually raises 
eyebrows. 

108
00:04:57,280 --> 00:04:59,760
Suggest large parts of the 
system aren't tested. 

109
00:04:59,800 --> 00:05:01,800
That seems low. 
It is concerning, yeah. 

110
00:05:02,320 --> 00:05:05,000
And even with really rigorous 
methods like Test Driven 

111
00:05:05,000 --> 00:05:09,960
Development, TDD coverage often 
gets above 90%, but it rarely 

112
00:05:09,960 --> 00:05:12,760
actually hits 100%. 
There's usually that last bit 

113
00:05:12,760 --> 00:05:14,720
where the cost outweighs the 
benefit. 

114
00:05:14,960 --> 00:05:16,960
Right, that point of diminishing
returns again. 

115
00:05:16,960 --> 00:05:19,680
Now this is interesting. 
What about the big players like 

116
00:05:19,680 --> 00:05:23,520
Google? 
Do they aim for 100% with all 

117
00:05:23,520 --> 00:05:25,160
their resources? 
That's a great question. 

118
00:05:25,160 --> 00:05:26,440
We actually have some data on 
that. 

119
00:05:26,760 --> 00:05:30,120
Google shared stats back in 2014
at a developer conference. 

120
00:05:30,640 --> 00:05:35,080
Their median statement coverage 
across systems it was 78%. 78, 

121
00:05:35,240 --> 00:05:37,680
not 100. 
Nope, and the recommended target

122
00:05:37,680 --> 00:05:41,280
for most systems was 85%, but 
even that wasn't a hard rule. 

123
00:05:41,280 --> 00:05:43,200
Wow OK. 
They also mentioned it varied by

124
00:05:43,200 --> 00:05:45,720
language. 
C++ projects average a bit under

125
00:05:45,720 --> 00:05:49,640
60%, Python a bit over 80%. 
So even Google doesn't mandate a

126
00:05:49,640 --> 00:05:51,040
single number. 
Exactly. 

127
00:05:51,560 --> 00:05:54,080
The take away seems to be 
monitor consistently. 

128
00:05:54,080 --> 00:05:56,800
Focus on critical areas, but 
don't obsess over hitting an 

129
00:05:56,800 --> 00:05:59,400
arbitrary percentage everywhere.
That's really insightful. 

130
00:05:59,800 --> 00:06:02,240
OK. 
So we've focused heavily on 

131
00:06:02,600 --> 00:06:05,920
statement coverage, counting 
lines, C0, you called it. 

132
00:06:06,240 --> 00:06:08,200
Are there other ways to measure 
coverage? 

133
00:06:08,640 --> 00:06:10,200
Other angles? 
Absolutely. 

134
00:06:10,200 --> 00:06:14,040
Statement coverage where C0 is 
the most common, Definitely, but

135
00:06:14,040 --> 00:06:15,600
there are other useful 
definitions. 

136
00:06:15,960 --> 00:06:19,880
There's function coverage. 
Simpler idea, what percentage of

137
00:06:19,880 --> 00:06:23,440
your actual functions or methods
got called by any test? 

138
00:06:23,880 --> 00:06:25,880
Just did we run this function at
all? 

139
00:06:25,880 --> 00:06:28,680
Pretty much the basic check. 
Then there's a function call 

140
00:06:28,680 --> 00:06:30,280
coverage that's a bit more 
specific. 

141
00:06:30,680 --> 00:06:33,320
It looks at the places in your 
code where one function calls 

142
00:06:33,320 --> 00:06:35,120
another. 
It measures what percentage of 

143
00:06:35,120 --> 00:06:37,560
those specific calls were 
exercised by tests. 

144
00:06:37,600 --> 00:06:40,920
So not just if the function ran,
but if we tested the part where 

145
00:06:40,920 --> 00:06:42,120
it gets called from somewhere 
else. 

146
00:06:42,120 --> 00:06:43,720
Exactly. 
Testing the connections if you 

147
00:06:43,720 --> 00:06:45,440
like. 
And another really important 

148
00:06:45,440 --> 00:06:48,640
one, branch coverage, also 
called C1 coverage. 

149
00:06:48,640 --> 00:06:52,080
This looks at decision points 
like an if statement creates 2 

150
00:06:52,080 --> 00:06:54,400
branches, right? 
One for true, one for false. 

151
00:06:54,960 --> 00:06:57,600
Branch coverage measures the 
percentage of all possible 

152
00:06:57,600 --> 00:07:01,000
branches or paths through your 
code that the tests actually 

153
00:07:01,000 --> 00:07:03,000
took. 
Is generally seen as stricter 

154
00:07:03,000 --> 00:07:05,640
than just statement coverage. 
Stricter because you might hit 

155
00:07:05,640 --> 00:07:08,600
all the lines but miss a 
specific decision path. 

156
00:07:09,040 --> 00:07:13,280
Recisely you could execute every
line in an if else block but 

157
00:07:13,280 --> 00:07:16,920
only ever test the if part, 
never the else statement. 

158
00:07:16,920 --> 00:07:19,880
Coverage might be 100%, but 
branch coverage would show the 

159
00:07:19,880 --> 00:07:21,760
gap. 
Can you give a quick example 

160
00:07:21,760 --> 00:07:23,480
just to make that distinction 
clearer? 

161
00:07:23,560 --> 00:07:26,480
Sure, Think about a simple 
function to get the absolute 

162
00:07:26,480 --> 00:07:29,960
value of a number like absecs 
inside it probably has something

163
00:07:29,960 --> 00:07:33,200
like if f0 then it changes the 
sign. 

164
00:07:33,440 --> 00:07:35,720
Makes sense? 
Now, if your only test is ABS 

165
00:07:35,720 --> 00:07:38,160
one, that test runs through all 
the lines, right? 

166
00:07:38,160 --> 00:07:40,480
It checks the condition, finds 
it true, it changes the sign. 

167
00:07:40,480 --> 00:07:44,160
So 100% statement coverage. 
Correct, 100% statement 

168
00:07:44,160 --> 00:07:47,640
coverage, but branch coverage 
only 50% because you only tested

169
00:07:47,640 --> 00:07:50,640
the true path of that. 
If you never tested what happens

170
00:07:50,640 --> 00:07:53,880
when X is not less than 0? 
I see, so you'd need another 

171
00:07:53,880 --> 00:07:54,560
test. 
Right. 

172
00:07:54,560 --> 00:07:57,760
No, you need to test like ABS 
one to exercise the false path. 

173
00:07:58,160 --> 00:08:00,160
Then you'd have 100% branch 
coverage too. 

174
00:08:00,400 --> 00:08:02,880
It forces you to test both sides
of the decision. 

175
00:08:02,960 --> 00:08:06,400
That clarifies it perfectly. 
So understanding test coverage, 

176
00:08:06,400 --> 00:08:09,280
it's really not just about the 
number itself, it's about making

177
00:08:09,280 --> 00:08:11,040
smart choices. 
Exactly. 

178
00:08:11,040 --> 00:08:14,240
It's about knowing what your 
tests are actually doing, what 

179
00:08:14,240 --> 00:08:17,680
paths they're taking, and using 
that information to build more 

180
00:08:17,680 --> 00:08:21,120
robust, more reliable software. 
It's about informed confidence. 

181
00:08:21,600 --> 00:08:23,520
Well, thank you for joining us 
on this deep dive.