1
00:00:00,160 --> 00:00:03,040
Welcome to the debate. 
Today we're looking at a 

2
00:00:03,040 --> 00:00:07,680
controversy that sits right at 
the bleeding edge of 

3
00:00:07,680 --> 00:00:11,120
transportation technology. 
It's a dispute that really 

4
00:00:11,120 --> 00:00:14,360
divides the engineering world 
right down the middle, and the 

5
00:00:14,360 --> 00:00:19,760
outcome is going to determine 
how and frankly if our vehicles 

6
00:00:19,760 --> 00:00:21,640
drive themselves in the next 
decade. 

7
00:00:22,000 --> 00:00:25,440
We are talking about the war 
between vision only systems and 

8
00:00:25,440 --> 00:00:28,840
sensor fusion, specifically 
regarding the use of Lidar. 

9
00:00:29,120 --> 00:00:31,480
And this isn't just a 
theoretical argument anymore, is

10
00:00:31,480 --> 00:00:33,640
it? 
We're looking at recent and, 

11
00:00:33,640 --> 00:00:37,000
well, quite alarming reports 
about the deployment of Tesla's 

12
00:00:37,000 --> 00:00:39,960
robo taxi fleet. 
The headline data suggests a 

13
00:00:39,960 --> 00:00:43,080
crash rate that's significantly 
higher than human drivers. 

14
00:00:43,320 --> 00:00:45,760
Some reports are saying up to 
four times higher. 

15
00:00:46,360 --> 00:00:48,760
It it just raises a very 
uncomfortable question. 

16
00:00:49,160 --> 00:00:52,840
Can a system that relies only on
cameras ever truly match the 

17
00:00:52,840 --> 00:00:55,920
reliability of a system that 
uses active laser sensors? 

18
00:00:56,160 --> 00:00:58,040
And that's really the core of 
it. 

19
00:00:58,280 --> 00:01:01,040
Can a computer see the world 
well enough with just video 

20
00:01:01,040 --> 00:01:04,640
feeds to navigate safely? 
Or does excluding depth sensing 

21
00:01:04,640 --> 00:01:07,920
hardware like Lidar resent an 
insurmountable barrier? 

22
00:01:08,600 --> 00:01:11,680
I'm the advocate. 
My position is that visual input

23
00:01:11,680 --> 00:01:15,160
is, well, theoretically 
sufficient because it mimics the

24
00:01:15,160 --> 00:01:17,240
biological model. 
It mimics us. 

25
00:01:17,640 --> 00:01:20,600
I also suspect the current crash
data is being heavily 

26
00:01:20,600 --> 00:01:23,920
misinterpreted because of some 
serious reporting biases. 

27
00:01:24,360 --> 00:01:27,640
And I'm the dissenter. 
My position is that abandoning 

28
00:01:27,640 --> 00:01:31,080
Lidar is a dangerous cost 
cutting measure that ignores the

29
00:01:31,080 --> 00:01:34,080
kind of redundancy you 
absolutely need for safety 

30
00:01:34,080 --> 00:01:37,000
critical systems. 
We're seeing higher crash rates 

31
00:01:37,000 --> 00:01:40,120
not because of a reporting bias,
but because of a fundamental 

32
00:01:40,120 --> 00:01:43,040
hardware deficit. 
When you take away the sensor 

33
00:01:43,040 --> 00:01:46,080
that tells you exactly how far 
away an object is, you are 

34
00:01:46,080 --> 00:01:48,960
introducing a level of risk that
just shouldn't be on public 

35
00:01:48,960 --> 00:01:50,760
roads. 
OK, so let's get into the 

36
00:01:50,760 --> 00:01:53,160
machinery of this. 
We really need to start with the

37
00:01:53,160 --> 00:01:56,600
first principles argument, 
because this is the hill the 

38
00:01:56,600 --> 00:01:59,440
vision only proponents are 
willing to die on. 

39
00:01:59,720 --> 00:02:02,360
There's a perspective shared by 
a contributor to our source 

40
00:02:02,360 --> 00:02:06,400
material, Retroviridae 6, that's
basically an existence proof. 

41
00:02:07,040 --> 00:02:09,360
Ah, the humans do it argument. 
Exactly. 

42
00:02:09,360 --> 00:02:12,280
It's the biological argument. 
You and I drove here today. 

43
00:02:12,280 --> 00:02:14,240
We navigated traffic. 
We merged. 

44
00:02:14,400 --> 00:02:17,280
We avoided pedestrians. 
We did all of that using two 

45
00:02:17,360 --> 00:02:20,840
passive optical sensors, our 
eyes, and a biological neural 

46
00:02:20,840 --> 00:02:23,160
network, our brain. 
We don't have lidar in our 

47
00:02:23,160 --> 00:02:25,960
foreheads, we don't emit laser 
pulses to measure time of 

48
00:02:25,960 --> 00:02:29,320
flight, we don't have radar. 
We rely entirely on optical flow

49
00:02:29,320 --> 00:02:31,040
and pattern recognition. 
Sure. 

50
00:02:31,160 --> 00:02:35,760
So Retrovira day six's point is 
that if a biological neural net 

51
00:02:35,960 --> 00:02:39,960
can drive a car using only 
passive optical sensors, then it

52
00:02:39,960 --> 00:02:42,640
is physically possible for a 
synthetic neural net to do the 

53
00:02:42,640 --> 00:02:44,680
same thing. 
The physics allows it. 

54
00:02:45,120 --> 00:02:49,160
Therefore, the argument that 
Lidar is required is just false.

55
00:02:49,600 --> 00:02:52,760
Lidar might be a shortcut, but 
it isn't a necessity. 

56
00:02:53,040 --> 00:02:56,480
The photons entering the camera 
contain all the information you 

57
00:02:56,480 --> 00:02:58,920
need to drive. 
I'm sorry but I just don't buy 

58
00:02:58,920 --> 00:03:00,960
that. 
Let me tell you why that is a 

59
00:03:00,960 --> 00:03:04,040
huge category error. 
Your conflating otential with 

60
00:03:04,040 --> 00:03:06,560
execution. 
Just because humans can drive 

61
00:03:06,560 --> 00:03:08,880
with their eyes doesn't mean 
robots should drive without 

62
00:03:08,880 --> 00:03:11,600
laser precision. 
But why not if the goal is to 

63
00:03:11,600 --> 00:03:14,840
replicate human capability? 
Theology is full of flaws we're 

64
00:03:14,840 --> 00:03:16,960
trying to engineer out of the 
system. 

65
00:03:17,240 --> 00:03:20,920
The promise of autonomy isn't to
drive as well as a distracted 

66
00:03:20,920 --> 00:03:24,840
ape, it's to drive perfectly. 
And to drive perfectly, you need

67
00:03:24,840 --> 00:03:27,800
data that the human eye simply 
cannot provide. 

68
00:03:28,000 --> 00:03:29,560
But we're not talking about 
fatigue. 

69
00:03:29,760 --> 00:03:32,840
We're talking about the sensory 
input required to build a model 

70
00:03:32,840 --> 00:03:34,800
of the world. 
But you cannot compete with 

71
00:03:34,800 --> 00:03:37,880
Lidar are using only visual 
cameras when it comes to what 

72
00:03:37,880 --> 00:03:41,560
I'd call ground truth. 
As another observer, Wiggly Worm

73
00:03:41,560 --> 00:03:44,840
pointed out in the materials, 
Lidar provides absolute depth 

74
00:03:44,840 --> 00:03:46,600
data. 
So maybe we should break that 

75
00:03:46,600 --> 00:03:48,920
down a bit for anyone who isn't 
a robotics engineer. 

76
00:03:49,200 --> 00:03:52,240
Right. 
A camera is a passive sensor. 

77
00:03:52,560 --> 00:03:55,640
It takes in light and it creates
a flat 2D image. 

78
00:03:56,200 --> 00:04:00,440
To figure out how far away a car
is, the software has to analyze 

79
00:04:00,440 --> 00:04:03,560
the size of that car in the 
image, compare it to what it 

80
00:04:03,560 --> 00:04:07,000
thinks a car looks like, and 
then infer the distance it's 

81
00:04:07,000 --> 00:04:09,600
guessing. 
It's a very educated guess, but 

82
00:04:09,600 --> 00:04:11,920
it is a guess. 
It's inference based on 

83
00:04:11,920 --> 00:04:16,399
perspective and parallax, yeah. 
Lidar is active, It shoots out a

84
00:04:16,399 --> 00:04:20,000
laser pulse, it hits a car, and 
it measures exactly how long it 

85
00:04:20,000 --> 00:04:21,399
takes for the light to bounce 
back. 

86
00:04:21,800 --> 00:04:24,840
It's simple physics. 
Distance equals time multiplied 

87
00:04:24,840 --> 00:04:27,880
by the speed of light. 
It doesn't guess, it knows. 

88
00:04:27,960 --> 00:04:31,520
It says there is an object 12.4 
meters away. 

89
00:04:32,160 --> 00:04:35,240
Wiggly Worm's point is that when
you remove that sensor, you're 

90
00:04:35,240 --> 00:04:37,560
forcing the computer to 
hallucinate death. 

91
00:04:37,960 --> 00:04:41,240
And when you look at the stats, 
Tesla's robo taxi is reportedly 

92
00:04:41,240 --> 00:04:43,720
crashing at a rate 4 times 
higher than humans. 

93
00:04:44,160 --> 00:04:47,040
That isn't just the learning 
curve, that is a failure of 

94
00:04:47,040 --> 00:04:49,800
perception. 
I think that statistic, the four

95
00:04:49,800 --> 00:04:52,880
times higher crash rate, is 
doing a lot of heavy lifting in 

96
00:04:52,880 --> 00:04:56,400
your argument, and I I want to 
contextualize it. 

97
00:04:56,600 --> 00:04:59,960
We need to be very careful about
comparing apples to oranges 

98
00:04:59,960 --> 00:05:02,520
here. 
A crash is a crash, isn't it? 

99
00:05:02,680 --> 00:05:07,120
Not necessarily. 
Another analyst, Eskrove 2 He 

100
00:05:07,120 --> 00:05:10,800
noted that this specific 
headline is based on a really 

101
00:05:10,800 --> 00:05:15,000
small sample size. 5 incidents 
in Austin in a single month. 

102
00:05:15,240 --> 00:05:17,880
But if you look at the 
granularity of those incidents, 

103
00:05:18,120 --> 00:05:21,760
the data includes really minor 
events like a tire touching a 

104
00:05:21,760 --> 00:05:25,720
parking sign or bumping A curb 
while parking. 5 incidents in a 

105
00:05:25,720 --> 00:05:27,960
month for a small fleet is still
high. 

106
00:05:28,120 --> 00:05:31,920
But think about human behavior. 
If I scrape my rim on a curb 

107
00:05:31,920 --> 00:05:35,120
while I'm parallel parking, or 
if I tap a plastic Bullard at 

108
00:05:35,120 --> 00:05:37,040
one mile per hour, do I call the
police? 

109
00:05:37,040 --> 00:05:39,080
Do I call my insurance company? 
No. 

110
00:05:39,120 --> 00:05:41,240
It never enters the statistical 
record. 

111
00:05:41,520 --> 00:05:44,520
It just vanishes. 
Right, it's unreported. 

112
00:05:44,520 --> 00:05:49,280
Exactly, but for a robo taxi, 
every single sensor reading is 

113
00:05:49,280 --> 00:05:51,960
logged. 
Every thump is a reported 

114
00:05:51,960 --> 00:05:54,160
incident. 
The system self-reports 

115
00:05:54,160 --> 00:05:57,120
everything. 
So you're comparing reported 

116
00:05:57,160 --> 00:06:01,120
autonomous incidents where every
scratch is scrutinized against 

117
00:06:01,120 --> 00:06:04,800
reported human accidents, which 
are usually only the the one 

118
00:06:04,800 --> 00:06:06,720
severe enough to require tow 
truck. 

119
00:06:07,480 --> 00:06:11,080
Ask contributors UX Test and 
Jerkletos pointed out you're 

120
00:06:11,080 --> 00:06:13,920
comparing A microscope to a 
telescope and then claiming the 

121
00:06:13,920 --> 00:06:17,000
microscope sees more dirt I. 
Understand the reporting bias 

122
00:06:17,000 --> 00:06:19,600
argument. 
It's valid to a point, but I 

123
00:06:19,600 --> 00:06:22,440
also think it's a convenient way
to wave away failure. 

124
00:06:22,600 --> 00:06:25,360
You call it rubbing a curb. 
I call it a failure of object 

125
00:06:25,360 --> 00:06:27,280
permanence. 
That feels like a bit of a 

126
00:06:27,280 --> 00:06:29,800
stretch for a scratched rim. 
Is it? 

127
00:06:30,240 --> 00:06:33,560
Contrast this with Waymo. 
We have user experiences from 

128
00:06:33,560 --> 00:06:37,240
San Francisco contributors like 
Turbo Encapsulator and Luda lol 

129
00:06:37,480 --> 00:06:41,120
who describe Waymo as flawless 
in the same complex urban 

130
00:06:41,120 --> 00:06:43,960
environments where these vision 
based systems are struggling. 

131
00:06:44,440 --> 00:06:47,360
Waymo uses Lidar. 
They have that spinning bucket 

132
00:06:47,360 --> 00:06:49,520
on the roof. 
They aren't scraping rims. 

133
00:06:49,680 --> 00:06:52,640
They aren't bumping signs. 
They're also driving in a 

134
00:06:52,640 --> 00:06:54,760
fishbowl. 
You're driving in San Francisco.

135
00:06:54,760 --> 00:06:57,680
That's hardly officiable. 
It's a Geo fenced pre mapped 

136
00:06:57,680 --> 00:07:01,480
environment that Waymo knows 
exactly where every curb is 

137
00:07:01,480 --> 00:07:04,680
because it has a high definition
map stored in its hard drive. 

138
00:07:05,040 --> 00:07:07,240
It's not seeing the curb, it's 
remembering it. 

139
00:07:07,720 --> 00:07:10,760
Tesla's vision approach is 
trying to do something much much

140
00:07:10,760 --> 00:07:13,560
harder. 
Drive anywhere on any road 

141
00:07:13,600 --> 00:07:15,480
without a map, just like a 
human. 

142
00:07:15,920 --> 00:07:17,680
Of course it's going to be 
clumsier in the beginning. 

143
00:07:18,000 --> 00:07:19,920
It's learning general 
intelligence, not just 

144
00:07:19,920 --> 00:07:22,640
memorizing a map. 
But that clumsiness has real 

145
00:07:22,640 --> 00:07:25,560
world consequences. 
The reports of these vision only

146
00:07:25,560 --> 00:07:28,800
cars hitting stationary objects 
like parking signs? 

147
00:07:29,000 --> 00:07:30,760
That indicates A fundamental 
flaw. 

148
00:07:31,080 --> 00:07:33,760
If a vision system cannot 
calculate the distance to a 

149
00:07:33,760 --> 00:07:36,800
concrete Bullard well enough to 
avoid hitting it, how can we 

150
00:07:36,800 --> 00:07:40,000
possibly trust it to calculate 
the velocity of a child running 

151
00:07:40,000 --> 00:07:42,520
into the street? 
Because the neural networks are 

152
00:07:42,520 --> 00:07:45,640
weighted differently for those 
tasks, the system is likely 

153
00:07:45,640 --> 00:07:48,520
hyper cautious around 
pedestrians, but has a higher 

154
00:07:48,520 --> 00:07:52,640
tolerance for static objects to 
facilitate, say, parking. 

155
00:07:52,800 --> 00:07:56,320
That is an assumption. 
Lidar solves the static object 

156
00:07:56,320 --> 00:07:59,600
problem instantly. 
It doesn't need to infer or wait

157
00:07:59,600 --> 00:08:01,880
anything. 
It hits the Ballard with a laser

158
00:08:01,880 --> 00:08:04,800
and it knows it's there. 
The fact that these vision based

159
00:08:04,800 --> 00:08:07,680
cars are hitting stationary 
objects suggests that the 

160
00:08:07,680 --> 00:08:10,760
software is hallucinating free 
space where there is solid 

161
00:08:10,760 --> 00:08:12,880
matter. 
That is terrifying. 

162
00:08:13,280 --> 00:08:15,920
It implies the car literally 
does not know the physical 

163
00:08:15,920 --> 00:08:17,320
boundaries of its own 
environment. 

164
00:08:17,560 --> 00:08:20,800
I will concede that static 
object detection is a hurdle 

165
00:08:20,920 --> 00:08:25,600
right now, but identifying these
edge cases is exactly how you 

166
00:08:25,600 --> 00:08:28,120
train the network. 
Every time it hits a parking 

167
00:08:28,120 --> 00:08:31,760
sign at 2 mph, it uploads that 
failure and the entire fleet 

168
00:08:31,760 --> 00:08:35,039
learns not to do it again. 
And this brings us directly to 

169
00:08:35,039 --> 00:08:38,280
the concept of systemic risk. 
We have to talk about the 

170
00:08:38,280 --> 00:08:40,640
multiplier effect. 
Explain how you view that. 

171
00:08:40,640 --> 00:08:43,799
This was articulated very well 
by the source contributor Becker

172
00:08:43,799 --> 00:08:46,000
Hollow. 
The argument is all about error 

173
00:08:46,000 --> 00:08:48,160
scaling. 
When a human makes a mistake, it

174
00:08:48,160 --> 00:08:51,160
causes 1 accident. 
Human error is stochastic. 

175
00:08:51,160 --> 00:08:53,240
It's random. 
You might get distracted by a 

176
00:08:53,240 --> 00:08:55,200
text. 
I might drop my coffee. 

177
00:08:55,200 --> 00:08:58,280
It's isolated. 
Sure, individual variants. 

178
00:08:58,400 --> 00:09:01,040
But when a vision based software
has a flaw in its programming, 

179
00:09:01,040 --> 00:09:04,040
say a specific inability to 
distinguish a white truck 

180
00:09:04,040 --> 00:09:07,440
against a bright sky, that error
is replicated across every 

181
00:09:07,440 --> 00:09:09,920
single device on the road. 
The centralized bug. 

182
00:09:10,120 --> 00:09:12,720
Exactly. 
If the software misinterprets a 

183
00:09:12,720 --> 00:09:15,760
specific shadow or glare, 
thousands of cars become 

184
00:09:15,760 --> 00:09:18,480
dangerous simultaneously in the 
exact same way. 

185
00:09:18,880 --> 00:09:21,320
You aren't dealing with one bad 
driver, you're dealing with a 

186
00:09:21,320 --> 00:09:23,640
fleet of clones all sharing the 
same blind spot. 

187
00:09:24,120 --> 00:09:27,400
That is a systemic risk profile 
that we have never, ever dealt 

188
00:09:27,400 --> 00:09:30,080
with in automotive history. 
That's an interesting point, 

189
00:09:30,080 --> 00:09:32,040
though I would frame it 
differently. 

190
00:09:32,280 --> 00:09:35,720
That logic flips both ways. 
It's actually the strongest 

191
00:09:35,720 --> 00:09:39,320
argument for autonomous systems.
Yes, an error is distributed, 

192
00:09:39,520 --> 00:09:42,600
but so is the solution. 
If they catch it in time. 

193
00:09:42,720 --> 00:09:45,920
Think about it, when a human 
driver is bad at merging, 

194
00:09:45,920 --> 00:09:47,760
they're usually bad at merging 
forever. 

195
00:09:47,920 --> 00:09:51,080
You can't patch their brain, but
if you solve the edge case in 

196
00:09:51,080 --> 00:09:54,680
software, if you fix that white 
truck against the sky bug, you 

197
00:09:54,720 --> 00:09:56,840
instantly fix every car on the 
road. 

198
00:09:57,080 --> 00:10:00,120
You can upgrade the safety of 
the entire fleet overnight with 

199
00:10:00,120 --> 00:10:02,920
an over the air update. 
The multiplier effect applies to

200
00:10:02,920 --> 00:10:05,000
safety even more than it applies
to error. 

201
00:10:05,200 --> 00:10:07,920
You're leveraging the collective
learning of millions of miles. 

202
00:10:08,080 --> 00:10:10,120
But. 
Until that fix arrives, the risk

203
00:10:10,120 --> 00:10:12,360
is distributed to the public 
without their consent. 

204
00:10:12,720 --> 00:10:15,440
The public roads are becoming a 
beta testing environment. 

205
00:10:15,960 --> 00:10:19,160
We're seeing a move fast and 
break things mentality applied 

206
00:10:19,160 --> 00:10:23,640
to two ton metal projectiles. 
Beckerhollow's logic holds the 

207
00:10:23,680 --> 00:10:26,760
error rate is a normal human 
error multiplied by the number 

208
00:10:26,760 --> 00:10:29,880
of devices using the program. 
If the program is flawed, the 

209
00:10:29,880 --> 00:10:32,600
carnage is scalable. 
I see why you think that, but 

210
00:10:32,720 --> 00:10:34,720
let me give you a different 
perspective on the technical 

211
00:10:34,720 --> 00:10:37,680
reliability piece. 
You keep going back to Lidar as 

212
00:10:37,680 --> 00:10:41,560
this source of truth, but Lidar 
has its own failure modes. 

213
00:10:41,560 --> 00:10:43,800
It does, but they are different 
from cameras. 

214
00:10:44,120 --> 00:10:48,000
Lidar struggles with heavy rain.
The laser pulses scatter off the

215
00:10:48,000 --> 00:10:50,480
water droplets. 
It struggles with fog. 

216
00:10:50,680 --> 00:10:53,280
It can get confused by 
interference from other lidar 

217
00:10:53,280 --> 00:10:55,160
units. 
It's not magic. 

218
00:10:55,160 --> 00:10:57,720
And this is where the whole 
sensor fusion argument gets 

219
00:10:57,720 --> 00:10:58,600
tricky. 
Go on. 

220
00:10:58,680 --> 00:11:02,600
When you have a camera an A 
lidar, they will often disagree.

221
00:11:02,920 --> 00:11:05,920
The camera sees a plastic bag 
blowing across the road and 

222
00:11:05,920 --> 00:11:09,880
thinks it's nothing. 
Lidar sees an object and says 

223
00:11:10,000 --> 00:11:13,640
obstacle emergency brake. 
Now the computer has to decide 

224
00:11:13,640 --> 00:11:16,880
which sensor to trust. 
This is the sensor fusion 

225
00:11:16,880 --> 00:11:20,000
conflict. 
By removing Lidar, Tesla is 

226
00:11:20,000 --> 00:11:21,880
arguing that you remove the 
noise. 

227
00:11:22,120 --> 00:11:25,240
You force the neural net to 
resolve the visual data just 

228
00:11:25,240 --> 00:11:28,440
like a human does, without 
getting confused by conflicting 

229
00:11:28,440 --> 00:11:30,240
signals. 
That sounds like a very 

230
00:11:30,240 --> 00:11:33,080
convenient engineering 
rationalization for saving 

231
00:11:33,080 --> 00:11:35,840
money. 
Cameras are cheap, Lidar is 

232
00:11:35,840 --> 00:11:38,400
expensive. 
It's definitely cheaper, but 

233
00:11:38,400 --> 00:11:41,760
retroverted Z6 mentioned. 
We went from the horse and buggy

234
00:11:41,760 --> 00:11:43,600
to the moon in just a few 
decades. 

235
00:11:44,040 --> 00:11:46,600
Assuming that computer vision 
can't bridge the gap just 

236
00:11:46,600 --> 00:11:48,320
because it hasn't yet is 
premature. 

237
00:11:48,800 --> 00:11:51,720
The issue isn't that the camera 
is blind, it's that the 

238
00:11:51,720 --> 00:11:54,080
processing isn't yet 
sophisticated enough. 

239
00:11:54,560 --> 00:11:57,520
But processing power is scaling 
exponentially. 

240
00:11:57,680 --> 00:12:01,240
Software cannot conjure photons 
where there are none. 

241
00:12:01,360 --> 00:12:04,680
That's the physics problem. 
But it can interpret context. 

242
00:12:04,880 --> 00:12:08,000
Let's talk about those photons. 
Cameras are passive. 

243
00:12:08,080 --> 00:12:10,520
They need light. 
What happens when you drive 

244
00:12:10,520 --> 00:12:12,560
directly into the sunset? 
We've all done it. 

245
00:12:12,560 --> 00:12:14,200
The visor goes down. 
You squint. 

246
00:12:14,200 --> 00:12:16,800
You can barely see. 
Cameras get blinded by sun 

247
00:12:16,800 --> 00:12:18,480
glare. 
They get obscured by mud. 

248
00:12:18,720 --> 00:12:22,080
We have reports that Tesla is 
having to employ trailing chase 

249
00:12:22,080 --> 00:12:26,080
cars with human safety monitors 
for their autonomous taxis. 

250
00:12:26,360 --> 00:12:29,120
If the system is so 
theoretically sound, why does it

251
00:12:29,120 --> 00:12:31,680
need a human babysitter in a 
separate vehicle? 

252
00:12:31,920 --> 00:12:35,240
Every developmental technology 
has safety protocols during 

253
00:12:35,240 --> 00:12:37,800
testing, but. 
This is being sold as a future 

254
00:12:37,800 --> 00:12:41,080
that is just around the corner. 
The reliance on cameras 

255
00:12:41,080 --> 00:12:43,800
introduces A fragility that 
Lidar solves. 

256
00:12:44,280 --> 00:12:47,840
Lidar cuts through sun glare. 
It works in total darkness. 

257
00:12:48,160 --> 00:12:52,040
It is a second layer of truth. 
If the camera sees a shadow and 

258
00:12:52,040 --> 00:12:55,480
thinks it's a hole in the road, 
the Lidar says no, the ground is

259
00:12:55,480 --> 00:12:58,080
flat. 
Removing that sensor removes a 

260
00:12:58,080 --> 00:13:01,080
layer of survival. 
It's engineering hubris to 

261
00:13:01,080 --> 00:13:04,560
believe you can derive 100% 
certainty from a sensor that is 

262
00:13:04,560 --> 00:13:06,600
susceptible to optical 
illusions. 

263
00:13:06,840 --> 00:13:09,960
I'm not convinced by that line 
of reasoning because it assumes 

264
00:13:09,960 --> 00:13:12,280
we can't solve optical illusions
with better AI. 

265
00:13:12,920 --> 00:13:15,440
But let's pivot to the 
consequences of this, because 

266
00:13:15,440 --> 00:13:18,400
the legal aspect is fascinating.
It's a nightmare. 

267
00:13:18,680 --> 00:13:21,280
This leads us to the inevitable 
question of accountability. 

268
00:13:21,840 --> 00:13:25,120
When these systems do fail, 
whether it's a clumsy bump or a 

269
00:13:25,120 --> 00:13:27,360
serious collision, who is 
responsible? 

270
00:13:27,480 --> 00:13:30,760
This is the question posed by 
Shifty Mennonite in our source 

271
00:13:30,760 --> 00:13:33,000
threads. 
Who is going to be held 

272
00:13:33,000 --> 00:13:35,480
accountable when these things 
mow people down? 

273
00:13:35,800 --> 00:13:39,160
It is a legal quagmire. 
Is it the driver, which in this 

274
00:13:39,160 --> 00:13:42,680
case is the software? 
Is it the manufacturer or is it 

275
00:13:42,680 --> 00:13:44,680
the limitations of the sensor 
suite itself? 

276
00:13:44,920 --> 00:13:48,120
I think we need to distinguish 
between a software bug and a 

277
00:13:48,120 --> 00:13:50,680
design choice. 
This is crucial. 

278
00:13:51,120 --> 00:13:54,400
If a car crashes because of a 
line of bad code, that's one 

279
00:13:54,400 --> 00:13:57,040
thing. 
But if a manufacturer knowingly 

280
00:13:57,040 --> 00:14:00,720
removes a safety sensor like 
Lidar, a sensor that is industry

281
00:14:00,720 --> 00:14:04,240
standard for competitors like 
Waymo, and that removal leads to

282
00:14:04,240 --> 00:14:07,320
a crash because the camera 
couldn't estimate depth, that 

283
00:14:07,320 --> 00:14:09,720
feels very distinct from a mere 
coding error. 

284
00:14:09,880 --> 00:14:13,120
You're suggesting negligence. 
I'm saying it borders on it. 

285
00:14:13,280 --> 00:14:17,560
There's this sentiment expressed
by User Beneficial Soup 3699 

286
00:14:17,720 --> 00:14:21,240
regarding blatant fraud. 
While that is, you know, strong 

287
00:14:21,240 --> 00:14:23,800
language, the core sentiment is 
valid. 

288
00:14:24,000 --> 00:14:26,680
If you claim a camera is 
sufficient and the physics 

289
00:14:26,680 --> 00:14:30,160
suggest it isn't, and the data 
shows it crashing, at what point

290
00:14:30,160 --> 00:14:33,280
does adherence to a vision only 
philosophy become liability? 

291
00:14:33,640 --> 00:14:36,600
But that assumes Lidar was 
prevented that specific crash. 

292
00:14:36,720 --> 00:14:40,400
We don't know that. 
Like I said, LIDAR isn't a magic

293
00:14:40,400 --> 00:14:43,240
bullet. 
It's not magic, it's redundancy.

294
00:14:43,480 --> 00:14:46,960
In aviation, we don't fly with 
one altimeter, we have three. 

295
00:14:47,120 --> 00:14:49,280
Why on earth should we drive 
with one type of eye? 

296
00:14:49,600 --> 00:14:53,280
If the camera fails due to glare
or a bug or mud, there's nothing

297
00:14:53,280 --> 00:14:56,000
to catch the car. 
It is a single point of failure 

298
00:14:56,000 --> 00:14:58,480
system. 
But there's an economic argument

299
00:14:58,480 --> 00:15:01,080
here too. 
If you require Lidar, you make 

300
00:15:01,080 --> 00:15:05,040
autonomous cars cost $100,000. 
They become toys for the rich. 

301
00:15:05,280 --> 00:15:09,200
If you can solve it with vision,
the hardware costs 500 $100. 

302
00:15:09,240 --> 00:15:13,200
You can put it in every car. 
A vision based system that's 99%

303
00:15:13,200 --> 00:15:17,280
safe and available to everyone 
might save more total lives than

304
00:15:17,280 --> 00:15:21,960
a lighter system that's 99.9% 
safe but only 1000 people can 

305
00:15:21,960 --> 00:15:24,680
afford it. 
That is a utilitarian calculus 

306
00:15:24,680 --> 00:15:27,600
that works on a spreadsheet, but
it doesn't work when you're the 

307
00:15:27,600 --> 00:15:30,920
one crossing the street. 
The public Rd. should not be a 

308
00:15:30,920 --> 00:15:34,200
testing ground for cost cutting 
measures, disguises innovation. 

309
00:15:35,000 --> 00:15:37,880
The experiences of users in San 
Francisco and Austin show a 

310
00:15:37,880 --> 00:15:40,720
clear divide. 
Waymo with Lidar is providing A 

311
00:15:40,720 --> 00:15:44,200
flawless service while vision 
only systems are struggling with

312
00:15:44,200 --> 00:15:47,560
basic static objects. 
But again, Waymo is on rails. 

313
00:15:47,600 --> 00:15:50,840
It's a local maximum. 
It's great for San Francisco, 

314
00:15:50,840 --> 00:15:52,720
but it doesn't scale to the rest
of the world. 

315
00:15:53,040 --> 00:15:56,200
I'd rather have a safe local 
maximum than a dangerous global 

316
00:15:56,200 --> 00:15:58,880
beta test. 
Until vision systems can match 

317
00:15:58,880 --> 00:16:01,880
the redundancy and depth 
accuracy of Lidar, the safety 

318
00:16:01,880 --> 00:16:04,760
consequences are real and they 
are statistically proven. 

319
00:16:05,160 --> 00:16:08,480
We cannot verify the safety of a
black box neural net without 

320
00:16:08,480 --> 00:16:11,600
ground truth sensors. 
It ultimately comes down to that

321
00:16:11,600 --> 00:16:13,440
multiplier effect we talked 
about. 

322
00:16:13,440 --> 00:16:15,600
It does. 
We are at a crossroads. 

323
00:16:16,040 --> 00:16:19,600
We can take the safe, expensive 
route with Lidar, which might 

324
00:16:19,600 --> 00:16:22,800
limit the scalability of the 
technology but provides that 

325
00:16:22,800 --> 00:16:26,640
warm blanket of redundancy. 
Or we can push for the vision 

326
00:16:26,640 --> 00:16:31,040
solution, which, if it works, 
multiply safety exponentially 

327
00:16:31,040 --> 00:16:34,040
across the globe and solves 
general intelligence. 

328
00:16:34,280 --> 00:16:36,680
But if it fails, it multiplies 
error. 

329
00:16:36,960 --> 00:16:40,200
It multiplies the risk of a 
single software blind spot into 

330
00:16:40,200 --> 00:16:43,480
a nationwide. 
Hazard, and that is the gamble. 

331
00:16:44,040 --> 00:16:47,120
Is the current risk worth the 
future reward? 

332
00:16:47,840 --> 00:16:51,640
I tend to believe that without 
taking that risk, we stagnate. 

333
00:16:51,920 --> 00:16:55,560
We'd still be driving horses if 
we waited for the perfect car. 

334
00:16:55,800 --> 00:16:59,160
And I would argue that safety is
not a place for gambling when 

335
00:16:59,160 --> 00:17:02,120
you're moving 2 tons of steel at
60 mph. 

336
00:17:02,320 --> 00:17:05,359
Pretty good isn't good enough. 
You need absolute truth and 

337
00:17:05,359 --> 00:17:08,800
cameras just don't provide that.
A fundamental disagreement on 

338
00:17:08,800 --> 00:17:12,280
the philosophy of engineering. 
Thank you for listening to the 

339
00:17:12,280 --> 00:17:14,880
debate. 
We hope this exchange has 

340
00:17:14,880 --> 00:17:17,440
illuminated the complexities 
behind the sensors. 

341
00:17:17,599 --> 00:17:20,720
Drive safe everyone, and watch 
out for the robots. 

342
00:17:20,920 --> 00:17:21,560
Goodbye.