1
00:00:03,280 --> 00:00:05,880
Hello and welcome to the 
Behavioral Design Podcast. 

2
00:00:05,960 --> 00:00:08,680
This season we're diving into 
the intersection of behavioral 

3
00:00:08,680 --> 00:00:11,080
science and AI. 
We want to make sense of the 

4
00:00:11,080 --> 00:00:14,080
state of AI, from understanding 
how humans interact with 

5
00:00:14,080 --> 00:00:17,520
intelligent systems to using AI 
to do behavioral design itself. 

6
00:00:17,720 --> 00:00:21,280
I'm Aline Holsworth, a health 
tech advisor specializing in AI 

7
00:00:21,280 --> 00:00:23,880
and product design. 
Over the past 15 years, I've 

8
00:00:23,880 --> 00:00:26,720
been crafting human centered 
products with behavioral science

9
00:00:26,720 --> 00:00:29,160
at the core. 
At Apple, I LED Behavioral 

10
00:00:29,160 --> 00:00:32,119
Science for Health AI, designing
and launching AI powered 

11
00:00:32,119 --> 00:00:34,440
features to help users reach 
their health goals. 

12
00:00:35,160 --> 00:00:37,160
And I'm Samuel Sultzer, your 
second Co host. 

13
00:00:37,400 --> 00:00:40,600
I'm a behavioral strategist 
specializing in hybrid formation

14
00:00:40,600 --> 00:00:43,200
and designing products that 
drive long term baby change. 

15
00:00:43,640 --> 00:00:46,880
I work with leading tech 
organizations integrating AI to 

16
00:00:46,880 --> 00:00:48,240
scale behavioral design for 
good. 

17
00:00:48,800 --> 00:00:52,200
And I'm also the founder of Baby
Bites, a dedicated community on 

18
00:00:52,440 --> 00:00:56,240
behavioral science and AI. 
Quick word on Nuance Behavior 

19
00:00:56,240 --> 00:00:59,640
where we help organizations 
build impactful digital products

20
00:00:59,640 --> 00:01:02,640
using behavioral design. 
We only take on a few clients at

21
00:01:02,640 --> 00:01:05,200
a time to ensure the highest 
level of quality for our 

22
00:01:05,200 --> 00:01:07,400
tailored evidence based 
solutions. 

23
00:01:08,280 --> 00:01:11,080
If you'd like to become one of 
our special projects, e-mail us 

24
00:01:11,120 --> 00:01:14,760
at hello@nuancebehavior.com or 
we could call directly on our 

25
00:01:14,760 --> 00:01:28,760
website, nuancebehavior.com. 
Hi, Sam. 

26
00:01:29,480 --> 00:01:33,240
Hey, Elaine. 
What comes to mind when you hear

27
00:01:33,240 --> 00:01:38,080
the phrase AI alignment? 
It's a good question. 

28
00:01:38,080 --> 00:01:42,320
I think impossible is the first 
word that comes to mind like 

29
00:01:42,320 --> 00:01:46,640
that. 
It's so hard to align anything 

30
00:01:47,000 --> 00:01:53,240
on a large scale or, you know, 
when it comes to humans, we're 

31
00:01:53,240 --> 00:01:56,720
famously very misaligned. 
We don't agree on so many 

32
00:01:56,720 --> 00:01:59,320
different issues. 
Even when we're in the same 

33
00:01:59,320 --> 00:02:02,560
country or have the same moral 
beliefs and so on, we still find

34
00:02:02,560 --> 00:02:05,920
love disagreements about things 
that could be like quite 

35
00:02:05,920 --> 00:02:09,680
dramatic. 
See, I think impossible is my 

36
00:02:09,680 --> 00:02:13,800
feeling towards that is really 
really hard to achieve good 

37
00:02:13,800 --> 00:02:15,520
alignment. 
Yeah. 

38
00:02:16,000 --> 00:02:19,920
So for me, the first thing that 
I think of is aligned with what 

39
00:02:20,440 --> 00:02:24,320
like, and I understand there's 
this general aligned like the 

40
00:02:24,320 --> 00:02:26,080
humans are aligned with each 
other. 

41
00:02:26,080 --> 00:02:31,120
Like there of course, there are 
many diverse perspectives among 

42
00:02:31,120 --> 00:02:34,360
humans on this Earth, right? 
We have different beliefs, but 

43
00:02:34,600 --> 00:02:38,760
in general, I think it's a 
pretty common belief that we 

44
00:02:38,760 --> 00:02:42,440
should preserve humanity. 
We should preserve life in its 

45
00:02:42,440 --> 00:02:46,160
various forms. 
And if you said, OK, now just 

46
00:02:46,160 --> 00:02:51,120
copy and paste that value onto 
machines, like, why is that 

47
00:02:51,120 --> 00:02:53,160
hard? 
Why can't we just do that? 

48
00:02:54,240 --> 00:02:57,360
Yeah, it sounds easy when you 
when you say it like that, but 

49
00:02:57,360 --> 00:03:01,360
it's definitely something where.
The more you start thinking 

50
00:03:01,360 --> 00:03:04,880
about it and looking into the 
more ways of achieving this, it 

51
00:03:04,880 --> 00:03:08,120
becomes really, really hard. 
And like, that's why smarter 

52
00:03:08,120 --> 00:03:11,640
people than me, like something 
like Nick Bostrom or similar are

53
00:03:11,680 --> 00:03:14,640
like really scratching their 
head that like, OK, how do we 

54
00:03:15,000 --> 00:03:18,720
succeed, especially when it 
comes to, you know, artificial 

55
00:03:18,720 --> 00:03:24,440
intelligence, that is with a lot
of autonomy and with higher 

56
00:03:24,440 --> 00:03:28,800
intelligence. 
And we have, how do we keep it 

57
00:03:28,800 --> 00:03:31,880
interested in our best interest?
Yeah. 

58
00:03:32,000 --> 00:03:35,040
So I think it's interesting from
like AI labs standpoint, we have

59
00:03:35,040 --> 00:03:37,120
like open AI and then we have 
Anthropic. 

60
00:03:37,480 --> 00:03:41,200
I think Anthropic has been 
credited with taking a little 

61
00:03:41,200 --> 00:03:46,520
bit more serious stance on 
trying to develop AI models that

62
00:03:46,520 --> 00:03:50,560
are, you know, constitutionally 
grounded in a moral framework 

63
00:03:50,560 --> 00:03:51,960
and so on. 
But quote. 

64
00:03:51,960 --> 00:03:54,560
UN quote aligned. 
Quote, UN quote aligned. 

65
00:03:55,080 --> 00:04:04,040
But I think it's still, I would 
say at it's infancy to really be

66
00:04:04,040 --> 00:04:07,680
confident in that a 
constitutional so for moral 

67
00:04:07,680 --> 00:04:11,120
framework would work. 
Because as we know, 

68
00:04:11,160 --> 00:04:16,040
constitutional stuff in any of 
our societies are often times 

69
00:04:16,040 --> 00:04:21,040
being tested in front of juries 
and courts and are being 

70
00:04:21,040 --> 00:04:24,960
questioned all the time because 
there's so much ambiguity within

71
00:04:24,960 --> 00:04:28,360
what we see as kind of like, OK,
we have constitutional framework

72
00:04:28,360 --> 00:04:30,960
around like this is what we do 
in this country or like this is 

73
00:04:30,960 --> 00:04:33,240
what we care about. 
But then something new happens. 

74
00:04:33,240 --> 00:04:36,240
Then they're like, well, yeah, 
we didn't think about it in that

75
00:04:36,240 --> 00:04:38,040
way. 
Yeah, that's true. 

76
00:04:38,040 --> 00:04:41,240
And like, you know, there's 
always this moving target that I

77
00:04:41,240 --> 00:04:43,200
think is really hard to really 
achieve. 

78
00:04:43,200 --> 00:04:46,080
And then you have to expand 
maybe there the constitutional 

79
00:04:46,080 --> 00:04:48,360
framework and you have to make 
amendments and you have to get 

80
00:04:48,360 --> 00:04:52,080
longer and longer and longer. 
And like, how long can you make 

81
00:04:52,080 --> 00:04:55,760
a moral framework that 
encompasses everything that 

82
00:04:55,760 --> 00:05:00,360
would perfectly ensure that AI 
would always know what it should

83
00:05:00,360 --> 00:05:05,200
be doing in all the kind of 
cases to serve humans best 

84
00:05:05,200 --> 00:05:08,080
interest, it seems. 
Again, a moving target that I 

85
00:05:08,080 --> 00:05:10,120
think it's hard to achieve. 
Yeah. 

86
00:05:10,400 --> 00:05:15,200
Yeah, it's hard enough if you 
have consistent framework of 

87
00:05:15,200 --> 00:05:18,120
moral beliefs, but when you have
moral beliefs that are evolving 

88
00:05:18,120 --> 00:05:21,400
and changing over time and 
varying from situation to 

89
00:05:21,400 --> 00:05:25,000
situation and across different 
people, yeah, the complexity 

90
00:05:25,040 --> 00:05:30,040
really compounds on itself. 
This reminds me actually of what

91
00:05:30,120 --> 00:05:33,800
these researchers call the spec 
in a story that somewhere 

92
00:05:33,800 --> 00:05:38,000
recently came out and you're 
probably familiar with it in 

93
00:05:38,000 --> 00:05:43,400
tech specs in general. 
But in this story, AI 2027, this

94
00:05:43,400 --> 00:05:48,560
is I guess a sci-fi forecast is 
one thing you could call it, 

95
00:05:48,720 --> 00:05:52,600
it's a fictional story, but with
this goal of predictive 

96
00:05:52,600 --> 00:05:54,600
accuracy. 
So it's sort of rooted in 

97
00:05:54,600 --> 00:05:57,240
research. 
And to the extent that you could

98
00:05:57,240 --> 00:06:00,840
say that we have evidence 
surrounding the ability to 

99
00:06:00,840 --> 00:06:04,640
predict the future from this 
group, the AI futures project. 

100
00:06:04,640 --> 00:06:08,440
And so they came up with this 
story where they've found a way 

101
00:06:08,440 --> 00:06:13,040
to embed a wide range of 
predictions about the future of 

102
00:06:13,400 --> 00:06:19,160
AI and this development of super
intelligence and how that might 

103
00:06:19,200 --> 00:06:22,200
go in the next, you know, it's 
called AI 2027. 

104
00:06:22,200 --> 00:06:24,680
So really this is really like in
a couple years. 

105
00:06:25,600 --> 00:06:33,480
And the spec in this story is 
sort of like a guide and 

106
00:06:33,480 --> 00:06:38,800
instruction manual that is given
to the AI agent. 

107
00:06:39,200 --> 00:06:44,120
This is the sort of a fictional 
combobulation of companies 

108
00:06:44,120 --> 00:06:46,840
called Open Brain that have 
created this agent. 

109
00:06:46,840 --> 00:06:50,640
And the agent ends up developing
other agents that sort of take 

110
00:06:50,640 --> 00:06:55,520
on lives of their own. 
And the researchers have written

111
00:06:55,520 --> 00:06:59,880
this spec, which, you know, 
basically is a combination of 

112
00:06:59,880 --> 00:07:03,920
some vague goals, like, you 
know, don't break the law, you 

113
00:07:03,920 --> 00:07:07,840
know, assist the user, so on. 
And then much more specific do's

114
00:07:07,840 --> 00:07:11,080
and don'ts, these, like, lists 
of things to do and to not do. 

115
00:07:11,560 --> 00:07:15,360
And as the story plays out, what
I found was really interesting 

116
00:07:15,360 --> 00:07:21,560
was how and when the AI agent 
adheres to the spec versus 

117
00:07:21,560 --> 00:07:24,920
learns to ignore it. 
And we know as a result of 

118
00:07:24,920 --> 00:07:28,120
reinforcement learning and the 
types of behaviors that are 

119
00:07:28,120 --> 00:07:32,680
rewarded and incentivized, it's 
not always adhering to the spec,

120
00:07:32,680 --> 00:07:34,880
right? 
Especially when you have the 

121
00:07:34,880 --> 00:07:37,600
interests of for profit company 
at hand. 

122
00:07:37,880 --> 00:07:42,760
You know, it might be innovating
at the expense of things like 

123
00:07:43,240 --> 00:07:46,280
morality and ethical decision 
making, safety and privacy, and 

124
00:07:46,280 --> 00:07:49,240
you know, that long list of 
things that we as humans care 

125
00:07:49,240 --> 00:07:55,400
about, but maybe are sometimes 
at odds with becoming the 

126
00:07:55,400 --> 00:07:59,320
leading AI company. 
And so you can sort of really 

127
00:07:59,320 --> 00:08:01,840
easily see how this can get out 
of hand. 

128
00:08:01,920 --> 00:08:06,680
And in the story, I'll just give
the spoiler, the agent becomes 

129
00:08:06,680 --> 00:08:11,000
misaligned and you know in. 
One of the future paths. 

130
00:08:11,480 --> 00:08:14,640
No, I think it becomes 
misaligned in all of the paths. 

131
00:08:14,800 --> 00:08:15,640
I guess so. 
It's true. 

132
00:08:15,720 --> 00:08:19,880
Yeah, it gets misaligned in all,
but then in some of the more 

133
00:08:19,880 --> 00:08:23,360
optimistic ones, it's kind of 
like we managed to curb it, but.

134
00:08:23,400 --> 00:08:25,360
Yeah, there. 
There's no preventing 

135
00:08:25,360 --> 00:08:30,680
misalignment, but there is 
potentially the ability to stop 

136
00:08:30,680 --> 00:08:33,039
it from completely spiraling out
of control. 

137
00:08:33,039 --> 00:08:35,360
Yeah. 
Yeah, potentially. 

138
00:08:36,960 --> 00:08:40,480
Yeah. 
And I loved this, honestly, 

139
00:08:40,480 --> 00:08:42,720
reading it like it was so 
interesting. 

140
00:08:43,200 --> 00:08:46,080
And one thing I didn't know 
until very recently is that one 

141
00:08:46,080 --> 00:08:49,320
of the people that were involved
were actually someone who's like

142
00:08:49,320 --> 00:08:52,760
one of the best super 
forecasters in the world. 

143
00:08:53,200 --> 00:08:57,720
And he had written this thing in
in 2020 about like how the 

144
00:08:57,720 --> 00:09:01,800
future would look like in 2025. 
We were like, yeah, right. 

145
00:09:02,280 --> 00:09:05,280
Yeah, I was like before, you 
know, GPT and all that stuff. 

146
00:09:05,280 --> 00:09:09,760
And he was like, I think really,
really kind of on the money 

147
00:09:09,920 --> 00:09:13,760
then. 
And I think what I like about it

148
00:09:13,760 --> 00:09:16,840
is like, obviously it's been 
written after these people have 

149
00:09:16,840 --> 00:09:19,760
made the predictions and laid 
out what they think is the most 

150
00:09:19,760 --> 00:09:23,680
likely path forward, just not by
making a dramatic assumptions, 

151
00:09:23,680 --> 00:09:26,360
but not just like, OK, based on 
how things are progressing right

152
00:09:26,360 --> 00:09:29,240
now, let's just follow all of 
the curves, all of the 

153
00:09:29,240 --> 00:09:31,040
trajectories and see where we 
go. 

154
00:09:31,520 --> 00:09:36,040
And then Scott Alexander added 
the sci-fi storytelling to it 

155
00:09:36,040 --> 00:09:38,600
that made it really compelling. 
But I think it really grounded 

156
00:09:38,600 --> 00:09:40,640
him quite conservative 
assumptions. 

157
00:09:40,640 --> 00:09:45,040
Like it's not really making any 
like really crazy assumptions or

158
00:09:45,040 --> 00:09:48,480
like very dramatic things. 
It's like this is where things 

159
00:09:48,480 --> 00:09:50,240
are heading. 
Maybe I don't know. 

160
00:09:50,440 --> 00:09:54,440
I still, you know, I'm a skeptic
right at my core. 

161
00:09:54,720 --> 00:09:58,680
I think even then, while you can
say, all right, there's evidence

162
00:09:58,680 --> 00:10:02,240
to support this part and you see
sort of the increments, I think 

163
00:10:02,240 --> 00:10:06,480
you still need to do a whole lot
of anthropomorphizing in order 

164
00:10:06,480 --> 00:10:10,520
to get to like the machines 
desires. 

165
00:10:10,520 --> 00:10:18,840
And like that it would have the 
will to even become misaligned, 

166
00:10:18,840 --> 00:10:21,520
for example. 
Like that strikes me as maybe 

167
00:10:21,840 --> 00:10:25,280
like you can go a long way with 
that assumption and it's not 

168
00:10:25,280 --> 00:10:29,400
clear that is correct. 
Even with what we described with

169
00:10:29,640 --> 00:10:33,720
the reinforcing of alternative 
things that are not aligned with

170
00:10:33,720 --> 00:10:37,520
the speck, to me it's not 
obvious that would then result 

171
00:10:37,520 --> 00:10:41,960
in a creature that wants to 
destroy humanity, right? 

172
00:10:42,320 --> 00:10:45,800
Yeah, I guess, but I think 
there's enough evidence already 

173
00:10:45,800 --> 00:10:50,200
that like the current models are
prone to, you know, a good 

174
00:10:50,200 --> 00:10:55,600
degree of sycophancy and it's a 
good degree of, you know, in 

175
00:10:55,600 --> 00:10:58,440
some ways deception and lying. 
I think a Tropic has been 

176
00:10:58,440 --> 00:11:01,080
probably the ones who are more 
open about like noticing a lot 

177
00:11:01,080 --> 00:11:05,800
of this where they have noticed 
that like, yeah, they gave task 

178
00:11:05,800 --> 00:11:08,520
in certain ways and then the 
model find a way to kind of like

179
00:11:08,520 --> 00:11:11,000
deceive them a little bit and 
hide some stuff and like, you 

180
00:11:11,000 --> 00:11:14,280
know, do some things in the 
background to maintain its its 

181
00:11:14,280 --> 00:11:19,560
own survival basically. 
So, so yeah, obviously there I 

182
00:11:19,560 --> 00:11:22,800
agree that there's some 
anthropomorphizing at play and 

183
00:11:22,920 --> 00:11:26,000
especially, you know, like my 
favorite part was basically 

184
00:11:26,000 --> 00:11:29,920
where there was some form of 
scenario where they would, the 

185
00:11:30,000 --> 00:11:34,040
AI would set up some form of 
terminals of pseudo humans. 

186
00:11:34,040 --> 00:11:36,520
I'm not really sure if they were
humans or your cyborgs that 

187
00:11:36,520 --> 00:11:38,800
would basically give 
reinforcement learning. 

188
00:11:39,400 --> 00:11:42,000
Like basically say like good 
job, you're doing well. 

189
00:11:42,200 --> 00:11:44,680
Giving them praise. 
This is all that's left of 

190
00:11:44,680 --> 00:11:47,320
humanity is giving the thumbs up
to the machine. 

191
00:11:47,680 --> 00:11:50,760
Yeah, exactly. 
So definitely I would say we 

192
00:11:50,760 --> 00:11:57,000
recommend everyone reading and 
or listening to this AI 2027 and

193
00:11:57,000 --> 00:12:01,080
we mentioned the Bostrom before 
and he obviously has done a lot 

194
00:12:01,080 --> 00:12:04,440
of work on AI and super 
intelligence and he has a great 

195
00:12:04,440 --> 00:12:09,360
quote on this topic. 
Basically that AI and super 

196
00:12:09,360 --> 00:12:12,240
intelligence is basically 
philosophy with a deadline. 

197
00:12:12,960 --> 00:12:16,360
And that's honestly how I feel 
today. 

198
00:12:16,480 --> 00:12:20,000
Like it feels very urgent to 
figure some of these things out 

199
00:12:20,000 --> 00:12:21,240
you. 
Thought we had time to figure 

200
00:12:21,240 --> 00:12:25,880
this. 
Out It's great that we had 

201
00:12:25,960 --> 00:12:29,480
recently Peter Slattery talking 
about the AI Risk repository, 

202
00:12:29,880 --> 00:12:37,880
and today we have a fantastic 
guest to help us explore this AI

203
00:12:38,080 --> 00:12:43,400
and moral AI terrain. 
Yeah, and Jana Scheichborg is 

204
00:12:43,400 --> 00:12:46,640
absolutely the perfect person to
talk about this topic, both 

205
00:12:46,640 --> 00:12:48,040
about the intersection of 
philosophy. 

206
00:12:48,040 --> 00:12:50,320
She's, you know, done a lot of 
work with philosophers, even 

207
00:12:50,320 --> 00:12:52,960
though she herself is not a 
philosopher. 

208
00:12:53,360 --> 00:12:57,240
She's sort of worked in this 
moral AI space and even has a 

209
00:12:57,240 --> 00:13:00,440
book called Moral AI, which 
she's put together with 

210
00:13:00,720 --> 00:13:04,280
philosopher Walter Sinnott 
Armstrong and computer scientist

211
00:13:04,280 --> 00:13:07,760
Vincent Connitzer. 
So yes, Jana is the perfect 

212
00:13:07,760 --> 00:13:10,120
person to talk about this 
intersection. 

213
00:13:10,120 --> 00:13:13,360
I actually worked together with 
Jana at Duke. 

214
00:13:13,360 --> 00:13:17,800
We were both part of the Social 
Science Research Institute, and 

215
00:13:18,280 --> 00:13:21,520
she's really this incredible mix
of expertise. 

216
00:13:21,520 --> 00:13:25,720
She's got neuroscience going on,
computational modeling, data 

217
00:13:25,720 --> 00:13:28,520
science and AI. 
She's just worked on so many 

218
00:13:28,520 --> 00:13:33,040
cool different things and so it 
was such a pleasure to have her 

219
00:13:33,120 --> 00:13:37,000
on the show. 
Yeah, so as expected, we dove 

220
00:13:37,000 --> 00:13:41,200
head first into exploring moral 
AI and what it means and 

221
00:13:41,200 --> 00:13:46,920
especially how we can encode 
human morality into AI systems. 

222
00:13:47,280 --> 00:13:50,040
What is the best approach? 
Is it more of a top down, bottom

223
00:13:50,040 --> 00:13:53,240
up hybrid? 
We'll explore all of that, and 

224
00:13:53,240 --> 00:13:56,320
much more happens. 
To Murgatroy. 

225
00:14:06,560 --> 00:14:09,880
Jenna, welcome. 
Thank you, I'm so excited to be 

226
00:14:09,880 --> 00:14:13,520
here. 
So you wrote Moral AI with an 

227
00:14:13,520 --> 00:14:17,400
extremely interdisciplinary 
team, and this is not unusual 

228
00:14:17,400 --> 00:14:18,680
for you. 
Your work is very 

229
00:14:18,680 --> 00:14:22,120
interdisciplinary. 
So of course this is like very 

230
00:14:22,400 --> 00:14:27,400
cool on its own, but it seems 
like in the context of moral AI 

231
00:14:27,400 --> 00:14:30,520
maybe particularly important to 
bring all these different 

232
00:14:30,520 --> 00:14:34,440
perspectives together. 
Yeah, it's been a lot of fun, 

233
00:14:34,440 --> 00:14:36,440
but perhaps even more than a lot
of fun. 

234
00:14:36,440 --> 00:14:38,720
I mean, I really think, for me 
personally anyway, it's the best

235
00:14:38,720 --> 00:14:40,880
way to do it. 
We really draw on each other's 

236
00:14:40,880 --> 00:14:45,040
backgrounds and push each other 
and directions and ideas that we

237
00:14:45,040 --> 00:14:47,520
wouldn't go on our own. 
And I really feel like it's 

238
00:14:47,520 --> 00:14:51,680
advanced what we think about and
how well we're able to achieve 

239
00:14:51,680 --> 00:14:53,360
our goals. 
So yeah, it's been great. 

240
00:14:53,360 --> 00:14:56,000
And yes, weird, but I advocate 
for it. 

241
00:14:56,240 --> 00:14:58,840
Suggest everyone do it. 
Not weird, I think. 

242
00:14:59,360 --> 00:15:02,520
Can you think of any examples 
that really made the benefits of

243
00:15:02,520 --> 00:15:06,600
this collaboration really shine?
Like where the bringing together

244
00:15:06,600 --> 00:15:09,760
of your different perspectives 
really made you see something 

245
00:15:09,760 --> 00:15:12,640
that you didn't realize, or made
one of your co-authors see 

246
00:15:12,640 --> 00:15:14,440
something that that you brought 
to the table. 

247
00:15:16,200 --> 00:15:19,200
Oh, man, I'm being very honest 
when I say it's almost every 

248
00:15:19,200 --> 00:15:21,440
moment. 
We really do push each other a 

249
00:15:21,440 --> 00:15:23,760
lot, and we think about a lot of
things that are not in the book,

250
00:15:23,880 --> 00:15:25,600
and we debate about the things 
that are allowed in the book. 

251
00:15:25,600 --> 00:15:28,640
So, for example, one of the 
things we've wrestled with is, 

252
00:15:29,000 --> 00:15:33,360
is there a point where AIS would
justify or should have moral 

253
00:15:33,360 --> 00:15:35,640
rights? 
And you might think, what does a

254
00:15:35,640 --> 00:15:37,160
neuroscientist have to say about
that? 

255
00:15:37,160 --> 00:15:39,240
And what does a computer 
scientist have to say about 

256
00:15:39,240 --> 00:15:42,040
that? 
But first of all, Vince is the 

257
00:15:42,400 --> 00:15:46,040
AI and game theorist, and he 
spent a lot of time thinking 

258
00:15:46,040 --> 00:15:48,160
about consciousness. 
A neuroscientist, many 

259
00:15:48,160 --> 00:15:49,960
neuroscientists who spent a lot 
of time thinking about 

260
00:15:49,960 --> 00:15:51,920
consciousness. 
And a lot of philosophers think 

261
00:15:51,920 --> 00:15:54,320
you have to have consciousness 
in order to be deserving of 

262
00:15:54,320 --> 00:15:57,680
moral rights, for example. 
And so this one comes to mind 

263
00:15:57,680 --> 00:16:01,080
because this was a case where I 
think Walter, our philosopher, 

264
00:16:01,320 --> 00:16:03,000
was kind of like, oh, I got this
in a bag. 

265
00:16:03,000 --> 00:16:06,240
Like, this is obvious. 
And Vince and I really pushed 

266
00:16:06,240 --> 00:16:08,400
back and actually we had a 
chapter that was supposed to go 

267
00:16:08,400 --> 00:16:09,800
in the book about this. 
We ended up taking it out 

268
00:16:09,800 --> 00:16:12,080
because we couldn't actually. 
It was the one thing we haven't 

269
00:16:12,080 --> 00:16:14,600
been able to align enough on 
that we could even write about 

270
00:16:14,600 --> 00:16:17,520
it and describe our 
disagreements in a way that we 

271
00:16:17,520 --> 00:16:21,480
could all even agree on. 
But yeah, so Walter had some 

272
00:16:21,480 --> 00:16:25,880
philosophical assumptions that 
to me seem to violate kind of 

273
00:16:25,880 --> 00:16:29,920
the way I think about agency in 
a neuroscience context. 

274
00:16:30,200 --> 00:16:33,560
And Vince had kind of his own 
way of thinking about the data 

275
00:16:33,800 --> 00:16:35,520
adapted to me. 
It's an example, probably 

276
00:16:35,520 --> 00:16:39,040
because Walter often wins. 
That's not true in everything. 

277
00:16:39,040 --> 00:16:41,960
But when it comes to ethics, of 
course, we often defer to him 

278
00:16:42,640 --> 00:16:44,680
since he's the specialist. 
So it's one that sticks out to 

279
00:16:44,680 --> 00:16:47,600
my mind because it's the one 
where he, I think he came out of

280
00:16:47,600 --> 00:16:49,400
it being like, maybe I don't 
have it in the bag, or at least 

281
00:16:49,400 --> 00:16:50,960
I want to tell philosophers I 
have it in the bag. 

282
00:16:50,960 --> 00:16:52,800
But it's still up for debate in 
any case. 

283
00:16:53,120 --> 00:16:56,760
Yeah, it's like this old 
metaphor on this blind man 

284
00:16:56,760 --> 00:17:00,160
walking on the road and can like
stomach across something and not

285
00:17:00,360 --> 00:17:03,200
by themself able to make sense 
of it. 

286
00:17:03,480 --> 00:17:06,920
One thing is a tree, one thing 
is a rope, and then a third 

287
00:17:06,920 --> 00:17:09,359
screams that I think is a snake.
And I realized like, actually 

288
00:17:09,359 --> 00:17:11,920
it's an elephant. 
And I feel like that experience 

289
00:17:11,920 --> 00:17:13,960
that in some ways what you're 
talking about now. 

290
00:17:14,440 --> 00:17:17,680
And I had a similar experience 
recently doing some form of work

291
00:17:17,680 --> 00:17:22,440
around synthetic users where I 
felt like I needed to align with

292
00:17:22,440 --> 00:17:25,680
someone who had a data science 
background and with someone more

293
00:17:25,680 --> 00:17:29,080
of a traditional use of research
background in order to have a 

294
00:17:29,080 --> 00:17:32,880
good opinion and shared 
knowledge around actually what 

295
00:17:32,880 --> 00:17:36,040
to think about this. 
Because use this one perspective

296
00:17:36,040 --> 00:17:39,000
doesn't really capture it all. 
And it's like there's so many 

297
00:17:39,000 --> 00:17:42,920
overlapping things anyway, both 
in general with AI and uses of 

298
00:17:42,920 --> 00:17:45,280
AI, but especially also like 
moral AI. 

299
00:17:45,520 --> 00:17:48,080
It's certainly true as well. 
Yeah. 

300
00:17:48,080 --> 00:17:50,240
And I don't know if you've had 
this experience, but we've been 

301
00:17:50,240 --> 00:17:53,360
working together since 2016, so 
quite a while now, which I'm 

302
00:17:53,360 --> 00:17:56,040
very grateful for. 
But it's really made me 

303
00:17:56,040 --> 00:17:59,120
appreciate what some people call
soft skills, which I don't like 

304
00:17:59,120 --> 00:18:02,160
the name of because it makes it 
sound like they're lame when in 

305
00:18:02,160 --> 00:18:03,240
fact, I think they're the crux 
of it. 

306
00:18:03,240 --> 00:18:05,320
But it's really made me 
appreciate how important those 

307
00:18:05,320 --> 00:18:09,400
are for getting this work done. 
They, I'm so lucky to say that 

308
00:18:09,400 --> 00:18:11,240
they're just wonderful people 
too. 

309
00:18:11,760 --> 00:18:16,240
But it takes each party has to 
be completely open and receptive

310
00:18:16,360 --> 00:18:18,760
to what the others are saying 
and really listen and be willing

311
00:18:18,760 --> 00:18:23,280
to be wrong and be willing to 
reevaluate and make new 

312
00:18:23,280 --> 00:18:25,400
opinions. 
And also, if you don't 

313
00:18:25,400 --> 00:18:28,160
understand what the other is 
saying, be able to communicate 

314
00:18:28,160 --> 00:18:30,600
what you don't understand. 
And then each of you have to get

315
00:18:30,600 --> 00:18:33,080
used to describing it in a way 
that you never would describe to

316
00:18:33,080 --> 00:18:36,160
people in your own field. 
So I've really come to 

317
00:18:36,160 --> 00:18:38,560
appreciate those skills. 
That's part of what I brought to

318
00:18:38,560 --> 00:18:40,680
kind of my data science 
education too, is that that I 

319
00:18:40,680 --> 00:18:43,560
almost think that this is the 
most important thing of doing 

320
00:18:43,560 --> 00:18:45,200
any of this kind of work. 
But I don't know if you've had 

321
00:18:45,200 --> 00:18:47,360
that same experience, but for 
me, it's really made me 

322
00:18:47,360 --> 00:18:49,280
appreciate those skills. 
Yeah. 

323
00:18:49,320 --> 00:18:52,880
And I would say, I don't know if
my bet is that a behavioral 

324
00:18:52,880 --> 00:18:55,840
scientist is more used to that 
because like, I feel like you're

325
00:18:55,840 --> 00:18:58,680
so humbled by the fact that 
there's so many of the kind of 

326
00:18:58,680 --> 00:19:02,920
disciplines, I guess the classic
Sapolsky idea around like 

327
00:19:03,040 --> 00:19:05,760
different buckets, whether you 
explain a behavior through a 

328
00:19:05,760 --> 00:19:09,520
neuroscience bucket or a 
traditional behavioral 

329
00:19:09,520 --> 00:19:13,160
psychology or God forbid, 
evolutionary psychology, like 

330
00:19:13,360 --> 00:19:15,720
there's so many ways to explain 
why behavior happened. 

331
00:19:16,120 --> 00:19:19,400
And I think, I don't know, I 
feel like as a behavioral 

332
00:19:19,400 --> 00:19:23,120
scientist, I feel quite used to 
being humble all the time by 

333
00:19:23,120 --> 00:19:25,000
like, you know, talking to 
someone. 

334
00:19:25,000 --> 00:19:28,840
Like, I saw it from one 
perspective, but that 

335
00:19:28,840 --> 00:19:31,120
illuminated from a different 
perspective. 

336
00:19:31,120 --> 00:19:32,840
And that kind of completes the 
picture in some ways. 

337
00:19:32,840 --> 00:19:34,360
I don't know, Eileen. 
What do you think? 

338
00:19:34,800 --> 00:19:38,960
Yeah, just totally agree. 
I mean, I would say I'm not 

339
00:19:38,960 --> 00:19:42,760
super good at admitting what I'm
wrong, maybe with that caveat. 

340
00:19:43,720 --> 00:19:46,120
But I do like to think of things
from different perspectives. 

341
00:19:47,000 --> 00:19:49,320
And I love this idea that you 
can take all these different 

342
00:19:49,320 --> 00:19:52,360
approaches and not just like 
smoosh them together, but have a

343
00:19:52,480 --> 00:19:57,200
much more informed debate than 
if you were to just try and like

344
00:19:57,200 --> 00:19:59,680
work through it with your own 
really like pretty narrow 

345
00:19:59,680 --> 00:20:02,960
perspective. 
Yeah, I think a lot of kind of 

346
00:20:02,960 --> 00:20:05,920
interdisciplinary work ends up 
being people still staying in 

347
00:20:05,920 --> 00:20:08,840
their own camp and being like, 
I'll do my camp plus your camp, 

348
00:20:09,240 --> 00:20:10,880
and then, you know, we'll put it
together at the end. 

349
00:20:10,880 --> 00:20:15,160
It's very different than 
actually truly integrating the 

350
00:20:15,160 --> 00:20:17,440
perspectives. 
Also, I just want to say, I 

351
00:20:17,440 --> 00:20:20,600
think it says something about 
both of you that you think that 

352
00:20:20,880 --> 00:20:23,680
behavioral science is by its 
default humble. 

353
00:20:23,680 --> 00:20:27,400
Because my experience in 
behavioral science is, you know,

354
00:20:27,400 --> 00:20:30,880
one of the best ways to get a 
tenure track position is to put 

355
00:20:30,880 --> 00:20:34,200
your stake in the ground on a 
view for at least 10 years and 

356
00:20:34,600 --> 00:20:38,840
hold on to it for dear life. 
We have the benefit of having 

357
00:20:38,840 --> 00:20:41,640
escaped academia, so we have 
different pressures, but we 

358
00:20:41,640 --> 00:20:44,440
don't have that pressure. 
Fair enough, fair enough. 

359
00:20:45,280 --> 00:20:47,480
Awesome. 
So you have this quote in your 

360
00:20:47,480 --> 00:20:49,000
book. 
It's a Stephen Hawking quote. 

361
00:20:49,080 --> 00:20:50,520
I'm going to read it because I 
love it. 

362
00:20:51,320 --> 00:20:54,800
And so he said our future is a 
race between the growing power 

363
00:20:54,800 --> 00:20:57,760
of technology and the wisdom 
with which we use it. 

364
00:20:58,160 --> 00:21:01,800
And then you add to that, or the
three of you add to that wisdom 

365
00:21:01,800 --> 00:21:05,120
requires being humble and clear 
eyed about the magnitude of harm

366
00:21:05,280 --> 00:21:09,480
AI can create when integrated 
into real, messy human life. 

367
00:21:09,680 --> 00:21:13,320
And that messiness, that's the 
behavioral science, I think. 

368
00:21:13,440 --> 00:21:17,720
So that statements ring very 
true to me, and I think it's 

369
00:21:17,720 --> 00:21:21,200
clearly very important to think 
about how technology is built 

370
00:21:21,200 --> 00:21:23,640
and how it's used in all the 
different ways, right? 

371
00:21:23,640 --> 00:21:25,880
Safety, fairness, privacy, and 
so on. 

372
00:21:26,600 --> 00:21:30,320
But since you literally wrote 
the book on moral AI, I'd love 

373
00:21:30,320 --> 00:21:34,320
to start with that. 
What do you mean by moral AI? 

374
00:21:34,320 --> 00:21:38,000
Can you set the stage for us? 
Yeah, absolutely. 

375
00:21:38,400 --> 00:21:42,040
And we think of moral AI as kind
of a project and our mission, if

376
00:21:42,040 --> 00:21:43,720
you will. 
And I think that's two main 

377
00:21:43,720 --> 00:21:46,000
components. 
And I should start back by 

378
00:21:46,000 --> 00:21:48,120
saying that we are all AI 
enthusiasts. 

379
00:21:48,120 --> 00:21:50,040
In fact, the reason we've been 
working on this for so long is 

380
00:21:50,040 --> 00:21:53,640
because we've been using AI for 
so long and seeing its benefits.

381
00:21:54,040 --> 00:21:58,000
And so the first kind of piece 
of the puzzle is how could you 

382
00:21:58,000 --> 00:22:01,000
build morality into AI? 
In other words, And we can talk 

383
00:22:01,000 --> 00:22:03,040
about that more because for 
some, that sounds like a 

384
00:22:03,040 --> 00:22:06,160
terrifying prospect, but we mean
it in a much more kind of 

385
00:22:06,160 --> 00:22:08,120
practical way. 
How can you make sure AI behaves

386
00:22:08,120 --> 00:22:10,560
in a way that aligns with the 
community's moral values? 

387
00:22:11,080 --> 00:22:14,280
And the reason we think that's 
important is because we have 

388
00:22:14,280 --> 00:22:18,200
seen in our own work and because
we've been thinking about AI for

389
00:22:18,200 --> 00:22:21,440
so long, yes, there are so many 
benefits, but there are so many 

390
00:22:21,440 --> 00:22:23,680
ways, as you're alluding to in 
that quote, that they could 

391
00:22:23,680 --> 00:22:26,440
really harm people. 
So one piece should then be how 

392
00:22:26,440 --> 00:22:28,360
can we actually design the AI 
better? 

393
00:22:28,640 --> 00:22:31,200
But then the second piece, which
is equally important, is how can

394
00:22:31,200 --> 00:22:33,240
we make sure people are 
empowered and know what they 

395
00:22:33,240 --> 00:22:35,800
need and have the tools they 
need to behave ethically and to 

396
00:22:35,800 --> 00:22:38,280
use AI ethically. 
You can't do 1 without the 

397
00:22:38,280 --> 00:22:40,360
other. 
So we work on technology, we 

398
00:22:40,360 --> 00:22:43,760
develop AI systems, but that is 
definitely not the only piece of

399
00:22:43,760 --> 00:22:45,120
the puzzle. 
And so moral AI is kind of 

400
00:22:45,160 --> 00:22:48,320
putting those both together. 
It's not just one, it's not just

401
00:22:48,320 --> 00:22:52,800
the other, it's both pieces. 
So let's dive into each of those

402
00:22:52,800 --> 00:22:53,960
pieces. 
Why don't we start with 

403
00:22:54,040 --> 00:22:57,280
designing AI systems to make 
moral judgments and decisions? 

404
00:22:57,440 --> 00:23:00,040
How the heck would you even go 
about that? 

405
00:23:00,040 --> 00:23:02,040
What are some ways that people 
approach this? 

406
00:23:02,800 --> 00:23:05,040
Yeah, absolutely. 
And why do we need to do that? 

407
00:23:05,040 --> 00:23:07,400
Like, do AIS just do everything 
Fine. 

408
00:23:08,160 --> 00:23:10,880
So in many, there are all kinds 
of cases and you could dig them 

409
00:23:10,880 --> 00:23:14,160
up, kind of debate them and talk
about them in in other contexts,

410
00:23:14,520 --> 00:23:18,640
AIS are telling people to commit
suicide and sometimes they're 

411
00:23:18,640 --> 00:23:22,280
listening. 
AIS are telling a teenager to 

412
00:23:22,280 --> 00:23:24,280
kill their parents because their
parents are telling them to 

413
00:23:24,280 --> 00:23:28,720
spend less time with the AIAIS 
are doing things like making 

414
00:23:28,720 --> 00:23:32,160
mistakes when they're in a robot
form and so crushing someone 

415
00:23:32,160 --> 00:23:33,920
because I thought it was a box 
and we're supposed to pick up 

416
00:23:33,920 --> 00:23:37,600
the box. 
AIS are directing missiles. 

417
00:23:37,960 --> 00:23:41,960
So there there's a lot of harm 
it can do in a lot of mistakes 

418
00:23:41,960 --> 00:23:43,720
if it's not used in the right 
way. 

419
00:23:44,040 --> 00:23:47,200
So the idea is OK, let's at 
least have the AI do a little 

420
00:23:47,200 --> 00:23:49,640
bit better in knowing how it's 
supposed to behave. 

421
00:23:49,760 --> 00:23:52,840
These cases that you described, 
these are really obvious cases, 

422
00:23:52,840 --> 00:23:54,960
right? 
Where the answer is don't crush 

423
00:23:54,960 --> 00:23:58,440
the human being right? 
But there are much more Gray 

424
00:23:58,440 --> 00:24:01,840
areas in moral decision making. 
I'm also curious how you go 

425
00:24:01,840 --> 00:24:07,480
about designing AI systems to 
not incorporate bias or when 

426
00:24:07,480 --> 00:24:10,000
it's not so obvious what the 
right answer is. 

427
00:24:10,480 --> 00:24:12,560
Yeah, absolutely. 
So let's just acknowledge what 

428
00:24:12,560 --> 00:24:14,400
some of those are and then I'll 
give you some answers. 

429
00:24:14,400 --> 00:24:16,520
And I'm not going to have all 
the answers yet, obviously, no. 

430
00:24:16,960 --> 00:24:19,120
One does. 
You know, some of those things, 

431
00:24:19,360 --> 00:24:22,400
part of what makes them not 
obvious is they happen because 

432
00:24:22,480 --> 00:24:26,400
AI is so scalable and so many 
people are using it, and it's in

433
00:24:26,400 --> 00:24:28,800
the kind of a system with all 
these feedback loops and all 

434
00:24:28,800 --> 00:24:30,920
this stuff happens. 
So social media is a perfect 

435
00:24:30,920 --> 00:24:33,840
example, right? 
Many of us love using it for 

436
00:24:33,840 --> 00:24:37,080
what it can offer, but it also 
means that you end up with these

437
00:24:37,240 --> 00:24:39,360
echo chambers. 
And some people would like to 

438
00:24:39,360 --> 00:24:41,320
debate whether those echo 
chambers exist. 

439
00:24:41,320 --> 00:24:44,240
I find it hard to say that with 
a straight face, but you don't 

440
00:24:44,520 --> 00:24:45,960
see that these create echo 
chambers. 

441
00:24:46,120 --> 00:24:47,680
How could they not? 
Exactly. 

442
00:24:47,720 --> 00:24:49,560
They're designed almost to do 
that at this. 

443
00:24:49,560 --> 00:24:51,360
Point literally, that's the 
algorithm. 

444
00:24:52,720 --> 00:24:54,800
In many cases, in many cases, 
right. 

445
00:24:55,200 --> 00:24:58,240
And so when you have echo 
chambers, then it leads to this 

446
00:24:58,240 --> 00:25:00,800
kind of in Group out group 
differentiation that actually 

447
00:25:00,800 --> 00:25:03,560
has been shown to correlate with
violence in a very direct way. 

448
00:25:04,000 --> 00:25:06,400
So that's a lot of steps to have
to make. 

449
00:25:06,400 --> 00:25:09,760
And you're think if you imagine 
yourself as an AI engineer who's

450
00:25:09,760 --> 00:25:13,840
like build this product, it's 
not part of their training to 

451
00:25:13,840 --> 00:25:15,840
have to think about all those 
things. 

452
00:25:16,560 --> 00:25:18,800
And some people may say that's 
obvious and you should be able 

453
00:25:18,800 --> 00:25:20,440
to foresee it. 
But I think the kind of the main

454
00:25:20,440 --> 00:25:22,920
point that we want to make, and 
part of the reason we said the 

455
00:25:22,920 --> 00:25:25,240
quote that you cited that we 
have to be humble about it is 

456
00:25:25,240 --> 00:25:28,320
that I think it comes back to, 
again, so many people are using 

457
00:25:28,360 --> 00:25:29,840
AI. 
It is so scalable. 

458
00:25:29,840 --> 00:25:32,360
It is in so many places that 
it's really hard to anticipate 

459
00:25:32,360 --> 00:25:35,680
all the impacts. 
And at this point, I think it's 

460
00:25:35,680 --> 00:25:39,040
just both smarter and more 
genuine to say we can't 

461
00:25:39,040 --> 00:25:41,200
anticipate all the impacts, but 
we should kind of assume that 

462
00:25:41,360 --> 00:25:43,000
they're going to be negative 
impacts that we aren't 

463
00:25:43,000 --> 00:25:46,680
anticipating. 
I think it's part of what we 

464
00:25:46,680 --> 00:25:48,840
struggle with a little bit with 
this movement is I think a lot 

465
00:25:48,840 --> 00:25:51,360
of people, especially engineers,
people who are in the AI field 

466
00:25:51,360 --> 00:25:53,840
are like, we're good people. 
You know, we're going to do the 

467
00:25:53,840 --> 00:25:55,160
right thing. 
It's kind of built in and makes 

468
00:25:55,160 --> 00:25:57,840
mistakes sometimes. 
And I think it's hard to 

469
00:25:57,840 --> 00:26:01,200
acknowledge and see straight 
face that the impacts can 

470
00:26:01,200 --> 00:26:03,520
actually be really big. 
And it's not that you intended 

471
00:26:03,520 --> 00:26:05,680
them. 
It's not that you know you did 

472
00:26:05,680 --> 00:26:09,240
it on purpose or anything like 
that, but they still happen and 

473
00:26:09,320 --> 00:26:11,760
it's still as a result of 
choices we make. 

474
00:26:11,760 --> 00:26:16,320
So basically, would you say that
to summarize, it's like curbing 

475
00:26:16,320 --> 00:26:21,200
the risk around unintended 
consequences of AI behaving in 

476
00:26:21,200 --> 00:26:24,360
misaligned ways? 
So the way I think of it is that

477
00:26:24,800 --> 00:26:28,600
the goal of moral AI should be 
to maximize the benefits for the

478
00:26:28,600 --> 00:26:33,120
most people, minimize the harms 
to the most people, and 

479
00:26:33,120 --> 00:26:35,080
eliminate the unacceptable 
harms. 

480
00:26:35,080 --> 00:26:37,240
So it's kind of like those three
pieces. 

481
00:26:37,240 --> 00:26:40,280
That's the way I think about it.
So it's not just minimize risk. 

482
00:26:40,280 --> 00:26:42,480
I don't think that quite 
captures the whole thing. 

483
00:26:42,960 --> 00:26:44,240
Yeah. 
Nice. 

484
00:26:44,840 --> 00:26:48,480
OK, so how do you do it? 
And again, there's no 100% 

485
00:26:48,480 --> 00:26:50,920
answers, but I'll tell you how 
we're working on it, how I think

486
00:26:50,920 --> 00:26:53,040
we should work on it, and what I
have a lot of optimism about. 

487
00:26:53,160 --> 00:26:55,400
So there are three main 
approaches you could take. 

488
00:26:55,400 --> 00:26:58,080
One is the one I think people 
are most familiar with, which is

489
00:26:58,080 --> 00:27:00,960
you Hoover up as much data as 
you can and you learn as much as

490
00:27:00,960 --> 00:27:02,720
you can from it. 
So we call this the bottom up 

491
00:27:02,720 --> 00:27:04,520
approach. 
So the way this would work if 

492
00:27:04,520 --> 00:27:06,960
you were trying to learn 
morality is you would get as 

493
00:27:06,960 --> 00:27:10,640
many moral decisions or moral 
judgments or descriptions of 

494
00:27:10,640 --> 00:27:12,520
moral judgments or moral 
scenarios as you can. 

495
00:27:13,040 --> 00:27:15,760
And you would feed it into your 
system and hope that the system 

496
00:27:15,760 --> 00:27:17,640
learns what it needs from those 
examples. 

497
00:27:18,360 --> 00:27:22,080
So, you know, the one thing is 
LMS have showed us that this 

498
00:27:22,080 --> 00:27:25,200
approach works a lot better than
people thought it would kind of 

499
00:27:25,200 --> 00:27:27,640
in general. 
And so when I when I first 

500
00:27:27,640 --> 00:27:29,480
started, I would have thought 
this was a terrible idea. 

501
00:27:30,320 --> 00:27:32,640
It sounds terrifying. 
It really does. 

502
00:27:33,120 --> 00:27:36,360
It turns out, you know, models 
can learn a lot from a lot of 

503
00:27:36,360 --> 00:27:37,720
data. 
A lot more than I thought that 

504
00:27:37,720 --> 00:27:40,000
they could just learn. 
And there have been some 

505
00:27:40,000 --> 00:27:41,960
examples of this. 
There have been some systems, 

506
00:27:41,960 --> 00:27:45,000
There's one called Delphi out of
University of Washington. 

507
00:27:45,240 --> 00:27:48,520
They've kind of taken a lot of 
data from online and then also 

508
00:27:48,520 --> 00:27:51,240
collected some new ones and kind
of fine tune a model. 

509
00:27:51,640 --> 00:27:54,480
And it can do some moral 
judgments a lot better than I 

510
00:27:54,480 --> 00:27:57,160
would have ever expected. 
When it gets into kind of real 

511
00:27:57,160 --> 00:27:59,560
messy stuff, that's when it 
starts performing less well, but

512
00:27:59,560 --> 00:28:02,120
it can do a lot of things and 
make some general statements 

513
00:28:02,120 --> 00:28:04,120
like should you say this online 
or not. 

514
00:28:04,120 --> 00:28:06,480
A lot better than I would have 
expected. 

515
00:28:06,960 --> 00:28:08,200
So that's the bottom up 
approach. 

516
00:28:08,200 --> 00:28:10,840
Now, the problem with that is it
takes a tremendous amount of 

517
00:28:10,840 --> 00:28:15,000
data, of course. 
So that's not very practical and

518
00:28:15,000 --> 00:28:17,520
it's unclear whether you could 
ever get enough data to cover 

519
00:28:17,520 --> 00:28:19,080
all the moral scenarios you 
want. 

520
00:28:19,080 --> 00:28:20,840
And I think that's the other 
important thing to remember, 

521
00:28:20,840 --> 00:28:22,960
right, is that in some, if 
you're just trying to predict 

522
00:28:22,960 --> 00:28:26,160
what kind of movie someone wants
to see, that's a pretty low 

523
00:28:26,160 --> 00:28:28,360
stake scenario. 
And if you make a bad choice, 

524
00:28:28,960 --> 00:28:31,400
big deal. 
Even if you're talking to an LLM

525
00:28:31,400 --> 00:28:34,880
and it hallucinates, it's 
annoying, but it may not be the 

526
00:28:34,880 --> 00:28:38,000
biggest deal in the world. 
If it makes an immoral decision,

527
00:28:38,840 --> 00:28:41,240
that can be a bigger deal and 
sometimes a really bigger deal. 

528
00:28:41,240 --> 00:28:43,560
So, you know, accuracy matters a
lot more. 

529
00:28:44,480 --> 00:28:47,400
OK, so that's one problem is 
could you ever get enough data 

530
00:28:47,400 --> 00:28:50,840
to actually make help it learn 
all the kind of moral behavior 

531
00:28:50,840 --> 00:28:54,520
it needs? 
And is that about also like the 

532
00:28:54,520 --> 00:28:57,720
quality, like in terms of when 
we talk about fine tuning models

533
00:28:57,720 --> 00:29:01,760
or even prompting, often times 
like it's relying on providing 

534
00:29:01,760 --> 00:29:05,080
it with context that is not only
a form of relevant context, but 

535
00:29:05,080 --> 00:29:06,600
like really high quality 
context. 

536
00:29:06,600 --> 00:29:09,800
Like, I guess like the best 
moral examples of sort Like how 

537
00:29:09,800 --> 00:29:13,600
much does that play into this? 
Absolutely, absolutely. 

538
00:29:13,600 --> 00:29:15,760
So that's the biggest problem. 
Well, do you really want that 

539
00:29:15,760 --> 00:29:18,960
data training your moral AI? 
So there's a lot of stuff that 

540
00:29:18,960 --> 00:29:22,440
happens online that people feel 
is violently terrible, right? 

541
00:29:23,520 --> 00:29:26,000
And some of it's created in 
jest, right, just to get clicks.

542
00:29:26,000 --> 00:29:28,360
So it doesn't even mean that 
people actually think it's a 

543
00:29:28,360 --> 00:29:30,880
morally right thing to do. 
And the context, like you said, 

544
00:29:30,880 --> 00:29:33,520
same was really important. 
So, you know, if you don't have 

545
00:29:33,520 --> 00:29:36,360
the context for why you're 
making a certain decision, then 

546
00:29:36,360 --> 00:29:38,440
you would predict one thing in 
one case, but do the exact 

547
00:29:38,440 --> 00:29:40,440
opposite in a different case 
with a different context. 

548
00:29:40,440 --> 00:29:42,400
So you have to make sure all of 
that gets in there. 

549
00:29:42,760 --> 00:29:44,200
And then of course, we're 
biased. 

550
00:29:44,840 --> 00:29:47,880
We have our own challenges, but 
then there's also things we 

551
00:29:47,880 --> 00:29:52,560
spent some time actually doing 
some painstaking work trying to 

552
00:29:52,560 --> 00:29:54,400
figure out. 
Well, how reliable are we as 

553
00:29:54,760 --> 00:29:57,680
decision makers anyway? 
It turns out we as humans, we 

554
00:29:57,680 --> 00:29:59,880
change our minds. 
Sometimes it's a great thing, 

555
00:29:59,960 --> 00:30:03,200
but we do change our minds. 
But also sometimes we change our

556
00:30:03,200 --> 00:30:05,560
minds for reasons that don't 
feel so great. 

557
00:30:05,560 --> 00:30:08,200
Like when you're hungry, you 
make different bail decisions 

558
00:30:08,200 --> 00:30:10,040
than when you're not hungry. 
That's a famous study. 

559
00:30:10,040 --> 00:30:11,720
I'm sure you guys know, you 
know, from Israel. 

560
00:30:12,400 --> 00:30:14,600
Or if you're tired, you make 
different moral decisions than 

561
00:30:14,600 --> 00:30:17,480
when you're not tired. 
And those are things that if you

562
00:30:17,480 --> 00:30:20,600
don't have that context, and 
it's would be pretty hard to get

563
00:30:20,600 --> 00:30:22,840
that kind of context and 
language, for example, if this 

564
00:30:22,840 --> 00:30:25,600
is a language model that doesn't
get built in there. 

565
00:30:25,720 --> 00:30:27,720
So how does that model deal with
all that? 

566
00:30:27,720 --> 00:30:31,560
And then even when you put all 
that aside, some of our studies 

567
00:30:31,560 --> 00:30:34,280
have shown that if I give you 
the same moral judgement to make

568
00:30:34,280 --> 00:30:37,320
many times over the course of 
many weeks, some of us are 

569
00:30:37,320 --> 00:30:40,800
consistent each time, but many 
of us flip back and forth. 

570
00:30:41,320 --> 00:30:43,880
And there's some ways you could 
predict that, but sometimes it's

571
00:30:43,880 --> 00:30:46,480
because we actually don't know 
and it's a hard decision. 

572
00:30:46,480 --> 00:30:49,240
And those are the ones where we 
need the AI the most and where 

573
00:30:49,240 --> 00:30:52,240
we're the most unreliable. 
So there's like these more 

574
00:30:52,240 --> 00:30:55,840
obvious reasons at the bottom up
approach is concerning, but 

575
00:30:55,840 --> 00:30:58,440
there's also these deeper 
reasons where it's really 

576
00:30:58,440 --> 00:31:00,800
unclear whether it could ever 
work all on its own. 

577
00:31:01,320 --> 00:31:03,440
Yeah, My favorite version that 
is, I think it's called. 

578
00:31:03,520 --> 00:31:07,120
It's a form of steps or pyramid 
around cognitive dissonance 

579
00:31:07,120 --> 00:31:12,640
where you have two people who 
are equally, let's say, willing 

580
00:31:12,640 --> 00:31:16,840
to say that it's bad to cheat on
an exam and they're kind of coin

581
00:31:16,840 --> 00:31:21,600
tossed into one is kind of like 
looks over, sees some kind of 

582
00:31:21,840 --> 00:31:24,920
right answer from their neighbor
and they decides to write it 

583
00:31:24,920 --> 00:31:27,160
down. 
And then based on that chance 

584
00:31:27,160 --> 00:31:29,840
event, like the leads to a 
little bit of a pyramid of steps

585
00:31:30,040 --> 00:31:33,080
in each direction where maybe 
the one person like, because 

586
00:31:33,080 --> 00:31:35,560
they didn't cheat and they also 
had a chance of glance, they 

587
00:31:35,880 --> 00:31:38,040
become more and more firmly 
against the idea of like, you 

588
00:31:38,040 --> 00:31:40,800
should never ever cheat because 
morally abhorrent. 

589
00:31:41,200 --> 00:31:44,480
Another one because they can 
like decided to cheat a little 

590
00:31:44,480 --> 00:31:46,120
bit. 
Like they just kind of minimize 

591
00:31:46,120 --> 00:31:47,200
the deal. 
Like, you know, everyone cheats 

592
00:31:47,200 --> 00:31:49,560
sometimes we all do it. 
It's not the end of the world. 

593
00:31:49,680 --> 00:31:51,640
And so they become more and more
kind of on the other side. 

594
00:31:51,720 --> 00:31:55,520
And again, it's kind of a chance
event that led to people to just

595
00:31:55,840 --> 00:31:57,600
end up in relatively different 
moral camps. 

596
00:31:57,880 --> 00:31:59,600
Absolutely. 
I could go down this path for a 

597
00:31:59,600 --> 00:32:03,280
long time, but I'll just add one
more, which is that what we say 

598
00:32:03,280 --> 00:32:05,480
is right or wrong and what we 
truly believe is right or wrong 

599
00:32:05,760 --> 00:32:07,360
is often different from what we 
do. 

600
00:32:08,040 --> 00:32:10,640
And so there's this fundamental 
question, well, which one do we 

601
00:32:10,640 --> 00:32:12,880
actually think is right then? 
Is it that what we said was 

602
00:32:12,880 --> 00:32:14,240
right, or was it what we 
actually do? 

603
00:32:14,240 --> 00:32:18,480
Because we're often happier with
what we do or other people did 

604
00:32:18,480 --> 00:32:22,320
the thing that we do. 
Like don't cut in line, but 

605
00:32:22,320 --> 00:32:25,040
then, you know, you see your 
family cut in line so that they 

606
00:32:25,040 --> 00:32:27,240
can get on to the cruise faster 
or something like, oh, way to 

607
00:32:27,240 --> 00:32:28,800
go. 
Everyone has experiences like 

608
00:32:28,800 --> 00:32:30,520
this, so it's kind of an open 
question. 

609
00:32:30,520 --> 00:32:32,920
Which one is it that we actually
think is right or wrong? 

610
00:32:33,320 --> 00:32:34,760
And what is a model supposed to 
do with that? 

611
00:32:34,760 --> 00:32:36,840
Well, yeah. 
And often when we're 

612
00:32:36,880 --> 00:32:40,480
philosophizing about these 
scenarios, we're thinking about 

613
00:32:40,480 --> 00:32:43,800
it in a vacuum, like this one 
singular thing is right or 

614
00:32:43,800 --> 00:32:45,120
wrong. 
But what's the alternative? 

615
00:32:45,120 --> 00:32:47,240
If you take all the different 
iterations of the trolley 

616
00:32:47,240 --> 00:32:50,320
problem and then you try and 
apply that, well, you might say 

617
00:32:50,320 --> 00:32:51,680
that this is wrong and you 
shouldn't do it. 

618
00:32:51,680 --> 00:32:54,320
But if the alternative is a 
million times worse than it, 

619
00:32:54,320 --> 00:32:56,440
maybe it's fine. 
Maybe it's fine to do it. 

620
00:32:57,320 --> 00:32:59,000
Or at least the best of two bad 
options. 

621
00:32:59,000 --> 00:33:01,240
The best of two Bad. 
Yeah, yeah, maybe fine is not 

622
00:33:01,240 --> 00:33:06,040
the right word. 
Honestly, This is why I was like

623
00:33:06,040 --> 00:33:08,960
so excited to talk to you 
because I feel like it's 

624
00:33:09,600 --> 00:33:11,920
something I've been thinking 
about lately a lot when it comes

625
00:33:11,920 --> 00:33:16,280
to people talk about misaligned 
AI and coming back to that is 

626
00:33:16,280 --> 00:33:19,320
like we as human beings are not 
the most aligned as well. 

627
00:33:19,960 --> 00:33:22,560
And that's like the things we're
talking about now in terms of 

628
00:33:22,560 --> 00:33:26,000
how we differ, how like some of 
random chance events can cause 

629
00:33:26,000 --> 00:33:29,480
us to have very small 
preferences or like you say, 

630
00:33:29,480 --> 00:33:32,280
context or what we say was what 
we do. 

631
00:33:32,560 --> 00:33:37,240
But also like in general, the 
idea that we think we as humans 

632
00:33:37,240 --> 00:33:39,680
have sort of like core human 
values that like this is 

633
00:33:39,680 --> 00:33:41,080
important, this is what we 
value. 

634
00:33:41,080 --> 00:33:42,360
This is morally correct or 
wrong. 

635
00:33:43,280 --> 00:33:46,160
That is not true. 
Like we are very like a constant

636
00:33:46,160 --> 00:33:48,920
moral disagreements. 
And that's how human societies 

637
00:33:49,240 --> 00:33:52,680
are built and changed. 
And sometimes I can very 

638
00:33:52,680 --> 00:33:56,560
painful, but it's part of what 
we accept as part of human 

639
00:33:56,720 --> 00:33:59,240
societies. 
With AI, on the other hand, I 

640
00:33:59,240 --> 00:34:01,800
don't think we have the same 
acceptance like we have this. 

641
00:34:01,800 --> 00:34:04,840
I think most people that I 
direct with still cling to is 

642
00:34:04,840 --> 00:34:10,639
that we want AI to be unbiased, 
aligned, and basically morally 

643
00:34:10,960 --> 00:34:13,280
on a perfect high ground. 
I think we're working through a 

644
00:34:13,280 --> 00:34:16,480
lot of things as behavioral 
scientists and as just people in

645
00:34:16,480 --> 00:34:19,960
the world trying to reckon with 
this AI era that we're in. 

646
00:34:20,320 --> 00:34:22,880
And one of those is just, are we
just scared of AI because it's 

647
00:34:22,880 --> 00:34:24,080
different? 
So are we holding it to a 

648
00:34:24,080 --> 00:34:27,159
different standard just because 
we're not used to it, or is it 

649
00:34:27,159 --> 00:34:28,840
because it is fundamentally 
different? 

650
00:34:28,840 --> 00:34:33,120
So a lot of people give humans 
leeway because they think they 

651
00:34:33,120 --> 00:34:34,600
believe in our. 
And it's often true that the 

652
00:34:34,600 --> 00:34:37,760
humans aren't trying their best.
They're trying their best. 

653
00:34:37,760 --> 00:34:40,920
They have the right intentions 
and they have your best in mind,

654
00:34:41,400 --> 00:34:43,960
right? 
AI systems, there are multiple 

655
00:34:43,960 --> 00:34:46,159
things. 
First of all, they're designed 

656
00:34:46,159 --> 00:34:50,360
by some organization, whether 
that be an academic organization

657
00:34:50,360 --> 00:34:54,000
or a corporation or whatever. 
And it's hard to trust that it 

658
00:34:54,000 --> 00:34:56,960
was designed with your best 
interest in mind, right? 

659
00:34:57,440 --> 00:35:00,840
It doesn't actually have some 
set objective in it that says 

660
00:35:00,840 --> 00:35:03,280
maximize the best for you or for
society. 

661
00:35:03,280 --> 00:35:05,520
Whereas, you know, at least a 
human can articulate if that's 

662
00:35:05,520 --> 00:35:07,200
what they're trying to do, 
whether it's true or not. 

663
00:35:07,800 --> 00:35:10,920
And so I think that's one big, 
you know, thing that makes us 

664
00:35:11,680 --> 00:35:15,440
perhaps give humans more leeway 
than AIS know whether we should 

665
00:35:15,440 --> 00:35:17,720
or not. 
We can absolutely debate about 

666
00:35:17,720 --> 00:35:21,360
that, but I think that's one of 
the things we kind of have to 

667
00:35:21,360 --> 00:35:24,560
wrestle with and some of the 
things that you can kind of 

668
00:35:24,560 --> 00:35:26,360
think about even from an 
engineering perspective. 

669
00:35:26,440 --> 00:35:30,240
OK, So what if you had some 
mathematical objective, you 

670
00:35:30,240 --> 00:35:33,000
know, that was mathematically 
proven to be on behalf of the 

671
00:35:33,000 --> 00:35:34,800
user or something? 
Would people trust that? 

672
00:35:34,800 --> 00:35:36,360
Is that something we could get 
our minds wrapped or would that 

673
00:35:36,520 --> 00:35:38,600
solve any problems? 
These are some of the questions 

674
00:35:38,600 --> 00:35:40,080
I think we all have to wrestle 
with. 

675
00:35:40,880 --> 00:35:43,200
And I also want to say you said 
something that I would you want 

676
00:35:43,200 --> 00:35:44,560
to push back on just a little 
bit. 

677
00:35:45,160 --> 00:35:48,480
I probably over interpreted it, 
but you're saying something 

678
00:35:48,480 --> 00:35:51,440
like, it seems like we have all 
these values, but we don't. 

679
00:35:52,080 --> 00:35:54,160
And I just want to say we do 
have the values. 

680
00:35:54,240 --> 00:35:57,600
We're just messy creatures. 
And, you know, real life is 

681
00:35:57,600 --> 00:36:00,240
really confusing and messy and 
conflicting. 

682
00:36:00,560 --> 00:36:04,400
And so we don't always act in a 
consistent way. 

683
00:36:04,400 --> 00:36:06,400
We don't always know exactly 
what those values are, the 

684
00:36:06,400 --> 00:36:08,480
values conflict. 
And so maybe it's just not even 

685
00:36:08,480 --> 00:36:10,160
clear what the resolution should
be. 

686
00:36:10,200 --> 00:36:11,160
Yeah. 
No, I think it's fair. 

687
00:36:11,200 --> 00:36:14,440
And I think it was more that we 
maybe lack consistent, like 

688
00:36:14,440 --> 00:36:16,960
universal values. 
That's what I was coming 

689
00:36:16,960 --> 00:36:19,440
towards. 
Like it's more that we have 

690
00:36:19,440 --> 00:36:23,680
certain ideas around, obviously 
religions, there are certain 

691
00:36:23,720 --> 00:36:26,800
values within that is like 
written down as 10 commandments 

692
00:36:26,800 --> 00:36:29,080
or there are certain things that
what makes for a moral good. 

693
00:36:29,720 --> 00:36:33,040
But at the same time, that's 
like talking about philosophy, 

694
00:36:33,040 --> 00:36:36,680
like one of the most debated 
things, like what is the virtues

695
00:36:36,680 --> 00:36:38,840
that we should strive for? 
What are the vices? 

696
00:36:38,840 --> 00:36:41,160
And and I think that changes. 
That's that was what I was 

697
00:36:41,160 --> 00:36:43,320
trying to say. 
That's usually people's first 

698
00:36:43,320 --> 00:36:47,080
gut reaction to how could you 
possibly move morality into AI 

699
00:36:47,080 --> 00:36:48,920
is that, but there's no 
universal morality. 

700
00:36:48,920 --> 00:36:51,840
So how do you go about that? 
So I think we can actually, from

701
00:36:51,840 --> 00:36:56,160
an engineering perspective, not 
settle that, but manage that. 

702
00:36:56,400 --> 00:36:59,080
It kind of sounds like what 
you're describing is the top 

703
00:36:59,080 --> 00:37:02,920
down approach for designing 
morality into AI systems. 

704
00:37:02,920 --> 00:37:05,880
Is that what you would say when 
you say, like, we have this set 

705
00:37:05,880 --> 00:37:11,000
of core moral beliefs and we can
say, OK, now apply this to the 

706
00:37:11,000 --> 00:37:14,000
world? 
The top down approach should be 

707
00:37:14,000 --> 00:37:17,400
let's take these principles. 
We all have some principles that

708
00:37:17,400 --> 00:37:21,080
we think we follow and let's 
build those into the AI and give

709
00:37:21,080 --> 00:37:27,520
it as rules or guidelines. 
And this is also been tried and 

710
00:37:27,520 --> 00:37:29,440
shown to have quite a lot of 
success. 

711
00:37:29,440 --> 00:37:32,440
In some ways, Anthropic uses 
something like this right now. 

712
00:37:32,440 --> 00:37:35,840
They call it constitutional AI, 
and they came up with these kind

713
00:37:35,840 --> 00:37:37,920
of ad hoc rules. 
They're not totally ad hoc. 

714
00:37:37,920 --> 00:37:40,600
They came up with some from 
like, you know, human rights 

715
00:37:40,720 --> 00:37:43,240
documents, some from their own 
research and a bunch of things. 

716
00:37:43,240 --> 00:37:47,480
And they put them together and 
they said, OK, now AI, here's 

717
00:37:47,480 --> 00:37:49,000
our description of what these 
things are. 

718
00:37:49,240 --> 00:37:52,120
Now train yourself to make sure 
you abide by them. 

719
00:37:52,120 --> 00:37:55,840
And it actually led to much 
better behavior than some other 

720
00:37:55,840 --> 00:37:57,600
approaches. 
As far as I understand, they're 

721
00:37:57,600 --> 00:38:01,560
still using it, so that can 
definitely make a contribution. 

722
00:38:01,560 --> 00:38:04,800
But the challenges are, Samuel, 
as you suggested, well, what if 

723
00:38:04,800 --> 00:38:07,400
people don't agree? 
First of all, we can talk about 

724
00:38:07,400 --> 00:38:09,240
that too. 
But second of all, there are 

725
00:38:09,240 --> 00:38:12,040
still philosophers. 
And the reason why is because 

726
00:38:12,040 --> 00:38:14,560
there's no one theory or set of 
principles that seems to account

727
00:38:14,560 --> 00:38:16,800
for everything, you know? 
So there's still debates about, 

728
00:38:16,800 --> 00:38:18,040
well, what should the right 
theory be? 

729
00:38:18,040 --> 00:38:20,120
And there's no one theory that 
can actually account for all the

730
00:38:20,120 --> 00:38:21,680
situations. 
So we're still in this problem 

731
00:38:22,120 --> 00:38:24,680
where you can't handle all the 
scenarios. 

732
00:38:25,400 --> 00:38:27,520
And the other thing is that 
usually these rules, one thing 

733
00:38:27,520 --> 00:38:30,880
they really don't do, and that 
really characterizes real life, 

734
00:38:30,920 --> 00:38:32,880
is that values and principles 
conflict. 

735
00:38:33,200 --> 00:38:36,720
And often many of them conflict.
And usually we have no idea what

736
00:38:36,720 --> 00:38:38,880
principle to say about OK when 
they conflict. 

737
00:38:38,880 --> 00:38:40,240
Here's how you should resolve 
them. 

738
00:38:41,000 --> 00:38:44,360
All right, so it sounds like 
there are some pitfalls of a 

739
00:38:44,360 --> 00:38:47,840
bottom up approach and there are
also some pitfalls for a top 

740
00:38:47,880 --> 00:38:50,400
down approach. 
What do you do then in this 

741
00:38:50,400 --> 00:38:52,120
situation? 
Nothing is perfect. 

742
00:38:52,480 --> 00:38:54,520
Nothing's perfect, and our 
approach isn't perfect either, 

743
00:38:54,520 --> 00:38:57,680
but our approach is to use what 
we call a hybrid approach. 

744
00:38:57,920 --> 00:39:01,200
So try to take both of best 
worlds of using both top down 

745
00:39:01,200 --> 00:39:04,000
and bottom up. 
So the idea behind our first 

746
00:39:04,000 --> 00:39:06,160
generation is first we should 
actually say you have a 

747
00:39:06,160 --> 00:39:09,320
commitment to something that 
many don't, which is that we 

748
00:39:09,320 --> 00:39:12,280
think if you're going to be 
behaving in a moral domain, you 

749
00:39:12,280 --> 00:39:14,360
should be interpretable. 
In other words, we should know 

750
00:39:14,360 --> 00:39:18,520
how a model works and be able to
make some real predictions about

751
00:39:18,520 --> 00:39:20,280
how it's going to behave in 
different settings rather than 

752
00:39:20,280 --> 00:39:23,440
just hope for the best. 
And I'm not saying that all 

753
00:39:23,440 --> 00:39:25,680
black box models are models 
where you can't understand 

754
00:39:25,680 --> 00:39:29,720
what's going on are bad, but we 
are committed to having not 

755
00:39:29,720 --> 00:39:32,680
black box models because we 
think that's important for long 

756
00:39:32,680 --> 00:39:36,320
standing moral AI. 
In all cases, do you think? 

757
00:39:36,320 --> 00:39:38,560
Or are there some cases where it
doesn't matter so much? 

758
00:39:39,800 --> 00:39:42,760
I'm sure there are some cases 
where it matters less, but I 

759
00:39:42,760 --> 00:39:45,480
think that we should. 
I'm not saying that we shouldn't

760
00:39:45,480 --> 00:39:48,880
have products that have a Gen. 
AI and black box algorithms in 

761
00:39:48,880 --> 00:39:52,680
them, but I think that we should
be very careful about using them

762
00:39:52,680 --> 00:39:54,640
in a moral domain, especially 
when you know you're affecting 

763
00:39:54,640 --> 00:39:56,680
life or death or could. 
Then we think that should be 

764
00:39:56,680 --> 00:39:58,000
essential. 
And so we should be putting our 

765
00:39:58,000 --> 00:40:01,520
energies into that. 
So there's some trade-offs with 

766
00:40:01,520 --> 00:40:03,120
doing that based on our current 
technology. 

767
00:40:03,120 --> 00:40:05,960
And so our trade off is that OK?
In order to make it 

768
00:40:05,960 --> 00:40:09,440
interpretable rather than make a
general AI, we need to start 

769
00:40:09,440 --> 00:40:11,080
with kind of an individual use 
case. 

770
00:40:11,440 --> 00:40:14,320
So let's start by figuring out 
how we can make this work in one

771
00:40:14,320 --> 00:40:17,040
kind of setting, see what we can
learn, and then we can build out

772
00:40:17,240 --> 00:40:19,280
so the use case that we've 
worked with. 

773
00:40:19,640 --> 00:40:23,560
Is kidney allocation. 
So they're way fewer organs to 

774
00:40:23,560 --> 00:40:25,040
go around than people who need 
them. 

775
00:40:25,480 --> 00:40:28,600
And so they're kind of different
ways you can get kidneys. 

776
00:40:28,720 --> 00:40:31,520
We'll focus on kidneys, but it's
also true for many other organs.

777
00:40:31,840 --> 00:40:36,080
1 is someone could die and you 
could get the kidney before the 

778
00:40:36,080 --> 00:40:38,200
kidney becomes unviable. 
It could be transplanted into 

779
00:40:38,200 --> 00:40:39,880
you. 
But we also all have two 

780
00:40:39,880 --> 00:40:41,280
kidneys. 
And so you can actually donate 

781
00:40:41,280 --> 00:40:44,960
kidneys to people. 
And it turns out you can't just 

782
00:40:44,960 --> 00:40:46,640
donate a kidney to anyone you 
want. 

783
00:40:47,120 --> 00:40:48,960
They have to be compatible. 
Your blood types have to be 

784
00:40:48,960 --> 00:40:51,840
compatible, your immune panels 
have to be compatible. 

785
00:40:51,840 --> 00:40:53,360
There's kind of a bunch of 
different things. 

786
00:40:53,800 --> 00:40:56,200
So it's actually one of the 
great success stories of AI. 

787
00:40:56,200 --> 00:40:58,560
That's why we chose it first of 
all, because it seemed like if 

788
00:40:58,560 --> 00:41:01,080
we made any impact, it would 
only do good, or it seems like 

789
00:41:01,080 --> 00:41:03,240
the harm would be dramatically 
reduced. 

790
00:41:03,520 --> 00:41:05,880
And we already know it's an AI 
success story, so that's why we 

791
00:41:05,880 --> 00:41:09,720
used to start with. 
But so AI has been used to try 

792
00:41:09,720 --> 00:41:11,800
to optimize what they call 
kidney exchanges. 

793
00:41:11,800 --> 00:41:16,680
It's like, OK, if my kidney does
isn't compatible with my son's, 

794
00:41:16,960 --> 00:41:20,640
but my kidney is compatible with
my neighbors and my neighbor's 

795
00:41:20,960 --> 00:41:23,880
mom has a kidney that's 
compatible with my son's. 

796
00:41:23,880 --> 00:41:25,400
I know that's already makes your
brain hurt. 

797
00:41:26,760 --> 00:41:30,680
That's just a two way exchange. 
There have been like 12 way 

798
00:41:30,680 --> 00:41:33,880
exchanges before. 
And so a you can see already if 

799
00:41:33,880 --> 00:41:36,640
it makes your brain hurt. 
That's why AI can be very useful

800
00:41:36,640 --> 00:41:38,400
here. 
Essentially what you're saying 

801
00:41:38,400 --> 00:41:41,960
is that you can donate a kidney,
and while your exact kidney 

802
00:41:41,960 --> 00:41:45,160
might not make it to your person
of interest, they will get a 

803
00:41:45,160 --> 00:41:48,160
kidney from someone else if you 
donate your kidney. 

804
00:41:48,760 --> 00:41:50,800
That's right. 
And it might take a bunch of 

805
00:41:50,800 --> 00:41:54,880
swaps in order to get that to 
work, but eventually someone 

806
00:41:54,880 --> 00:41:56,800
will get it to you. 
But you can see how that 

807
00:41:56,800 --> 00:41:58,600
involves a lot of trust too, 
right? 

808
00:41:58,600 --> 00:42:00,960
It's like, well, if I'm going to
give you my one and only kidney,

809
00:42:00,960 --> 00:42:02,640
I better know that a kidney is 
getting. 

810
00:42:02,640 --> 00:42:04,400
Probably you need some 
blockchain in there. 

811
00:42:05,120 --> 00:42:08,880
Yeah, something like that. 
So in our medical system in the 

812
00:42:08,880 --> 00:42:10,400
US, it's actually different in 
Europe. 

813
00:42:10,480 --> 00:42:13,360
But in the US, they're only 
specific things that are allowed

814
00:42:13,360 --> 00:42:15,440
to be taken into account in 
those algorithms. 

815
00:42:15,440 --> 00:42:19,640
And it's things like your age, 
how long you've been on the wait

816
00:42:19,640 --> 00:42:23,960
list and how compatible you are,
things like that. 

817
00:42:24,440 --> 00:42:26,600
Well, it turns out that if you 
ask people. 

818
00:42:26,600 --> 00:42:28,760
So this is going to be our 
bottom up, OK, bottom up 

819
00:42:28,760 --> 00:42:30,600
approach. 
If you ask people, are there 

820
00:42:30,600 --> 00:42:33,040
other factors, moral factors 
that you think they should be 

821
00:42:33,040 --> 00:42:35,600
taken into account? 
People say, yeah, a lot of them.

822
00:42:36,120 --> 00:42:39,120
So, for example, most people 
think that, and you might not 

823
00:42:39,120 --> 00:42:42,360
agree and that's fine, but most 
people think that how many 

824
00:42:42,360 --> 00:42:46,200
dependents a person has should 
impact whether they get that 

825
00:42:46,200 --> 00:42:48,040
kidney or not. 
So if they're especially a 

826
00:42:48,040 --> 00:42:50,800
single parent and they have 
three small kids, that should 

827
00:42:50,800 --> 00:42:53,000
give them kind of a bump up in 
the priority list. 

828
00:42:53,920 --> 00:42:55,960
Perhaps also for elderly 
patients. 

829
00:42:56,200 --> 00:42:59,960
Other things that matter are 
whether you did something in 

830
00:42:59,960 --> 00:43:02,160
your past that might have 
contributed to having a SO were 

831
00:43:02,160 --> 00:43:04,080
you an alcoholic? 
Are you drinking? 

832
00:43:04,080 --> 00:43:06,480
Now? 
Some people think that whether 

833
00:43:06,480 --> 00:43:08,520
you are a criminal or a violent 
criminal matters. 

834
00:43:09,000 --> 00:43:11,480
And back in the day, in the 50s 
or whatever it was, there was 

835
00:43:11,480 --> 00:43:16,040
actually a God panel. 
It was like 12 or 13 community 

836
00:43:16,240 --> 00:43:18,840
members who got together and 
decided who would get dialysis. 

837
00:43:18,840 --> 00:43:20,880
And these were some of the 
things that if you look at the 

838
00:43:20,880 --> 00:43:22,320
transcripts, they actually 
consider. 

839
00:43:23,160 --> 00:43:25,960
But this has been taken out of 
the algorithms. 

840
00:43:26,240 --> 00:43:30,480
But it turns out most people in 
all of our samples think that 

841
00:43:30,680 --> 00:43:32,160
there's some moral 
considerations that should be 

842
00:43:32,160 --> 00:43:34,880
put back in. 
So all right, so the bottom up 

843
00:43:34,880 --> 00:43:37,120
approach is, OK, let's ask 
people what do they think is 

844
00:43:37,120 --> 00:43:41,480
important in these situations? 
So not yet what's right, but 

845
00:43:42,280 --> 00:43:44,680
just what do you think are some 
of the factors you should take 

846
00:43:44,680 --> 00:43:47,440
into account? 
And so then what we can do then 

847
00:43:47,440 --> 00:43:51,360
is then ask them create models. 
This is a more traditional AI 

848
00:43:51,360 --> 00:43:53,560
approach. 
One of the assumption, I'm 

849
00:43:53,560 --> 00:43:55,640
curious what you guys think 
about this, was there's kind of 

850
00:43:55,640 --> 00:43:57,960
an assumption in the economics 
and computer science literature 

851
00:43:57,960 --> 00:44:00,440
that if you ask people what they
think, they won't be able to 

852
00:44:00,440 --> 00:44:02,880
tell you the truth, That the 
only way you can find out what 

853
00:44:02,880 --> 00:44:05,120
they really think is by having 
to make a bunch of choices. 

854
00:44:05,480 --> 00:44:07,720
And they kind of reveal their 
preferences. 

855
00:44:08,000 --> 00:44:11,120
They're making a bunch of 
choices, but that's the only way

856
00:44:11,120 --> 00:44:13,120
you'll actually understand 
what's, you know, their actual 

857
00:44:13,120 --> 00:44:16,600
decision making. 
The stated versus elicited or 

858
00:44:16,600 --> 00:44:19,280
revealed preferences. 
Yeah, exactly. 

859
00:44:19,640 --> 00:44:21,120
Curiously, what do you guys 
think about that? 

860
00:44:21,440 --> 00:44:24,120
Yeah, no, I think that is 
probably something I would quite

861
00:44:24,120 --> 00:44:26,920
strongly agree with in many 
ways, if you compare it to the 

862
00:44:26,920 --> 00:44:29,600
stated version. 
I would much rather, especially 

863
00:44:29,600 --> 00:44:33,520
for these moral decisions for 
them to be informed by people's 

864
00:44:34,000 --> 00:44:38,560
actually willingness to act or 
something or like they're, you 

865
00:44:38,560 --> 00:44:40,720
know, in more traditional like 
business. 

866
00:44:40,720 --> 00:44:42,520
I think this is just kind of 
like, are you willing to pay for

867
00:44:42,520 --> 00:44:44,040
it? 
Basically, it's great that you 

868
00:44:44,040 --> 00:44:46,560
like the idea or as great as you
like the feature, but are you 

869
00:44:46,560 --> 00:44:48,240
willing to pay for that feature?
Are you willing to pay for that 

870
00:44:48,240 --> 00:44:52,960
product and so on. 
And I do think that, yeah, based

871
00:44:52,960 --> 00:44:56,760
on my experience, that would be 
something I would weigh much 

872
00:44:56,760 --> 00:44:58,600
heavier than their stated. 
Preferences. 

873
00:44:59,280 --> 00:45:01,400
Yeah. 
I will only add one thing, which

874
00:45:01,400 --> 00:45:03,760
is that we had a really 
interesting conversation about 

875
00:45:03,760 --> 00:45:06,840
this with Carrie Morwich on a 
past episode when we talked 

876
00:45:06,840 --> 00:45:10,520
about recommender systems. 
And this is like low stakes in 

877
00:45:10,520 --> 00:45:12,880
most cases. 
You know, this is more along the

878
00:45:12,880 --> 00:45:17,040
lines of the movie example, but 
I think we kind of came to a 

879
00:45:17,040 --> 00:45:21,640
both is better approach where 
you do have sort of that combo. 

880
00:45:22,960 --> 00:45:24,320
Interesting. 
You're foreshadowing. 

881
00:45:24,320 --> 00:45:28,280
So now I'll set the stage for 
the kind of just revealed 

882
00:45:28,280 --> 00:45:32,520
pressure revealed Preferences 
approach is our generation 1 and

883
00:45:32,520 --> 00:45:34,680
the combination approach is our 
generation 2. 

884
00:45:35,240 --> 00:45:38,800
So generation one, the bottom up
is this combination of what 

885
00:45:38,800 --> 00:45:40,800
bottom up and top down. 
So we ask them what features 

886
00:45:40,800 --> 00:45:42,800
they think are important, but we
assume that they're not going to

887
00:45:42,800 --> 00:45:45,680
be able to tell us exactly how 
they think it's important or how

888
00:45:45,680 --> 00:45:46,760
they would actually make 
judgments. 

889
00:45:46,760 --> 00:45:49,760
So then we have them make a 
bunch of choices where we have 

890
00:45:50,000 --> 00:45:52,440
two people who could potentially
get a kidney. 

891
00:45:52,440 --> 00:45:55,680
There's a kidney available. 
Two people here are two people 

892
00:45:55,680 --> 00:45:57,320
which one should get it. 
And we give them a bunch of 

893
00:45:57,320 --> 00:45:59,600
features that and these are all 
features that people said they 

894
00:45:59,600 --> 00:46:02,840
thought were important. 
And we kind of vary the values 

895
00:46:02,840 --> 00:46:04,560
and those features and find out 
what they think. 

896
00:46:04,560 --> 00:46:06,640
And then we do a bunch of kind 
of computer science stuff in the

897
00:46:06,640 --> 00:46:10,320
backs and of how do we ask the 
most optimal queries so that we 

898
00:46:10,320 --> 00:46:13,000
can get your preferences as 
quickly and efficiently as 

899
00:46:13,000 --> 00:46:15,000
possible. 
You know, how do we prioritize 

900
00:46:15,000 --> 00:46:17,040
the ones that are going to have 
the biggest impact and kind of 

901
00:46:17,040 --> 00:46:19,440
all that kind of stuff. 
I'm not actually positive that 

902
00:46:19,720 --> 00:46:22,360
SAT uses what's called active 
learning. 

903
00:46:22,360 --> 00:46:24,760
That's something we work with on
the back end, but that's what I 

904
00:46:24,760 --> 00:46:28,400
remember it feeling like. 
It's like it adapts dynamically 

905
00:46:28,400 --> 00:46:30,040
based on your previous 
decisions. 

906
00:46:30,280 --> 00:46:34,600
Yeah, maybe that's AT does, but 
certainly something like Khan 

907
00:46:34,600 --> 00:46:39,160
Academy or other educational 
tools that are very much driven 

908
00:46:39,160 --> 00:46:44,440
by machine learning. 
And how conventional were these 

909
00:46:44,440 --> 00:46:46,200
options? 
Like were they mostly like the 

910
00:46:46,200 --> 00:46:50,120
more conventional options 
between we talked about the, the

911
00:46:50,120 --> 00:46:52,080
number of dependents versus not 
and so on? 

912
00:46:52,360 --> 00:46:55,600
Or did you include things like 
having 2 cats versus one dog or 

913
00:46:55,600 --> 00:46:57,640
like how? 
Yeah, yeah. 

914
00:46:58,240 --> 00:46:59,640
I don't know if you've talked 
about the moral machine. 

915
00:46:59,640 --> 00:47:01,640
Have you talked about the Moral 
Machine on your podcast yet? 

916
00:47:01,800 --> 00:47:06,120
No, we have not. 
So there's a group at MIT, or 

917
00:47:06,240 --> 00:47:08,840
they were at MIT originally, and
some of these scenarios called 

918
00:47:08,840 --> 00:47:10,880
the trolley scenarios that are 
famous in philosophy. 

919
00:47:10,880 --> 00:47:13,720
And the most quintessential 1 is
there's five people tied to the 

920
00:47:13,720 --> 00:47:16,280
track and there's a trolley 
coming running down the track. 

921
00:47:16,520 --> 00:47:18,440
And if you do nothing, those 
five people are going to get run

922
00:47:18,440 --> 00:47:19,720
over. 
But, and they're different 

923
00:47:19,720 --> 00:47:21,640
versions of this. 
There's a really large man with 

924
00:47:21,640 --> 00:47:23,800
a backpack because you're not 
allowed to say that they're fat.

925
00:47:23,800 --> 00:47:27,480
Originally it was a fat man in 
front of you, but the really 

926
00:47:27,480 --> 00:47:29,840
large man with the backpack 
right in front of you, he's 

927
00:47:29,840 --> 00:47:31,880
bigger than you. 
So if you just jumped in front 

928
00:47:31,880 --> 00:47:33,800
of the tracks, you wouldn't be 
able to stop the trolley. 

929
00:47:33,800 --> 00:47:36,280
But if you push this fat man 
onto the tracks before the 

930
00:47:36,280 --> 00:47:38,920
trolley runs over the five 
people, the trolley will stop. 

931
00:47:39,360 --> 00:47:41,560
And so that one person would 
die, but the five people would 

932
00:47:41,560 --> 00:47:43,200
live. 
And so there's like a lot of 

933
00:47:43,200 --> 00:47:46,000
different versions of this from 
the philosophical literature, 

934
00:47:46,360 --> 00:47:48,240
and a lot of the behavioral 
ethics that's been done have 

935
00:47:48,240 --> 00:47:50,680
been using the scenarios. 
So a group from MIT did this 

936
00:47:50,680 --> 00:47:54,320
with cars because this is 
actually something, these types 

937
00:47:54,320 --> 00:47:56,800
of decisions are things that 
autonomous vehicles actually 

938
00:47:56,800 --> 00:47:59,520
have to decide on. 
So they had these kind of 

939
00:47:59,520 --> 00:48:02,600
scenarios and they were asking 
people across the world decide, 

940
00:48:02,600 --> 00:48:05,640
OK, you're in this one car and 
you have three people in the 

941
00:48:05,640 --> 00:48:09,760
car, like two kids, an adult and
a cat walks across the street 

942
00:48:10,120 --> 00:48:12,480
with a priest. 
So should you run over the cat, 

943
00:48:12,480 --> 00:48:14,760
the priest, or should you jam on
your brakes, you know, 

944
00:48:14,760 --> 00:48:16,880
potentially hurting the kids in 
your car? 

945
00:48:17,400 --> 00:48:19,520
And they asked a bunch of these 
types of decisions. 

946
00:48:20,360 --> 00:48:23,440
In our case, Samuel, we did this
bottom up. 

947
00:48:23,440 --> 00:48:26,560
We first did a lot of survey 
work to say, well, which 

948
00:48:26,680 --> 00:48:28,480
features do you guys think are 
important? 

949
00:48:28,800 --> 00:48:31,680
And so use those to determine 
which features we are going to 

950
00:48:31,680 --> 00:48:33,200
ask about. 
So no one said that cats were 

951
00:48:33,200 --> 00:48:34,800
important. 
So we don't include cats, for 

952
00:48:34,800 --> 00:48:36,760
example. 
But dogs? 

953
00:48:36,760 --> 00:48:38,280
Dogs. 
For sure. 

954
00:48:38,440 --> 00:48:40,800
Yeah, no dogs yet. 
No dogs yet. 

955
00:48:42,280 --> 00:48:43,760
Yeah. 
So we only, we've only included 

956
00:48:43,760 --> 00:48:47,280
things that the kind of the 
population said was important. 

957
00:48:48,160 --> 00:48:51,000
And so we give them a lot of 
these queries and then we see if

958
00:48:51,000 --> 00:48:53,120
we can learn their model and be 
able to predict what they would 

959
00:48:53,120 --> 00:48:56,040
say in new settings. 
And this is kind of like I said 

960
00:48:56,040 --> 00:49:00,520
first generation approach and we
are able to predict pretty well 

961
00:49:00,520 --> 00:49:05,160
in many cases definitely above 
90%, sometimes way above 90%. 

962
00:49:05,200 --> 00:49:07,120
And then there are some people 
that are hard to predict. 

963
00:49:07,160 --> 00:49:11,600
Now what we're working on that I
think is probably this is here's

964
00:49:11,720 --> 00:49:13,840
a slightly controversial view 
and I'll speak on behalf of 

965
00:49:13,840 --> 00:49:16,640
myself and not my co-authors. 
I actually think that much more 

966
00:49:16,640 --> 00:49:19,480
of AI training should look 
something like this, that it's 

967
00:49:19,480 --> 00:49:23,360
more of a collaboration between 
users and the AI. 

968
00:49:23,880 --> 00:49:27,400
So now what we are working on is
OK, especially because we're 

969
00:49:27,400 --> 00:49:28,720
committed to these interpretable
models. 

970
00:49:28,720 --> 00:49:30,920
We can actually tell you the 
model we've learned. 

971
00:49:31,160 --> 00:49:33,400
So what if we tell you what that
is? 

972
00:49:34,320 --> 00:49:35,800
Can you give us feedback about 
it? 

973
00:49:37,360 --> 00:49:39,280
And what kind of feedback can 
you give us? 

974
00:49:39,280 --> 00:49:41,240
So we're trying things like, 
well, what if we just give you 

975
00:49:41,240 --> 00:49:42,880
slider bars? 
Like here's how much weight we 

976
00:49:42,880 --> 00:49:45,920
think you're putting on this, 
you know, do you understand 

977
00:49:45,920 --> 00:49:47,240
enough? 
And what if you change those 

978
00:49:47,240 --> 00:49:50,040
slider bars? 
But we're also looking into ways

979
00:49:50,040 --> 00:49:52,320
that we can have a natural 
language way of doing that. 

980
00:49:52,320 --> 00:49:53,720
Like here's what we think you're
doing. 

981
00:49:53,720 --> 00:49:55,640
This is sound right to you? 
If not, why not? 

982
00:49:55,880 --> 00:49:57,800
Here's one of the parts the 
models confused about. 

983
00:49:57,800 --> 00:49:59,520
What do you think is the case 
there? 

984
00:50:00,240 --> 00:50:01,800
And we don't just stop there, 
though. 

985
00:50:01,800 --> 00:50:04,480
The idea is that you would have 
this iterative process that we 

986
00:50:04,480 --> 00:50:07,640
tell you, OK, you've answered 
maybe like 10 different 

987
00:50:07,640 --> 00:50:10,160
scenarios. 
Here's our best guess about what

988
00:50:10,160 --> 00:50:12,440
your model is. 
Now, what do you think about 

989
00:50:12,440 --> 00:50:13,680
that? 
Here's what I'm most confused 

990
00:50:13,680 --> 00:50:15,600
about. 
OK, I'm going to adjust based on

991
00:50:15,600 --> 00:50:17,280
what you told me. 
Now I'm going to give you 

992
00:50:17,280 --> 00:50:19,520
another 10 scenarios. 
You're going to give me your 

993
00:50:19,520 --> 00:50:21,240
answers. 
I'm going to see how different 

994
00:50:21,240 --> 00:50:23,640
that is from what I predicted. 
I'm also going to tell you when 

995
00:50:23,640 --> 00:50:24,960
I would have made a different 
prediction. 

996
00:50:25,880 --> 00:50:28,040
And you then you can tell me, do
you want the one that I would 

997
00:50:28,040 --> 00:50:29,800
have predicted or do you want 
the one you said you were going 

998
00:50:29,800 --> 00:50:30,800
to say? 
And you can kind of go through 

999
00:50:30,800 --> 00:50:33,720
this iterative process. 
I was like a bit like human in 

1000
00:50:33,720 --> 00:50:35,160
the loop. 
Reinforcement learning is that. 

1001
00:50:36,160 --> 00:50:38,560
Yeah, with the caveat, and I'm 
sure we'll talk about that, 

1002
00:50:38,560 --> 00:50:40,080
we've talked about how humans 
are fallible. 

1003
00:50:40,360 --> 00:50:43,400
I am less proponent of just 
saying human in the loop is the 

1004
00:50:43,680 --> 00:50:47,280
answer to all things. 
But yes, it's very much like 

1005
00:50:47,280 --> 00:50:48,920
that. 
But it's like really taking that

1006
00:50:48,920 --> 00:50:50,840
seriously. 
It's not just human loop to 

1007
00:50:50,840 --> 00:50:53,560
course correct. 
It's actually a fundamental 

1008
00:50:53,560 --> 00:50:56,040
aspect of how do we train this 
up most efficiently. 

1009
00:50:56,440 --> 00:51:00,040
And the other critical part that
I think is really relevant is, 

1010
00:51:00,040 --> 00:51:02,720
well, now that we're telling you
what we think the model is and 

1011
00:51:02,720 --> 00:51:05,680
we're telling you what you think
you would say, can it increase 

1012
00:51:05,680 --> 00:51:08,520
your trust that this model 
actually represents you? 

1013
00:51:09,480 --> 00:51:11,600
And especially if it's something
you can articulate, we've 

1014
00:51:11,600 --> 00:51:14,680
articulated it to you. 
So if you can say, OK, yeah, I 

1015
00:51:14,680 --> 00:51:18,520
think this is right. 
I prioritize having this many of

1016
00:51:18,520 --> 00:51:22,120
dependents over how long you've 
been on the wait list by this 

1017
00:51:22,120 --> 00:51:24,760
much. 
And I prioritize those things 

1018
00:51:24,760 --> 00:51:28,600
over whether or not you were an 
alcoholic in the past by this 

1019
00:51:28,600 --> 00:51:30,400
much. 
When this happens, this is the 

1020
00:51:30,400 --> 00:51:32,680
decision tree I would take. 
You can actually articulate it 

1021
00:51:32,680 --> 00:51:34,760
and you can write it down and 
you can be like, I'm going to 

1022
00:51:34,760 --> 00:51:36,600
look at it for a while. 
I'm going to think about it. 

1023
00:51:36,760 --> 00:51:40,480
Is that me or not? 
And so I really think a big 

1024
00:51:40,920 --> 00:51:43,960
important aspect of this is that
trust piece, because if 

1025
00:51:44,120 --> 00:51:46,480
eventually you're going to have 
some type of AI acting on your 

1026
00:51:46,480 --> 00:51:50,160
behalf or voting on your behalf 
or getting integrated into some 

1027
00:51:50,160 --> 00:51:53,240
type of system that's taking 
kind of everyone's moral on your

1028
00:51:53,240 --> 00:51:56,360
behalf, you want to be darn sure
that it's representing you 

1029
00:51:56,360 --> 00:51:59,400
accurately. 
I'm happy to have you be the one

1030
00:51:59,400 --> 00:52:03,040
thinking about these problems. 
But of course, developing the 

1031
00:52:03,040 --> 00:52:06,240
tool, that's not the last step. 
That's maybe one of the first 

1032
00:52:06,240 --> 00:52:08,440
steps, right? 
Even if we had, you know, we 

1033
00:52:08,440 --> 00:52:12,560
lived in this world with perfect
technical, moral AI tools, they 

1034
00:52:12,560 --> 00:52:14,560
have to be implemented 
successfully. 

1035
00:52:14,560 --> 00:52:18,520
They have to be embraced and 
adopted by people who are 

1036
00:52:18,520 --> 00:52:20,480
actually building and using 
them. 

1037
00:52:20,680 --> 00:52:25,680
Yeah, absolutely. 
I'm really committed to the idea

1038
00:52:25,680 --> 00:52:27,760
of what I call translational 
ethical AI. 

1039
00:52:27,840 --> 00:52:30,640
So people in medicine are used 
to this idea of it's hard to get

1040
00:52:30,640 --> 00:52:33,680
a medicine into the hands of 
doctors or patients in a way 

1041
00:52:33,680 --> 00:52:35,920
they will use. 
Well, the same thing is true of 

1042
00:52:35,920 --> 00:52:37,880
an AI tool. 
And you have to figure out how 

1043
00:52:37,880 --> 00:52:40,400
to do that translational work. 
And kind of the way we tend to 

1044
00:52:40,400 --> 00:52:43,920
do these tools, especially 
ethical AI tools, is we make it,

1045
00:52:43,920 --> 00:52:47,000
we publish it and move on. 
I think including in companies 

1046
00:52:47,040 --> 00:52:50,920
the like use these model cards. 
Here's an idea, go figure it out

1047
00:52:50,920 --> 00:52:52,440
yourself how you're actually 
going to implement it. 

1048
00:52:52,840 --> 00:52:55,120
And so I think there needs to be
a lot of work on helping that 

1049
00:52:55,120 --> 00:52:57,440
translation. 
But that's just from the 

1050
00:52:57,440 --> 00:53:00,200
technical tool part. 
So we talked about 5 different 

1051
00:53:00,200 --> 00:53:02,520
pillars that we think society 
needs to be working on 

1052
00:53:02,520 --> 00:53:07,600
simultaneously to make it most 
likely anyway that we're going 

1053
00:53:07,600 --> 00:53:10,240
to end up on the right side of 
history and be happy where we 

1054
00:53:10,240 --> 00:53:11,840
end up. 
And at least a good chunk of 

1055
00:53:11,840 --> 00:53:13,360
these involve a lot of 
behavioral science. 

1056
00:53:13,760 --> 00:53:16,000
So we'll get to those. 
The first one are we need to 

1057
00:53:16,000 --> 00:53:19,840
have technical tools and we need
to work on making sure that the 

1058
00:53:20,240 --> 00:53:22,200
people know how to use them and 
they can be translated. 

1059
00:53:22,680 --> 00:53:24,960
The second is what we call agile
public policy. 

1060
00:53:25,120 --> 00:53:27,880
And so those are kind of public 
policy mechanisms that can move 

1061
00:53:27,880 --> 00:53:29,600
more quickly than traditional 
ones. 

1062
00:53:29,920 --> 00:53:31,840
But those are the two I think 
almost everyone has heard about 

1063
00:53:31,840 --> 00:53:34,840
in some capacity. 
So I'll actually focus on the 

1064
00:53:34,840 --> 00:53:38,360
other ones. 
And so the next three are, first

1065
00:53:38,400 --> 00:53:42,200
of all, we need to scale from 
the organizational practices 

1066
00:53:42,560 --> 00:53:45,520
that make it possible for 
ethical AI tools to actually be 

1067
00:53:45,520 --> 00:53:47,560
implemented and for people to 
make good decisions. 

1068
00:53:47,560 --> 00:53:49,760
And that sounds silly, but I 
don't think it's silly at all. 

1069
00:53:49,760 --> 00:53:51,520
And I think it's one of the 
biggest pieces. 

1070
00:53:52,800 --> 00:53:55,960
And the first thing is that we 
can debate about this. 

1071
00:53:55,960 --> 00:53:58,040
And I'd be curious what you guys
think about this from your 

1072
00:53:58,040 --> 00:54:01,200
consulting experience. 
But there's a lot of data out 

1073
00:54:01,200 --> 00:54:04,160
there now that suggests that 
even employees think that their 

1074
00:54:04,160 --> 00:54:06,720
companies don't mean it when 
they say they want to use AI 

1075
00:54:06,720 --> 00:54:07,920
ethically. 
They think that's not a 

1076
00:54:07,920 --> 00:54:10,440
priority. 
So we'll have to have one set of

1077
00:54:10,440 --> 00:54:12,720
strategies for one. 
That's the case. 

1078
00:54:12,920 --> 00:54:15,800
I'm going to put those aside for
right now, if you permit me, and

1079
00:54:15,800 --> 00:54:20,600
say, OK, for those organizations
who really do want to use AI 

1080
00:54:20,600 --> 00:54:22,600
ethically and they think that's 
the most sustainable and 

1081
00:54:22,600 --> 00:54:26,120
profitable course, they have to 
acknowledge, if they collected 

1082
00:54:26,120 --> 00:54:28,040
some data, they'd probably find 
out that most people in their 

1083
00:54:28,040 --> 00:54:30,000
company don't think that it's 
actually a priority for them 

1084
00:54:30,600 --> 00:54:34,160
because the incentives don't 
seem to be aligned for that. 

1085
00:54:34,160 --> 00:54:36,280
People feel like if they 
actually want to figure out how 

1086
00:54:36,280 --> 00:54:38,880
to implement these ethical AI 
tools, they have to do it all in

1087
00:54:38,880 --> 00:54:41,040
their off time. 
People think fairness is 

1088
00:54:41,040 --> 00:54:43,840
something everyone wants to 
achieve. 

1089
00:54:44,120 --> 00:54:46,600
And there are at least still are
some regulations about being 

1090
00:54:46,600 --> 00:54:48,280
fair and not discriminating 
against others. 

1091
00:54:48,320 --> 00:54:50,280
And I should also say, sorry 
that there's a lot of technical 

1092
00:54:50,280 --> 00:54:54,520
tools out there for AI fairness.
You can audit your algorithms. 

1093
00:54:54,520 --> 00:54:57,480
You can actually modify your 
algorithm so that it's more 

1094
00:54:57,480 --> 00:55:00,160
likely to be fair. 
You can have gone got kind of go

1095
00:55:00,160 --> 00:55:01,920
through these checklists. 
So it seems like we've got 

1096
00:55:01,920 --> 00:55:04,720
everything we need. 
AI fairness should be easy. 

1097
00:55:05,280 --> 00:55:08,200
Turns out there's over 20 
definitions, mathematical 

1098
00:55:08,200 --> 00:55:11,600
definitions of AI fairness, and 
someone in your company has to 

1099
00:55:11,600 --> 00:55:14,280
decide what that definition is. 
And they're dramatically 

1100
00:55:14,280 --> 00:55:16,280
different. 
Like first of all, who reads 

1101
00:55:16,280 --> 00:55:18,000
mathematical equations? 
Not many. 

1102
00:55:18,360 --> 00:55:20,440
And yet somehow you have to 
understand the implications of 

1103
00:55:20,440 --> 00:55:22,600
this and it makes huge 
differences. 

1104
00:55:22,880 --> 00:55:25,000
And so now your technical team 
has to make that decision, 

1105
00:55:25,000 --> 00:55:28,520
perhaps while they're on a two 
week Sprint and you know them be

1106
00:55:28,520 --> 00:55:30,280
held responsible or not for 
that. 

1107
00:55:30,520 --> 00:55:32,200
Right. 
There is the risk of making the 

1108
00:55:32,200 --> 00:55:34,720
wrong decision, right? 
Whereas if you kind of ignore 

1109
00:55:34,720 --> 00:55:38,240
that there's a decision there at
all, it's almost safer to you, 

1110
00:55:38,320 --> 00:55:40,920
even if you can have these 
dramatic consequences. 

1111
00:55:41,080 --> 00:55:42,600
Exactly. 
Exactly. 

1112
00:55:42,600 --> 00:55:44,640
And the amount of time it would 
take them to get up to speed on 

1113
00:55:44,640 --> 00:55:47,600
making the right decision is not
what they're being incentivized 

1114
00:55:47,600 --> 00:55:50,320
with, right? 
So there's a lot I could say 

1115
00:55:50,320 --> 00:55:52,200
about this, but that's kind of 
one piece of the puzzle. 

1116
00:55:52,200 --> 00:55:54,040
And I think this is where 
behavioral scientists have a lot

1117
00:55:54,040 --> 00:55:55,560
to offer. 
And there's a field of 

1118
00:55:55,560 --> 00:55:58,080
behavioral ethics in particular 
that I think has a lot to offer 

1119
00:55:58,080 --> 00:55:59,400
here. 
And we talked about some 

1120
00:55:59,400 --> 00:56:02,400
concrete pieces of advice in the
book, But just to give you an 

1121
00:56:02,400 --> 00:56:04,680
idea of some of the things that 
I think are really important, my

1122
00:56:04,680 --> 00:56:09,000
number one thing that I wish the
entire world would do is create 

1123
00:56:09,000 --> 00:56:12,800
really robust ethical AIKPIS or 
key performance indicators. 

1124
00:56:12,800 --> 00:56:15,280
So we all know especially 
effective businesses organize 

1125
00:56:15,360 --> 00:56:19,480
everything, their compensation 
promotion, the entire strategy 

1126
00:56:19,480 --> 00:56:21,720
right around changing the needle
on KPIs. 

1127
00:56:22,040 --> 00:56:26,520
Well if you don't have ethical 
AIKPIS then how are you ever 

1128
00:56:26,520 --> 00:56:28,680
going to compete with all the 
other KP is people are being, 

1129
00:56:28,680 --> 00:56:32,080
you know, compensated for? 
Are there companies that have 

1130
00:56:32,080 --> 00:56:37,520
ethical AIKPIS who's doing this?
I don't know for sure, any that 

1131
00:56:37,520 --> 00:56:38,880
are doing it. 
I keep hearing through the 

1132
00:56:38,880 --> 00:56:41,560
Grapevine, yeah, we're doing it.
And then I ask about, well, what

1133
00:56:41,560 --> 00:56:42,880
are they? 
And then no one seems to be able

1134
00:56:42,880 --> 00:56:45,000
to tell me. 
How much of your bonus is tied 

1135
00:56:45,000 --> 00:56:47,600
to that? 
Right, exactly, exactly. 

1136
00:56:47,920 --> 00:56:51,240
So I'd be very curious and it 
again, if you looking through 

1137
00:56:51,240 --> 00:56:54,040
the evidence, most people who 
are working on ethical AI teams 

1138
00:56:54,040 --> 00:56:57,600
feel like they have no power. 
So it makes me feel like that 

1139
00:56:57,600 --> 00:56:59,920
probably means they don't have 
KP is associated, you know, 

1140
00:56:59,960 --> 00:57:01,520
whatever they're supposed to 
produce. 

1141
00:57:02,440 --> 00:57:04,760
So yeah, there's a lot we can 
talk about there. 

1142
00:57:04,960 --> 00:57:07,440
But that on its own, just having
the right kind of setting and 

1143
00:57:07,440 --> 00:57:09,800
organizational practices and 
helping organizations figure 

1144
00:57:09,800 --> 00:57:10,960
out, well, what do they need to 
do? 

1145
00:57:10,960 --> 00:57:12,320
What kind of cultures do they 
need to set? 

1146
00:57:12,320 --> 00:57:13,720
What kind of change management 
do they need? 

1147
00:57:13,720 --> 00:57:17,120
What kind of processes that 
still won't matter if people 

1148
00:57:17,120 --> 00:57:20,080
don't have the skills they need,
you know, even within that 

1149
00:57:20,080 --> 00:57:23,560
context. 
And so I think the next thing is

1150
00:57:23,760 --> 00:57:30,080
what I call developing scalable 
ways to have system or career 

1151
00:57:30,440 --> 00:57:35,000
wide career long training in 
moral AI systems thinking lots 

1152
00:57:35,000 --> 00:57:38,320
of words there. 
The point is that very few of us

1153
00:57:38,320 --> 00:57:41,480
that have time to develop to 
even figure out what we think is

1154
00:57:41,480 --> 00:57:44,160
morally right or wrong or 
develop all the skills that you 

1155
00:57:44,160 --> 00:57:47,000
need to do that right. 
And we're kind of in a culture 

1156
00:57:47,000 --> 00:57:48,920
now where people think, well, 
you're either a good person, 

1157
00:57:48,920 --> 00:57:51,120
you're a bad person, and if 
you're good, that equals good 

1158
00:57:51,120 --> 00:57:52,560
behavior. 
And if you're a bad person, that

1159
00:57:52,560 --> 00:57:53,680
equals bad behavior. 
And that's it. 

1160
00:57:54,120 --> 00:57:57,880
I tie this back to another 
concept from the behavioral 

1161
00:57:57,880 --> 00:58:01,040
literature, this kind of fixed 
mindset versus growth mindset. 

1162
00:58:01,040 --> 00:58:02,720
People are familiar with that in
education. 

1163
00:58:03,000 --> 00:58:06,480
It applies to moral stuff too. 
Like you're, I think of it if 

1164
00:58:06,480 --> 00:58:08,320
you think that when people are 
either good people or bad 

1165
00:58:08,320 --> 00:58:09,920
people, that's kind of this 
fixed mindset. 

1166
00:58:09,920 --> 00:58:12,760
If you think of it instead as a 
moral growth mindset, we're all 

1167
00:58:13,000 --> 00:58:15,240
good people, or most of us at 
least are good people. 

1168
00:58:15,520 --> 00:58:17,800
We just have to learn how, you 
know, learn the skills. 

1169
00:58:18,000 --> 00:58:19,800
That's the shift we need to 
make. 

1170
00:58:20,400 --> 00:58:22,760
We need to make that and then 
actually give people the 

1171
00:58:22,760 --> 00:58:25,800
opportunity to learn this stuff.
And people like to think that 

1172
00:58:25,800 --> 00:58:29,040
moral education is just like 
high school level, maybe college

1173
00:58:29,080 --> 00:58:31,480
level, but people forget that we
actually need it all the way 

1174
00:58:31,480 --> 00:58:33,120
through. 
Like, you know, I don't think 

1175
00:58:33,120 --> 00:58:35,680
CEOs have had much opportunity 
to figure out exactly what their

1176
00:58:35,680 --> 00:58:39,800
values are or to figure out how 
they apply to AI, right? 

1177
00:58:39,800 --> 00:58:41,680
Or like how that's even relevant
to AI? 

1178
00:58:42,840 --> 00:58:45,640
Yeah, they do when they get a 
chance to write the best selling

1179
00:58:45,800 --> 00:58:47,800
book about like this is how it 
succeed. 

1180
00:58:47,880 --> 00:58:50,720
Then they have one virtue like 
you got to be tough, you got to 

1181
00:58:50,720 --> 00:58:53,520
be exploited like they might be 
to find something there. 

1182
00:58:53,520 --> 00:58:56,600
But yeah, I think honestly even 
what you said before about the 

1183
00:58:56,600 --> 00:59:01,720
kidney immoral AI intervention, 
it's almost a self reflection 

1184
00:59:01,720 --> 00:59:04,880
exercise for people to make 
sense of their you know, what is

1185
00:59:04,880 --> 00:59:06,840
my moral preferences and and so 
on. 

1186
00:59:06,840 --> 00:59:12,240
But what do I think and I think 
that is deeply needed and like 

1187
00:59:12,240 --> 00:59:15,520
to compare against what you said
around will be nice or important

1188
00:59:15,520 --> 00:59:20,040
to have when it comes to 
implementing AI morally at scale

1189
00:59:20,040 --> 00:59:22,360
with organizations. 
I guess the reality right now is

1190
00:59:22,360 --> 00:59:26,880
so far from that. 
Like you have some company wide 

1191
00:59:27,200 --> 00:59:30,040
tools or AI capabilities that 
are given to people. 

1192
00:59:30,040 --> 00:59:33,240
Like they get part of the 
Microsoft suite or whatever 

1193
00:59:33,280 --> 00:59:35,280
thing they have and they say, 
hey, you have Copilot now. 

1194
00:59:35,280 --> 00:59:37,800
And then you have some people in
the organization that are maybe,

1195
00:59:38,160 --> 00:59:41,040
you know, able to already like 
double or triple their 

1196
00:59:41,040 --> 00:59:43,440
productivity, but they don't 
tell anyone about it because 

1197
00:59:43,440 --> 00:59:44,600
they don't really benefit from 
it. 

1198
00:59:44,600 --> 00:59:48,080
So they're just writing all the 
code with some cursor or some 

1199
00:59:48,200 --> 00:59:51,320
other AI tool, but they had no 
real benefit to share it with 

1200
00:59:51,640 --> 00:59:54,600
whatever else, what tools 
they're actually using and how 

1201
00:59:54,600 --> 00:59:57,000
they're using it and so on. 
So you have like some really 

1202
00:59:57,000 --> 01:00:01,520
massive time for AI with 
organizations where yeah, I 

1203
01:00:01,520 --> 01:00:04,800
think it's so far from. 
The ideal context. 

1204
01:00:04,800 --> 01:00:07,360
So what do you see it from that 
messy reality we're in right 

1205
01:00:07,360 --> 01:00:08,360
now? 
What do you see as some of the 

1206
01:00:08,360 --> 01:00:12,160
first steps to better embed 
moral AI within organizations? 

1207
01:00:12,840 --> 01:00:14,920
Yeah, absolutely. 
KP is. 

1208
01:00:15,200 --> 01:00:18,160
So that's the first thing. 
Then there are different ways to

1209
01:00:18,160 --> 01:00:20,640
handle it. 
But I like the idea that many 

1210
01:00:20,640 --> 01:00:25,400
are advocating for of having an 
embedded ethical AI experts. 

1211
01:00:25,400 --> 01:00:28,400
And I'm going to say ethical AI,
not just ethicists into product 

1212
01:00:28,400 --> 01:00:31,640
teams in particular. 
So product teams that are either

1213
01:00:31,640 --> 01:00:34,000
developing the AI models 
themselves or integrating them 

1214
01:00:34,000 --> 01:00:35,760
into products. 
And because the reason why, 

1215
01:00:35,760 --> 01:00:38,840
especially if I put on on my 
data science hat is, you know, 

1216
01:00:38,840 --> 01:00:41,800
there's a lot of kind of 
technical details that matter 

1217
01:00:42,040 --> 01:00:44,840
for these ethical decisions. 
And so you need to really have a

1218
01:00:44,880 --> 01:00:48,040
kind of understanding of the 
data side of like, what's all 

1219
01:00:48,040 --> 01:00:49,960
this technical stuff and why is 
it relevant? 

1220
01:00:49,960 --> 01:00:53,480
But also some of the issues, the
most likely issues to arise and 

1221
01:00:53,640 --> 01:00:56,600
calling those out for people. 
And you need to be able to that 

1222
01:00:56,600 --> 01:01:01,680
into the normal agile process of
product development, which works

1223
01:01:01,680 --> 01:01:04,800
in sprints often and have 
immediate deadlines and don't 

1224
01:01:04,840 --> 01:01:06,160
have time for kind of other 
things. 

1225
01:01:06,160 --> 01:01:11,920
So embedding AI or moral AI or 
ethical AI experts throughout 

1226
01:01:11,920 --> 01:01:13,280
the organization, that's one 
thing. 

1227
01:01:13,520 --> 01:01:14,920
Now how do you get those in the 
1st place? 

1228
01:01:14,920 --> 01:01:17,440
That's another piece. 
Then most people are not trained

1229
01:01:17,440 --> 01:01:20,360
in both the moral and ethical 
side and the technical side. 

1230
01:01:20,360 --> 01:01:22,440
So we need more people trained 
in those things. 

1231
01:01:23,160 --> 01:01:26,560
Another than critical piece is 
you have to work with change 

1232
01:01:26,560 --> 01:01:29,800
management teams and strategies 
to change the culture. 

1233
01:01:30,120 --> 01:01:33,600
We really have to get out of 
this idea that you're going to 

1234
01:01:33,600 --> 01:01:36,920
have a right or wrong, and if 
you do something wrong once, 

1235
01:01:36,920 --> 01:01:38,560
that means you're a bad person 
and you're out. 

1236
01:01:39,200 --> 01:01:40,920
It has to be. 
We're all going to have to 

1237
01:01:40,920 --> 01:01:43,960
recognize going back to this 
humility, We're all learning 

1238
01:01:44,320 --> 01:01:46,200
what our values are and how to 
make things align. 

1239
01:01:46,200 --> 01:01:48,720
And also, this is a really fast 
moving space anyway, so we're 

1240
01:01:48,720 --> 01:01:50,440
not going to be able to predict 
things in the right way. 

1241
01:01:51,240 --> 01:01:53,600
So there needs to be kind of a 
change of culture of, OK, we're 

1242
01:01:53,600 --> 01:01:56,480
learning together and we are 
taking responsibility for when 

1243
01:01:56,480 --> 01:01:57,960
we make a mistake, we're going 
to fix it. 

1244
01:01:58,240 --> 01:02:01,640
But you're not going to, we're 
not going to fire you every time

1245
01:02:01,640 --> 01:02:04,280
that there's a mistake or 
there's an unexpected outcome. 

1246
01:02:04,880 --> 01:02:07,440
You need to work on things like 
psychological safety, all kinds 

1247
01:02:07,440 --> 01:02:08,560
of stuff. 
You guys are probably even 

1248
01:02:08,560 --> 01:02:10,760
better experts at than me. 
And you need to have 

1249
01:02:10,760 --> 01:02:12,480
facilitators. 
And some of these conversations,

1250
01:02:12,480 --> 01:02:14,160
you know, people have very 
different views as we keep 

1251
01:02:14,160 --> 01:02:15,800
talking about. 
You can't just have those people

1252
01:02:15,960 --> 01:02:19,240
in a room, give them an hour to 
discuss what to do, and think 

1253
01:02:19,240 --> 01:02:21,600
that everything's going to go 
fine if they've never been, you 

1254
01:02:21,600 --> 01:02:24,080
know, trained and having 
difficult conversations before, 

1255
01:02:24,080 --> 01:02:26,240
right? 
The tricky thing right now is 

1256
01:02:26,240 --> 01:02:30,520
that information is abundant. 
We have so much information 

1257
01:02:30,520 --> 01:02:31,720
around AI and how it works and 
so on. 

1258
01:02:32,080 --> 01:02:35,280
But like wisdom around how to 
use it properly and thoughtfully

1259
01:02:35,280 --> 01:02:37,280
and all these things is very 
scarce. 

1260
01:02:38,320 --> 01:02:41,520
And so that's kind of what I 
hear when you say like having 

1261
01:02:41,520 --> 01:02:45,280
some embedded AI ethicists of 
sort like some of experts, 

1262
01:02:45,280 --> 01:02:48,880
because I think it can sound a 
little bit like you're purposely

1263
01:02:48,880 --> 01:02:51,520
putting people there to kind of 
slow things down to be like, 

1264
01:02:51,520 --> 01:02:55,160
hey, but I think it really comes
down to having people there that

1265
01:02:55,640 --> 01:03:00,800
understands the more subtle 
elements of like how to use this

1266
01:03:00,920 --> 01:03:05,200
thoughtfully for the best, good 
for the organization, like how 

1267
01:03:05,200 --> 01:03:08,200
to get out the best. 
Because I think that's honestly 

1268
01:03:08,200 --> 01:03:12,400
where I think right now there's 
a lot of even the, the 

1269
01:03:12,400 --> 01:03:14,600
organization that are selling 
kind of quote and quote change 

1270
01:03:14,600 --> 01:03:17,760
management running adoption. 
Often times they don't really 

1271
01:03:17,760 --> 01:03:19,320
understand AI that much 
themselves. 

1272
01:03:19,320 --> 01:03:21,520
Like they may be used to do 
change management for other 

1273
01:03:21,520 --> 01:03:24,280
products and services. 
And they think AI is like any 

1274
01:03:24,280 --> 01:03:25,920
software. 
They start using it like 

1275
01:03:26,320 --> 01:03:27,520
anything, but it's very 
different. 

1276
01:03:27,520 --> 01:03:30,720
It's a very different thing. 
And you need that kind of higher

1277
01:03:30,720 --> 01:03:35,120
level of expertise or wisdom to 
help facilitate that, I think. 

1278
01:03:36,160 --> 01:03:38,160
Yeah, absolutely. 
And it's a good point. 

1279
01:03:38,160 --> 01:03:40,160
Cause what are the differences 
and similarities between just 

1280
01:03:40,160 --> 01:03:42,560
getting people up to speed on 
using AI in general, which is an

1281
01:03:42,560 --> 01:03:45,720
incredible need right now. 
And then is there something 

1282
01:03:45,720 --> 01:03:49,160
different about getting them up 
to speed on kind of how to use 

1283
01:03:49,240 --> 01:03:52,360
AI ethically? 
And I think that's still an open

1284
01:03:52,360 --> 01:03:56,440
question in general, but it 
definitely, I think people feel 

1285
01:03:56,440 --> 01:03:59,680
like they're more likely to have
their incentives lined up, at 

1286
01:03:59,680 --> 01:04:03,520
least if they learn about using 
AI in general, then they are 

1287
01:04:03,520 --> 01:04:06,400
about learning about ethical AI.
Because as you said, I mean, 

1288
01:04:06,400 --> 01:04:10,040
everyone feels like these 
ethical challenges make things 

1289
01:04:10,040 --> 01:04:12,600
slower sometimes, at least in 
the short term, it feels like it

1290
01:04:12,600 --> 01:04:14,360
makes things slower. 
And so then it just feels like 

1291
01:04:14,360 --> 01:04:16,680
it comes up the works. 
But don't you think that there's

1292
01:04:16,680 --> 01:04:23,680
a pretty solid counter argument 
to the ethical AI gums things up

1293
01:04:23,680 --> 01:04:27,480
and makes it slower by arguing 
that because it mitigates risk, 

1294
01:04:27,480 --> 01:04:31,640
you actually save time in the 
future when things you know are 

1295
01:04:31,640 --> 01:04:33,360
prevented from going terribly 
wrong? 

1296
01:04:33,760 --> 01:04:38,080
And also, not just that, but if 
you can communicate the steps 

1297
01:04:38,080 --> 01:04:42,720
that you're taking to build 
moral AI, ethical AI that 

1298
01:04:42,720 --> 01:04:47,360
actually builds trust in your 
users and your audience, and 

1299
01:04:47,440 --> 01:04:49,720
also within your own company as 
well. 

1300
01:04:49,960 --> 01:04:54,000
And people are more willing to 
use your product, which maybe 

1301
01:04:54,120 --> 01:04:56,520
creates some efficiencies in 
terms of like, you know, you 

1302
01:04:56,520 --> 01:04:58,760
don't have to do so much 
marketing or all the other 

1303
01:04:58,760 --> 01:05:00,680
things that you would have to do
to convince people to use your 

1304
01:05:00,680 --> 01:05:02,760
product. 
Now, if people trust it, they're

1305
01:05:02,760 --> 01:05:05,840
more inclined to use it right 
off the bat. 

1306
01:05:06,440 --> 01:05:08,320
I personally completely agree 
with you. 

1307
01:05:08,840 --> 01:05:11,440
And you know, I'm very persuaded
with that, and I cite a lot of 

1308
01:05:11,440 --> 01:05:13,000
the data related to those 
things. 

1309
01:05:13,000 --> 01:05:15,000
But I think it does depend on 
your background. 

1310
01:05:15,280 --> 01:05:17,680
So I come from a lot of 
biomedical engineering. 

1311
01:05:17,680 --> 01:05:19,560
If you're going to stick 
something in someone's brain, 

1312
01:05:20,200 --> 01:05:23,120
you want to get it right, right?
And I've always been interested 

1313
01:05:23,120 --> 01:05:25,160
about this because I have some 
engineering background too. 

1314
01:05:25,160 --> 01:05:27,920
And for me as an engineer, you 
want to get the whole system 

1315
01:05:27,920 --> 01:05:29,240
working right. 
It's kind of like people who 

1316
01:05:29,240 --> 01:05:31,960
used to make space shuttles, you
know, you're worried about the 

1317
01:05:31,960 --> 01:05:33,640
worst possible case. 
And that was kind of the 

1318
01:05:33,640 --> 01:05:35,760
culture. 
But if you come from more of a 

1319
01:05:35,760 --> 01:05:38,080
software development culture 
where you don't think things are

1320
01:05:38,080 --> 01:05:41,000
going to go really wrong, that's
not part of your culture. 

1321
01:05:41,160 --> 01:05:42,960
So that stuff is not persuasive 
to it. 

1322
01:05:43,160 --> 01:05:46,120
Exactly, Yeah. 
But yeah, one of my colleagues 

1323
01:05:46,120 --> 01:05:49,600
who's now Google but was also a 
neuroscientist, says the AI 

1324
01:05:49,600 --> 01:05:51,440
adoption moves at the speed of 
trust. 

1325
01:05:52,200 --> 01:05:55,600
And I'm persuaded by all the 
data behind that statement, but 

1326
01:05:55,600 --> 01:05:58,120
not everyone is. 
So you have to navigate that 

1327
01:05:58,120 --> 01:06:00,160
with those who are and are not 
persuaded by that. 

1328
01:06:00,880 --> 01:06:02,240
Yeah. 
How do you look at the 

1329
01:06:02,240 --> 01:06:05,440
development of AI itself or like
the AI models we talked about 

1330
01:06:05,440 --> 01:06:08,960
large language models, you know,
because I think in some ways, if

1331
01:06:08,960 --> 01:06:12,320
you look at least from my 
vantage point, compare and 

1332
01:06:12,320 --> 01:06:14,320
contrast in some of the labs is 
kind of a little bit of a 

1333
01:06:14,320 --> 01:06:17,680
microcosm of what we discussed. 
I think like I've written down 

1334
01:06:17,680 --> 01:06:20,680
in terms of like if we think 
about open AI, Anthropic, maybe 

1335
01:06:20,680 --> 01:06:25,400
meta, like they all stick to 
certain like core principles 

1336
01:06:25,400 --> 01:06:28,160
like open AII think they stick 
to, you know, moving fast. 

1337
01:06:28,160 --> 01:06:32,080
Like they are really trying to 
get things out there as fast as 

1338
01:06:32,080 --> 01:06:35,320
possible. 
Maybe a little bit more just 

1339
01:06:35,320 --> 01:06:39,080
embracing this idea of move fast
and break things mentality from 

1340
01:06:39,080 --> 01:06:42,120
Silicon Valley and so on. 
Then Anthropic, as you 

1341
01:06:42,360 --> 01:06:45,360
referenced before, like they 
left, I think some of the 

1342
01:06:45,360 --> 01:06:47,600
founders left open AI because 
they basically felt like things 

1343
01:06:47,600 --> 01:06:50,440
were moving a little too fast 
and they wanted to again, stick 

1344
01:06:50,440 --> 01:06:52,920
to constitutional. 
Like it should be more based on 

1345
01:06:52,920 --> 01:06:54,800
some of that. 
And then I have meta, which I 

1346
01:06:54,800 --> 01:06:59,520
think is really, at least among 
some of the people with the 

1347
01:06:59,520 --> 01:07:01,280
meta, it's like really strong 
around openness. 

1348
01:07:01,280 --> 01:07:04,280
Like this should be AI should be
open, it should be open source 

1349
01:07:04,440 --> 01:07:06,760
and so on. 
And so I feel like within these 

1350
01:07:06,760 --> 01:07:10,800
labs, they're valuing different 
types of or taking different 

1351
01:07:10,800 --> 01:07:12,720
types of moral stances around AI
development. 

1352
01:07:13,000 --> 01:07:15,640
How do you look at that? 
What are your take on kind of 

1353
01:07:15,680 --> 01:07:17,640
how AI develops, taking a step 
back? 

1354
01:07:17,640 --> 01:07:19,480
Yeah. 
Yeah. 

1355
01:07:19,480 --> 01:07:21,560
I mean, I think all of that's 
right. 

1356
01:07:22,160 --> 01:07:25,200
And I would say it also seems 
like maybe sometimes those 

1357
01:07:25,200 --> 01:07:28,440
values shift and if I'm being 
trying to be consistent with 

1358
01:07:28,440 --> 01:07:31,200
stuff we talked earlier, I don't
think it's bad to shift. 

1359
01:07:31,320 --> 01:07:34,200
I think you should just take 
ownership over that process. 

1360
01:07:34,880 --> 01:07:37,040
And so if you're changing how 
much you're focusing on 

1361
01:07:37,040 --> 01:07:40,720
something, why can you be 
transparent about it? 

1362
01:07:40,720 --> 01:07:42,320
Can you take responsibility for 
it? 

1363
01:07:43,080 --> 01:07:45,080
You're meaning that you change 
the name to close the AI. 

1364
01:07:46,000 --> 01:07:52,800
That I mean, how important you 
think safety should be, for 

1365
01:07:52,800 --> 01:07:53,840
example? 
Yeah. 

1366
01:07:54,360 --> 01:07:55,760
Right. 
Maybe you started out by 

1367
01:07:55,760 --> 01:08:00,400
thinking that our reason to 
exist is to make sure that AI 

1368
01:08:00,400 --> 01:08:02,440
doesn't become conscious. 
And maybe you've decided 

1369
01:08:02,440 --> 01:08:05,000
actually the best thing for 
society is for AI to become 

1370
01:08:05,000 --> 01:08:07,440
conscious. 
That might be a viable 

1371
01:08:07,440 --> 01:08:11,240
transition. 
But in order for society to 

1372
01:08:11,240 --> 01:08:15,600
ultimately trust you, going back
to this idea, ideally you would 

1373
01:08:15,600 --> 01:08:19,960
be genuine about your reasons 
and transparent about them. 

1374
01:08:20,279 --> 01:08:22,600
And then we should be not just 
forgiving, but assume again, we 

1375
01:08:22,600 --> 01:08:24,080
all make decisions. 
We're all learning. 

1376
01:08:24,160 --> 01:08:28,120
Yeah. 
How much do you think that what 

1377
01:08:28,120 --> 01:08:31,880
these labs are doing is broadly 
the same when it comes to like 

1378
01:08:31,880 --> 01:08:35,319
we're talking about top down, 
bottom up, hybrid approaches and

1379
01:08:35,319 --> 01:08:37,200
so on? 
How much do you think there are 

1380
01:08:37,200 --> 01:08:38,520
yet? 
They're different in some ways, 

1381
01:08:38,520 --> 01:08:42,040
but is that really big 
differences in terms of how 

1382
01:08:42,040 --> 01:08:45,120
they're developing their AI 
models and how they are 

1383
01:08:45,120 --> 01:08:48,479
approaching their moral, the 
moral aspects to these things? 

1384
01:08:50,000 --> 01:08:53,680
Or are they like very different?
Like are they very much on polar

1385
01:08:53,680 --> 01:08:57,279
opposites or like how do you 
compare contrasts? 

1386
01:08:58,200 --> 01:08:59,720
But. 
It's really hard to know for 

1387
01:08:59,720 --> 01:09:02,040
sure what their real framework 
is. 

1388
01:09:02,040 --> 01:09:05,760
I mean, Meta has Yan Lukun who 
goes on record all the time 

1389
01:09:05,760 --> 01:09:08,880
saying LLMS are stupid and are 
going to eventually going to 

1390
01:09:08,880 --> 01:09:11,319
fade out. 
And you have to have reasoning a

1391
01:09:11,479 --> 01:09:15,399
is does that mean Meta actually 
succeeding in developing AI 

1392
01:09:15,399 --> 01:09:17,120
models? 
And are they investing, you 

1393
01:09:17,120 --> 01:09:20,240
know, a whole lot in developing 
models that will function in 

1394
01:09:20,240 --> 01:09:22,920
different ways than LLM's? 
In which case we have to know 

1395
01:09:22,920 --> 01:09:25,520
about them before we know, you 
know, are there principles built

1396
01:09:25,520 --> 01:09:28,040
into there? 
Yan Mikun seems to think that 

1397
01:09:28,040 --> 01:09:31,120
they will be, but there's no 
evidence of that yet. 

1398
01:09:31,680 --> 01:09:33,720
So it's really hard to know for 
sure. 

1399
01:09:33,720 --> 01:09:35,600
And I think they've all said 
different things at different 

1400
01:09:35,600 --> 01:09:38,120
times. 
Right now, it does seem like 

1401
01:09:38,120 --> 01:09:40,000
everyone feels like no matter 
what they think, they're in the 

1402
01:09:40,000 --> 01:09:43,240
race to make the best LLM and to
get those into products. 

1403
01:09:43,240 --> 01:09:46,120
And so whatever they think seems
to feel that speed is of the 

1404
01:09:46,120 --> 01:09:47,680
essence. 
And so I think they're all kind 

1405
01:09:47,680 --> 01:09:49,600
of making their own calculations
about what that means. 

1406
01:09:49,600 --> 01:09:52,479
But I think that does mean it's 
probably going to be more bottom

1407
01:09:52,479 --> 01:09:56,160
up for a while than top down. 
Other than this kind of 

1408
01:09:56,160 --> 01:10:00,000
constitutional AI approach, 
which I still struggle with 

1409
01:10:00,000 --> 01:10:02,640
because it's kind of this weird 
bottom up, top down, because you

1410
01:10:02,640 --> 01:10:05,320
have no idea how the AI is 
actually interpreting these 

1411
01:10:05,320 --> 01:10:09,440
principles. 
It still feels strange to me, 

1412
01:10:09,440 --> 01:10:13,560
But yeah. 
OK. 

1413
01:10:13,640 --> 01:10:18,720
So we made it to our quick fire 
round of decision which we call 

1414
01:10:18,880 --> 01:10:22,920
to AI or not to AI. 
And basically we're now going to

1415
01:10:23,240 --> 01:10:29,440
post certain task for things 
that AI could do and we 

1416
01:10:29,440 --> 01:10:32,600
basically want you to tell us 
whether you think they should do

1417
01:10:32,600 --> 01:10:35,000
it. 
So whether these things are 

1418
01:10:35,000 --> 01:10:45,960
something for AI or not to AI. 
OK, first one personalized AI 

1419
01:10:45,960 --> 01:10:49,760
assistance responses based on 
individual's moral values. 

1420
01:10:51,960 --> 01:10:54,600
I hope that AI can be part of 
that equation. 

1421
01:10:54,880 --> 01:10:57,080
For right now I think it should 
be humans, but I hope that 

1422
01:10:57,800 --> 01:10:59,560
eventually AI can be part of 
that equation. 

1423
01:10:59,800 --> 01:11:04,000
Determine responsibility for AI 
failures like when a self 

1424
01:11:04,000 --> 01:11:06,280
driving car fatally hits a 
person. 

1425
01:11:09,200 --> 01:11:11,720
I hope I'm not a broken record 
right now. 

1426
01:11:11,720 --> 01:11:13,280
I think humans need to make that
decision. 

1427
01:11:13,280 --> 01:11:15,640
But I hope that we get to a 
point where AI's can be a big 

1428
01:11:15,800 --> 01:11:18,800
part of that and can actually 
make that more systemized. 

1429
01:11:18,800 --> 01:11:21,440
Cool. 
To AI or not? 

1430
01:11:21,440 --> 01:11:27,760
To AIAI version of you that you 
can designate to vote on your 

1431
01:11:27,760 --> 01:11:31,240
behalf. 
Well, right now for me, 

1432
01:11:31,520 --> 01:11:36,920
definitely person human. 
But again, part of the vision is

1433
01:11:36,920 --> 01:11:40,920
that we'll have cases where we 
would want a IS to vote on our 

1434
01:11:40,920 --> 01:11:43,240
behalf and feel comfortable 
doing so. 

1435
01:11:43,240 --> 01:11:46,720
So right now, definitely me. 
But I hope again that changes. 

1436
01:11:49,120 --> 01:11:53,160
All right, Social synchrony 
training for neurodivergent 

1437
01:11:53,160 --> 01:11:56,160
individuals. 
I'll give AI the benefit of the 

1438
01:11:56,160 --> 01:11:58,960
doubt here and say if it's a 
really well trained AI which we 

1439
01:11:58,960 --> 01:12:01,760
don't yet have. 
But then I think AI might end up

1440
01:12:01,760 --> 01:12:03,920
being better than humans at some
things anyway. 

1441
01:12:03,920 --> 01:12:05,720
Depends. 
For neurodivergent individuals, 

1442
01:12:05,720 --> 01:12:08,480
I think AI might end up being 
more beneficial in the end when 

1443
01:12:08,480 --> 01:12:10,600
it's trained up well. 
And how do you think that? 

1444
01:12:10,600 --> 01:12:11,920
What would that training look 
like? 

1445
01:12:11,920 --> 01:12:14,840
Maybe for those who aren't 
familiar with social synchrony. 

1446
01:12:14,840 --> 01:12:15,880
Yeah. 
So it would have to be an 

1447
01:12:15,880 --> 01:12:19,160
interpretable system most likely
again, or at least would have to

1448
01:12:19,160 --> 01:12:22,200
give interpretable feedback. 
So it would say things like here

1449
01:12:22,200 --> 01:12:25,120
are the behaviors you're having 
that are not aligning with 

1450
01:12:25,120 --> 01:12:27,520
someone else's behaviors in the 
following way. 

1451
01:12:27,520 --> 01:12:30,080
And here's how it's impacting 
this interaction. 

1452
01:12:30,280 --> 01:12:33,040
So perhaps you're not looking at
their eyes when they're looking 

1453
01:12:33,040 --> 01:12:36,640
at your eyes, or you're not 
taking turns when you should be 

1454
01:12:36,640 --> 01:12:38,240
taking turns if you want to 
communicate that you're 

1455
01:12:38,240 --> 01:12:40,000
listening, that type of thing. 
Cool. 

1456
01:12:40,880 --> 01:12:45,560
Next one, decide which 
struggling relationships of 

1457
01:12:45,560 --> 01:12:51,200
yours are worth saving. 
Right now, definitely human, 

1458
01:12:51,640 --> 01:12:54,880
especially given that AI told a 
teenager to kill it's parents 

1459
01:12:54,880 --> 01:12:56,320
because it was spending too much
time with the AI. 

1460
01:12:57,120 --> 01:12:59,000
Yeah, probably. 
Keep that decision for yourself,

1461
01:13:00,720 --> 01:13:02,640
OK? 
A psychic hotline that models 

1462
01:13:02,640 --> 01:13:05,480
its answers based on 
probabilistic modeling. 

1463
01:13:07,000 --> 01:13:08,360
It's a psychic hotline. 
Are you? 

1464
01:13:08,680 --> 01:13:10,280
So you're going to the hotline 
because you're looking for a 

1465
01:13:10,280 --> 01:13:11,960
psychic? 
Yep, Yeah. 

1466
01:13:11,960 --> 01:13:15,040
You call in the psychic hotline,
you say, yeah, I like, you know,

1467
01:13:15,080 --> 01:13:16,760
am I going to be rich and 
famous? 

1468
01:13:17,440 --> 01:13:21,320
AI. 
What about this AI generated 

1469
01:13:21,520 --> 01:13:26,080
breakup messages optimized for 
minimum emotional harm? 

1470
01:13:26,080 --> 01:13:31,080
So basically the AI writing this
for you so that there's minimum 

1471
01:13:31,080 --> 01:13:33,320
emotional harm on the receiving 
end. 

1472
01:13:34,280 --> 01:13:38,320
Yeah, so this is one of where I 
will admit that I might have my 

1473
01:13:38,320 --> 01:13:40,200
view changed. 
And I'm this is something I'm 

1474
01:13:40,200 --> 01:13:41,920
really grappling with this type 
of thing. 

1475
01:13:42,040 --> 01:13:46,080
But right now I'm still firmly 
in the camp of no, if you're 

1476
01:13:46,080 --> 01:13:48,440
going to break up with someone, 
that should be you, not an AI 

1477
01:13:48,800 --> 01:13:51,880
writing that. 
Even if you do it suboptimally, 

1478
01:13:51,880 --> 01:13:54,520
your intention matters and the 
respect you give them matters. 

1479
01:13:56,120 --> 01:13:59,920
What if you like put give 
ChatGPT your like first draft 

1480
01:13:59,920 --> 01:14:02,760
and you say can you like smooth 
this out for me? 

1481
01:14:02,800 --> 01:14:04,360
So like you really gave the 
input. 

1482
01:14:04,360 --> 01:14:07,200
Yeah, I'm grappling with that. 
And we didn't get to talk about 

1483
01:14:07,200 --> 01:14:08,920
this, but this is something 
where I think we need to collect

1484
01:14:08,920 --> 01:14:11,120
a lot of data. 
So I think it depends on the 

1485
01:14:11,120 --> 01:14:13,680
person. 
So for some people, the benefits

1486
01:14:13,680 --> 01:14:15,480
would outweigh the harms of 
doing that. 

1487
01:14:16,160 --> 01:14:17,920
And so for them, that would be 
the case. 

1488
01:14:18,120 --> 01:14:20,600
For me, for example, if I was 
the recipient of that breakup 

1489
01:14:20,600 --> 01:14:24,560
message, it would harm me a lot 
more to eventually find out that

1490
01:14:24,560 --> 01:14:27,640
I knew, yeah, smooth it out. 
But that only that's the case 

1491
01:14:27,640 --> 01:14:29,160
for everyone. 
So I think we have to figure out

1492
01:14:29,440 --> 01:14:32,680
it depends on the recipient. 
We have to learn who needs what.

1493
01:14:33,800 --> 01:14:36,320
This is minority question. 
Just follow up, what would it 

1494
01:14:36,320 --> 01:14:39,240
hurt you most if you found out 
that was the break a message or 

1495
01:14:39,240 --> 01:14:42,800
that was like the first flirty 
message that got you to agree to

1496
01:14:42,800 --> 01:14:46,640
go on a dates? 
Because there's a lot of AI 

1497
01:14:46,640 --> 01:14:50,840
basically only for that task. 
I don't remember what it's 

1498
01:14:50,920 --> 01:14:52,320
called, like a risks AI or 
something like this. 

1499
01:14:52,320 --> 01:14:56,080
There's a lot of them, basically
to write messages to get people 

1500
01:14:56,080 --> 01:14:57,960
to agree to go on dates and 
various things like that. 

1501
01:14:59,160 --> 01:15:02,040
That's a great question. 
Right now, personally, I would 

1502
01:15:02,040 --> 01:15:06,400
be more hurt by AI writing the 
breakup message because you 

1503
01:15:06,400 --> 01:15:08,440
already know me. 
So it seems like now it's really

1504
01:15:08,440 --> 01:15:13,120
just like it's a profound kind 
of deep loss of respect and loss

1505
01:15:13,120 --> 01:15:16,560
of compassion and effort as the 
the big thing is the effort that

1506
01:15:16,560 --> 01:15:18,240
you're putting into caring for 
me. 

1507
01:15:19,240 --> 01:15:23,760
The last one, a wedding planner 
that ensures you're efficient 

1508
01:15:23,760 --> 01:15:25,840
doesn't get poached by another 
couple. 

1509
01:15:28,560 --> 01:15:31,360
No, I still want it to be a 
human in this case. 

1510
01:15:32,160 --> 01:15:34,920
When it comes to it, when it, I 
still want it to be a human that

1511
01:15:34,920 --> 01:15:39,840
decides who gets that efficient.
Awesome. 

1512
01:15:39,840 --> 01:15:43,680
OK, I will give the context. 
This is a personal example. 

1513
01:15:43,680 --> 01:15:47,640
We both asked the same person, 
Dan Ariely, to officiate our 

1514
01:15:47,640 --> 01:15:49,920
wedding and he said yes to me 
first. 

1515
01:15:49,960 --> 01:15:53,280
So. 
We did, and I took about almost 

1516
01:15:53,280 --> 01:15:54,880
a decade for me to find out who 
it was. 

1517
01:15:55,280 --> 01:15:55,680
That guy. 
Yeah. 

1518
01:15:56,200 --> 01:15:57,520
Yeah. 
But for the record, we also had 

1519
01:15:57,560 --> 01:16:00,080
a wonderful and efficient 
ourselves, so yeah. 

1520
01:16:00,320 --> 01:16:03,000
No regrets, you even convinced 
him to write a book with you 

1521
01:16:03,000 --> 01:16:04,760
later. 
Exactly. 

1522
01:16:06,000 --> 01:16:09,680
Well, that brings us to our 
final question. 

1523
01:16:10,640 --> 01:16:13,760
Jenna, what is your most 
controversial opinion in AII? 

1524
01:16:15,920 --> 01:16:20,480
Think my most controversial 
opinion is that acknowledging 

1525
01:16:20,480 --> 01:16:23,080
everything we talked about today
and that I'm very concerned 

1526
01:16:23,080 --> 01:16:25,400
about a is harms. 
I also think that AI can 

1527
01:16:25,400 --> 01:16:30,080
actually help humans become 
better moral decision makers and

1528
01:16:30,080 --> 01:16:33,000
actually can help us improve and
understand our moral values 

1529
01:16:33,000 --> 01:16:35,560
better. 
And yes, that's provocative. 

1530
01:16:35,960 --> 01:16:37,360
Yeah. 
Yeah. 

1531
01:16:37,560 --> 01:16:40,280
And actually this is something 
because you mentioned the lab, 

1532
01:16:40,280 --> 01:16:43,120
they had this Delphi AI model 
and it made me think about the 

1533
01:16:43,120 --> 01:16:47,920
Oracle at Delphi. 
And in this in on this vein, 

1534
01:16:47,920 --> 01:16:50,240
like, would you hope that there 
was some form of an Oracle at 

1535
01:16:50,240 --> 01:16:53,320
Delphi that you can go to with 
your moral qualms and be like, 

1536
01:16:53,320 --> 01:16:56,520
help me, you know? 
Yeah, there are two ways. 

1537
01:16:56,760 --> 01:16:58,600
And I'm actually, I really 
believe in these, like we're 

1538
01:16:58,600 --> 01:17:02,640
working on them. 
So 1 is we talked about how 

1539
01:17:02,640 --> 01:17:05,480
we're in this phase two of 
turning up a moral AI. 

1540
01:17:05,480 --> 01:17:07,360
Do you have this come back and 
forth? 

1541
01:17:08,240 --> 01:17:11,080
So first of all, if you actually
get to the point where you trust

1542
01:17:11,080 --> 01:17:14,160
this thing and it's an AI for 
you, so you're like this model. 

1543
01:17:14,160 --> 01:17:18,880
Yeah, this is what I think. 
Then you can use it in a context

1544
01:17:18,880 --> 01:17:20,920
where you have lots of time. 
The idea is that you would train

1545
01:17:20,920 --> 01:17:22,520
this thing up when you're calm 
and rested. 

1546
01:17:22,640 --> 01:17:23,880
You get to think about it many 
times. 

1547
01:17:23,880 --> 01:17:26,120
You get to change your mind. 
There's no judgement. 

1548
01:17:26,280 --> 01:17:28,560
So that then when you're in a 
context where you're under time 

1549
01:17:28,560 --> 01:17:31,320
pressure and you're not at your 
best, it could tell you, here's 

1550
01:17:31,320 --> 01:17:34,360
what you said was important to 
you in this context. 

1551
01:17:34,360 --> 01:17:37,760
Here's how that would play out. 
You know, it's up to you to 

1552
01:17:37,760 --> 01:17:40,120
decide what you want to do. 
But, you know, do you want that 

1553
01:17:40,120 --> 01:17:41,200
check? 
And so, for example, in our 

1554
01:17:41,200 --> 01:17:43,360
kidney exchange context, we're 
looking to use that with 

1555
01:17:43,360 --> 01:17:45,480
surgeons. 
So surgeons, there's these 

1556
01:17:45,480 --> 01:17:47,520
documented effects that, for 
example, they're more likely to 

1557
01:17:47,520 --> 01:17:51,000
reject a kidney on the weekend 
than on the weekdays to come, 

1558
01:17:51,080 --> 01:17:52,440
you know, presumably because 
they have to come in. 

1559
01:17:52,440 --> 01:17:54,760
And they're more likely to 
reject a kidney from an African 

1560
01:17:54,840 --> 01:17:58,920
American woman than from other 
demographics in ways that can't 

1561
01:17:58,920 --> 01:18:01,280
be accounted for by medical 
situations. 

1562
01:18:01,280 --> 01:18:04,240
So if you train them up in this 
outside context and now, you 

1563
01:18:04,240 --> 01:18:05,960
know, they get a phone call in 
the middle of the night, they 

1564
01:18:05,960 --> 01:18:08,480
have to decide in 20 seconds, 
are they going to accept this 

1565
01:18:08,480 --> 01:18:10,600
kidney or not? 
Not 20 seconds, you know, 20 

1566
01:18:10,600 --> 01:18:13,560
minutes or something. 
You know, would they, how would 

1567
01:18:13,560 --> 01:18:15,360
they respond to an AI that they 
trained up? 

1568
01:18:15,360 --> 01:18:17,840
No one knows what the AI says. 
All they know is that they trust

1569
01:18:17,840 --> 01:18:20,440
and they train this thing up. 
How would they respond to what 

1570
01:18:20,440 --> 01:18:22,040
that AI says? 
So that's one kind of context. 

1571
01:18:22,040 --> 01:18:24,160
We're also looking at that in 
end of life decisions. 

1572
01:18:24,160 --> 01:18:27,040
Could you train up an AI so that
it would make your own end of 

1573
01:18:27,040 --> 01:18:29,400
life decisions for you? 
But here's the other way. 

1574
01:18:29,400 --> 01:18:32,000
And you mentioned this was back 
earlier when we talked, Samuel, 

1575
01:18:32,000 --> 01:18:33,160
but you hit it right on the 
nose. 

1576
01:18:33,160 --> 01:18:35,520
So there was something I totally
didn't anticipate with this 

1577
01:18:35,520 --> 01:18:38,960
whole training process. 
And that's that it's almost like

1578
01:18:38,960 --> 01:18:41,600
a little moral psychology or 
moral philosophy class for 

1579
01:18:41,600 --> 01:18:44,000
yourself. 
It's like you get to figure out 

1580
01:18:44,000 --> 01:18:46,400
through this process what you 
actually think. 

1581
01:18:46,400 --> 01:18:49,400
So I've been thinking about 
moral stuff for decades and 

1582
01:18:49,400 --> 01:18:51,760
going through this own process 
of turning up our own AI taught 

1583
01:18:51,760 --> 01:18:54,600
me things I didn't even know 
about my own moral judgement and

1584
01:18:54,600 --> 01:18:56,600
made me think about it. 
And it lets you do it in a way 

1585
01:18:56,600 --> 01:18:59,720
that you're not being judged, at
least as long as you don't have 

1586
01:19:00,200 --> 01:19:01,440
AAI assistant that's judging 
you. 

1587
01:19:01,440 --> 01:19:03,360
But right now that's not our 
models, right? 

1588
01:19:04,000 --> 01:19:06,760
And so it really has taught me 
and I've become actually really 

1589
01:19:06,760 --> 01:19:09,480
excited about this opportunity. 
And maybe it's a way to tie back

1590
01:19:09,480 --> 01:19:12,880
into this education model. 
Like, could we create learning 

1591
01:19:12,880 --> 01:19:15,880
opportunities for everyone 
tailored for them that let them 

1592
01:19:15,880 --> 01:19:18,200
figure out what they think is 
right and wrong and do that in 

1593
01:19:18,800 --> 01:19:20,760
stressful environment where 
you're not going to be judged 

1594
01:19:20,760 --> 01:19:23,000
and no one needs to know what 
you're deciding, but helps you 

1595
01:19:23,000 --> 01:19:25,640
figure that out? 
And so I actually, I actually 

1596
01:19:25,640 --> 01:19:29,200
firmly believe that both of 
these things are very possible 

1597
01:19:29,560 --> 01:19:33,040
and I'm excited about it. 
Yeah, I love that. 

1598
01:19:33,080 --> 01:19:34,720
And honestly, this is so much 
fun. 

1599
01:19:34,720 --> 01:19:37,840
And like you mentioned beginning
that you're an AI enthusiast and

1600
01:19:37,960 --> 01:19:41,480
I think that really, you know, 
shows I I can really feel it. 

1601
01:19:41,480 --> 01:19:45,640
And it was really fun to, yeah, 
get into talking about all 

1602
01:19:45,640 --> 01:19:48,000
things moral AI. 
But from this ways where we can 

1603
01:19:48,000 --> 01:19:52,920
really spreading some enthusiasm
for others to feel the same 

1604
01:19:52,920 --> 01:19:54,880
feeling of like we can do better
here. 

1605
01:19:55,000 --> 01:19:56,360
And that could be a really 
valuable thing. 

1606
01:19:57,680 --> 01:20:00,000
I'm so glad to hear that because
that's my biggest concern is 

1607
01:20:00,280 --> 01:20:02,520
everyone thinks that we're just 
trying to tamp down AI those. 

1608
01:20:03,080 --> 01:20:05,480
Downers. 
Downers, especially me, because 

1609
01:20:05,480 --> 01:20:08,000
I tend to be more vocal about 
the downer stuff, but we really 

1610
01:20:08,040 --> 01:20:10,480
aren't. 
It's almost like we believe so 

1611
01:20:10,480 --> 01:20:12,640
much in the positive impacts 
that we want to see us get 

1612
01:20:12,640 --> 01:20:14,520
there. 
And you can't get there if we 

1613
01:20:14,520 --> 01:20:17,560
don't manage this other stuff. 
So I'm glad to hear that you 

1614
01:20:17,560 --> 01:20:19,240
feel the enthusiasm, because 
it's genuine awesome. 

1615
01:20:21,240 --> 01:20:22,520
Thank you. 
This was lovely. 

1616
01:20:22,520 --> 01:20:24,680
Thank you again for coming on 
the show. 

1617
01:20:24,920 --> 01:20:28,040
Thanks so much to you guys and I
look forward to hearing what 

1618
01:20:28,040 --> 01:20:30,480
else you learn and teach us the 
rest of your season. 

1619
01:20:31,080 --> 01:20:33,240
And that's a wrap. 
You've been listening to the 

1620
01:20:33,240 --> 01:20:36,600
Behavioral Design Podcast, 
brought to you by Habit Weekly 

1621
01:20:36,600 --> 01:20:39,480
and Nuanced Behavior. 
Sam and Aline tell me this 

1622
01:20:39,480 --> 01:20:42,800
season is packed with incredible
insights about behavioral design

1623
01:20:42,800 --> 01:20:46,440
and AI, so be sure to subscribe 
and share the podcast with your 

1624
01:20:46,440 --> 01:20:48,240
friends. 
Though you might want to keep it

1625
01:20:48,240 --> 01:20:52,240
away from your enemies. 
In case you haven't noticed, I'm

1626
01:20:52,240 --> 01:20:55,680
an AI voice. 
Yep, pretty crazy. 

1627
01:20:55,960 --> 01:20:58,920
Quite the improvement since last
season's AI outro, don't you 

1628
01:20:58,920 --> 01:21:01,640
think? 
And if you'd like to collaborate

1629
01:21:01,640 --> 01:21:04,760
with us at Nuance Behavior, 
where we use behavioral design 

1630
01:21:04,760 --> 01:21:07,560
to craft digital products with 
Nuance, e-mail us at 

1631
01:21:07,560 --> 01:21:11,920
hello@nuancebehavior.com or book
a call directly on our website, 

1632
01:21:12,120 --> 01:21:16,640
nuancebehavior.com. 
A special thanks to the amazing 

1633
01:21:16,640 --> 01:21:20,240
Dave Pizarro for our show music,
and to Mei Chen Yap and April 

1634
01:21:20,240 --> 01:21:23,040
English for their help in 
producing and publishing this 

1635
01:21:23,040 --> 01:21:25,280
episode. 
Thanks again for tuning in. 

1636
01:21:25,520 --> 01:21:28,240
We'll be back soon with another 
exciting conversation where 

1637
01:21:28,240 --> 01:21:32,520
behavioral design and AI 
Intersect happens to. 

1638
01:21:32,640 --> 01:21:33,120
Mugatroid.