1
00:00:00,040 --> 00:00:03,640
We all know that icky feeling 
when your privacy is violated, 

2
00:00:04,000 --> 00:00:07,240
people feel like their phone is 
spying on them, and all these 

3
00:00:07,240 --> 00:00:09,600
things are happening, right? 
A lot of times when we talk 

4
00:00:09,600 --> 00:00:11,920
about privacy, there are a lot 
of technologies that could 

5
00:00:11,920 --> 00:00:15,880
expose us to this privacy race. 
We heard a lot in news data 

6
00:00:15,880 --> 00:00:18,440
breaches, data leaks, people's 
privacy. 

7
00:00:18,960 --> 00:00:21,120
Cambridge Analytica. 
Catherine Jarmel is the 

8
00:00:21,120 --> 00:00:24,480
Principal data Scientist at 
ThoughtWorks and the author of 

9
00:00:24,480 --> 00:00:27,320
Practical Data Privacy. 
If you don't know what data you 

10
00:00:27,320 --> 00:00:30,880
have, if you don't know where it
lives, then you kind of end up 

11
00:00:30,880 --> 00:00:33,960
in a place where privacy is not 
possible. 

12
00:00:34,280 --> 00:00:36,360
You don't know what you're going
to use the data for. 

13
00:00:36,560 --> 00:00:39,760
You're probably just amassing 
large cloud computing fees for 

14
00:00:39,760 --> 00:00:42,640
no reason. 
Why can't we hold a higher bar 

15
00:00:42,840 --> 00:00:47,040
for organizations to create 
secure and private by design 

16
00:00:47,040 --> 00:00:49,880
services? 
Maybe can help explain what is 

17
00:00:49,880 --> 00:00:52,600
privacy by design? 
Privacy by design is actually a 

18
00:00:52,600 --> 00:00:55,920
quite old concept. 
There's seven design principles.

19
00:00:56,360 --> 00:00:59,480
You see a lot of software 
thinking allowing user choice, 

20
00:00:59,560 --> 00:01:04,200
allowing transparency, but also 
by default building things that 

21
00:01:04,239 --> 00:01:07,320
respect the user. 
And a lot of people associate 

22
00:01:07,440 --> 00:01:10,920
privacy with PII, personally 
identifiable information. 

23
00:01:11,240 --> 00:01:14,640
Do you think that privacy 
relates just to PII or is it 

24
00:01:14,640 --> 00:01:19,200
more than that? 
Privacy is a lot about having 

25
00:01:19,200 --> 00:01:37,000
the. 
Hey guys, welcome back to 

26
00:01:37,000 --> 00:01:39,320
another new episode of the 
Technical General Podcast. 

27
00:01:39,320 --> 00:01:42,720
Today we are going to cover a 
topic that is becoming trendy in

28
00:01:42,720 --> 00:01:44,680
most parts of the world, data 
privacy. 

29
00:01:44,960 --> 00:01:48,360
I have Catherine German here. 
She is the author of a book 

30
00:01:48,360 --> 00:01:52,520
titled Practical Data Privacy, 
Enhancing Privacy and Security 

31
00:01:52,520 --> 00:01:55,200
in Data. 
So Catherine has a lot of 

32
00:01:55,200 --> 00:01:57,960
background expertise in machine 
learning and data scientists. 

33
00:01:57,960 --> 00:02:00,840
But today we'll try to cover the
topics in more like journalist 

34
00:02:00,840 --> 00:02:04,880
term and maybe we can discuss 
the importance of data privacy 

35
00:02:04,880 --> 00:02:07,200
and maybe some potential risks 
that you need to be aware of 

36
00:02:07,200 --> 00:02:09,160
whenever you build your projects
or products. 

37
00:02:09,280 --> 00:02:10,759
So, Catherine, welcome to the 
show. 

38
00:02:11,680 --> 00:02:13,880
Thanks so much, Henry. 
I'm excited to be here. 

39
00:02:14,680 --> 00:02:17,640
Right, Catherine, I'd love to 
ask my guests to maybe share 

40
00:02:17,640 --> 00:02:20,320
some turning points in your 
career that you think we all can

41
00:02:20,320 --> 00:02:23,120
learn from that? 
I think one of the turning 

42
00:02:23,120 --> 00:02:27,240
points in my career, the early 
ones, is I started my job in 

43
00:02:27,240 --> 00:02:31,200
technology as a data journalist,
or what we would today called 

44
00:02:31,200 --> 00:02:34,520
data journalists. 
Kind of like how do we use data 

45
00:02:34,520 --> 00:02:38,360
and interactives to tell stories
and to support investigative 

46
00:02:38,360 --> 00:02:41,240
journalism. 
I was at the Washington Post 

47
00:02:41,240 --> 00:02:43,720
doing that. 
There was certainly a large 

48
00:02:43,720 --> 00:02:47,720
turning point where I left kind 
of the field of media and 

49
00:02:47,720 --> 00:02:50,040
journalism and I went into 
startups. 

50
00:02:50,320 --> 00:02:53,520
And at the time I went into 
startups that were focused on 

51
00:02:53,720 --> 00:02:57,040
media companies as a customer 
base. 

52
00:02:57,440 --> 00:03:00,600
And that led me into what I 
would today call large scale 

53
00:03:00,800 --> 00:03:04,840
natural language processing or 
very early models compared to 

54
00:03:04,840 --> 00:03:08,640
what we now call AI models. 
But still a lot of exposure to 

55
00:03:08,640 --> 00:03:12,560
how do we do language processing
as scale and a lot of exposure 

56
00:03:12,560 --> 00:03:15,480
to thinking through things like 
parallel computation. 

57
00:03:15,840 --> 00:03:18,760
Those were the days of Hadoop 
and all these other types of 

58
00:03:18,760 --> 00:03:22,160
problems that came with how do 
we process large scale 

59
00:03:22,160 --> 00:03:26,360
documents, stores and use them 
for some sort of other service. 

60
00:03:27,040 --> 00:03:32,760
And then around 2014 is when I 
moved from Los Angeles, CA, 

61
00:03:32,760 --> 00:03:36,680
where I was based to Berlin, 
Germany, where I now live since 

62
00:03:36,680 --> 00:03:40,600
then. 
And this was also the time when 

63
00:03:40,640 --> 00:03:44,880
NLP, So the space of language 
processing was moving from more 

64
00:03:44,880 --> 00:03:47,960
simple model designs or what we 
would today maybe call more 

65
00:03:47,960 --> 00:03:52,680
simple model designs into deep 
learning, which basically fuels 

66
00:03:52,840 --> 00:03:55,160
a lot of the architectures that 
we talked about today when we 

67
00:03:55,160 --> 00:03:59,440
talk about AI. 
And I basically took a year off 

68
00:03:59,440 --> 00:04:04,840
of working and just studied deep
learning at that time to kind of

69
00:04:04,840 --> 00:04:07,240
update. 
I already knew a lot of the math

70
00:04:07,240 --> 00:04:09,960
behind it because thankfully I 
always loved math. 

71
00:04:10,520 --> 00:04:14,040
But to really upskill myself 
from how do we think of the 

72
00:04:14,040 --> 00:04:16,160
problem from a statistical point
of view? 

73
00:04:16,160 --> 00:04:18,880
Statistical learning at the end 
of the day, deep learning is 

74
00:04:18,880 --> 00:04:21,760
also statistical learning. 
But how do we move from that 

75
00:04:21,760 --> 00:04:28,120
space to maybe a more kind of 
notion of applying concepts of 

76
00:04:28,120 --> 00:04:31,320
calculus and linear algebra to a
learning model? 

77
00:04:31,800 --> 00:04:34,640
And that was a good idea 
because, yeah, kind of deep 

78
00:04:34,640 --> 00:04:39,160
learning then ate most of the 
world of machine learning. 

79
00:04:39,720 --> 00:04:43,920
Then the final career change was
maybe about 3 or 4 years after 

80
00:04:43,920 --> 00:04:48,080
that of doing deep learning. 
I started thinking or became 

81
00:04:48,080 --> 00:04:51,440
enmeshed in the problem of kind 
of how do we think about what we

82
00:04:51,440 --> 00:04:55,240
would today called trustworthy 
and responsible AI development? 

83
00:04:55,760 --> 00:04:59,560
Then most of the time we use the
concept of ethical machine 

84
00:04:59,560 --> 00:05:03,680
learning, and in looking into 
that problem, I became very 

85
00:05:03,680 --> 00:05:08,040
interested in data privacy and 
data security of machine 

86
00:05:08,040 --> 00:05:11,520
learning systems and models 
themselves. 

87
00:05:11,960 --> 00:05:15,800
And that kind of fueled what 
eventually led to writing the 

88
00:05:15,800 --> 00:05:19,920
book and obviously my current 
work, which is as a specialist 

89
00:05:19,920 --> 00:05:22,200
in privacy and security machine 
learning models. 

90
00:05:23,040 --> 00:05:24,600
Well, thank you for sharing your
story. 

91
00:05:24,600 --> 00:05:27,640
I find it quite interesting the 
journey that you had, right? 

92
00:05:27,920 --> 00:05:31,200
And I think it's very unique 
that you took a career break 

93
00:05:31,200 --> 00:05:34,000
simply to study deep learning 
and all that, right? 

94
00:05:34,280 --> 00:05:37,320
So most people maybe took a 
break to do something else, you 

95
00:05:37,320 --> 00:05:39,600
know, beyond work or some did go
study. 

96
00:05:39,600 --> 00:05:42,800
But I think it's still pretty 
rare to go deep into a certain 

97
00:05:42,800 --> 00:05:46,040
area and just to study and do 
maybe more things to get expert 

98
00:05:46,040 --> 00:05:48,520
on that. 
This episode is brought to you 

99
00:05:48,520 --> 00:05:52,720
by Swim dot IO and I'm excited 
to have its CTO and Co founder 

100
00:05:52,920 --> 00:05:56,360
Omar Rosenbaum with me today to 
tell you more about SWIM. 

101
00:05:56,440 --> 00:05:59,040
Hi Henry, very nice to meet you 
and thank you for having me. 

102
00:05:59,360 --> 00:06:02,160
So tell us a little bit more, 
what is swim dot IO? 

103
00:06:02,320 --> 00:06:06,040
At Swim, we want to help 
companies understand their code 

104
00:06:06,040 --> 00:06:09,000
bases. 
We combine static code analysis 

105
00:06:09,040 --> 00:06:12,960
with generative AI to create 
comprehensive documents that 

106
00:06:12,960 --> 00:06:16,120
help you navigate the code base.
As an engineer myself, I 

107
00:06:16,120 --> 00:06:19,960
wouldn't want them 10 years to 
spend so much time understanding

108
00:06:19,960 --> 00:06:22,400
existing code. 
I would want them to spend time 

109
00:06:22,600 --> 00:06:26,240
creating and building new stuff.
When you have code that has 

110
00:06:26,240 --> 00:06:30,840
accumulated over decades, and 
especially in legacy languages 

111
00:06:31,160 --> 00:06:36,640
that not many people are adapted
nowadays, then the problem is 

112
00:06:36,640 --> 00:06:40,080
even bigger. 
Swim dot IO is specializing into

113
00:06:40,080 --> 00:06:42,600
helping mainframe developers to 
understand their code base. 

114
00:06:42,680 --> 00:06:45,640
Why mainframes? 
We actually didn't start there. 

115
00:06:45,720 --> 00:06:51,320
COBOL had been by some people 
obsolete for a few years, and I 

116
00:06:51,320 --> 00:06:54,720
discovered that it's not really 
obsolete, not at all. 

117
00:06:54,720 --> 00:06:58,840
There are more than 800 billion 
lines of COBOL code that are in 

118
00:06:58,840 --> 00:07:01,680
production and they drive lots 
of the business in the world. 

119
00:07:02,360 --> 00:07:07,800
And we got more and more 
requests from customers to help 

120
00:07:07,800 --> 00:07:10,880
them understand the legacy code 
visas that was written decades 

121
00:07:10,880 --> 00:07:14,360
ago and got accumulated over a 
very long period of time. 

122
00:07:14,560 --> 00:07:17,920
So from your customers so far, 
what are the some of the success

123
00:07:17,920 --> 00:07:21,320
stories that you can share? 
So we worked with an analyst who

124
00:07:21,560 --> 00:07:25,960
shared with us that it took them
a year to document a single 

125
00:07:25,960 --> 00:07:29,240
mainframe application, and using
SWIM they were able to document 

126
00:07:29,240 --> 00:07:31,800
a similar application in a 
matter of hours. 

127
00:07:32,280 --> 00:07:36,160
So saving that amount of time 
enables them to focus on other 

128
00:07:36,400 --> 00:07:39,280
tasks. 
Thanks Amir for sharing with us 

129
00:07:39,280 --> 00:07:42,040
about SWIM today. 
To learn more about SWIM, check 

130
00:07:42,040 --> 00:07:44,120
out their website at swim dot 
IO. 

131
00:07:45,440 --> 00:07:48,520
So today's topic we are going to
cover about data privacy. 

132
00:07:48,520 --> 00:07:52,920
So I think in a lots of parts of
the world, this topic or this 

133
00:07:52,920 --> 00:07:56,520
kind of thing has become like 
the mainstream topics, right? 

134
00:07:56,840 --> 00:08:01,200
We heard a lot in news, data 
breaches, you know, data leaks, 

135
00:08:01,600 --> 00:08:05,080
people's privacy, you know, the 
Cambridge Analytica kind of 

136
00:08:05,080 --> 00:08:07,360
thing, also as part of the 
privacy thing. 

137
00:08:07,720 --> 00:08:10,880
So maybe if you can give us an 
overview first, what is the 

138
00:08:10,880 --> 00:08:15,200
current landscape of the data 
privacy thing and what should 

139
00:08:15,200 --> 00:08:17,360
people know about it? 
Yeah. 

140
00:08:17,360 --> 00:08:22,080
I mean, a lot of times I try to 
Orient people first with the 

141
00:08:22,080 --> 00:08:27,600
idea of kind of personal or we 
can think of like sociological 

142
00:08:27,600 --> 00:08:31,840
privacy because sometimes people
hear data privacy and they just 

143
00:08:31,840 --> 00:08:35,440
think, oh, that's for lawyers. 
Like that's those people are 

144
00:08:35,440 --> 00:08:37,360
responsible. 
They stuttered privacy law. 

145
00:08:37,679 --> 00:08:39,640
Those people are really, really 
important in the field of 

146
00:08:39,640 --> 00:08:41,320
privacy. 
I'm not trying to diminish the 

147
00:08:41,320 --> 00:08:45,760
importance of law and regulation
in the field of privacy, but I 

148
00:08:45,760 --> 00:08:49,640
think we can all relate to 
privacy because each one of us 

149
00:08:50,080 --> 00:08:53,960
probably has some sort of 
personal or what we might call 

150
00:08:53,960 --> 00:08:57,000
individual understanding of our 
own privacy. 

151
00:08:57,520 --> 00:09:02,040
And often that's informed by 
whatever cultural influences we 

152
00:09:02,040 --> 00:09:06,360
grew up around whatever society.
Like where did privacy play or 

153
00:09:06,360 --> 00:09:11,160
where did trust play in whatever
societal bonds we kind of grew 

154
00:09:11,160 --> 00:09:13,920
up with and we're socialized 
with or maybe even where we live

155
00:09:13,920 --> 00:09:16,200
now. 
So like, for example, now I live

156
00:09:16,200 --> 00:09:20,040
in Germany has a very different 
relationship with privacy than 

157
00:09:20,280 --> 00:09:24,560
where I come from, California. 
And so we can also be influenced

158
00:09:24,560 --> 00:09:29,600
by kind of shifts or changes 
along our life that allow us to 

159
00:09:29,600 --> 00:09:34,280
see the problem differently. 
But essentially privacy is a lot

160
00:09:34,280 --> 00:09:40,280
about having the autonomy and 
the control to decide who I want

161
00:09:40,280 --> 00:09:44,840
to show up as and what I want to
share with whom under what 

162
00:09:44,840 --> 00:09:48,680
circumstances or contexts. 
And I think we all know that 

163
00:09:48,680 --> 00:09:52,800
there's been times probably in 
our lives when information that 

164
00:09:52,800 --> 00:09:56,160
we didn't want to share with a 
particular person or with groups

165
00:09:56,160 --> 00:10:00,280
of persons got out some which 
way, sometimes through 

166
00:10:00,280 --> 00:10:02,920
technology, sometimes through a 
person, maybe both. 

167
00:10:03,560 --> 00:10:07,520
And we all know that icky 
feeling when your privacy is 

168
00:10:07,520 --> 00:10:11,440
violated. 
And that's how it can also teach

169
00:10:11,440 --> 00:10:15,440
us how much it is about trust, 
how much it is about context, 

170
00:10:15,720 --> 00:10:18,960
how much it is about us having a
choice. 

171
00:10:18,960 --> 00:10:23,760
And I think that you can see 
kind of the social constructs in

172
00:10:23,760 --> 00:10:26,040
a lot of the regulation that you
then read. 

173
00:10:26,040 --> 00:10:29,840
Because then of course, we have 
regulatory or what I would call 

174
00:10:29,840 --> 00:10:32,640
like judicial legal 
understanding of privacy. 

175
00:10:32,960 --> 00:10:36,320
And then we have a technical 
understanding of privacy. 

176
00:10:36,320 --> 00:10:40,280
And kind of the best part is 
when all of those work together,

177
00:10:40,280 --> 00:10:43,400
when the technology is 
reflecting not only what we 

178
00:10:43,400 --> 00:10:47,400
legally need to do, but also 
what we socially and culturally 

179
00:10:47,400 --> 00:10:51,640
understand, and knowing that 
that can shift depending on all 

180
00:10:51,640 --> 00:10:55,200
sorts of different things down 
to the individual level. 

181
00:10:56,080 --> 00:10:57,960
I like the three things that you
mentioned, right? 

182
00:10:57,960 --> 00:11:00,960
First, it may be relates to the 
context and society, right? 

183
00:11:00,960 --> 00:11:04,360
So privacy in one particular 
area might be different and 

184
00:11:04,440 --> 00:11:07,240
maybe the norm of the culture in
some other parts of the world. 

185
00:11:07,560 --> 00:11:10,760
I also like the trusting because
like most of the time when we 

186
00:11:10,760 --> 00:11:13,320
use software, right, the concept
of trust is a little bit 

187
00:11:13,320 --> 00:11:14,520
abstract. 
It's vague, right? 

188
00:11:14,600 --> 00:11:18,200
I know that I submit my details,
but that's not necessary a trust

189
00:11:18,200 --> 00:11:20,720
being created or established 
when I use the software. 

190
00:11:21,120 --> 00:11:22,880
And the last one is about 
choice, right? 

191
00:11:22,880 --> 00:11:26,880
I do have a choice to actually 
maybe take out my data, delete 

192
00:11:26,880 --> 00:11:28,360
my data or whatever that is, 
right? 

193
00:11:28,360 --> 00:11:31,400
So I think those are the key 
terms that I just picked from 

194
00:11:31,400 --> 00:11:34,120
you what you explained just now.
And a lot of people actually 

195
00:11:34,120 --> 00:11:36,880
associate privacy with PII, 
right? 

196
00:11:36,960 --> 00:11:39,000
Personally identifiable 
information. 

197
00:11:39,280 --> 00:11:43,040
Do you think that privacy 
relates just to PII or is it 

198
00:11:43,040 --> 00:11:46,360
more than that? 
Yeah, I mean, PII is a great 

199
00:11:46,360 --> 00:11:50,680
place to start. 
I don't wanna PII bash here, but

200
00:11:51,000 --> 00:11:57,040
I think PII is a very small view
of data that we might think of 

201
00:11:57,040 --> 00:11:59,840
as what I like to call sensitive
data. 

202
00:12:00,400 --> 00:12:03,480
Within the field of sensitive 
data, of course we have PII. 

203
00:12:03,480 --> 00:12:05,880
That's things just for people 
who don't know that term. 

204
00:12:05,880 --> 00:12:09,520
That's things like e-mail 
addresses, birth dates, names, 

205
00:12:09,920 --> 00:12:13,040
things that we would say might 
be unique to an individual and 

206
00:12:13,040 --> 00:12:16,440
certainly in combination are 
unique to an individual. 

207
00:12:17,000 --> 00:12:20,680
And then we might have what I 
tend to call person related 

208
00:12:20,680 --> 00:12:23,040
data. 
Not everybody calls it that. 

209
00:12:23,040 --> 00:12:24,840
Some people want to call 
personal data. 

210
00:12:24,840 --> 00:12:28,160
Some people call it, yeah, data 
related to persons. 

211
00:12:28,160 --> 00:12:30,600
Anyways, there's all sorts of 
terms for it, especially when 

212
00:12:30,600 --> 00:12:33,560
you get into the legal realm. 
But I think of person related 

213
00:12:33,560 --> 00:12:38,080
data as like things like what I 
click on, what I buy can be 

214
00:12:38,080 --> 00:12:42,200
person related, What I enter 
into search terms can be person 

215
00:12:42,200 --> 00:12:44,520
related. 
Because when we think of all 

216
00:12:44,520 --> 00:12:48,200
these things in combination with
each other, probably when we 

217
00:12:48,200 --> 00:12:52,680
combine them, those can also 
point to uniquely identifiable 

218
00:12:52,680 --> 00:12:56,400
characteristics of a person. 
And you already mentioned 

219
00:12:56,400 --> 00:12:59,200
Cambridge Analytica. 
Some of the original research 

220
00:12:59,200 --> 00:13:03,120
was focused on can we use things
like Facebook likes or very 

221
00:13:03,120 --> 00:13:07,760
short Facebook surveys or those 
combinations to decide how this 

222
00:13:07,760 --> 00:13:11,520
person might vote, whether 
they're easily influenced in 

223
00:13:11,520 --> 00:13:14,280
voting, and these types of 
characteristics which we 

224
00:13:14,280 --> 00:13:16,400
probably would think of as 
sensitive. 

225
00:13:17,120 --> 00:13:20,120
And then we also have other 
sensitive data that isn't really

226
00:13:20,120 --> 00:13:24,840
person related that we might 
call proprietary or confidential

227
00:13:24,840 --> 00:13:28,680
data. 
And a lot of times we don't 

228
00:13:28,680 --> 00:13:30,720
think of that when I think we 
think of privacy, because 

229
00:13:30,720 --> 00:13:33,240
obviously it doesn't really have
to do with privacy. 

230
00:13:33,520 --> 00:13:38,560
And yet the types of protections
that we use for privacy can also

231
00:13:38,560 --> 00:13:42,720
be applied to things like 
corporate secrets, other 

232
00:13:42,720 --> 00:13:46,000
proprietary confidential 
information that we also don't 

233
00:13:46,000 --> 00:13:48,760
want to share outside of a 
particular context. 

234
00:13:49,040 --> 00:13:52,680
And so these all kind of can 
fall under some idea of data 

235
00:13:52,680 --> 00:13:56,760
that we might need to protect in
a different way than other data.

236
00:13:57,880 --> 00:13:58,840
Nice. 
Thanks for all the 

237
00:13:58,840 --> 00:14:01,600
classifications of different 
person related, right? 

238
00:14:01,600 --> 00:14:04,200
It could be the sensitive, 
highly sensitive ones, it could 

239
00:14:04,200 --> 00:14:07,080
be just the person related, it 
could be like confidential data.

240
00:14:07,120 --> 00:14:09,160
So I think that totally makes 
sense, right? 

241
00:14:09,480 --> 00:14:12,120
So not necessarily just like 
person related, it could be 

242
00:14:12,120 --> 00:14:15,360
entity related. 
And I think a lot of times when 

243
00:14:15,360 --> 00:14:17,240
we talk about privacy these 
days, there are a lot of 

244
00:14:17,240 --> 00:14:20,960
technologies that could expose 
us to this privacy risk, right? 

245
00:14:21,280 --> 00:14:23,760
I mean, just to mention a few 
things like social media, 

246
00:14:23,760 --> 00:14:25,760
definitely. 
And then you have the web 

247
00:14:25,760 --> 00:14:28,680
technologies like cookies, you 
know, when you browse a certain 

248
00:14:28,680 --> 00:14:31,160
things, right? 
It tends to follow you from one 

249
00:14:31,160 --> 00:14:34,360
page, one website to the other. 
You have the connected device. 

250
00:14:34,440 --> 00:14:37,360
You know, we talk about, you 
know, this Alexa or, you know, 

251
00:14:37,440 --> 00:14:41,520
the Google Home thing, right? 
And lately also AI, right? 

252
00:14:41,520 --> 00:14:44,440
So people just, you know, ask AI
about certain stuff. 

253
00:14:44,440 --> 00:14:47,480
Sometimes it could take your 
data away, maybe looking at 

254
00:14:47,560 --> 00:14:50,360
those technologies. 
What will be your advice for 

255
00:14:50,360 --> 00:14:52,520
people when they use those 
technologies, right? 

256
00:14:52,520 --> 00:14:54,400
Because it's such a ubiquitous 
thing now. 

257
00:14:54,760 --> 00:14:58,360
What should people care about in
terms of their privacy? 

258
00:14:59,320 --> 00:15:02,920
Yeah, that's, that's a hard 
question because obviously 

259
00:15:02,920 --> 00:15:05,960
everybody might be different. 
And again, like we said, like 

260
00:15:05,960 --> 00:15:10,080
there's also different levels of
trust that people might have 

261
00:15:10,080 --> 00:15:14,040
with different services, right. 
So you might really, really like

262
00:15:14,040 --> 00:15:17,520
Copilot or whatever it is that 
you use everyday and you might 

263
00:15:17,520 --> 00:15:21,280
decide, you know what the 
usefulness of this is enough for

264
00:15:21,280 --> 00:15:25,480
whatever trade off that I have. 
At the end of the day, a lot of 

265
00:15:25,480 --> 00:15:29,840
my work is focused around 
informing organizations what 

266
00:15:29,840 --> 00:15:34,360
they should be doing to better 
protect the people's privacy 

267
00:15:34,360 --> 00:15:36,680
that use their services and 
products. 

268
00:15:36,680 --> 00:15:41,000
And I think we run into this 
problem and I think this happens

269
00:15:41,000 --> 00:15:45,240
a lot also in information 
security or cybersecurity where 

270
00:15:45,240 --> 00:15:50,120
we kind of decide that is a user
problem and that like it's your 

271
00:15:50,120 --> 00:15:54,360
job to figure out like what you 
think is the most secure e-mail 

272
00:15:54,360 --> 00:15:57,400
service and what is like the 
best thing. 

273
00:15:57,400 --> 00:16:01,120
And like, oh, you're bad if you 
use WhatsApp because Whatsapp's 

274
00:16:01,120 --> 00:16:03,680
not secure, which is not 
necessarily true, right? 

275
00:16:04,000 --> 00:16:07,920
But the this kind of culture, I 
call it kind of like blaming the

276
00:16:07,920 --> 00:16:13,400
victim type of culture, where 
why is it that I, if I want to 

277
00:16:13,400 --> 00:16:18,360
talk to a person and they only 
use one particular messaging 

278
00:16:18,360 --> 00:16:23,320
application, why is it my job to
make sure to try to inform 

279
00:16:23,320 --> 00:16:26,560
people what is the most secure 
or not most secure? 

280
00:16:26,560 --> 00:16:32,160
And this why can't we hold a 
higher bar for organizations to 

281
00:16:32,160 --> 00:16:37,840
create secure and private by 
design services so that people 

282
00:16:37,840 --> 00:16:39,560
can choose whatever one they 
like. 

283
00:16:39,560 --> 00:16:41,120
If they like the colors, I don't
care. 

284
00:16:41,120 --> 00:16:44,280
They like the interface, they 
like whatever product they like.

285
00:16:44,280 --> 00:16:46,840
If they want to buy a package 
deal from a certain cloud 

286
00:16:46,840 --> 00:16:50,800
provider, If we're looking at 
privacy more holistically, it's 

287
00:16:50,800 --> 00:16:54,040
less about like you must use 
this software. 

288
00:16:54,360 --> 00:16:59,080
Then we need to hold software 
companies and product companies 

289
00:16:59,080 --> 00:17:03,560
to a higher standard so that 
people can choose whatever it is

290
00:17:03,560 --> 00:17:07,560
they like, which I'm like very 
pro-choice on that. 

291
00:17:07,560 --> 00:17:12,800
One of obviously there's lots of
good advice out there how to 

292
00:17:12,800 --> 00:17:16,480
inform yourself, doing things 
like reading through the boring 

293
00:17:16,480 --> 00:17:20,280
privacy policies and figuring 
out how to make it a fun game 

294
00:17:20,280 --> 00:17:22,839
for yourself. 
There's plenty of good advice to

295
00:17:22,839 --> 00:17:25,800
inform yourself for how do you 
make those choices? 

296
00:17:26,200 --> 00:17:32,040
But I think it almost focuses 
like the blame on the individual

297
00:17:32,040 --> 00:17:34,800
who has to use things. 
And at the end of the day, I 

298
00:17:34,800 --> 00:17:38,640
want people to, yeah, be able to
connect and use technology in 

299
00:17:38,640 --> 00:17:44,480
cool ways and not have this very
privileged burden of reading 

300
00:17:44,480 --> 00:17:47,440
through every single privacy 
notice to figure out which best 

301
00:17:47,440 --> 00:17:49,280
aligns. 
And, oh, by the way, then it 

302
00:17:49,280 --> 00:17:51,960
gets updated six months later. 
They got to read through and 

303
00:17:51,960 --> 00:17:56,720
compare all of them over again. 
Mozilla has a great resource, by

304
00:17:56,720 --> 00:18:00,400
the way, for people who might be
looking for what criteria might 

305
00:18:00,400 --> 00:18:03,640
I use called Privacy Not 
Included. 

306
00:18:04,160 --> 00:18:07,920
And that reviews things like I 
think they did one on autos, 

307
00:18:07,920 --> 00:18:11,600
like automobiles. 
I think they did one on like 

308
00:18:11,600 --> 00:18:14,360
connect to devices, home 
assistance like you mentioned, 

309
00:18:14,760 --> 00:18:16,960
and a few other things. 
And then at least you can look 

310
00:18:16,960 --> 00:18:20,320
like what criteria did they use 
and maybe start to build your 

311
00:18:20,320 --> 00:18:24,720
own criteria to inform your 
choices should you have the time

312
00:18:24,720 --> 00:18:28,160
and energy to do so. 
And there's no shame if you 

313
00:18:28,160 --> 00:18:31,160
don't because at the end of the 
day, we got to hold the orgs 

314
00:18:31,160 --> 00:18:33,200
accountable. 
Right. 

315
00:18:33,200 --> 00:18:36,040
So very interesting the way you 
explain that, right. 

316
00:18:36,040 --> 00:18:39,280
So it used to be, you know, like
it's a user's problem, right? 

317
00:18:39,280 --> 00:18:42,920
So the one who is responsible to
take care about the data that 

318
00:18:42,920 --> 00:18:45,400
you share. 
And I think we can read all this

319
00:18:45,400 --> 00:18:48,800
privacy policy and all that 
sometimes in most of in, I don't

320
00:18:48,800 --> 00:18:51,600
know most of it, but many 
software, but we don't have a 

321
00:18:51,600 --> 00:18:54,120
choice. 
We simply just to have to Czech 

322
00:18:54,400 --> 00:18:57,840
agree and you know, like, you 
know, whatever language that 

323
00:18:57,840 --> 00:18:59,480
they provide, we just agree, 
right. 

324
00:18:59,480 --> 00:19:01,720
Otherwise we can't use anything 
from the software at all. 

325
00:19:02,040 --> 00:19:05,480
So I think that's a very yeah, 
that's a very good segue now to 

326
00:19:05,480 --> 00:19:09,320
actually move into towards the 
organizations or the teams who 

327
00:19:09,320 --> 00:19:10,920
build the software and the 
products, right. 

328
00:19:11,240 --> 00:19:15,280
So the first thing is I came 
from some, you know, belief, 

329
00:19:15,280 --> 00:19:18,000
probably right last time, 
especially in the era of big 

330
00:19:18,000 --> 00:19:21,800
data, we have to collect as many
data as possible, you know, from

331
00:19:21,800 --> 00:19:25,880
the users so that we can derive 
insights, we can analyse things.

332
00:19:26,280 --> 00:19:29,160
And now this seems to be 
clashing, you know, with data 

333
00:19:29,160 --> 00:19:32,240
privacy. 
So in this era, these days, what

334
00:19:32,240 --> 00:19:35,320
would you advise people in terms
of balance, right, about 

335
00:19:35,320 --> 00:19:38,640
collecting as many data as 
possible versus, you know, 

336
00:19:39,000 --> 00:19:43,720
treating privacy more seriously?
I think First off, I want to say

337
00:19:43,720 --> 00:19:46,440
like, I'm not trying to get 
anybody to lose their job. 

338
00:19:46,560 --> 00:19:50,680
So if any of the advice I gave 
you think would make you lose 

339
00:19:50,680 --> 00:19:54,720
your job, don't do it right. 
So there's some mini disclaimer.

340
00:19:56,240 --> 00:19:59,480
But I think I think as you 
mentioned, I think there's, 

341
00:19:59,680 --> 00:20:05,200
there's starting to be I think a
more popular shift into even 

342
00:20:05,200 --> 00:20:09,800
just consumers, so to speak, the
average person thinking more 

343
00:20:09,800 --> 00:20:13,360
about privacy. 
Namely because I think this 

344
00:20:13,360 --> 00:20:17,960
trend, as you mentioned, of the 
big data era of just collecting,

345
00:20:17,960 --> 00:20:22,000
you know, at will and doing all 
sorts of things that at the end 

346
00:20:22,000 --> 00:20:24,960
of the day make people feel 
creeped out, you know, make 

347
00:20:24,960 --> 00:20:28,800
people feel like their phone is 
spying on them and all these 

348
00:20:28,800 --> 00:20:31,360
things are happening, right? 
And in some cases that might be 

349
00:20:31,360 --> 00:20:34,760
true, and in other cases it 
might be a fact that we have so 

350
00:20:34,760 --> 00:20:39,200
much data that we can actually 
infer sensitive things about 

351
00:20:39,200 --> 00:20:43,080
people without their knowledge 
and also without our intention, 

352
00:20:43,200 --> 00:20:44,480
right? 
When we think about 

353
00:20:44,920 --> 00:20:48,560
recommendation algorithm 
development, search response 

354
00:20:48,560 --> 00:20:52,680
algorithm development, so search
algorithms and content 

355
00:20:52,680 --> 00:20:56,360
algorithms, when we think about 
a lot of this, there can be 

356
00:20:56,360 --> 00:20:58,840
latent, what we call latent 
variables. 

357
00:20:58,840 --> 00:21:01,920
And what I mean by that is, you 
know, because you follow and you

358
00:21:01,920 --> 00:21:06,360
like these five things, we kind 
of have learned your gender or 

359
00:21:06,360 --> 00:21:09,800
your age or your family status 
or these other things. 

360
00:21:10,160 --> 00:21:12,520
We didn't explicitly try to 
learn them, but we kind of 

361
00:21:12,520 --> 00:21:16,120
learned them by amassing, as you
say, so much data and then 

362
00:21:16,120 --> 00:21:19,840
putting an algorithm on top and 
not controlling for anything 

363
00:21:19,840 --> 00:21:22,480
like privacy. 
And then you get super creepy 

364
00:21:22,480 --> 00:21:26,720
ads and you're like convinced 
your phone is spying on you and 

365
00:21:26,800 --> 00:21:29,280
then you must hide it or you 
know what I mean? 

366
00:21:29,280 --> 00:21:32,000
Like, I don't want people to be 
afraid of, I don't think any of 

367
00:21:32,000 --> 00:21:33,920
us want people to be afraid of 
technology. 

368
00:21:33,920 --> 00:21:37,320
I also don't personally want to 
be afraid to have a mobile 

369
00:21:37,720 --> 00:21:39,920
phone. 
It's a nice thing to have. 

370
00:21:40,400 --> 00:21:45,320
And so when I think about like 
this shift that's happening, 

371
00:21:45,600 --> 00:21:48,480
maybe one of the things that you
can start talking about in an 

372
00:21:48,480 --> 00:21:52,080
organization that may or may not
already have a culture around 

373
00:21:52,080 --> 00:21:56,760
thinking about privacy is how do
we reason about the trust that 

374
00:21:56,760 --> 00:21:59,800
we're creating? 
And I think this is great for 

375
00:21:59,800 --> 00:22:04,000
tech leads, as you aptly point 
out, and product people, because

376
00:22:04,000 --> 00:22:06,720
at the end of the day, when 
you're like a technical lead or 

377
00:22:06,720 --> 00:22:10,920
a principal or a senior person 
in the type of architecture 

378
00:22:10,920 --> 00:22:15,360
decisions, product decisions, 
data decisions that get made and

379
00:22:15,360 --> 00:22:19,360
you're a top product person, 
those people can have a real 

380
00:22:19,360 --> 00:22:23,120
conversation about, do we want 
to talk with our customers about

381
00:22:23,120 --> 00:22:25,560
this problem of privacy? 
Do we want to talk with our 

382
00:22:25,560 --> 00:22:28,600
customers about what's the 
mental model that they think 

383
00:22:28,840 --> 00:22:32,480
about how our service works? 
What parts of this is up for 

384
00:22:32,480 --> 00:22:34,360
debate? 
What parts of this is not? 

385
00:22:34,360 --> 00:22:37,880
Because there are like indeed 
businesses that by default, 

386
00:22:38,720 --> 00:22:42,520
probably privacy is not going to
play a huge role, right? 

387
00:22:43,080 --> 00:22:45,840
But you can at least start that 
conversation. 

388
00:22:46,320 --> 00:22:49,320
And what we often talk about in 
the field of technical privacy 

389
00:22:49,320 --> 00:22:53,680
is balancing between the most 
privacy we can offer, which 

390
00:22:53,680 --> 00:22:58,000
would be collecting no data and 
utility, right? 

391
00:22:58,000 --> 00:23:01,800
Is what we think about privacy, 
utility balance or trade-offs 

392
00:23:02,240 --> 00:23:04,880
and utility. 
There we might think of as this 

393
00:23:04,880 --> 00:23:08,880
old idea of we're going to 
collect every single mouse 

394
00:23:08,880 --> 00:23:14,600
movement or single character you
type in in the hopes of getting 

395
00:23:14,600 --> 00:23:18,560
some sort of insight. 
Which I would also argue as a 

396
00:23:18,560 --> 00:23:21,280
data person, if you don't know 
what you're going to use the 

397
00:23:21,280 --> 00:23:25,440
data for, you're probably just 
amassing large cloud computing 

398
00:23:25,440 --> 00:23:29,200
fees for no reason. 
And so it's better to have 

399
00:23:29,200 --> 00:23:33,640
smaller experiments with less 
data and then to grow it over 

400
00:23:33,640 --> 00:23:35,760
time. 
Again, if you will lose your job

401
00:23:35,760 --> 00:23:39,760
for doing that, I'm sorry. 
And maybe you can find a new job

402
00:23:39,760 --> 00:23:43,520
one day. 
Right. 

403
00:23:43,840 --> 00:23:46,320
So I think you mentioned 
something very important, right?

404
00:23:46,440 --> 00:23:49,880
And many, many privacy experts 
also say the same thing, right? 

405
00:23:49,920 --> 00:23:53,320
If you don't know what you 
collect that data for, then 

406
00:23:53,320 --> 00:23:55,440
don't do it, right? 
Don't store it, don't do it, 

407
00:23:55,440 --> 00:23:58,240
don't ask for it, right? 
So I think that's also very, 

408
00:23:58,240 --> 00:24:01,440
very important, right, for 
everyone who builds software 

409
00:24:01,440 --> 00:24:03,480
these days, right? 
Especially the technologies, 

410
00:24:03,480 --> 00:24:05,200
right? 
And also when you build AI 

411
00:24:05,200 --> 00:24:08,040
model, right? 
I mean, last time we also simply

412
00:24:08,040 --> 00:24:11,720
say for machine learning, it 
needs as many variables as much 

413
00:24:11,720 --> 00:24:14,000
as possible, right? 
So that it can train the model 

414
00:24:14,000 --> 00:24:15,560
better. 
But again, like, if you don't 

415
00:24:15,560 --> 00:24:18,320
know how you will want to use 
the data, maybe it's better not 

416
00:24:18,320 --> 00:24:20,360
to collect them. 
And this comes back to this 

417
00:24:20,360 --> 00:24:23,840
thing called privacy by design. 
I think it's mentioned a lot of 

418
00:24:23,840 --> 00:24:26,160
times, right? 
In security, there's a secure by

419
00:24:26,160 --> 00:24:27,880
design. 
In privacy now there's the 

420
00:24:27,920 --> 00:24:30,880
privacy by design. 
So maybe it can help explain 

421
00:24:30,880 --> 00:24:33,600
what is privacy by design and 
how we should use it. 

422
00:24:34,480 --> 00:24:38,320
Yeah, Privacy by Design is 
actually like a quite old 

423
00:24:38,320 --> 00:24:41,760
concept, which is pretty cool. 
I think it was developed first 

424
00:24:41,760 --> 00:24:46,840
in the early 2000s by a 
researcher in Canada, Kabukin, I

425
00:24:46,840 --> 00:24:48,440
don't know if I pronounced the 
name right. 

426
00:24:48,440 --> 00:24:53,880
Anyways, and she developed these
principles as part of research. 

427
00:24:53,880 --> 00:24:56,840
I think she was working on as 
somebody that both understood 

428
00:24:56,840 --> 00:25:01,800
some of the privacy regulatory 
aspects and then also the 

429
00:25:01,800 --> 00:25:04,240
software aspects. 
And when you read the original 

430
00:25:04,240 --> 00:25:07,280
privacy by design, I think 
there's seven design principles.

431
00:25:07,840 --> 00:25:11,360
You see a lot of software 
thinking this is like really pre

432
00:25:11,360 --> 00:25:16,440
data, pre algorithms as like a 
normal part of software design. 

433
00:25:16,840 --> 00:25:19,560
And a lot of it talks about the 
same types of themes we're 

434
00:25:19,560 --> 00:25:23,920
talking about now, allowing user
choice, allowing transparency, 

435
00:25:24,280 --> 00:25:29,200
but also by default building 
things that respect the user, 

436
00:25:29,640 --> 00:25:33,360
that respects the fact that 
privacy does have all these, you

437
00:25:33,360 --> 00:25:38,160
know, is informed by context. 
And so if I put something in a 

438
00:25:38,160 --> 00:25:41,320
form, it doesn't necessarily 
mean I want that to be used to 

439
00:25:41,320 --> 00:25:45,320
train an algorithm, right? 
That we understand like how the 

440
00:25:45,320 --> 00:25:47,960
user is imagining what it is 
they're doing. 

441
00:25:48,280 --> 00:25:51,080
And that we're building 
technology that hopefully 

442
00:25:51,080 --> 00:25:54,680
mirrors that or if not, makes it
more obvious what is happening. 

443
00:25:55,520 --> 00:26:00,120
And then security principles, so
basic, not only application 

444
00:26:00,120 --> 00:26:02,480
security, but also 
infrastructure security 

445
00:26:02,480 --> 00:26:07,480
principles so that we don't end 
up exposing something that we've

446
00:26:07,480 --> 00:26:11,080
collected and trusts that we are
even using responsibly. 

447
00:26:11,320 --> 00:26:16,040
But then open the system up for 
potential attacks or other ways 

448
00:26:16,040 --> 00:26:20,240
of exfiltrating that data and 
using it for another purpose. 

449
00:26:20,800 --> 00:26:23,720
And this is where I think we can
take privacy by design. 

450
00:26:23,720 --> 00:26:26,080
We can use that to architect and
build our systems. 

451
00:26:26,400 --> 00:26:30,040
And then we can also evolve that
thinking for new things like 

452
00:26:30,320 --> 00:26:33,960
building large scale data 
systems, algorithmic systems. 

453
00:26:34,360 --> 00:26:38,280
And then we have a huge problem 
now around third party data 

454
00:26:38,280 --> 00:26:42,480
systems of the, you know, which 
I don't think by any means 

455
00:26:42,600 --> 00:26:45,520
private by design. 
And so when we think of third 

456
00:26:45,520 --> 00:26:49,560
party, it can either be services
we're using and we are the 

457
00:26:49,560 --> 00:26:53,880
intermediary then between our 
users who maybe we have a direct

458
00:26:53,880 --> 00:26:56,200
relationship with and direct 
trust with. 

459
00:26:56,560 --> 00:26:59,880
And then we want to use a third 
party system, which is fine as 

460
00:26:59,880 --> 00:27:03,160
long as we're really clear and 
as long as we're actually 

461
00:27:03,160 --> 00:27:06,680
reviewing the third party 
vendors that we use for things 

462
00:27:06,680 --> 00:27:10,840
like privacy or the other way 
around where we're actually a 

463
00:27:10,840 --> 00:27:14,800
third party vendor, we have 0 
access to the users. 

464
00:27:14,800 --> 00:27:17,600
And this is a lot of us that 
work in the B to B space. 

465
00:27:17,760 --> 00:27:21,240
We don't actually, I mean, 
usually the companies that we're

466
00:27:21,240 --> 00:27:24,360
interacting with, they might 
have customers or they might 

467
00:27:24,360 --> 00:27:26,640
even have customers who have 
customers, right? 

468
00:27:26,640 --> 00:27:30,760
And so how many layers until we 
get to a human who can make a 

469
00:27:30,760 --> 00:27:33,880
choice about whether they want 
their data used that way or not.

470
00:27:34,200 --> 00:27:38,120
And do we have a good review 
process to make sure that that 

471
00:27:38,200 --> 00:27:43,120
kind of chain of trust is 
verified and that it's mentally 

472
00:27:43,120 --> 00:27:45,720
sound both the way that we're 
dealing with these trust 

473
00:27:45,720 --> 00:27:49,440
relationships and also how we're
eventually communicating all the

474
00:27:49,440 --> 00:27:53,560
way down to a person who can 
say, you know what, I'd actually

475
00:27:53,560 --> 00:27:57,560
rather not if I could opt out. 
I'd rather use this service, but

476
00:27:57,560 --> 00:28:00,960
without the interface to 
ChatGPT, for example. 

477
00:28:01,280 --> 00:28:04,120
And I think we kind of see this 
in some of the new products that

478
00:28:04,120 --> 00:28:09,280
come out that there's a little 
bit more transparency of, hey, 

479
00:28:09,280 --> 00:28:12,720
we built it this way, here's the
services we're using. 

480
00:28:13,000 --> 00:28:16,040
Do you want to use this 
particular feature or not? 

481
00:28:16,160 --> 00:28:19,240
At least? 
I see that a lot in the way that

482
00:28:19,240 --> 00:28:23,080
Apple is choosing to design some
pieces of Apple intelligence and

483
00:28:23,080 --> 00:28:25,640
so forth. 
Is trying to have that 

484
00:28:25,640 --> 00:28:28,400
conversation in the open. 
Does it always go well? 

485
00:28:28,400 --> 00:28:31,840
Maybe not, but at least trying 
to start that conversation with 

486
00:28:31,840 --> 00:28:36,240
users in the open about how they
can better understand and 

487
00:28:36,240 --> 00:28:39,560
control the way that the data 
flows through the systems. 

488
00:28:39,560 --> 00:28:43,000
And I would call that like 
today's updated version of 

489
00:28:43,000 --> 00:28:44,760
Thinking through Privacy by 
design. 

490
00:28:45,880 --> 00:28:49,400
So after you mentioned about, 
you know, third parties and how 

491
00:28:49,400 --> 00:28:52,880
data their data intermediaries 
or even people these days, we 

492
00:28:52,880 --> 00:28:55,400
use a lot of SAS, right? 
I mean software product teams, 

493
00:28:55,400 --> 00:28:57,200
right? 
We use a lot of SAS, you know, 

494
00:28:57,200 --> 00:29:00,120
maybe manage database or 
serverless database and things 

495
00:29:00,120 --> 00:29:01,960
like that. 
So that kind of like implies the

496
00:29:01,960 --> 00:29:05,280
data can be stored in most of 
other places and you know, 

497
00:29:05,280 --> 00:29:08,400
beyond our premise, right? 
So I think this also speaks 

498
00:29:08,400 --> 00:29:11,160
very, very true. 
The fact that sometimes if your 

499
00:29:11,160 --> 00:29:14,560
organization grow really large, 
it's very difficult to actually 

500
00:29:14,560 --> 00:29:17,920
know what data have you 
collected, where they reside, 

501
00:29:18,640 --> 00:29:21,920
the retention, you know, who 
people get access to, right? 

502
00:29:22,120 --> 00:29:24,800
So tell us about the importance 
of this, you know, data 

503
00:29:24,800 --> 00:29:27,880
governance, because in many 
parts of conversation I was in 

504
00:29:27,880 --> 00:29:31,280
about data privacy, I think the 
first step is always to know the

505
00:29:31,280 --> 00:29:34,200
inventory of your data, right? 
Govern that particular 

506
00:29:34,200 --> 00:29:36,400
inventory. 
So maybe tell us the importance 

507
00:29:36,400 --> 00:29:38,320
of data governance. 
Yeah. 

508
00:29:38,320 --> 00:29:41,800
And I think data governance is 
such a critical even outside of 

509
00:29:41,800 --> 00:29:44,880
privacy. 
But you said it perfectly. 

510
00:29:44,880 --> 00:29:47,720
I mean, if you don't know what 
data you have, if you don't know

511
00:29:47,720 --> 00:29:51,160
where it lives, if you don't 
know what pathways it goes 

512
00:29:51,160 --> 00:29:53,400
through, like when we talk about
data engineering, if you don't 

513
00:29:53,400 --> 00:29:56,440
know where it's being processed,
how it's being processed, by 

514
00:29:56,440 --> 00:30:01,760
whom along the way, then you 
kind of end up in a place where 

515
00:30:01,760 --> 00:30:05,800
privacy is not possible to some 
degree, right? 

516
00:30:06,120 --> 00:30:10,800
Hopefully you've kind of, again,
I'll re mention having strong 

517
00:30:10,880 --> 00:30:15,160
practice of vendor review and 
vendor assessment because I 

518
00:30:15,160 --> 00:30:19,720
think like thinking through risk
always means can we create a 

519
00:30:19,720 --> 00:30:23,840
reasonable way that we as an 
organization want to think about

520
00:30:23,840 --> 00:30:26,520
what third parties we work with,
what intermediates we work with,

521
00:30:26,520 --> 00:30:28,360
what SAS as you point out we 
work with. 

522
00:30:28,840 --> 00:30:31,960
And there's no going to be no 
magical right choice, but there 

523
00:30:31,960 --> 00:30:35,480
needs to be a commitment to some
sort of choice and perhaps some 

524
00:30:35,480 --> 00:30:38,720
thinking through of why it is 
that you choose one vendor over 

525
00:30:38,720 --> 00:30:42,160
another that relates to 
principles like privacy and data

526
00:30:42,160 --> 00:30:45,480
security. 
But then beyond that, if we 

527
00:30:45,480 --> 00:30:48,080
don't understand the data flows 
that are going through our 

528
00:30:48,080 --> 00:30:52,400
systems, if we don't have data 
properly tagged, like some 

529
00:30:52,400 --> 00:30:56,080
people might use tagging, other 
people use categorization, other

530
00:30:56,080 --> 00:30:58,960
people use all sorts of things. 
But if we don't have a 

531
00:30:58,960 --> 00:31:03,120
reasonable way of organizing the
quote UN quote, data catalogs 

532
00:31:03,120 --> 00:31:07,360
that we have and data stores 
that we have, then we also don't

533
00:31:07,360 --> 00:31:10,520
have any reasonable way to think
through retention schedules. 

534
00:31:10,920 --> 00:31:14,840
What is a retention schedule? 
It means that we say that we 

535
00:31:14,840 --> 00:31:19,080
collect data for a certain 
purpose for a given timeline. 

536
00:31:19,960 --> 00:31:23,680
Maybe we collect purchase data 
for every customer, right? 

537
00:31:23,680 --> 00:31:25,480
They go to our site, they buy 
something. 

538
00:31:25,840 --> 00:31:28,080
Or maybe we hold other people's 
purchase date. 

539
00:31:28,080 --> 00:31:31,040
Doesn't matter. 
The purchase data exists, right?

540
00:31:31,240 --> 00:31:34,640
And maybe we say we retain 
purchase data for as long as 

541
00:31:34,640 --> 00:31:39,120
you're a customer or for up to 
two years after you close your 

542
00:31:39,120 --> 00:31:42,760
account or something like this. 
Well, how do we even enforce 

543
00:31:42,760 --> 00:31:44,680
that? 
If we haven't bothered to like 

544
00:31:44,960 --> 00:31:48,560
connect, then somebody has to 
write like some really nasty SQL

545
00:31:48,560 --> 00:31:51,000
query. 
It may or may not work if 

546
00:31:51,000 --> 00:31:55,360
somebody did bad data entry. 
And this all relates to better 

547
00:31:55,360 --> 00:31:58,520
data governance, which relates 
to things like data cataloging, 

548
00:31:58,760 --> 00:32:03,760
but also things like thinking 
through data quality metrics and

549
00:32:03,760 --> 00:32:07,040
how do we ensure that we're 
actually getting value out of 

550
00:32:07,040 --> 00:32:10,840
the data we collect. 
And what I will say is if you 

551
00:32:10,840 --> 00:32:14,000
improve trust relationships with
your customers, you will get 

552
00:32:14,000 --> 00:32:17,400
better data quality. 
Like that's by default proven 

553
00:32:17,400 --> 00:32:20,800
time and time again. 
And so if you're just spamming, 

554
00:32:21,040 --> 00:32:24,520
collecting whatever you can, 
especially through some sort of 

555
00:32:24,520 --> 00:32:28,120
intermediary, you're probably 
getting pretty low quality data.

556
00:32:28,480 --> 00:32:32,680
If the signal that you're trying
to measure is something like is 

557
00:32:32,680 --> 00:32:37,320
this customer interested in ABC 
and you have a good relationship

558
00:32:37,320 --> 00:32:40,920
with a customer, then you could 
just directly ask them, right? 

559
00:32:41,160 --> 00:32:44,280
And then you have both high data
quality of high trust, you've 

560
00:32:44,280 --> 00:32:48,080
created a good privacy and 
security relationship with your 

561
00:32:48,080 --> 00:32:51,440
customer, and you know that the 
data that you're storing is 

562
00:32:51,440 --> 00:32:55,160
actually accurate. 
So I think in the end, these all

563
00:32:55,160 --> 00:32:58,280
end up supporting each other. 
And I just want to say it is not

564
00:32:58,280 --> 00:33:03,320
a trivial undertaking to do data
governance at scale and most 

565
00:33:03,320 --> 00:33:06,160
organizations are not doing it 
very well. 

566
00:33:06,160 --> 00:33:09,320
They're just kind of starting 
right now, I think, for a lot of

567
00:33:09,320 --> 00:33:13,880
organizations. 
So it's OK to decide, OK, We 

568
00:33:13,880 --> 00:33:16,960
want our next few years to be a 
goal of having better data 

569
00:33:16,960 --> 00:33:21,760
governance in which we can also 
have better privacy and data 

570
00:33:21,760 --> 00:33:23,760
quality. 
Yeah. 

571
00:33:23,760 --> 00:33:26,400
So I could imagine companies 
that have been around for years,

572
00:33:26,400 --> 00:33:29,920
right, having production data 
serving lots of customers. 

573
00:33:30,120 --> 00:33:32,600
Definitely it will be a 
challenge, right, Especially, 

574
00:33:33,000 --> 00:33:34,880
you know, these days also people
work remotely. 

575
00:33:34,880 --> 00:33:37,640
They work, you know, using 
Internet, you know, something 

576
00:33:37,640 --> 00:33:41,440
got downloaded from the system 
into the device, sent over to 

577
00:33:41,440 --> 00:33:44,560
WhatsApp or something like that.
Or you have data pipelines, you 

578
00:33:44,560 --> 00:33:47,440
know, data warehouse where data 
moves from one side to the 

579
00:33:47,440 --> 00:33:49,120
other. 
I think it's really, really 

580
00:33:49,120 --> 00:33:50,880
challenging. 
I would, I would just empathize.

581
00:33:51,040 --> 00:33:54,320
I also have this kind of problem
as well to catalog knowing the 

582
00:33:54,320 --> 00:33:57,880
lineage, right govern, whether 
it's sensitive, not sensitive 

583
00:33:57,960 --> 00:34:00,680
parts of it, whether it can be 
shared with many others people 

584
00:34:00,680 --> 00:34:02,400
or not. 
But I think it's definitely a 

585
00:34:02,400 --> 00:34:04,320
challenge, but I think it will 
take some time. 

586
00:34:04,400 --> 00:34:07,720
If we build awareness, I'm sure 
maybe we can improve the data 

587
00:34:07,720 --> 00:34:10,560
governance that we have. 
So maybe let's go into the 

588
00:34:10,560 --> 00:34:12,800
techniques. 
How can we actually improve our 

589
00:34:12,800 --> 00:34:15,760
privacy practice, right? 
I think the first that is always

590
00:34:15,840 --> 00:34:18,639
mentioned is about, you know, 
pseudonymization or 

591
00:34:18,639 --> 00:34:20,639
anonymization or data masking, 
right? 

592
00:34:21,040 --> 00:34:23,840
So tell us maybe for people who 
are not familiar with these 

593
00:34:23,840 --> 00:34:26,199
terms, what are those 
techniques, right? 

594
00:34:26,320 --> 00:34:28,520
And how can we apply that 
practically? 

595
00:34:29,320 --> 00:34:32,480
Yeah. 
I mean, I want to add a mention 

596
00:34:32,480 --> 00:34:36,639
here that is maybe kind of also 
helps bridge data governance to 

597
00:34:36,639 --> 00:34:39,080
like actually implementing what 
we might call as like 

598
00:34:39,080 --> 00:34:43,520
mitigations or controls. 
And that is, these are not 

599
00:34:43,520 --> 00:34:47,800
decisions to be made in a vacuum
of like what controls get 

600
00:34:47,800 --> 00:34:51,600
applied where or how do we 
decide what we want to do with 

601
00:34:51,600 --> 00:34:53,719
what data? 
How do we even decide what 

602
00:34:54,000 --> 00:34:58,800
sensitivity a data has? 
Hopefully your organization has 

603
00:34:58,800 --> 00:35:04,160
the ability to create a data 
governance board or practice or 

604
00:35:04,160 --> 00:35:08,240
even a data privacy practice 
where there could be multiple 

605
00:35:08,240 --> 00:35:10,600
disciplines of people in the 
room. 

606
00:35:10,680 --> 00:35:13,200
Because I think you have to have
people that are business 

607
00:35:13,200 --> 00:35:16,360
informed that know what are the 
business goals of why we're 

608
00:35:16,360 --> 00:35:20,440
trying to do this thing. 
Data informed, software and 

609
00:35:20,440 --> 00:35:25,360
tech, tech informed, and then of
course, regulatory informed and 

610
00:35:25,360 --> 00:35:27,720
privacy informed. 
So you can also hire privacy 

611
00:35:27,720 --> 00:35:31,760
professionals who are not legal 
experts, but instead focus on 

612
00:35:31,760 --> 00:35:33,760
the topic of like privacy by 
design. 

613
00:35:34,040 --> 00:35:36,520
So if you have these people 
together, maybe also somebody 

614
00:35:36,520 --> 00:35:40,440
from Infosec, you can start to 
say, you know what we think the 

615
00:35:40,440 --> 00:35:44,160
biggest challenges for our data 
governance or our data privacy 

616
00:35:44,160 --> 00:35:48,280
or our data security are all 
three is these top three, some 

617
00:35:48,280 --> 00:35:50,120
new systems that we want to 
develop. 

618
00:35:50,520 --> 00:35:53,560
So we're going to put that on a 
kind of our collective road map 

619
00:35:53,560 --> 00:35:56,320
for the year and we're going to 
prioritize those. 

620
00:35:56,840 --> 00:36:00,200
And I think then you start 
thinking once you can understand

621
00:36:00,200 --> 00:36:03,360
what's the risk space that you 
have, So what are the biggest 

622
00:36:03,360 --> 00:36:07,640
problems that you have with the 
data privacy, then you can start

623
00:36:07,640 --> 00:36:09,960
thinking about, oh, do we need 
to student mise? 

624
00:36:09,960 --> 00:36:13,520
Do we need to anonymise? 
Do we need to run some other 

625
00:36:13,520 --> 00:36:16,440
type of setup? 
And then we get into the fun, 

626
00:36:16,440 --> 00:36:20,320
fun bits, which I would all fall
under kind of these types of 

627
00:36:20,320 --> 00:36:24,760
privacy controls or start to go 
into what we would call privacy 

628
00:36:24,760 --> 00:36:28,040
enhancing technologies, which 
are different technologies that 

629
00:36:28,040 --> 00:36:31,600
we can use to help meet these 
goals of privacy at our 

630
00:36:31,600 --> 00:36:35,040
organization. 
And most of us have seen 

631
00:36:35,040 --> 00:36:39,440
pseudonymization at some point 
in time that can take many 

632
00:36:39,440 --> 00:36:42,640
forms. 
It can be simple masking, like 

633
00:36:42,640 --> 00:36:46,200
maybe you've used one of the 
popular libraries is Microsoft 

634
00:36:46,200 --> 00:36:52,600
Presidio to try to identify 
things like PII and mask them. 

635
00:36:53,080 --> 00:36:56,960
Masking to take several forms. 
You probably already use some 

636
00:36:56,960 --> 00:37:01,440
hashing mechanisms or something 
like this, maybe a one way hash 

637
00:37:01,440 --> 00:37:04,120
or maybe a two way like a 
reversible hash. 

638
00:37:04,400 --> 00:37:07,200
These can be ways to pseudomize 
information. 

639
00:37:07,680 --> 00:37:10,960
You've probably already have 
already used redaction. 

640
00:37:11,240 --> 00:37:15,800
So just simply like removing 
certain types of sensitive data 

641
00:37:15,800 --> 00:37:21,520
from let's say reporting tool or
ABI dashboard to say, OK, now 

642
00:37:21,520 --> 00:37:26,120
this is ready for organization 
wide consumption or this is 

643
00:37:26,120 --> 00:37:28,920
ready for our marketing report 
or whatever it is. 

644
00:37:28,920 --> 00:37:32,440
So like the data is scheduled 
for release publicly or at least

645
00:37:32,600 --> 00:37:38,200
semi publicly within an org and 
any of these things are small 

646
00:37:38,200 --> 00:37:42,480
privacy mitigations that you do.
And then those can lead up to 

647
00:37:42,480 --> 00:37:47,680
more serious privacy mitigations
like anonymization or even 

648
00:37:47,680 --> 00:37:51,160
thinking through things like 
local first data processing, 

649
00:37:51,160 --> 00:37:54,720
distributed data or encrypted 
computation. 

650
00:37:54,720 --> 00:37:58,280
So gets very space aged the 
further you want to go. 

651
00:37:58,600 --> 00:38:01,320
And those only fit certain types
of problems right. 

652
00:38:01,320 --> 00:38:04,640
So first you have to shape the 
problem so you can know as every

653
00:38:04,640 --> 00:38:06,160
developer knows it, not to say 
it. 

654
00:38:06,360 --> 00:38:08,760
If you don't know what problem 
you're trying to do, then just 

655
00:38:08,760 --> 00:38:11,840
putting a technology in it is 
unfortunately not going to solve

656
00:38:11,840 --> 00:38:14,720
the problem. 
Yeah, that's a very, very good 

657
00:38:14,720 --> 00:38:16,760
advice, right. 
So I like the in the beginning 

658
00:38:16,760 --> 00:38:19,680
you mentioned that it's not 
something one like one 

659
00:38:19,680 --> 00:38:22,960
particular team or you know, 
person to decide, right? 

660
00:38:22,960 --> 00:38:25,800
So it's like a multi 
departmental kind of effort, 

661
00:38:25,800 --> 00:38:27,920
right? 
And there are many departments 

662
00:38:27,920 --> 00:38:30,520
that should be involved, right? 
You mentioned a few that are 

663
00:38:30,560 --> 00:38:33,080
very important, right? 
It could be the privacy, you 

664
00:38:33,080 --> 00:38:35,400
know, department, the 
technology, definitely the 

665
00:38:35,400 --> 00:38:38,320
product, right? 
And maybe the legal side in four

666
00:38:38,320 --> 00:38:40,800
SEC, definitely data and 
security is kind of like 

667
00:38:40,800 --> 00:38:43,080
interrelated with each other and
there could be many other 

668
00:38:43,080 --> 00:38:44,960
parties, right? 
And especially in the 

669
00:38:45,040 --> 00:38:48,160
organization, I think it's very 
important to have this so-called

670
00:38:48,160 --> 00:38:50,560
decision making process to 
actually identify whether 

671
00:38:50,560 --> 00:38:52,240
something is sensitive or not 
sensitive. 

672
00:38:52,840 --> 00:38:55,280
So I think during the 
explanation just now, you 

673
00:38:55,280 --> 00:38:58,160
mentioned this term that is 
quite trendy these days, privacy

674
00:38:58,160 --> 00:39:02,160
enhancing technologies. 
I think still tech we all love, 

675
00:39:02,160 --> 00:39:04,840
you know, to know, like what are
the technologies out there so 

676
00:39:04,840 --> 00:39:07,160
that we can kind of like apply 
and try it. 

677
00:39:07,360 --> 00:39:10,480
So what are some of the privacy 
enhancing technologies that are,

678
00:39:10,560 --> 00:39:13,280
you know, like quite modern 
these days that people are using

679
00:39:13,280 --> 00:39:17,040
to solve this privacy thing? 
I think there's been some really

680
00:39:17,040 --> 00:39:20,280
cool developments, I would say, 
over the past 10 years. 

681
00:39:20,760 --> 00:39:25,040
And I think the ones I'm most 
excited about are also the ones 

682
00:39:25,040 --> 00:39:28,600
that show up in the book that I 
know you're now a bit familiar 

683
00:39:28,600 --> 00:39:33,200
with, which is technologies like
differential privacy, which is a

684
00:39:33,200 --> 00:39:37,280
way that we can reason 
technically about privacy. 

685
00:39:37,600 --> 00:39:42,320
I like to call it for people the
idea of measurable privacy or 

686
00:39:42,320 --> 00:39:46,040
rigorous privacy, because we're 
using a scientific way of 

687
00:39:46,040 --> 00:39:49,000
thinking about or trying to 
shape the problem and then 

688
00:39:49,000 --> 00:39:53,080
trying to decide that. 
And that works well for things 

689
00:39:53,080 --> 00:39:56,520
like if you have to release data
publicly, if you want to think 

690
00:39:56,520 --> 00:39:58,560
through something like 
anonymization. 

691
00:39:59,080 --> 00:40:02,320
And it can work in many 
different types of use cases, 

692
00:40:02,320 --> 00:40:06,560
but it can never work if you're 
trying to say customer A wants 

693
00:40:06,560 --> 00:40:09,560
B. 
It has to work in some form of 

694
00:40:09,560 --> 00:40:12,680
aggregation of an idea or a 
person. 

695
00:40:13,120 --> 00:40:16,920
But of course, when we think of 
anonymization, we cannot ever 

696
00:40:16,920 --> 00:40:20,080
say that we release something 
anonymized that could be tracked

697
00:40:20,080 --> 00:40:23,800
back to one individual, because 
then it's not anonymized, right,

698
00:40:23,800 --> 00:40:27,000
By definition. 
So I think differential privacy 

699
00:40:27,000 --> 00:40:30,720
is really cool technology, lots 
of different approaches, lots of

700
00:40:30,720 --> 00:40:34,640
different algorithms to think 
through, and also an increasing 

701
00:40:34,640 --> 00:40:37,480
number of really cool open 
source libraries that allow you 

702
00:40:37,480 --> 00:40:40,600
to think through differential 
privacy from a development 

703
00:40:40,640 --> 00:40:44,200
context. 
Then probably the next most 

704
00:40:44,200 --> 00:40:48,560
famous one of recent years has 
been Federated learning or 

705
00:40:48,560 --> 00:40:52,000
Federated analytics. 
Sometimes I like to call it more

706
00:40:52,000 --> 00:40:55,840
distributed learning, and it 
also relates to what I would 

707
00:40:55,840 --> 00:40:58,640
call the field of local first 
software. 

708
00:40:59,000 --> 00:41:02,200
I don't know if you've already 
had local first software experts

709
00:41:02,200 --> 00:41:06,800
on your show, but Martin 
Klippmann and who's famous for 

710
00:41:06,800 --> 00:41:08,720
the book? 
Nice. 

711
00:41:09,880 --> 00:41:11,360
Martin Klippmann's book is very 
famous. 

712
00:41:11,880 --> 00:41:16,080
Sorry. 
Data in intensive applications. 

713
00:41:16,080 --> 00:41:18,760
Something like that. 
Yes, yes, yes. 

714
00:41:18,880 --> 00:41:24,360
Thank you. 
And he and a group of folks that

715
00:41:24,360 --> 00:41:27,120
have kind of started this 
movement, I think it probably 

716
00:41:27,120 --> 00:41:29,720
started before them, but have 
popularized this movement of 

717
00:41:29,720 --> 00:41:34,120
local first data, which is more 
thinking through kind of exactly

718
00:41:34,120 --> 00:41:38,520
what we were talking about 
before, which is if we can now 

719
00:41:38,520 --> 00:41:41,720
we have devices, we have edge 
devices, we also have edge 

720
00:41:41,720 --> 00:41:46,680
compute like mobile devices. 
And certainly within whatever 

721
00:41:46,680 --> 00:41:49,560
AWS cluster you run, you have 
plenty of compute. 

722
00:41:50,160 --> 00:41:56,600
So can we push processing as far
down to the edge that we need? 

723
00:41:56,840 --> 00:42:00,200
And therefore can we push the 
data as far down to the edge as 

724
00:42:00,200 --> 00:42:03,640
we need? 
So this might mean what we call 

725
00:42:03,640 --> 00:42:07,480
cross silo learning or cross 
silo analytics, which says 

726
00:42:07,760 --> 00:42:11,320
you're a multinational company 
or you're in some sort of B to B

727
00:42:11,320 --> 00:42:15,760
situation and you allow every 
company to at least house their 

728
00:42:15,760 --> 00:42:20,000
data within their own, you know,
realm premise, so to speak. 

729
00:42:20,000 --> 00:42:22,560
So within their own boundaries 
of their cloud system or 

730
00:42:22,560 --> 00:42:24,000
whatever it is that they're 
using. 

731
00:42:24,360 --> 00:42:28,200
And they don't actually exchange
data, they exchange some sort of

732
00:42:28,200 --> 00:42:31,240
analysis or output. 
So this could be machine 

733
00:42:31,240 --> 00:42:33,360
learning processing. 
This could just be simple 

734
00:42:33,360 --> 00:42:36,280
analytics, but we kind of try to
push it there. 

735
00:42:36,720 --> 00:42:40,160
Or it could go all the way down 
to I'm building a mobile 

736
00:42:40,160 --> 00:42:43,680
application, I'm going to keep 
all of the data local and I'm 

737
00:42:43,680 --> 00:42:46,040
only going to send certain 
artifacts. 

738
00:42:46,080 --> 00:42:49,120
Am I going to push that back up 
to some sort of centralized 

739
00:42:49,120 --> 00:42:52,840
compute or some sort of 
redundancy compute, Right. 

740
00:42:53,160 --> 00:42:56,480
And depending on how that works,
you could even offer things like

741
00:42:56,480 --> 00:42:59,280
end to end encryption and then 
other types of things. 

742
00:42:59,720 --> 00:43:03,000
Why would you go to this bother?
Will you go to this bother? 

743
00:43:03,000 --> 00:43:07,360
Because A, you're greatly 
reducing your exposure risk, 

744
00:43:07,480 --> 00:43:09,200
right? 
You'd actually don't have 

745
00:43:09,200 --> 00:43:12,240
centralized data. 
You just have some centralized 

746
00:43:12,240 --> 00:43:15,440
insights. 
So how attractive is it to try 

747
00:43:15,440 --> 00:43:19,640
to hack your system much less so
the less data that you have. 

748
00:43:19,920 --> 00:43:23,560
And B, if you can get the same 
insights, you're also saving a 

749
00:43:23,560 --> 00:43:26,920
huge amount on cloud compute 
because again, if you're like, 

750
00:43:27,160 --> 00:43:30,880
if every ping, every two seconds
I open it, you're just checking 

751
00:43:30,880 --> 00:43:34,680
whether I'm still at my house. 
Like, cool. 

752
00:43:35,000 --> 00:43:37,680
I mean, not so cool. 
I don't think so, But you know, 

753
00:43:37,760 --> 00:43:43,440
you're wasting a whole bunch of 
network compute and storage to 

754
00:43:43,680 --> 00:43:47,840
just find out that, guess what, 
between the hours of, you know, 

755
00:43:47,840 --> 00:43:51,120
9:00 PM and whatever time in the
morning, I'm at my house, right?

756
00:43:51,360 --> 00:43:54,440
When in aggregate analytics, he 
probably could have already 

757
00:43:54,440 --> 00:43:57,480
figured out that is, you know, 
80% of your user base or 

758
00:43:57,480 --> 00:43:59,320
something like this. 
I don't really know why you need

759
00:43:59,320 --> 00:44:02,280
to know that, but I just give 
you an example, a rough example 

760
00:44:02,800 --> 00:44:06,800
of how differently we could 
engineer, architect our systems 

761
00:44:07,040 --> 00:44:09,680
if we thought through again this
question of what days you 

762
00:44:09,680 --> 00:44:13,000
actually need. 
And can I do that in some sort 

763
00:44:13,000 --> 00:44:17,280
of privacy respecting manner so 
that most of the data stays far 

764
00:44:17,280 --> 00:44:20,640
away from some sort of 
centralized setup and certainly 

765
00:44:20,640 --> 00:44:24,280
far away from some sort of data 
sharing setup where companies 

766
00:44:24,280 --> 00:44:27,280
are just exchanging sensitive 
data at scale. 

767
00:44:28,040 --> 00:44:31,360
And then that leads to the third
type of privacy enhancing 

768
00:44:31,360 --> 00:44:33,800
technology I'm personally really
excited about, which is 

769
00:44:33,800 --> 00:44:37,320
encrypted computation or people 
might have heard about that 

770
00:44:37,320 --> 00:44:41,880
under terms like homomorphic 
encryption or multi secure multi

771
00:44:41,880 --> 00:44:45,960
party computation. 
And this essentially enables a 

772
00:44:45,960 --> 00:44:50,920
whole bunch of use cases based 
on cryptography, which allows 

773
00:44:50,920 --> 00:44:54,960
you to compute insights on data 
without decrypting it. 

774
00:44:55,360 --> 00:44:58,200
Of course, then eventually 
somebody will decrypt it. 

775
00:44:58,200 --> 00:45:01,840
So the insight gets decrypted 
and used at some point in time, 

776
00:45:02,440 --> 00:45:07,480
but the actual processing of the
data can be done in an encrypted

777
00:45:07,480 --> 00:45:10,960
state. 
And that provides a quality that

778
00:45:10,960 --> 00:45:14,800
we like to call secrecy, which 
is different than privacy. 

779
00:45:15,200 --> 00:45:19,360
But secrecy we can use to say, 
you know what, in our cloud 

780
00:45:19,360 --> 00:45:24,480
compute we only ever computed in
encrypted and then we move back 

781
00:45:24,480 --> 00:45:28,160
to your device where you 
personally unencrypted it. 

782
00:45:28,400 --> 00:45:33,560
And so by the design that we 
have offered, we actually never 

783
00:45:33,560 --> 00:45:37,600
saw the data in an unencrypted 
state except once it was on 

784
00:45:37,600 --> 00:45:41,280
device or in a data sharing 
landscape. 

785
00:45:41,520 --> 00:45:45,560
We actually only ever process 
data with another company by 

786
00:45:45,560 --> 00:45:49,720
keeping that data encrypted so 
that neither company could learn

787
00:45:49,720 --> 00:45:52,600
anything other than the final 
released results. 

788
00:45:52,600 --> 00:45:56,720
So any intermediary results, 
anything that we think might 

789
00:45:56,720 --> 00:46:01,280
leak an individual level 
privacy, we can kind of cover up

790
00:46:01,280 --> 00:46:04,280
with a secrecy blanket. 
And then at the end, when we 

791
00:46:04,280 --> 00:46:08,040
decrypt and we pull off that 
secrecy blanket, we've already 

792
00:46:08,040 --> 00:46:12,680
ensured that the end result has 
enough privacy that we can 

793
00:46:12,680 --> 00:46:16,360
remove the secrecy. 
I hope that was understandable. 

794
00:46:16,600 --> 00:46:20,480
You tell me. 
Well, I think definitely sounds 

795
00:46:20,480 --> 00:46:23,080
really cool, all these advanced 
technologies, I may not be 

796
00:46:23,080 --> 00:46:25,880
exposed to many of those 
technologies, but it sounds 

797
00:46:25,880 --> 00:46:28,440
really promising, right? 
Especially I think the local 

798
00:46:28,440 --> 00:46:31,120
first concept thing will be very
important, right? 

799
00:46:31,120 --> 00:46:34,280
Especially if you don't want to 
get exposed to many sensitive 

800
00:46:34,280 --> 00:46:38,200
data at your organization. 
Encryption, I think still kind 

801
00:46:38,200 --> 00:46:41,520
of like the go to kind of like 
strategy for securing the data, 

802
00:46:41,520 --> 00:46:43,440
right? 
It could be encryption at rest, 

803
00:46:43,440 --> 00:46:46,160
it could be in transit, right? 
It could be just like what you 

804
00:46:46,160 --> 00:46:48,600
mentioned, maybe just now 
homomorphic encryption and 

805
00:46:48,600 --> 00:46:51,200
things like that. 
These days cloud provider also 

806
00:46:51,200 --> 00:46:54,560
come up with, I don't know, like
secure chip, you know, where you

807
00:46:54,560 --> 00:46:58,240
can actually do more secure 
computing on the virtual 

808
00:46:58,240 --> 00:47:00,560
machine. 
So definitely a space that maybe

809
00:47:00,560 --> 00:47:02,680
for those people who are 
interested, right, you can 

810
00:47:02,960 --> 00:47:06,760
follow the technology's trends. 
So you mentioned about all this,

811
00:47:06,760 --> 00:47:09,120
right? 
Definitely 1 aspect for 

812
00:47:09,120 --> 00:47:13,000
engineers or product teams, it 
makes it really, really more 

813
00:47:13,000 --> 00:47:16,040
complicated to actually, you 
know, come up with the solution,

814
00:47:16,040 --> 00:47:18,720
right? 
How can we actually, you know, 

815
00:47:19,120 --> 00:47:21,960
talk to the stakeholders that we
need to, you know, improve our 

816
00:47:21,960 --> 00:47:24,920
solution by implementing all 
this complex thing that might 

817
00:47:24,920 --> 00:47:27,320
increase the effort and also the
amount of resources that we 

818
00:47:27,320 --> 00:47:29,480
need, you know, with all these 
technologies, with the amount of

819
00:47:29,480 --> 00:47:31,200
development effort and things 
like that. 

820
00:47:32,080 --> 00:47:35,160
Yeah, yeah, yeah, yeah. 
One thing that I think that your

821
00:47:35,160 --> 00:47:39,560
podcast has done well so far is 
like talk about lean principles,

822
00:47:39,600 --> 00:47:43,600
talk about creating buy in 
overtime, developing like a 

823
00:47:43,600 --> 00:47:47,040
practice and eventually 
developing platforms, right? 

824
00:47:47,040 --> 00:47:49,760
Because I think that's kind of 
the evolution that you have to 

825
00:47:49,760 --> 00:47:53,160
work with. 
If you're starting with even 

826
00:47:53,160 --> 00:47:56,280
just basic data governance, 
you're not going to tomorrow 

827
00:47:56,280 --> 00:47:59,480
deploy homomorphic encryption, 
like that's not going to happen 

828
00:47:59,480 --> 00:48:02,160
for you. 
So what I think you have to 

829
00:48:02,160 --> 00:48:04,600
start with is like you have to 
start with some of the building 

830
00:48:04,600 --> 00:48:07,680
blocks. 
But try, I think from a kind of 

831
00:48:07,680 --> 00:48:12,040
a lean product thinking, try to 
think through what types of 

832
00:48:12,040 --> 00:48:15,320
these more advanced technologies
actually fit the problems that 

833
00:48:15,320 --> 00:48:17,880
we have. 
You know, maybe it is exploring 

834
00:48:17,880 --> 00:48:21,480
local first development or 
something and then pulling in 

835
00:48:21,600 --> 00:48:26,720
very small use cases, very small
product features or maybe even 

836
00:48:26,720 --> 00:48:30,480
new product launches and saying,
you know what, we're going to 

837
00:48:30,800 --> 00:48:35,600
cordon off an extra 20% of the 
planned development time to 

838
00:48:35,600 --> 00:48:39,040
think through an experiment. 
Could we use this new technology

839
00:48:39,040 --> 00:48:44,600
to help expedite future products
so that they are launched in a 

840
00:48:44,600 --> 00:48:48,520
private by design or secure by 
default type of mentality? 

841
00:48:49,320 --> 00:48:52,880
And what you're doing there is 
a, you're giving people who are 

842
00:48:52,880 --> 00:48:56,440
interested in the topic enough 
space so that they can actually 

843
00:48:56,440 --> 00:49:00,280
start exploring, right? 
And as you know from any new 

844
00:49:00,360 --> 00:49:03,800
development technology, new type
of thinking around development, 

845
00:49:04,040 --> 00:49:06,520
if you give people a little bit 
of space and they already have 

846
00:49:06,520 --> 00:49:09,800
some interest or motivation, 
they can actually go pretty far.

847
00:49:10,080 --> 00:49:12,640
And you're also developing this 
culture of learning and 

848
00:49:12,640 --> 00:49:14,720
experimentation, which we all 
need. 

849
00:49:15,280 --> 00:49:18,680
And then B, you're also like 
boxing it enough that it doesn't

850
00:49:18,680 --> 00:49:23,360
become a blocker so that there's
some sort of backup technology 

851
00:49:23,360 --> 00:49:28,000
that you've always used that's 
there and that you're kind of 

852
00:49:28,400 --> 00:49:32,000
giving, you know, you're growing
the experience and knowledge. 

853
00:49:32,240 --> 00:49:35,960
And I think as soon as you start
focusing on the process rather 

854
00:49:35,960 --> 00:49:40,360
than the outcomes, then you're 
going to see the rewards happen 

855
00:49:40,360 --> 00:49:42,920
over time. 
And you asked like, how do we 

856
00:49:42,920 --> 00:49:44,680
explain to the business 
stakeholders? 

857
00:49:44,680 --> 00:49:48,440
Well, maybe you don't even 
yourself have to explain. 

858
00:49:48,440 --> 00:49:52,840
It will become evident by 
allowing time in the process. 

859
00:49:52,840 --> 00:49:57,000
For that, you can say as simple 
as we need to stay up to date. 

860
00:49:57,000 --> 00:49:59,960
We need to modernize the way 
that we think about privacy. 

861
00:50:00,400 --> 00:50:04,240
And I think this is going to pay
off in the long run by allowing 

862
00:50:04,240 --> 00:50:07,960
us to build platforms that allow
us to do privacy at scale, which

863
00:50:07,960 --> 00:50:11,360
is the end goal, right? 
But along that way, we're going 

864
00:50:11,360 --> 00:50:15,080
to add some extra time and 
cushion so that we can build in 

865
00:50:15,080 --> 00:50:18,320
these new exciting technologies 
that I promise you, you will be 

866
00:50:18,320 --> 00:50:21,320
able to talk about in some press
release one day, right? 

867
00:50:21,640 --> 00:50:24,280
And show that we're forward 
thinking and show that we're 

868
00:50:24,320 --> 00:50:28,200
kind of advancing this space. 
But to start out and allow that.

869
00:50:28,200 --> 00:50:31,960
And I think the advocates that 
will come out of allowing some 

870
00:50:31,960 --> 00:50:35,360
of your developers or your data 
people or other technologists at

871
00:50:35,360 --> 00:50:39,080
your org to learn more deeply 
about this stuff is going to pay

872
00:50:39,080 --> 00:50:41,800
off in and of itself. 
Those people will end up being 

873
00:50:41,800 --> 00:50:45,880
the advocates that educate the 
other people in the org about 

874
00:50:45,880 --> 00:50:48,920
how this stuff works. 
And they will find, I promise 

875
00:50:48,920 --> 00:50:52,640
you, they will find clever ways 
of explaining it if you give 

876
00:50:52,640 --> 00:50:54,640
them the time and space to be 
able to learn. 

877
00:50:55,720 --> 00:50:57,480
Yeah. 
So I think that's a very, you 

878
00:50:57,480 --> 00:51:00,160
know, important advice, right? 
So start small, right? 

879
00:51:00,320 --> 00:51:02,880
Focus on the process, not 
necessarily just the outcome. 

880
00:51:03,360 --> 00:51:06,040
And I think everyone can build 
awareness, I think within the 

881
00:51:06,040 --> 00:51:08,840
team, within the organization, 
right, about the privacy, right,

882
00:51:08,840 --> 00:51:12,440
treats privacy more seriously. 
Obviously, one easy way to 

883
00:51:12,440 --> 00:51:14,920
explain to stakeholders is about
regulations, right? 

884
00:51:14,920 --> 00:51:18,160
I'm sure in Europe it's easier 
to actually mention about 

885
00:51:18,160 --> 00:51:21,280
privacy because there's the GDPR
that is very strict, right? 

886
00:51:21,680 --> 00:51:24,560
In some parts of the world, they
are starting also to adopt these

887
00:51:24,560 --> 00:51:28,440
kind of stringent rules about 
personal data law, personal data

888
00:51:28,440 --> 00:51:31,800
protection and things like that.
So let's go to that legal 

889
00:51:31,800 --> 00:51:34,040
aspect. 
So what do you see the trends 

890
00:51:34,040 --> 00:51:36,760
these days? 
Especially what I know is that 

891
00:51:36,760 --> 00:51:39,640
many countries now starting to 
come up with this kind of law, 

892
00:51:39,640 --> 00:51:42,400
this kind of GDPR like 
regulations. 

893
00:51:42,840 --> 00:51:46,320
So is this something that we are
going to see going forward, like

894
00:51:46,320 --> 00:51:49,440
all countries will have their 
own kind of like regulations? 

895
00:51:49,440 --> 00:51:52,960
And I think it will be, it will 
become very complicated if let's

896
00:51:52,960 --> 00:51:55,560
say you build a product that 
works in multiple jurisdiction, 

897
00:51:55,560 --> 00:51:57,200
right? 
So maybe tell us a little bit 

898
00:51:57,200 --> 00:51:59,200
your insights on this. 
Yeah. 

899
00:51:59,200 --> 00:52:02,160
I mean, I think it's already 
pretty complicated. 

900
00:52:02,760 --> 00:52:05,920
It's probably gonna get more 
rather than less. 

901
00:52:05,920 --> 00:52:10,600
I mean, that's why I think a lot
of times when I talk about this,

902
00:52:10,600 --> 00:52:13,000
especially with high level 
stakeholders, like when we're 

903
00:52:13,000 --> 00:52:17,920
talking with C level of some 
sort of multinational or company

904
00:52:17,920 --> 00:52:21,080
that wants to become 
multinational, we have to think 

905
00:52:21,080 --> 00:52:25,000
about it as like future proofing
the data strategy, right? 

906
00:52:25,000 --> 00:52:30,720
Because if we just kind of 
developed data or AI strategy 

907
00:52:31,200 --> 00:52:35,480
within what we know today, we're
not going to be prepared for the

908
00:52:35,480 --> 00:52:38,240
future. 
And even with what we know today

909
00:52:38,240 --> 00:52:43,000
is we have a very fragmented 
legal jurisdiction set up around

910
00:52:43,000 --> 00:52:45,800
almost everything data, but 
certainly data privacy. 

911
00:52:46,040 --> 00:52:48,200
But that also applies to data 
security. 

912
00:52:48,440 --> 00:52:51,160
It also applies to data 
governance protections that 

913
00:52:51,160 --> 00:52:53,000
different organizations are 
looking at. 

914
00:52:53,400 --> 00:52:56,200
And one of the big trends that 
unfortunately I don't see going 

915
00:52:56,200 --> 00:53:00,560
away anytime soon is the data 
sovereignty laws as well. 

916
00:53:01,200 --> 00:53:04,720
And these data sovereignty laws 
are kind of around the 

917
00:53:04,720 --> 00:53:06,880
jurisdictional control of the 
data. 

918
00:53:06,880 --> 00:53:11,000
So for example, that data cannot
leave a certain nation state or 

919
00:53:11,000 --> 00:53:15,680
a group of nation states or that
data cannot travel across these 

920
00:53:15,680 --> 00:53:20,240
untrusted nation states. 
And if you already work in the 

921
00:53:20,240 --> 00:53:23,040
government or you work in 
government adjacent, you 

922
00:53:23,040 --> 00:53:25,680
probably already know this pain 
and you're like, you're not 

923
00:53:25,680 --> 00:53:28,880
saying anything new, Catherine. 
Well, now the pain is coming to 

924
00:53:28,880 --> 00:53:32,480
you, whether or not you're 
government adjacent, whether or 

925
00:53:32,480 --> 00:53:36,240
not you deal with kind of highly
secure systems or critical 

926
00:53:36,240 --> 00:53:39,480
infrastructure. 
I think it's kind of ballooning 

927
00:53:39,480 --> 00:53:42,960
out to other areas. 
And again, at the end of the 

928
00:53:42,960 --> 00:53:46,480
day, a lot of this is a policy 
problem internally at an 

929
00:53:46,480 --> 00:53:50,640
organization to decide, OK, 
what's our stance towards 

930
00:53:50,640 --> 00:53:53,160
regulation? 
Where do we exist? 

931
00:53:53,320 --> 00:53:57,120
Like what's important to us? 
And then obviously the legal 

932
00:53:57,120 --> 00:54:00,760
experts, not the technology 
experts, get to decide, OK, 

933
00:54:00,760 --> 00:54:04,760
here's our risk posture. 
Here's like the policies that we

934
00:54:04,760 --> 00:54:08,360
want to see related to what we 
know about the data, how the 

935
00:54:08,360 --> 00:54:11,120
data stored, where it's stored, 
what is used for. 

936
00:54:11,560 --> 00:54:15,760
And then we kind of get to step 
in as technologists at kind of 

937
00:54:15,760 --> 00:54:21,760
like that Olicy and principles 
level, I would say and say this 

938
00:54:21,760 --> 00:54:23,560
is possible, this is not 
possible. 

939
00:54:23,560 --> 00:54:26,400
This is like a good idea. 
This is not such a good idea 

940
00:54:26,400 --> 00:54:30,360
because and then we can talk 
about things like architectures,

941
00:54:30,960 --> 00:54:33,840
clouds that we use, 
infrastructure questions 

942
00:54:34,080 --> 00:54:36,960
alongside cool stuff like 
privacy technologies. 

943
00:54:36,960 --> 00:54:41,000
And I would say there's a non 
trivial number of legal 

944
00:54:41,000 --> 00:54:44,600
professionals and privacy 
professionals that want to also 

945
00:54:44,600 --> 00:54:48,000
learn more about technology or 
might even already know a lot 

946
00:54:48,000 --> 00:54:50,760
about technology. 
And you if you build those 

947
00:54:50,760 --> 00:54:55,640
bridges, you can have a really 
useful conversation around what 

948
00:54:55,640 --> 00:54:59,520
is this actually look like. 
And yeah, I think that can be 

949
00:55:00,000 --> 00:55:02,560
probably the most healthy 
approach rather than just 

950
00:55:02,560 --> 00:55:05,320
avoiding the topic because it's 
scary and you don't want to talk

951
00:55:05,320 --> 00:55:07,160
with the lawyers because they 
might say no. 

952
00:55:07,520 --> 00:55:10,320
And then five years down the 
line they're like, why do we 

953
00:55:10,320 --> 00:55:14,440
architect it this way? 
And there's like no answer other

954
00:55:14,440 --> 00:55:16,680
than we didn't have the 
conversation early enough. 

955
00:55:17,560 --> 00:55:20,600
Yeah, I think you mentioned 
something really, really, you 

956
00:55:20,600 --> 00:55:23,520
know, very important, right to 
have now probably legal team, 

957
00:55:23,520 --> 00:55:25,640
legal aspect when you build the 
products, right? 

958
00:55:25,640 --> 00:55:28,440
When you architect the solution,
when you architect your data 

959
00:55:28,440 --> 00:55:31,680
storage and things like that, I 
think we are used to discuss 

960
00:55:31,680 --> 00:55:34,400
about whenever we choose a cloud
provider, right? 

961
00:55:34,400 --> 00:55:36,480
It's the data center in the 
particular country. 

962
00:55:36,480 --> 00:55:37,880
Now I think that's not enough, 
right? 

963
00:55:37,880 --> 00:55:41,520
Because the data transfer, you 
have to also look at like if 

964
00:55:41,520 --> 00:55:44,400
let's say you use a SAS provider
that resides in another country,

965
00:55:44,400 --> 00:55:48,200
another jurisdiction, maybe the 
data sovereignty law doesn't 

966
00:55:48,200 --> 00:55:50,760
allow you to do that, right? 
And maybe you can have a 

967
00:55:50,760 --> 00:55:52,760
trouble, right, when 
transferring data over. 

968
00:55:53,040 --> 00:55:56,640
So I think a lot of countries 
also implement this differently.

969
00:55:56,960 --> 00:55:58,560
I think it's very hard to keep 
up. 

970
00:55:58,600 --> 00:56:01,040
And that's why probably the 
legal team is also still the 

971
00:56:01,040 --> 00:56:04,840
best person to kind of like help
advise the team to actually how 

972
00:56:05,000 --> 00:56:07,600
to implement the better solution
about this privacy. 

973
00:56:08,040 --> 00:56:11,960
So Catherine, we have spoken a 
lot about privacy technologies 

974
00:56:11,960 --> 00:56:14,040
and all that. 
Is there something else that you

975
00:56:14,040 --> 00:56:16,840
think the listeners here should 
know about data privacy? 

976
00:56:17,720 --> 00:56:21,480
I think lately I get a lot of 
questions around AI and data 

977
00:56:21,480 --> 00:56:25,480
privacy. 
So I just wanted to note that 

978
00:56:25,480 --> 00:56:28,000
there's a lot of new 
developments and thinking 

979
00:56:28,000 --> 00:56:32,960
through, A, how do we build 
machine learning or AI systems 

980
00:56:32,960 --> 00:56:37,400
that can utilize private data 
without problems? 

981
00:56:37,400 --> 00:56:41,640
But B, how do we, if we don't 
build the AI models ourselves, 

982
00:56:41,640 --> 00:56:45,040
of which most organizations 
still don't at this point in 

983
00:56:45,040 --> 00:56:49,520
time, how do we actually build 
kind of a protective blanket 

984
00:56:49,520 --> 00:56:54,360
around these interfaces to, 
again, these like third party AI

985
00:56:54,360 --> 00:56:59,640
systems or AIAPIS? 
And I just want to suggest out 

986
00:56:59,640 --> 00:57:02,560
there, obviously, I think my 
book is a great resource. 

987
00:57:03,240 --> 00:57:04,920
I also have a newsletter on the 
topic. 

988
00:57:04,920 --> 00:57:09,080
But to just kind of start to 
inform yourself, because what 

989
00:57:09,080 --> 00:57:13,080
I'm seeing in the circles that 
I'm in is that this is a growing

990
00:57:13,080 --> 00:57:18,560
trend and a growing problem. 
And I think that it will come. 

991
00:57:19,400 --> 00:57:21,800
It will come for everyone. 
It will come for people that are

992
00:57:21,800 --> 00:57:25,360
not machine learning experts. 
Soon enough. 

993
00:57:25,360 --> 00:57:28,480
If it's not already a problem 
that you're facing where you're 

994
00:57:28,480 --> 00:57:32,440
sitting there with some other 
team and you want to use a third

995
00:57:32,440 --> 00:57:36,040
party API and all of a sudden 
this questionnaire comes in from

996
00:57:36,040 --> 00:57:38,720
the legal department or how is 
it going to look? 

997
00:57:38,720 --> 00:57:41,960
Or how do you actually build 
that with privacy or with 

998
00:57:41,960 --> 00:57:44,440
security, right? 
If you're using proprietary 

999
00:57:44,440 --> 00:57:49,480
documents, let's say you want to
build like a rag system and you 

1000
00:57:49,480 --> 00:57:52,000
want to have access to a bunch 
of sensitive documents that the 

1001
00:57:52,000 --> 00:57:55,240
company has, like how do we make
sure that we build that 

1002
00:57:55,240 --> 00:57:59,600
responsibly? 
So I just want to point to the 

1003
00:57:59,600 --> 00:58:04,000
fact that this is like a growing
field of practice and you don't 

1004
00:58:04,000 --> 00:58:08,280
have to learn it like to the 
greatest depth, but to just kind

1005
00:58:08,280 --> 00:58:13,280
of start to, when it pleases 
you, start to inform yourself 

1006
00:58:13,280 --> 00:58:17,160
lightly like, hmm, like what 
might that look like if we built

1007
00:58:17,160 --> 00:58:20,240
this rag system with privacy or 
security? 

1008
00:58:20,480 --> 00:58:24,240
How could that look different? 
Do we have the capabilities to 

1009
00:58:24,240 --> 00:58:26,840
do that now? 
If not, what capabilities would 

1010
00:58:26,840 --> 00:58:31,120
we need to grow and to just, you
know, your tech leads, right to 

1011
00:58:31,120 --> 00:58:36,160
just start to allow that topic 
to percolate in your head a bit 

1012
00:58:36,160 --> 00:58:40,280
because I think it's a growing 
importance, I think in how we 

1013
00:58:40,280 --> 00:58:44,200
think about integrating machine 
learning into normal work flows.

1014
00:58:45,080 --> 00:58:48,120
Yeah, I could imagine many teams
now are kind of like scrambling 

1015
00:58:48,120 --> 00:58:51,040
to implement some sort of AI 
into their systems, right? 

1016
00:58:51,040 --> 00:58:54,720
And especially when you do that,
sometimes we don't think further

1017
00:58:54,720 --> 00:58:56,640
ahead, right? 
Especially if you're dealing 

1018
00:58:56,640 --> 00:58:58,480
with, I don't know, like 
customer support, you know, 

1019
00:58:58,480 --> 00:59:01,640
customer confidential data. 
You think just by implementing 

1020
00:59:01,640 --> 00:59:04,560
AI it's going to make life 
easier, but you are not sure 

1021
00:59:04,560 --> 00:59:06,120
that they will expose certain 
things. 

1022
00:59:06,440 --> 00:59:09,800
Also, when you use some third 
party systems these days, you 

1023
00:59:09,800 --> 00:59:13,600
know, when, whenever these 
companies adopt AI, sometimes 

1024
00:59:13,800 --> 00:59:17,240
the choice by default is to 
actually enable AI training, 

1025
00:59:17,240 --> 00:59:19,520
right, using your data 
sometimes. 

1026
00:59:20,040 --> 00:59:22,520
So be aware of that. 
And also like, I think those 

1027
00:59:22,520 --> 00:59:25,080
things need to be more 
transparent and explicit so that

1028
00:59:25,080 --> 00:59:27,800
we can build trust with the 
third party, the systems that we

1029
00:59:27,800 --> 00:59:29,840
use. 
So Catherine, I think it's been 

1030
00:59:29,840 --> 00:59:32,240
a great conversation. 
I learned a lot about data 

1031
00:59:32,240 --> 00:59:35,760
privacy, although it's kind of 
scares me a little bit on how to

1032
00:59:35,760 --> 00:59:38,720
build a more private by design 
system. 

1033
00:59:39,680 --> 00:59:42,080
Unfortunately, we reached the 
end of our conversation. 

1034
00:59:42,280 --> 00:59:44,880
So one thing I would like to ask
you to end the conversation is 

1035
00:59:44,880 --> 00:59:47,000
what I call the three technical 
leadership system. 

1036
00:59:47,320 --> 00:59:49,520
So you can think of them just 
like an advice. 

1037
00:59:49,800 --> 00:59:52,560
So maybe if you can share your 
version to the listeners here. 

1038
00:59:53,360 --> 00:59:57,080
Yeah, I mean, I hope maybe I've 
sparked some people who might be

1039
00:59:57,080 --> 00:59:59,800
interested in learning more 
about data privacy, data 

1040
00:59:59,800 --> 01:00:03,040
security topics. 
But the number one piece of 

1041
01:00:03,040 --> 01:00:06,000
advice is like, even if you're 
not inspired to learn. 

1042
01:00:06,440 --> 01:00:10,080
I ask that you empower people on
your team or in your 

1043
01:00:10,080 --> 01:00:14,120
organization to learn and that's
because as you can probably tell

1044
01:00:14,120 --> 01:00:20,400
from my advice today, the field 
is actually quite deep and very 

1045
01:00:20,400 --> 01:00:24,480
intensive and it's not 
necessarily a field that a lot 

1046
01:00:24,480 --> 01:00:28,680
of people specialize in because 
they studied it right. 

1047
01:00:28,680 --> 01:00:33,120
So you going to want to open 
doors for people to grow and to 

1048
01:00:33,120 --> 01:00:36,640
learn and to foster kind of that
ability as a specialized, my 

1049
01:00:36,640 --> 01:00:41,240
culture of learning and for you 
all as tech leads, like if it's 

1050
01:00:41,240 --> 01:00:44,320
coming from you, it's going to 
be much more powerful than like 

1051
01:00:44,520 --> 01:00:47,680
a junior who just joined the 
team and who's like, oh, wow, 

1052
01:00:47,680 --> 01:00:49,320
cool. 
Like I learned about 

1053
01:00:49,320 --> 01:00:51,880
differential privacy in college,
because sometimes they do that 

1054
01:00:51,880 --> 01:00:54,480
Dow like I want to try out this 
thing. 

1055
01:00:54,480 --> 01:00:57,400
Like if you can give that person
space, if if you can kind of 

1056
01:00:57,400 --> 01:01:00,400
shield them and give them some 
ability to learn, they're going 

1057
01:01:00,400 --> 01:01:03,960
to level up the entire org 
eventually, right? 

1058
01:01:03,960 --> 01:01:07,480
And so certainly level up your 
team, but eventually this kind 

1059
01:01:07,480 --> 01:01:11,000
of spreads and creates these 
cultural changes that you need. 

1060
01:01:11,360 --> 01:01:15,120
So foster that learning, you 
know, support your people and 

1061
01:01:15,120 --> 01:01:17,480
learning. 
Create those spaces for 

1062
01:01:17,480 --> 01:01:20,200
learning. 
Try to give some back pressure 

1063
01:01:20,200 --> 01:01:24,280
to delivery deadlines to allow 
for some of that learning 

1064
01:01:24,280 --> 01:01:27,000
cushion. 
Another thing that I want to 

1065
01:01:27,000 --> 01:01:32,280
share is I don't think all risk 
conversations have to be scary. 

1066
01:01:33,320 --> 01:01:37,280
I think it's totally natural. 
And like I, I empathize so much 

1067
01:01:37,600 --> 01:01:42,680
with it being scary because it 
is a big job and it is like a 

1068
01:01:42,680 --> 01:01:46,560
serious job to talk about the 
risk of people's data and that 

1069
01:01:46,560 --> 01:01:50,280
trust relationship. 
And yet at the same time, if we 

1070
01:01:50,280 --> 01:01:55,080
can normalize conversations 
around risk, if we can kind of 

1071
01:01:55,080 --> 01:02:00,160
allow for some of that space to 
have like scary topics, be part 

1072
01:02:00,160 --> 01:02:03,120
of our Rd. mapping, be part of 
our planning, where we can 

1073
01:02:03,120 --> 01:02:06,240
actually kind of just regularly 
do privacy risk reviews, 

1074
01:02:06,240 --> 01:02:09,280
security risk reviews, auditing 
reviews. 

1075
01:02:09,320 --> 01:02:12,760
If we can kind of normalize that
as a normal thing, like we do 

1076
01:02:12,760 --> 01:02:16,680
with like testing for bugs or 
figuring out if something is 

1077
01:02:16,680 --> 01:02:21,120
deployed properly, then we also 
make it like more tangible. 

1078
01:02:21,120 --> 01:02:24,880
And I think that it becomes less
scary by default and we make it 

1079
01:02:24,880 --> 01:02:27,040
less uncertain. 
And I think most of the 

1080
01:02:27,040 --> 01:02:29,880
scariness comes from this 
uncertainty problem. 

1081
01:02:30,320 --> 01:02:32,640
So to try to focus on that, 
right? 

1082
01:02:33,320 --> 01:02:36,080
And then finally, like that 
relates to both of these relate 

1083
01:02:36,080 --> 01:02:39,280
to like creating a team culture 
and hopefully an organization 

1084
01:02:39,280 --> 01:02:44,040
culture of psychological safety.
And I think by knowing that 

1085
01:02:44,040 --> 01:02:46,440
we're not going to get 
everything 100% right all the 

1086
01:02:46,440 --> 01:02:50,480
time, we can foster both 
learning, we can foster risk 

1087
01:02:50,480 --> 01:02:53,160
discussions. 
And we can certainly build much,

1088
01:02:53,160 --> 01:02:57,080
much more private and secure 
systems by allowing us to have 

1089
01:02:57,080 --> 01:03:01,080
real conversations, by allowing 
us to like escalate problems 

1090
01:03:01,080 --> 01:03:05,560
when they exist, when we see 
them, and by empowering people 

1091
01:03:05,560 --> 01:03:09,800
to kind of have reporting up and
down about what do we think 

1092
01:03:09,800 --> 01:03:11,880
about problems like privacy and 
security. 

1093
01:03:11,880 --> 01:03:15,400
So up the hierarchical chain of 
the Oregon down. 

1094
01:03:15,720 --> 01:03:19,840
And by fostering this kind of 
safety conversation around these

1095
01:03:19,840 --> 01:03:23,800
types of problems, I think we 
end up creating by default 

1096
01:03:23,800 --> 01:03:27,000
better software systems products
in terms of privacy and 

1097
01:03:27,000 --> 01:03:29,400
security. 
Wow, very lovely. 

1098
01:03:29,400 --> 01:03:31,480
And you know, it all 
interrelated with each other, 

1099
01:03:31,480 --> 01:03:32,960
right? 
I especially love the second 

1100
01:03:32,960 --> 01:03:35,200
one. 
You know, not all risk needs to 

1101
01:03:35,200 --> 01:03:37,640
be scary, right? 
Sometimes it just because of the

1102
01:03:37,640 --> 01:03:40,600
uncertainty, like we don't know 
about that particular risk. 

1103
01:03:40,960 --> 01:03:43,680
But I think if you dive deep, if
you learn right and you allow 

1104
01:03:43,680 --> 01:03:46,840
people to experiment and try to 
solve the problem, psychological

1105
01:03:46,840 --> 01:03:49,480
safety here. 
So I think maybe we can all 

1106
01:03:49,480 --> 01:03:51,760
improve together. 
So thanks for that beautiful 

1107
01:03:51,760 --> 01:03:54,200
wisdom. 
So, Catherine, for people who 

1108
01:03:54,200 --> 01:03:56,680
love to talk more about privacy,
is there a place where they can 

1109
01:03:56,680 --> 01:03:59,600
find you online or maybe 
resources where they can learn 

1110
01:03:59,600 --> 01:04:03,360
more about privacy? 
Yeah, I mean, I have a 

1111
01:04:03,360 --> 01:04:05,560
newsletter, it's called probably
Private. 

1112
01:04:05,560 --> 01:04:08,280
So if you're more of like a 
e-mail newsletter person, you 

1113
01:04:08,280 --> 01:04:11,480
can find me there. 
I'm starting kind of to produce 

1114
01:04:11,480 --> 01:04:13,800
also some YouTube videos. 
They're mainly around machine 

1115
01:04:13,800 --> 01:04:16,560
learning systems and privacy and
machine learning systems. 

1116
01:04:16,960 --> 01:04:21,160
So you can find me there. 
My book is mainly written for 

1117
01:04:21,200 --> 01:04:25,320
data people, but I hope or I 
heard some feedback from also 

1118
01:04:25,320 --> 01:04:29,520
software and architects that it 
can be useful parts of my book. 

1119
01:04:29,920 --> 01:04:34,520
And then there's such an amazing
wealth of resources online 

1120
01:04:34,520 --> 01:04:37,720
around privacy. 
I would even like even just 

1121
01:04:37,720 --> 01:04:41,240
start with your cloud provider, 
like your main cloud provider or

1122
01:04:41,240 --> 01:04:43,720
a few of your main service 
providers if you have major 

1123
01:04:43,720 --> 01:04:47,960
services and just start looking 
around of like what settings do 

1124
01:04:47,960 --> 01:04:51,280
they have around privacy? 
What do they make available to 

1125
01:04:51,280 --> 01:04:53,200
you? 
What knobs can you turn? 

1126
01:04:53,960 --> 01:04:56,960
Even just starting there or even
having like somebody on your 

1127
01:04:56,960 --> 01:04:59,640
team, like the art, the cloud 
architecture, cloud engineer on 

1128
01:04:59,640 --> 01:05:04,000
your team focus on that could 
end up paying dividends later of

1129
01:05:04,000 --> 01:05:07,240
just starting to know, hey, it'd
be really easy for us to turn 

1130
01:05:07,240 --> 01:05:09,560
on, you know, this or turn off 
this. 

1131
01:05:09,840 --> 01:05:12,240
And that would provide better 
privacy. 

1132
01:05:12,240 --> 01:05:15,040
So I think those are some easy 
ways to get started and make 

1133
01:05:15,040 --> 01:05:18,000
friends with privacy people at 
your org. 

1134
01:05:18,400 --> 01:05:21,480
Be nice to them. 
Just reach out, set up a 1 to 

1135
01:05:21,480 --> 01:05:27,360
115 minute coffee because they 
could be huge resources if you 

1136
01:05:27,360 --> 01:05:29,520
develop that or maybe you 
already have the relationship, 

1137
01:05:29,520 --> 01:05:32,240
but if you can develop that 
relationship, that can be a 

1138
01:05:32,240 --> 01:05:35,800
quick person that you can just 
double check your thinking with 

1139
01:05:35,800 --> 01:05:39,120
or think out loud with private, 
if they have a legal background,

1140
01:05:39,120 --> 01:05:41,280
they're never going to say yes 
or no, which is fine. 

1141
01:05:41,360 --> 01:05:44,560
That's what they're trained to 
do, good job for them, but 

1142
01:05:44,560 --> 01:05:46,480
they're going to give you 
advice, they're going to give 

1143
01:05:46,480 --> 01:05:48,640
you steering ideas, they're 
going to give you guiding 

1144
01:05:48,640 --> 01:05:51,920
questions is going to help you. 
So be friendly with them. 

1145
01:05:52,680 --> 01:05:54,880
Well, thanks for the plug. 
I'm sure if there are legal 

1146
01:05:54,880 --> 01:05:57,280
people listening to this, they 
will feel happy. 

1147
01:05:58,160 --> 01:06:01,280
So I think, yeah, make friends 
with the legal compliance team, 

1148
01:06:01,280 --> 01:06:04,160
security team, right? 
So those people not there to 

1149
01:06:04,160 --> 01:06:07,040
make the job harder for you, but
actually they help you and the 

1150
01:06:07,040 --> 01:06:09,840
organization to improve. 
So, Catherine, I love this 

1151
01:06:09,840 --> 01:06:11,400
conversation. 
Thank you so much for your time.

1152
01:06:11,400 --> 01:06:14,400
I hope the listeners here learn 
a lot about the data privacy 

1153
01:06:14,400 --> 01:06:15,680
today. 
So thanks again. 

1154
01:06:16,160 --> 01:06:17,040
Thanks so much, Henry.
