1
00:00:02,000 --> 00:00:06,200
I feel in that sense a PhD is 
actually, you know, in some 

2
00:00:06,200 --> 00:00:07,800
sense. 
It's leading you to become a 

3
00:00:07,800 --> 00:00:11,300
data scientist because that's 
what we do in day in and day 

4
00:00:11,300 --> 00:00:14,700
out. 
One of the challenges I feel is 

5
00:00:14,700 --> 00:00:17,700
that in the industry itself. 
The expectation to reach a 

6
00:00:17,708 --> 00:00:21,300
solution is the time frame is 
very small. 

7
00:00:39,200 --> 00:00:42,800
Hello and welcome to data. 
Shatter the podcast on all 

8
00:00:42,800 --> 00:00:46,100
things data. 
This podcast is a series of 

9
00:00:46,100 --> 00:00:49,100
conversations with experts and 
Industry leaders in data. 

10
00:00:49,300 --> 00:00:52,100
And each week. 
We aim to unpack a different 

11
00:00:52,100 --> 00:00:53,900
compartment of the data 
suitcase. 

12
00:00:54,600 --> 00:00:57,000
I am your host. 
Karthik chassis that I'm a 

13
00:00:57,008 --> 00:01:00,900
blogger newspaper, columnist 
book author and a former data 

14
00:01:00,900 --> 00:01:04,200
and strategy consultant at 
currently head analytics and 

15
00:01:04,200 --> 00:01:07,400
business intelligence for 
Liberty one of India's largest 

16
00:01:07,400 --> 00:01:10,700
logistics companies. 
You can follow me on Twitter at 

17
00:01:10,700 --> 00:01:16,900
Karthik s that is Kar Phi. 
K s and read my blog at no 

18
00:01:16,900 --> 00:01:19,800
Intruder.com. 
That is n 0e. 

19
00:01:20,300 --> 00:01:26,400
N th you be a.com, Halloween is 
expressing his podcast belong to

20
00:01:26,408 --> 00:01:29,500
me and my podcast guests, and it
do not reflect the views of any 

21
00:01:29,500 --> 00:01:32,100
organizations. 
We might be Associated, nothing 

22
00:01:32,100 --> 00:01:33,600
disgusting. 
This podcast will be taken as 

23
00:01:33,600 --> 00:01:44,300
Financial or legal advice. 
When I was graduating college in

24
00:01:44,300 --> 00:01:49,100
the mid-2000s the word in job, 
descriptions that most commonly 

25
00:01:49,100 --> 00:01:51,900
appeared alongside data was 
analytics. 

26
00:01:52,500 --> 00:01:56,700
However, in 2008, the phrase 
data science, got coin and took 

27
00:01:56,700 --> 00:01:58,400
over the world in the next five 
years. 

28
00:01:59,100 --> 00:02:01,400
Nowadays. 
It seems, everyone wants to be a

29
00:02:01,400 --> 00:02:05,100
data scientist. 
Where is this science in data 

30
00:02:05,100 --> 00:02:07,300
science? 
It's and why are so many people 

31
00:02:07,300 --> 00:02:11,000
with phds in pure Sciences 
moving to data science to 

32
00:02:11,000 --> 00:02:13,900
understand this better. 
I bring back one of my old 

33
00:02:13,900 --> 00:02:15,400
guests or data. 
Chatham. 

34
00:02:15,800 --> 00:02:18,700
Hernia parameswaran is an 
aerospace engineer, third 

35
00:02:18,700 --> 00:02:20,900
neuroscientist turned data 
scientist. 

36
00:02:20,900 --> 00:02:24,500
She's co-founder of messy 
fractals and cavity, Atta and a 

37
00:02:24,500 --> 00:02:27,600
researcher acts. 18 laps, Daniel
talks about her journey from 

38
00:02:27,600 --> 00:02:31,100
Neuroscience to data science, 
white PhD, is good training for 

39
00:02:31,100 --> 00:02:33,700
data science. 
And what the science in data 

40
00:02:33,700 --> 00:02:37,500
science is all about So to get 
us started. 

41
00:02:37,500 --> 00:02:39,400
Can you just take me through 
your career? 

42
00:02:39,400 --> 00:02:41,300
Evolution? 
I mean, last pretty much the 

43
00:02:41,300 --> 00:02:43,500
last time. 
Last I know is that you are 

44
00:02:43,500 --> 00:02:45,600
doing b.tech in aerospace, 
engineering. 

45
00:02:45,600 --> 00:02:49,400
So what happened after yeah, 
like you mentioned right? 

46
00:02:49,500 --> 00:02:54,600
I did my btech in aerospace 
engineering at IIT Madras and 

47
00:02:54,700 --> 00:02:57,100
that's where the pH debug 
between. 

48
00:02:57,400 --> 00:03:01,900
Where during my third year, at 
IIT itself. 

49
00:03:02,300 --> 00:03:08,100
I was introduced to this. 
Guitarist at DHS Hospital, which

50
00:03:08,100 --> 00:03:11,700
isn't that our money. 
And his name is, yes 

51
00:03:11,700 --> 00:03:16,500
krishnamurti, and he was doing 
some fmri work in neuroscience. 

52
00:03:17,000 --> 00:03:20,700
And one of my seniors SRI that 
they were Arjun. 

53
00:03:21,100 --> 00:03:24,200
Who is in fact, a professor at, 
I see, I see today. 

54
00:03:24,900 --> 00:03:28,100
He introduced me to dr. 
Krishnamoorthy. 

55
00:03:28,500 --> 00:03:31,500
And there was my first stint 
with Neuroscience. 

56
00:03:31,500 --> 00:03:35,700
So, where I started looking at 
Neuroscience data, I Getting 

57
00:03:35,700 --> 00:03:40,500
some fmri recordings in that 
sense, right and post this I 

58
00:03:40,500 --> 00:03:46,200
continued my PhD immediately 
after my btech at NCBS in 

59
00:03:46,200 --> 00:03:50,000
Bangalore in CB s stands for 
National Center for biological 

60
00:03:50,000 --> 00:03:54,000
sciences, right? 
And here I did my PhD with dr. 

61
00:03:54,000 --> 00:03:58,000
Open the sing Bella. 
In fact, he himself is a 

62
00:03:58,100 --> 00:04:01,700
physicist who moved into 
neuroscience and those were the 

63
00:04:01,700 --> 00:04:03,000
transitions which were 
happening. 

64
00:04:03,000 --> 00:04:05,400
I guess a decade or two earlier.
Sure. 

65
00:04:06,400 --> 00:04:11,900
And during my PhD, I worked with
another collaborator for name is

66
00:04:11,900 --> 00:04:18,300
Tarik, a garage in, and she is a
neuroscientist of, and she's 

67
00:04:18,300 --> 00:04:20,100
from Stanford. 
She was a neuroscientist from 

68
00:04:20,100 --> 00:04:23,100
Stanford, but due to some 
personal situation. 

69
00:04:23,100 --> 00:04:26,900
She had to come back to India 
and her father had, in fact, 

70
00:04:26,900 --> 00:04:30,800
passed away and she had to take 
over her father's micro finance 

71
00:04:30,800 --> 00:04:33,500
company. 
So, she became a neuroscientist 

72
00:04:33,500 --> 00:04:37,200
and an entrepreneur that time. 
So, She was working with us at 

73
00:04:37,200 --> 00:04:40,200
in CBS and I work very closely 
with her for three years. 

74
00:04:40,400 --> 00:04:45,100
Last, three years of my PhD. 
And once I finished my PhD, she 

75
00:04:45,100 --> 00:04:49,300
said, why don't you kind of 
helped me solve some questions 

76
00:04:49,300 --> 00:04:51,300
at mudra microfinance? 
Also. 

77
00:04:51,400 --> 00:04:55,500
And so, the questions are simple
questions, right? 

78
00:04:55,500 --> 00:04:59,700
Like, whom do I best lend loans 
to write? 

79
00:04:59,700 --> 00:05:04,300
How do I cluster Villages where 
I give loans and they were not 

80
00:05:04,300 --> 00:05:07,100
Neuroscience questions, but they
Very interesting questions. 

81
00:05:07,100 --> 00:05:12,800
Nevertheless, and I guess that's
where I would have first taken 

82
00:05:12,800 --> 00:05:15,400
up. 
What is what we call as data 

83
00:05:15,400 --> 00:05:18,500
science today, right? 
Try to address these problems. 

84
00:05:18,900 --> 00:05:22,500
In fact, the very first question
I addressed was this morning, 

85
00:05:22,700 --> 00:05:27,400
whom do we lend loans to? 
And at that time we built a 

86
00:05:27,400 --> 00:05:30,700
simple, logistically, regression
model and the results are very 

87
00:05:30,700 --> 00:05:33,400
interesting because it was a 
mother's education, which kind 

88
00:05:33,400 --> 00:05:35,600
of finally decides whether the 
person will repay. 

89
00:05:35,700 --> 00:05:41,100
A the loan or not, right? 
And so that's how I got into 

90
00:05:41,100 --> 00:05:45,200
data science and I won't say, in
my particular case. 

91
00:05:45,200 --> 00:05:49,700
I won't say it's 100% shift from
Neuroscience to data sense. 

92
00:05:50,100 --> 00:05:55,400
Because today, I continue 
working with Tara And as a part 

93
00:05:55,400 --> 00:05:57,900
of this company called Sapien 
Labs. 

94
00:05:58,200 --> 00:06:02,700
So Sapien Labs is essentially, a
not-for-profit neuroscience 

95
00:06:02,700 --> 00:06:05,600
company where we are trying to 
collect data EG data. 

96
00:06:05,800 --> 00:06:10,400
From thousands of people across 
the world, you know, and you're 

97
00:06:10,400 --> 00:06:12,300
trying to make sense out of that
data. 

98
00:06:12,600 --> 00:06:17,100
So a part of my life is still 
stuck to Academia in that sense.

99
00:06:17,200 --> 00:06:21,000
And another part of my life is 
where I do more of my data 

100
00:06:21,000 --> 00:06:23,600
science related things today. 
Interesting. 

101
00:06:23,600 --> 00:06:26,300
Interesting. 
So, okay, so, but that Euro 

102
00:06:26,300 --> 00:06:29,400
science Academia part is not 
strictly speaking. 

103
00:06:29,400 --> 00:06:31,800
It's not like work for a 
university or something like 

104
00:06:31,800 --> 00:06:33,800
that. 
It's more like Neuroscience. 

105
00:06:33,800 --> 00:06:36,100
I can't hear. 
Such. 

106
00:06:36,100 --> 00:06:39,900
Yeah, exactly. 
Okay, so it was a sort of soft 

107
00:06:39,900 --> 00:06:41,100
move in some sense for you, 
right? 

108
00:06:41,100 --> 00:06:43,400
It was like you moved from 
Neuroscience to something 

109
00:06:43,400 --> 00:06:46,200
adjacent click. 
So, if I ask it, why is it that 

110
00:06:46,200 --> 00:06:50,000
you just sort of stuck on in 
industry and business for the 

111
00:06:50,000 --> 00:06:54,800
large part rather than sort of 
pursuing a career within its a 

112
00:06:54,800 --> 00:06:57,400
core Academia and so on? 
Because my understanding is that

113
00:06:57,400 --> 00:07:00,700
like or it correct me if I'm 
wrong, but my understanding is 

114
00:07:00,700 --> 00:07:04,600
that if somebody does the PHD 
this with a strong, it did to 

115
00:07:04,600 --> 00:07:08,200
becoming an academic. 
So what was the then? 

116
00:07:08,200 --> 00:07:11,300
We particular reasons why you 
stuck to the to business? 

117
00:07:13,000 --> 00:07:19,300
No, I am sure even when I 
started off my PhD, I guess the 

118
00:07:19,300 --> 00:07:21,700
Eve was probably to become an 
academy ssion. 

119
00:07:22,100 --> 00:07:24,700
Become a professor at some 
University and, you know, follow

120
00:07:24,700 --> 00:07:28,700
it up from there. 
But as you do, your PhD, I think

121
00:07:28,700 --> 00:07:32,500
the number 133 to anyone needs 
to complete their phds 

122
00:07:32,500 --> 00:07:37,400
perseverance because it is a 
grueling sort of affair. 

123
00:07:37,500 --> 00:07:40,800
Right? 
And today, things have changed a

124
00:07:40,800 --> 00:07:45,000
bit when it comes to Getting an 
academy, adore bread. 

125
00:07:45,300 --> 00:07:49,700
Because most of us need to 
follow up phds with a postdoc 

126
00:07:50,300 --> 00:07:55,200
and postdoc is not a degree as 
such but it is more of the same 

127
00:07:55,200 --> 00:07:57,900
stuff that you do do during a 
PhD. 

128
00:07:58,400 --> 00:08:01,600
And essentially, when you 
finally get your job, you'll be 

129
00:08:01,600 --> 00:08:04,900
easily around 32 years old and I
would sit, right? 

130
00:08:04,900 --> 00:08:09,300
So it's a long, wait before you 
actually get your tenure track 

131
00:08:09,300 --> 00:08:11,000
position, right? 
Because that would take another 

132
00:08:11,000 --> 00:08:15,300
three to four beers. 
In University, I think that was 

133
00:08:15,300 --> 00:08:20,800
one of the deterrence and the 
second thing. 

134
00:08:20,800 --> 00:08:23,600
I mean I would say is 
monetarily. 

135
00:08:23,600 --> 00:08:28,700
Also, a PhD after PhD it, I 
don't know if it really makes 

136
00:08:28,700 --> 00:08:32,100
that much sense, right? 
Because you're if you look at 

137
00:08:32,100 --> 00:08:36,200
your P scales and your growth in
terms of career opportunities, 

138
00:08:36,600 --> 00:08:38,700
you grow fast. 
I mean, you grow. 

139
00:08:38,700 --> 00:08:42,200
As you do your degree is right. 
If I do my btech, I get 

140
00:08:42,200 --> 00:08:43,299
something. 
I do my intake. 

141
00:08:43,299 --> 00:08:46,700
I get a little more if I to do 
my PhD images. 

142
00:08:46,700 --> 00:08:48,100
Come down. 
Yes. 

143
00:08:48,100 --> 00:08:48,700
Yes. 
Yes. 

144
00:08:48,700 --> 00:08:49,100
Yes. 
Yes. 

145
00:08:49,500 --> 00:08:54,300
And I'm a bit old, for many of 
the jobs also, so that would be 

146
00:08:54,300 --> 00:08:58,300
some of the reasons I will say. 
But in my case, it was also 

147
00:08:58,300 --> 00:09:02,400
serendipitous, like my husband 
arvind. 

148
00:09:02,500 --> 00:09:04,700
He was running a startup called 
office. 

149
00:09:04,700 --> 00:09:06,100
Yes. 
At that time, which is a 

150
00:09:06,100 --> 00:09:11,400
e-commerce startup, and they 
were a lot of operational 

151
00:09:11,400 --> 00:09:15,200
optimization which It needed to 
be done, and he is to obviously 

152
00:09:15,200 --> 00:09:18,800
reach out to me, and we used to 
work quite a bit together at 

153
00:09:18,800 --> 00:09:23,700
that particular time point to 
solve, you know, supply chain 

154
00:09:23,700 --> 00:09:26,400
problems. 
Operation problems for this 

155
00:09:26,400 --> 00:09:29,900
particular startup, and that's 
when both of us, kind of, you 

156
00:09:29,900 --> 00:09:32,000
know, thought about it. 
Why don't we kind of work 

157
00:09:32,000 --> 00:09:35,100
together because you bring 
business value. 

158
00:09:35,100 --> 00:09:38,100
I can solve the problems and 
that seems to make a lot of 

159
00:09:38,100 --> 00:09:42,000
sense that when we created messy
fractals in 2015. 

160
00:09:42,700 --> 00:09:44,700
Three years after I finish my 
PhD. 

161
00:09:45,700 --> 00:09:49,300
And the goal of messy fractals 
was centered around data, sings.

162
00:09:50,200 --> 00:09:53,500
Okay, I then from there like you
guys decided to sort of focus a 

163
00:09:53,500 --> 00:09:56,900
bit more on the, on the sports 
bit, I guess, which is where you

164
00:09:56,900 --> 00:09:59,400
sort of got into comedy at diet,
things like that. 

165
00:09:59,400 --> 00:10:00,700
I have no idea. 
Exactly. 

166
00:10:01,300 --> 00:10:02,500
Exactly. 
Yeah. 

167
00:10:02,500 --> 00:10:04,900
They're initially, we started 
working with startups. 

168
00:10:04,900 --> 00:10:09,100
We wanted to solve data, science
problems for startups, and 

169
00:10:09,100 --> 00:10:12,500
sports was a hobby. 
So where we started working with

170
00:10:12,500 --> 00:10:15,900
Saina Nehwal prakash Padukone at
that time as a hobby, I would 

171
00:10:15,900 --> 00:10:18,400
say because it was just an 
interesting problem to solve. 

172
00:10:19,300 --> 00:10:23,600
And but eventually kept getting 
bigger and bigger and snowballed

173
00:10:23,600 --> 00:10:28,200
into more sport analytics and 
less of other startup stuff 

174
00:10:29,600 --> 00:10:31,500
coming back to your PhD days and
so on. 

175
00:10:31,500 --> 00:10:34,200
So what about your fears during 
your PHD programs? 

176
00:10:34,200 --> 00:10:37,700
Have have many of them. 
Also moved into something sort 

177
00:10:37,700 --> 00:10:40,900
of similar to data science, or 
have a lot of them sort of 

178
00:10:41,200 --> 00:10:45,200
stayed within Academia. 
If I have to think correctly, 

179
00:10:45,800 --> 00:10:49,200
they were wrong ten of us who 
are doing a PhD is roughly 

180
00:10:49,200 --> 00:10:53,200
around the same time. 
And I would say, three to four 

181
00:10:53,200 --> 00:10:55,400
of them. 
Three to four of us have moved 

182
00:10:55,500 --> 00:10:59,900
out of Academy at self. 
Yeah, rest of them. 

183
00:10:59,900 --> 00:11:02,200
They are still pursuing hoping 
to get to Academia. 

184
00:11:02,200 --> 00:11:04,100
Some of them are still in their 
postdoc. 

185
00:11:04,100 --> 00:11:07,700
So, but I believe we are. 
If they get into their postdoc 

186
00:11:07,700 --> 00:11:11,400
at least they're quite clear 
that they want to end up in 

187
00:11:11,400 --> 00:11:12,600
Academia. 
Yeah. 

188
00:11:12,600 --> 00:11:13,000
Yeah. 
Yeah. 

189
00:11:13,200 --> 00:11:14,900
To me personally, that's a very 
strong sign. 

190
00:11:14,900 --> 00:11:18,800
I mean because I think the 
easiest best exit out of 

191
00:11:18,900 --> 00:11:21,700
Academia into Industries. 
The moment you get your PhD, I 

192
00:11:21,700 --> 00:11:25,900
guess because once you get a get
into a postdoc, you going to 

193
00:11:25,900 --> 00:11:29,500
the, you signal to the market in
a among other things that you 

194
00:11:29,500 --> 00:11:32,800
want to be become an academic 
rather than be the industry, I 

195
00:11:32,808 --> 00:11:35,500
guess. 
That's exactly how it is. 

196
00:11:36,300 --> 00:11:38,800
Yeah, so, okay. 
So now coming back like that. 

197
00:11:38,900 --> 00:11:41,500
So, what's your in your obedient
rather? 

198
00:11:41,500 --> 00:11:44,200
What's the definition of data 
sites have been? 

199
00:11:44,200 --> 00:11:47,100
Because I asked a lot of my 
guests this because this is a 

200
00:11:47,100 --> 00:11:49,600
very, very highly abuse term, 
right? 

201
00:11:49,600 --> 00:11:52,700
Like, lots of different people 
use it to mean different things.

202
00:11:52,700 --> 00:11:54,300
So what is your opinion is data 
center? 

203
00:11:55,000 --> 00:11:57,600
Okay. 
So my opinions, I guess they'll 

204
00:11:57,600 --> 00:12:03,500
be biased by my experience. 
By doing my PhD, right? 

205
00:12:03,600 --> 00:12:08,000
So and that's how I address all 
the problems which I face. 

206
00:12:08,500 --> 00:12:11,800
So I go in with the thing that I
hype. 

207
00:12:11,800 --> 00:12:15,200
I try to design a good question.
When I face a data science 

208
00:12:15,200 --> 00:12:17,700
problem, right? 
It's not easy often to design 

209
00:12:17,700 --> 00:12:21,000
the right question. 
And once I designed the right 

210
00:12:21,000 --> 00:12:25,700
question, I try to hypothesize 
on what could be the possible 

211
00:12:25,700 --> 00:12:29,900
solutions. 
And most importantly, I feel 

212
00:12:31,300 --> 00:12:36,100
Understand the context within 
which this question or problem 

213
00:12:36,100 --> 00:12:37,700
is. 
They're right? 

214
00:12:37,700 --> 00:12:41,500
For instance, in badminton. 
The context would be, how 

215
00:12:41,500 --> 00:12:45,600
players understand. 
You know, how our players will 

216
00:12:46,300 --> 00:12:49,200
learn the sport of badminton. 
What are the different 

217
00:12:49,200 --> 00:12:52,200
dimensions when it comes to 
badminton itself, right? 

218
00:12:52,400 --> 00:12:55,600
For instance, height is one 
dimension, which I never thought

219
00:12:55,600 --> 00:12:59,500
of until I spoke to people who 
play badminton height at which 

220
00:12:59,500 --> 00:13:02,700
they get the shuttle. 
So Creating context around the 

221
00:13:02,700 --> 00:13:07,100
question I think, is one of the 
most important things, and once 

222
00:13:07,100 --> 00:13:09,200
we do this, we can design 
Solutions. 

223
00:13:09,200 --> 00:13:12,700
And that's where I think the 
abuse term of data science comes

224
00:13:12,700 --> 00:13:15,400
in today, right? 
Where I design Solutions and 

225
00:13:15,400 --> 00:13:17,900
write code to address 0 
Solutions. 

226
00:13:19,000 --> 00:13:21,700
But it's oh, so what you're 
saying is like, I mean, if I 

227
00:13:22,000 --> 00:13:25,400
might be paraphrasing, correct 
me if I'm wrong that the bigger 

228
00:13:25,400 --> 00:13:28,700
problem bigger challenge is in 
figuring out the right question 

229
00:13:28,700 --> 00:13:32,000
to ask rather than just solve 
solving the thing. 

230
00:13:32,000 --> 00:13:33,600
Gam get the problem. 
I'm guessing. 

231
00:13:33,600 --> 00:13:37,600
Yeah, exactly. 
So they can anything standing 

232
00:13:37,900 --> 00:13:41,200
designing the question and 
understanding the context. 

233
00:13:41,700 --> 00:13:43,900
Yes. 
Yes, because I know for example,

234
00:13:43,900 --> 00:13:47,600
recently, I put this sort of 
only LinkedIn post where I said 

235
00:13:47,600 --> 00:13:51,300
that anybody can The three lines
of scikit-learn sport to any 

236
00:13:51,300 --> 00:13:54,500
machine learning model. 
What's what's critical is to 

237
00:13:54,700 --> 00:13:58,900
figure out what the context is 
and structure the problem 

238
00:13:58,900 --> 00:14:02,100
correctly rather than so the 
real skill is in structuring the

239
00:14:02,100 --> 00:14:05,500
problem rather than sort of 
solving it ready for solving it.

240
00:14:05,700 --> 00:14:09,500
Some extent is sort of 
democratized nowadays, encasing 

241
00:14:10,100 --> 00:14:11,300
correct. 
Correct. 

242
00:14:11,300 --> 00:14:12,800
Exactly. 
Solving. 

243
00:14:12,800 --> 00:14:16,600
It is moral, is democratized 
with the kind of codes and the 

244
00:14:16,608 --> 00:14:20,300
packages that you get, right? 
It's it's pretty easy to write 

245
00:14:20,300 --> 00:14:22,700
it. 
Finally, if you've got the 

246
00:14:22,700 --> 00:14:27,500
structure, correct and well 
thought through and I guess 

247
00:14:27,500 --> 00:14:29,700
that's what will be good data 
signs. 

248
00:14:29,900 --> 00:14:35,600
And that is precisely, the 
reason I feel a lot of phds end 

249
00:14:35,600 --> 00:14:38,500
up doing data science because 
that is a skill. 

250
00:14:38,500 --> 00:14:42,200
We are trained to do in our 
phds, whether we like it or not 

251
00:14:42,200 --> 00:14:44,500
right. 
There is obviously a specialized

252
00:14:44,500 --> 00:14:47,900
question, we go in trying to 
address in my case. 

253
00:14:47,900 --> 00:14:51,800
It was Sorry, I was trying to 
address a question that, but 

254
00:14:52,200 --> 00:14:56,400
what I ended up doing in my 
first two years of my PhD was 

255
00:14:56,900 --> 00:15:01,400
designed the question, which has
not been solved before design, 

256
00:15:01,400 --> 00:15:04,900
understand the context in which 
that question, is there by 

257
00:15:04,900 --> 00:15:07,200
reading a lot of research 
papers. 

258
00:15:07,600 --> 00:15:10,100
And, you know, seeing if 
something which I think about 

259
00:15:10,100 --> 00:15:14,100
has already been done or not, 
and then the next two years, it 

260
00:15:14,100 --> 00:15:17,000
was about collecting data 
designing experiments to collect

261
00:15:17,000 --> 00:15:19,900
data. 
So I could collect as much data 

262
00:15:19,900 --> 00:15:23,800
as I wanted or as much data as I
thought make sense to solve this

263
00:15:23,800 --> 00:15:27,300
problem. 
And the final step is where you 

264
00:15:27,300 --> 00:15:31,000
kind of write that piece of code
which solves it, build a model, 

265
00:15:31,500 --> 00:15:34,300
which addresses that particular 
problem that you're solving. 

266
00:15:34,900 --> 00:15:39,600
So I feel in that sense a PhD is
actually, you know, in some 

267
00:15:39,600 --> 00:15:41,200
sense. 
It's leading you to become a 

268
00:15:41,208 --> 00:15:44,700
data scientist because that's 
what we do in day in and day 

269
00:15:44,700 --> 00:15:46,000
out. 
Yeah. 

270
00:15:46,000 --> 00:15:49,500
So is there other certain So 
Phi, G Square, this is more 

271
00:15:49,500 --> 00:15:52,300
prominent because what I see is 
like, especially what I see is 

272
00:15:52,600 --> 00:15:54,400
data sets may not be so much in 
India. 

273
00:15:54,400 --> 00:15:56,600
But at least I brought like you 
have a lot of data scientist who

274
00:15:56,608 --> 00:16:01,600
come from in my limited data 
Gathering physics or biology 

275
00:16:01,600 --> 00:16:05,100
backgrounds rather than other 
sort of academic background. 

276
00:16:05,100 --> 00:16:07,900
So are there certain kinds of 
phds that are more suited 

277
00:16:07,900 --> 00:16:11,200
towards data science? 
And like I mean, or or rather 

278
00:16:11,500 --> 00:16:13,600
like if you want to get into the
industry you get into data 

279
00:16:13,600 --> 00:16:16,000
science instead of getting into 
something else and so on. 

280
00:16:16,900 --> 00:16:23,900
I I believe, right, the way 
people think about research and 

281
00:16:25,200 --> 00:16:29,300
It is almost any PhD kind of 
goes through the similar 

282
00:16:29,300 --> 00:16:31,800
process, which I just told you 
which is hypothesized. 

283
00:16:31,800 --> 00:16:37,300
Collect data, solve the problem.
In fact experiments experiments 

284
00:16:37,300 --> 00:16:41,100
as a way of solving problems 
Krypton into the way science is 

285
00:16:41,100 --> 00:16:43,600
happening. 
I think only after Newton's 

286
00:16:43,600 --> 00:16:47,500
times, right? 
So it's quite the way that has 

287
00:16:47,500 --> 00:16:52,700
happened. 
So so datas, I believe that PhD 

288
00:16:52,700 --> 00:16:57,200
in any field will be Equally 
good at solving data science 

289
00:16:57,200 --> 00:17:00,000
problems. 
The only thing is the figure of 

290
00:17:00,000 --> 00:17:03,900
coding, which is stopping many 
people from taking up that as an

291
00:17:03,900 --> 00:17:08,400
industry alternative. 
But otherwise, I don't think it 

292
00:17:08,400 --> 00:17:10,400
should make a difference. 
In fact today. 

293
00:17:10,400 --> 00:17:14,300
What I see is the reversal of 
some process in some sense that 

294
00:17:14,300 --> 00:17:17,300
is people who are data 
scientists, right? 

295
00:17:17,300 --> 00:17:20,599
At the end of their detect are 
interested in machine learning 

296
00:17:20,599 --> 00:17:22,500
AI. 
They've done a bunch of courses,

297
00:17:23,200 --> 00:17:27,099
the research. 
I'm sorry inviting them because 

298
00:17:27,099 --> 00:17:30,400
all research papers. 
Go through this experiment data,

299
00:17:30,400 --> 00:17:33,300
presentation of data. 
And now there's a new section 

300
00:17:33,300 --> 00:17:36,600
which is added in all research 
papers, which is a model to 

301
00:17:36,600 --> 00:17:40,300
address that data and to build 
this model there, looking out 

302
00:17:40,300 --> 00:17:43,800
for data scientists to come in. 
So they're inviting data 

303
00:17:43,800 --> 00:17:47,200
scientist to get into the 
research lab and you know, solve

304
00:17:47,200 --> 00:17:52,400
problems, which may be of, you 
know, greater interest in that 

305
00:17:52,400 --> 00:17:53,700
sense. 
Like this could be like a 

306
00:17:53,700 --> 00:17:56,400
research in Pretty much any 
subject not like research in 

307
00:17:56,400 --> 00:17:57,900
machine learning or anything 
like that. 

308
00:17:59,200 --> 00:18:00,900
Not reset for the instance, 
right? 

309
00:18:00,900 --> 00:18:04,500
I will give you one example. 
I'm working on a project where 

310
00:18:04,500 --> 00:18:07,500
we are looking at cognitive 
development of three-year-old 

311
00:18:07,500 --> 00:18:11,400
children. 
And so there is a scale scale in

312
00:18:11,400 --> 00:18:13,000
the sense. 
There's a question which is 

313
00:18:13,000 --> 00:18:16,500
given to three year old children
are 80 85 questions and they 

314
00:18:16,500 --> 00:18:18,100
have to say which shape is 
bigger. 

315
00:18:18,100 --> 00:18:20,400
Which shape is smaller. 
Does this fit there in those 

316
00:18:20,400 --> 00:18:23,900
sort of things and you'll get a 
cognitive score of where they 

317
00:18:23,900 --> 00:18:25,800
are. 
It's called the Bailey's test. 

318
00:18:25,900 --> 00:18:27,400
Okay. 
So very popular test. 

319
00:18:28,100 --> 00:18:32,200
And now we are doing, EG is to 
collect data from the brain. 

320
00:18:32,600 --> 00:18:35,900
And so now, the obvious thing is
to say, okay, what are the EG 

321
00:18:35,900 --> 00:18:38,900
signatures, which predict my 
bailey score. 

322
00:18:39,300 --> 00:18:44,400
So I want data scientist or a 
person who is good at machine 

323
00:18:44,400 --> 00:18:46,400
learning to come and build this 
model for me. 

324
00:18:46,900 --> 00:18:51,700
And so, we invited one person 
from Harvard and he was an 

325
00:18:51,700 --> 00:18:55,100
undergrad, and he's the one who 
actually built out the model. 

326
00:18:55,100 --> 00:18:57,600
And it took him three four weeks
to build out this model. 

327
00:18:58,100 --> 00:19:02,000
But with some iterations and on 
them, but he was invited to 

328
00:19:02,000 --> 00:19:04,200
build the model. 
So it is a reversal of process 

329
00:19:04,200 --> 00:19:07,900
in some sense where and calling 
in data, scientists to quickly 

330
00:19:07,900 --> 00:19:11,400
solve this model for me because 
I have got all the data. 

331
00:19:11,400 --> 00:19:14,400
I've got all the context of got 
everything sorted for you now 

332
00:19:14,400 --> 00:19:17,300
just build the model and give it
to me, right? 

333
00:19:17,500 --> 00:19:20,000
Which I think kind of brings me 
to the sort of classic 

334
00:19:20,000 --> 00:19:22,700
definition of data scientists as
somebody who's like sort of 

335
00:19:22,900 --> 00:19:26,700
given a well-defined problem 
statement and then like writes 

336
00:19:26,700 --> 00:19:30,000
the model for it. 
I'm Exactly. 

337
00:19:30,100 --> 00:19:31,400
Exactly. 
Exactly. 

338
00:19:32,900 --> 00:19:34,400
Okay. 
Is this so changing track a 

339
00:19:34,408 --> 00:19:36,400
little bit like. 
So we were talking about how a 

340
00:19:36,600 --> 00:19:39,000
PhD sort of helps you in your 
data science career, because of 

341
00:19:39,000 --> 00:19:43,200
your process of hypothesize. 
Correct data, then we'll models 

342
00:19:43,200 --> 00:19:45,600
its own, which is pretty much 
what happens in everything that 

343
00:19:45,600 --> 00:19:46,600
we do at work. 
Right? 

344
00:19:46,800 --> 00:19:49,600
So, but are there any sort of 
challenges that I because you 

345
00:19:49,600 --> 00:19:51,200
come from an academic background
West? 

346
00:19:51,400 --> 00:19:52,400
Let's say, for example, I don't 
know. 

347
00:19:52,400 --> 00:19:58,100
Like your output is measured in 
terms of research papers and The

348
00:19:58,100 --> 00:20:00,400
things like that. 
So are there any challenges that

349
00:20:00,400 --> 00:20:03,200
you faced in the industry 
because of your academic 

350
00:20:03,200 --> 00:20:07,400
background? 
Yeah, they were definitely a few

351
00:20:07,400 --> 00:20:09,900
challenges. 
I faced initially, right. 

352
00:20:11,000 --> 00:20:14,600
One of the challenges. 
I feel is that in the industry 

353
00:20:14,600 --> 00:20:16,300
itself. 
The expectation to reach a 

354
00:20:16,300 --> 00:20:21,400
solution is the time frame is 
very small compared to that 

355
00:20:21,500 --> 00:20:25,000
during research in research. 
You kind of practically unlearn 

356
00:20:25,000 --> 00:20:29,500
that skill where speed is a 
skill which you unlearn during 

357
00:20:29,500 --> 00:20:32,600
your research, right? 
So but the minute we came to an 

358
00:20:32,600 --> 00:20:35,300
industry for instance. 
Instance, we working with our 

359
00:20:35,300 --> 00:20:39,900
Telecom product during early 
days of my see fractals, and 

360
00:20:39,900 --> 00:20:43,200
this guy had collected a lot of 
data about his users. 

361
00:20:43,300 --> 00:20:46,500
And he wanted us to Cluster the 
uses into many groups. 

362
00:20:46,900 --> 00:20:50,000
And he said, you have one month 
and you have to give me a 

363
00:20:50,008 --> 00:20:53,900
solution in one month and how to
Cluster the data and that's 

364
00:20:53,900 --> 00:20:56,500
practically the time. 
It took me to understand the 

365
00:20:56,500 --> 00:21:00,600
context of the problem. 
How people interact with people,

366
00:21:00,600 --> 00:21:03,500
paid vouchers and things like 
that, right? 

367
00:21:04,100 --> 00:21:08,500
So, speed is one thing I think 
which the industry demands, but 

368
00:21:08,900 --> 00:21:11,000
coming from an academic 
background. 

369
00:21:11,100 --> 00:21:16,800
We take it a bit too lightly. 
And another thing which I have 

370
00:21:16,800 --> 00:21:22,200
faced often is that the data is 
already ready when you come into

371
00:21:22,200 --> 00:21:25,400
the industry, right? 
So here is the data now, give me

372
00:21:25,400 --> 00:21:27,900
the solution. 
That's how it is there, as in 

373
00:21:27,900 --> 00:21:29,800
Academia. 
This is a problem. 

374
00:21:29,800 --> 00:21:32,300
I want to solve, so I need to 
collect this data. 

375
00:21:32,300 --> 00:21:33,900
So I'll design an experiment 
together. 

376
00:21:34,000 --> 00:21:37,400
Get this data. 
So the way that works is a 

377
00:21:37,400 --> 00:21:42,300
little different and often data 
is a limiting step. 

378
00:21:42,400 --> 00:21:45,800
I mean there are certain fields 
which are never available, you 

379
00:21:45,800 --> 00:21:49,100
know, there's certain data 
points which won't be available.

380
00:21:49,100 --> 00:21:51,900
But still, you need to go ahead 
and take a decision. 

381
00:21:52,700 --> 00:21:57,200
So, those are two places where I
feel, there is a huge difference

382
00:21:57,200 --> 00:22:01,300
and phds struggle a bit, at 
least in the initial years in 

383
00:22:01,300 --> 00:22:04,600
the industry. 
Did. 

384
00:22:04,600 --> 00:22:09,400
There's another challenge which 
is that when we do a PhD is, or 

385
00:22:09,400 --> 00:22:12,700
when we are doing in Academy of 
you're not bothered about 

386
00:22:12,700 --> 00:22:17,600
building a product out of it 
here, often interested in 

387
00:22:17,600 --> 00:22:21,100
solving a problem coming up with
a solution, understanding key 

388
00:22:21,100 --> 00:22:24,000
parameters, which Drive the 
solution, those sort of things, 

389
00:22:24,600 --> 00:22:27,300
whereas, in the industry, in the
early days. 

390
00:22:27,300 --> 00:22:30,900
That's one of the key things I 
faced when we build a code. 

391
00:22:30,900 --> 00:22:33,700
When we write a piece of code, 
which Works that is never 

392
00:22:33,700 --> 00:22:35,700
enough. 
It has to connect with the 

393
00:22:35,700 --> 00:22:37,300
database. 
It does to sync with the 

394
00:22:37,300 --> 00:22:39,500
product. 
It has to keep getting updated 

395
00:22:39,700 --> 00:22:44,900
and it has to just work. 
Well and so that that is 

396
00:22:44,900 --> 00:22:49,600
something I I think I struggled 
in the early days of my stint in

397
00:22:49,600 --> 00:22:51,900
the industry. 
But over time. 

398
00:22:52,000 --> 00:22:54,300
Yeah, I figured that out over 
time. 

399
00:22:54,300 --> 00:22:57,300
You learned that. 
So what you mentioned that th 

400
00:22:57,300 --> 00:22:59,100
these are also generally good at
coding, right? 

401
00:22:59,100 --> 00:23:01,500
Because back when 10, 12 years 
back. 

402
00:23:01,500 --> 00:23:05,000
I used to work for Goldman. 
And they're like used to hire. 

403
00:23:05,500 --> 00:23:08,900
I think physics phds because 
they could write C++ code. 

404
00:23:08,900 --> 00:23:12,700
That's what somebody told B, 
which is like, which sounds like

405
00:23:12,700 --> 00:23:14,500
sort of country do too because 
you would think that, if you 

406
00:23:14,500 --> 00:23:17,600
want somebody to write C++ code,
you are a software engineer. 

407
00:23:18,000 --> 00:23:22,400
But, but so in some sense, I 
think these are to ask different

408
00:23:22,400 --> 00:23:24,900
aspects of coding. 
As I guess in the PHD you become

409
00:23:24,900 --> 00:23:27,400
good at coding, a model. 
But in the industry, I think you

410
00:23:27,408 --> 00:23:30,300
have a lot of sort of for the 
lack of a better word like 

411
00:23:30,300 --> 00:23:33,700
pipeline coding and stuff you. 
That you need to do to kind of 

412
00:23:33,700 --> 00:23:36,000
kick to the database connected. 
The product on the other side in

413
00:23:36,000 --> 00:23:38,300
the and things like that. 
I am correct. 

414
00:23:39,300 --> 00:23:41,500
That's exactly what I was also 
thinking day. 

415
00:23:41,900 --> 00:23:45,700
Yeah, for the first seed be, I 
learnt coding was because I had 

416
00:23:45,700 --> 00:23:50,200
a need for it. 
Even at ITV were trying to solve

417
00:23:50,200 --> 00:23:54,000
some problems for my thesis. 
It was in combustion at that 

418
00:23:54,000 --> 00:23:58,900
time, right of jet engines. 
And I had to Learn Python to 

419
00:23:58,900 --> 00:24:02,300
write that piece of code. 
And so that's how we learn. 

420
00:24:02,900 --> 00:24:06,600
Then we get into research. 
We don't learn coding because we

421
00:24:06,600 --> 00:24:09,300
had to learn coding. 
I'm really envious of you that 

422
00:24:09,300 --> 00:24:12,600
in IIT Madras. 
You Learn Python because be in 

423
00:24:12,600 --> 00:24:15,300
the computer science department,
were asked to do everything in 

424
00:24:15,300 --> 00:24:17,800
Java. 
And by the time I graduated, I 

425
00:24:17,900 --> 00:24:19,800
hated everything about computer 
science. 

426
00:24:20,200 --> 00:24:23,400
It took on another six years for
me to kind of start coding 

427
00:24:23,400 --> 00:24:25,700
again. 
So, so in that sense, I'm 

428
00:24:26,100 --> 00:24:31,100
envious of you for the ring 
python, but but yeah, I guess 

429
00:24:31,300 --> 00:24:33,100
sorry. 
You were telling About learning 

430
00:24:33,100 --> 00:24:35,900
coding because it was required. 
When you were doing. 

431
00:24:35,900 --> 00:24:36,400
Yes. 
It is. 

432
00:24:36,400 --> 00:24:39,500
A need to learn coding when 
you're doing research that you 

433
00:24:39,500 --> 00:24:43,100
don't have a choice because most
of us in research, especially 

434
00:24:43,100 --> 00:24:46,600
during a PhD, we own one project
ourselves. 

435
00:24:47,300 --> 00:24:51,500
So it's my responsibility to 
take it from data collection, to

436
00:24:51,600 --> 00:24:53,900
model to writing paper to 
publish in it. 

437
00:24:54,800 --> 00:24:58,800
So, you end up learning to code?
You don't have a choice really. 

438
00:24:58,800 --> 00:25:02,600
So, and that, that is a 
different way of learning to 

439
00:25:02,600 --> 00:25:06,500
code rather than I want to 
become a data scientist. 

440
00:25:06,800 --> 00:25:12,100
So let me learn how to code in 
python or are you just a 

441
00:25:12,108 --> 00:25:14,100
different motivation to learn 
coding? 

442
00:25:14,200 --> 00:25:16,900
I think that's the major 
difference. 

443
00:25:17,200 --> 00:25:20,100
Yea though. 
I mean, I don't have a PhD, but 

444
00:25:20,100 --> 00:25:23,300
I personally have a bias for I'm
going to learn to code because I

445
00:25:23,300 --> 00:25:26,100
need to code for something 
rather than I need to learn to 

446
00:25:26,100 --> 00:25:29,400
code because I want to become 
good at something real cents on 

447
00:25:29,400 --> 00:25:31,500
every day. 
I somehow have a bias towards 

448
00:25:31,500 --> 00:25:34,500
the first one, but coming back 
to another thing you were 

449
00:25:34,500 --> 00:25:37,000
saying, I mean, you were talking
about how like the nature of 

450
00:25:37,000 --> 00:25:39,100
data is different between 
industry and Academia, right? 

451
00:25:39,100 --> 00:25:41,800
Because in Academia, I guess you
get to design your own 

452
00:25:41,800 --> 00:25:44,700
experiments, which means you 
have control over the data that 

453
00:25:44,700 --> 00:25:47,700
you collect. 
Which I guess the downside of 

454
00:25:47,700 --> 00:25:50,800
that is that like sometimes the 
amount of data that you can get,

455
00:25:50,900 --> 00:25:55,100
can get a little limited because
you have to Even the experiments

456
00:25:55,100 --> 00:25:56,900
in so on. 
But the good thing is that you 

457
00:25:56,900 --> 00:26:00,400
get all the fields that you 
want, if the way that you get 

458
00:26:00,400 --> 00:26:03,400
and so on, but in the industry, 
obviously, like, I mean, pretty 

459
00:26:03,400 --> 00:26:06,700
much have have almost. 
I do run some experience from 

460
00:26:06,700 --> 00:26:08,800
time to time. 
But like I almost never get to 

461
00:26:08,800 --> 00:26:11,900
collect my own data, it's some 
data collected for some other 

462
00:26:11,900 --> 00:26:15,200
purpose, that has to be used for
this particular problem and so 

463
00:26:15,200 --> 00:26:16,700
on it. 
So, how do you, how do you deal 

464
00:26:16,700 --> 00:26:17,800
with this? 
I mean, what are the things that

465
00:26:17,800 --> 00:26:21,100
you do to sort of like, kind of?
Because it's obviously it's a 

466
00:26:21,200 --> 00:26:23,900
uncomfortable / unfamiliar 
situation for you. 

467
00:26:23,900 --> 00:26:27,600
So How do you deal with the 
situations where your space with

468
00:26:28,400 --> 00:26:31,100
sort of data that's been 
collected for another purpose 

469
00:26:31,500 --> 00:26:33,800
and what are the challenges 
there? 

470
00:26:33,800 --> 00:26:34,800
How do you solve them? 
It's one. 

471
00:26:36,400 --> 00:26:38,600
See, it's a very difficult 
problem. 

472
00:26:38,900 --> 00:26:42,900
I would say because in most 
cases the baby have solved. 

473
00:26:42,900 --> 00:26:47,500
It is, we have in addition to 
whatever data we have. 

474
00:26:48,000 --> 00:26:51,300
We do another exercise of data 
collection of data. 

475
00:26:51,300 --> 00:26:55,300
We want, that is how we solved 
it so far. 

476
00:26:55,400 --> 00:26:59,700
And for let me give you an 
example. 

477
00:26:59,700 --> 00:27:05,000
For instance, when I was working
with the micro Finance Company. 

478
00:27:05,200 --> 00:27:06,700
Right. 
Mother of microfinance. 

479
00:27:07,100 --> 00:27:10,900
We wanted to kind of identify 
how far the villages are from a 

480
00:27:10,900 --> 00:27:14,600
national highway or a state 
highway because we had a hunch 

481
00:27:14,600 --> 00:27:18,700
or an hypothesis that distance 
to Highway was a key parameter 

482
00:27:18,700 --> 00:27:21,900
in defining success of giving 
loans in that particular 

483
00:27:21,900 --> 00:27:24,800
Village. 
And so we had to painstakingly 

484
00:27:24,800 --> 00:27:29,400
collected data from GPS 
satellite maps and all that and 

485
00:27:29,408 --> 00:27:34,800
there was more alternate to it 
at all, but there are also times

486
00:27:35,000 --> 00:27:39,700
when You can create proxy or 
derived variables, right which 

487
00:27:39,700 --> 00:27:43,100
give you an approximation of the
missing data. 

488
00:27:43,900 --> 00:27:48,200
And from there. 
You can probably derive things. 

489
00:27:48,500 --> 00:27:52,300
Which can not fully give you the
information. 

490
00:27:52,300 --> 00:27:55,500
But at least partially paint a 
picture for you. 

491
00:27:55,900 --> 00:28:01,300
So, for instance, the one metric
we created was ratio of 

492
00:28:01,300 --> 00:28:05,500
industrial workers and that give
us a proxy for how How much 

493
00:28:05,500 --> 00:28:08,300
migration has happened to a 
particular Village, or how much 

494
00:28:08,300 --> 00:28:10,900
migration has happened from a 
particular Village? 

495
00:28:11,900 --> 00:28:16,300
And to proxy for some variables,
we can often use, but when 

496
00:28:16,300 --> 00:28:19,800
nothing helps, I think you'll 
have to go back and collect 

497
00:28:19,800 --> 00:28:22,500
data. 
And this might be caused by a 

498
00:28:22,500 --> 00:28:28,200
bias of my prior background in 
Academia itself. 

499
00:28:28,200 --> 00:28:31,500
Right? 
So, I think I just because I've 

500
00:28:31,500 --> 00:28:34,700
never been an academic. 
I just end up using proxies all 

501
00:28:34,700 --> 00:28:36,300
the time. 
So you can give you some more 

502
00:28:36,300 --> 00:28:38,600
examples of the interesting 
examples of proxy State. 

503
00:28:38,600 --> 00:28:42,000
I mean this industrial workers 
was a was an interesting one. 

504
00:28:42,000 --> 00:28:43,400
I mean could be in any of your 
work. 

505
00:28:43,400 --> 00:28:46,900
Like it's always fun to look at 
what kind of proxies people use 

506
00:28:46,900 --> 00:28:50,800
for what in cricket, right? 
For instance. 

507
00:28:51,900 --> 00:28:56,800
We wanted to get boners often 
get most of their wicked in the 

508
00:28:56,800 --> 00:29:00,300
last sad thinking about T20 
matches right Bowlers. 

509
00:29:00,300 --> 00:29:04,800
Often get their wicked in the 
death overs and this could often

510
00:29:04,800 --> 00:29:09,800
be because of mistakes made by 
batsman or by the bowlers 

511
00:29:09,800 --> 00:29:14,800
Brilliance itself. 
So we created a metric, which is

512
00:29:14,800 --> 00:29:20,200
basically how many characters 
were caught in the field as a 

513
00:29:20,200 --> 00:29:24,200
proxy for how good the bowler 
is, right. 

514
00:29:24,400 --> 00:29:28,600
So if more catches are taken in 
the Outfield, we attribute that 

515
00:29:28,600 --> 00:29:32,800
wicked, as wicked witch just 
happened because it Batman was 

516
00:29:32,800 --> 00:29:38,800
taking a risk, whereas if it is 
due to an lbw or a clean bold, 

517
00:29:39,000 --> 00:29:43,000
attribute goes to the bowler. 
So if feel they're dependent 

518
00:29:43,000 --> 00:29:46,500
wickets and feel the independent
wickets, and that was kind of a 

519
00:29:46,508 --> 00:29:49,600
proxy of how good a boiler is. 
For instance. 

520
00:29:49,600 --> 00:29:53,100
Ebola-like shower dual tracker. 
He gets a lot of wickets in the 

521
00:29:53,108 --> 00:29:56,700
death toll was right and 
probably, he is equivalent to 

522
00:29:56,700 --> 00:29:58,200
boom, right in that sense. 
Right? 

523
00:29:58,500 --> 00:30:02,700
But most of whom are circuits 
are Fielder independent wickets.

524
00:30:02,700 --> 00:30:06,000
Whereas shardas, Ricketts a 
Fielder dependent wickets. 

525
00:30:06,700 --> 00:30:10,900
So you get a sense and that's a 
proxy for how good a boner is in

526
00:30:10,900 --> 00:30:12,000
that. 
Sense, right? 

527
00:30:12,300 --> 00:30:13,200
Okay. 
I'm glad I asked you this 

528
00:30:13,200 --> 00:30:16,700
question because I be proxies 
are always fun than this. 

529
00:30:16,800 --> 00:30:19,100
The sensor, it I mean, because 
it really, it always makes you 

530
00:30:19,100 --> 00:30:22,000
think about things that you 
would normally think of like 

531
00:30:22,000 --> 00:30:27,000
aspects that you would sort of, 
because if you sometimes kind 

532
00:30:27,000 --> 00:30:29,900
of, if you have the freedom to 
collect the data, you would have

533
00:30:29,900 --> 00:30:32,500
probably, I mean, in some cases,
you can't really run an 

534
00:30:32,500 --> 00:30:36,200
experiment for this like inner 
life, which obviously their life

535
00:30:36,200 --> 00:30:38,300
cricket match. 
You can't run an experiment. 

536
00:30:38,300 --> 00:30:43,600
But like I feel like you can But
yeah, it allows you to think in 

537
00:30:43,600 --> 00:30:46,000
the in this manner answer. 
So coming back. 

538
00:30:46,000 --> 00:30:49,000
I mean like so I think you spoke
about how in data science, you 

539
00:30:49,400 --> 00:30:52,400
in Industry you end up having to
build a product and things like 

540
00:30:52,400 --> 00:30:54,900
that. 
So another aspect I guess isn't 

541
00:30:54,900 --> 00:30:56,400
just in terms of communication, 
right? 

542
00:30:56,400 --> 00:31:01,500
Because not every data science 
problem the leads to a product 

543
00:31:01,500 --> 00:31:03,500
decision in some sense, right? 
In some sense. 

544
00:31:03,500 --> 00:31:06,400
It's more like it could be a 
business decision. 

545
00:31:06,400 --> 00:31:08,100
For example, in your micro 
Finance Company. 

546
00:31:08,100 --> 00:31:10,700
I'm guessing one of the problems
they might face is like, where 

547
00:31:10,700 --> 00:31:13,100
do we have? 
Of our officers, sir branches 

548
00:31:13,100 --> 00:31:15,300
and things like that, which is a
business decision because you 

549
00:31:15,300 --> 00:31:19,500
can't, you can't give them a 
GPS, XYZ to say that you need to

550
00:31:19,500 --> 00:31:21,300
put your office exactly here in.
So on. 

551
00:31:21,800 --> 00:31:24,800
So in terms of communicating 
your results, make I mean, what 

552
00:31:24,800 --> 00:31:27,900
are the challenges in terms of 
communicating the results of the

553
00:31:28,300 --> 00:31:30,800
model that you have built, how 
what are the challenges in terms

554
00:31:30,800 --> 00:31:37,800
of how is communication of the 
results of an experiment /, 

555
00:31:37,900 --> 00:31:41,100
results of a problem different 
in Academia? 

556
00:31:41,300 --> 00:31:44,400
Compared to the data science 
Industry. 

557
00:31:45,500 --> 00:31:47,700
In fact, I'm glad you brought 
this up, right? 

558
00:31:47,700 --> 00:31:51,100
This particular question, which 
you gave an example of which is,

559
00:31:51,500 --> 00:31:54,300
how do you create more branches?
And that was one of the 

560
00:31:54,308 --> 00:31:57,800
questions we addressed where 
communication became an issue. 

561
00:31:57,800 --> 00:32:02,800
Also, right. 
So what happened was mudra had a

562
00:32:02,800 --> 00:32:08,600
lot of branches in Tamil Nadu. 
And so we looked at the data of 

563
00:32:08,600 --> 00:32:11,100
Tamil Nadu and identify like 
things. 

564
00:32:11,200 --> 00:32:14,700
Like this industrial worker 
ratios importance of villages 

565
00:32:14,700 --> 00:32:16,900
where the industrial worker 
ratios High. 

566
00:32:17,500 --> 00:32:22,400
The repayment is better or 
distance from national highway 

567
00:32:22,400 --> 00:32:25,000
is an important metric. 
So we took all these metrics and

568
00:32:25,000 --> 00:32:28,200
they wanted to open branches in 
Karnataka and Maharashtra 

569
00:32:28,600 --> 00:32:32,200
looking at this particular data.
So I told them who can give me 

570
00:32:32,300 --> 00:32:35,800
the tough for villages from 
Maharashtra and Karnataka. 

571
00:32:36,200 --> 00:32:39,900
Let me run the model on Tamil 
Nadu and give you Villages where

572
00:32:39,900 --> 00:32:46,200
you can open branches in. 
Modesty and what happened was we

573
00:32:46,200 --> 00:32:50,300
were not getting any Villages 
which had scores, you know, 

574
00:32:50,700 --> 00:32:53,100
which fit the community 
explanation. 

575
00:32:53,100 --> 00:32:55,700
And we could not understand why 
that was happening. 

576
00:32:56,200 --> 00:33:00,200
And then we had to talk to the 
field team and they also 

577
00:33:00,200 --> 00:33:03,200
obviously at that particular 
Point didn't have any answers. 

578
00:33:03,800 --> 00:33:07,600
So we went back to data and what
we realized was Tamil, Nadu is 

579
00:33:08,100 --> 00:33:11,100
heavily industrialized state 
that has colonel. 

580
00:33:11,200 --> 00:33:16,800
Attica and Maharashtra were 
highly agree States. 

581
00:33:17,300 --> 00:33:21,700
So in the solutions, which are 
modeled through a portal gonna 

582
00:33:21,700 --> 00:33:25,400
do, just doesn't make sense in 
Karnataka and Maharashtra. 

583
00:33:26,100 --> 00:33:29,600
And so, you had to throw out 
that parameter and rebuild the 

584
00:33:29,600 --> 00:33:31,500
model to make sense of things 
over there. 

585
00:33:32,000 --> 00:33:36,200
And this, and also another 
thing, which we face, which 

586
00:33:36,200 --> 00:33:39,400
comes through for the 
communication point, right? 

587
00:33:39,900 --> 00:33:43,000
We gave a list. 
Of 100 branches, which can be 

588
00:33:43,000 --> 00:33:47,300
opened in Maharashtra and 
Karnataka with the field team. 

589
00:33:47,300 --> 00:33:50,900
Often came back saying things 
like but there's competition is 

590
00:33:50,900 --> 00:33:54,000
high, then people are not 
accepting us when we go there 

591
00:33:54,500 --> 00:33:57,800
and things like that, right? 
And so you need it to work 

592
00:33:57,800 --> 00:34:02,000
through in a sort of painstaking
manner could filters remove 

593
00:34:02,000 --> 00:34:05,200
things, which don't fit. 
So that the field team is 

594
00:34:05,200 --> 00:34:09,199
equally happy deploying the 
solution which we kind of 

595
00:34:09,199 --> 00:34:12,400
created on our computers. 
He's in our offices. 

596
00:34:12,400 --> 00:34:17,300
Right? 
And the that yeah, that's where,

597
00:34:17,300 --> 00:34:20,800
I guess the communication comes 
in communication and context, 

598
00:34:20,800 --> 00:34:23,100
right. 
Sometimes while solving data 

599
00:34:23,100 --> 00:34:25,300
science problems. 
We just don't understand what 

600
00:34:25,300 --> 00:34:28,800
are the challenges faced on the 
field by the team, which is 

601
00:34:28,808 --> 00:34:33,100
actually executing the problem 
and they don't understand why we

602
00:34:33,100 --> 00:34:37,800
are saying these are good 
answers and a lot on a lot of 

603
00:34:37,800 --> 00:34:40,900
back-and-forth communication 
probably helps it a bit. 

604
00:34:41,400 --> 00:34:45,199
But what helps most is you 
traveling over there with them 

605
00:34:45,199 --> 00:34:48,199
and you know, spending a few 
days and understanding the 

606
00:34:48,199 --> 00:34:51,000
context yourself. 
Oh, yeah, either that can that's

607
00:34:51,000 --> 00:34:53,300
super important. 
I mean, in my purse in my work, 

608
00:34:53,300 --> 00:34:54,600
I mean, I work for delivery, 
right? 

609
00:34:54,600 --> 00:34:57,200
So I mean, I had built a model 
for something away. 

610
00:34:57,200 --> 00:34:59,800
I forget what it was and I 
discussed it with my team and 

611
00:34:59,900 --> 00:35:02,700
these viewers sort of Fairly 
happy with the model and then 

612
00:35:02,700 --> 00:35:04,300
one day, we decided like let's 
go. 

613
00:35:04,300 --> 00:35:06,800
Look at the operations of our 
have been back door. 

614
00:35:07,300 --> 00:35:09,500
So three of us went there. 
We are looking at operations and

615
00:35:09,500 --> 00:35:12,100
then one of my teammates is like
Just look at how they're 

616
00:35:12,100 --> 00:35:14,200
collecting this data. 
You do take that into account 

617
00:35:14,200 --> 00:35:18,100
while building your model, and I
was like, I was like, no. 

618
00:35:18,100 --> 00:35:21,900
And that's when I realized my 
model was like, completely of 

619
00:35:21,900 --> 00:35:25,200
the monk. 
So, it's events like this, which

620
00:35:25,200 --> 00:35:28,700
Make Me Remember that. 
I'm not solving a math problem. 

621
00:35:29,200 --> 00:35:31,800
I'm solving a real life industry
problem right. 

622
00:35:31,800 --> 00:35:33,600
Now. 
You just exactly the same for 

623
00:35:33,600 --> 00:35:36,700
you, sort of going to your 
microfinance locations to 

624
00:35:37,600 --> 00:35:39,800
spending time with the, with a 
team, I guess. 

625
00:35:39,800 --> 00:35:42,900
Because that's Pretty much the 
only time you kind of it's 

626
00:35:42,900 --> 00:35:46,700
almost like you are G where your
entire model is sort of getting 

627
00:35:46,700 --> 00:35:49,100
calibrated to the market in some
sense, right? 

628
00:35:49,100 --> 00:35:53,200
When you when you sort of like 
when you do that to do. 

629
00:35:53,900 --> 00:35:56,500
So the question again, like I 
think we discussed briefly a 

630
00:35:56,500 --> 00:35:59,200
while back is, how does it tie 
in with speed? 

631
00:35:59,200 --> 00:36:01,900
Because I think in the industry,
especially when you are having a

632
00:36:02,500 --> 00:36:04,700
it necessarily needs some back 
and forth, right? 

633
00:36:04,700 --> 00:36:06,900
Because you give a model, they 
will tell you. 

634
00:36:06,900 --> 00:36:09,300
These are the issues, so you re 
work on it and so on. 

635
00:36:09,700 --> 00:36:14,400
So I guess in that Since you 
have to at some level prioritize

636
00:36:14,700 --> 00:36:19,100
speed over accuracy or speed 
over correctness of the 

637
00:36:19,100 --> 00:36:21,200
solution, its own. 
So, how did you sort of have? 

638
00:36:21,200 --> 00:36:25,900
Which I guess, is very different
from Academia where, I want six 

639
00:36:25,900 --> 00:36:28,300
very different from Academy in 
Academy. 

640
00:36:28,300 --> 00:36:31,800
You learn one thing that you 
need to be patient and you need 

641
00:36:31,800 --> 00:36:34,600
to do things. 
I treat every hundreds of times 

642
00:36:34,600 --> 00:36:36,200
before you actually get it, 
right. 

643
00:36:36,800 --> 00:36:39,500
So in that particular sense, 
it's okay. 

644
00:36:39,600 --> 00:36:43,100
In the cell where You quickly 
give one solution. 

645
00:36:43,400 --> 00:36:46,900
You don't have, you have seventy
sixty seventy percent accuracy 

646
00:36:46,900 --> 00:36:51,800
you work on it and you edit the 
solution, but I guess this I 

647
00:36:51,800 --> 00:36:56,800
traitor model Works probably in 
any field and it is it is 

648
00:36:56,800 --> 00:37:01,300
something which should work well
in data science also and like 

649
00:37:01,300 --> 00:37:04,700
you said once we get to the 
field once you understand the 

650
00:37:04,700 --> 00:37:09,000
context better and that takes 
time, that takes its own time, 

651
00:37:09,200 --> 00:37:11,000
we can I treat if we improve the
model. 

652
00:37:11,100 --> 00:37:15,300
Well, and I think that hydration
is only solution forward here in

653
00:37:15,300 --> 00:37:17,900
that sense. 
How is the communication 

654
00:37:17,900 --> 00:37:19,700
different? 
Because I think again in 

655
00:37:19,700 --> 00:37:21,900
Academia, you're used to 
communicating in papers, I 

656
00:37:21,908 --> 00:37:24,600
guess. 
So what is the transition there 

657
00:37:24,600 --> 00:37:29,300
in terms of like because I think
your papers I'm guessing don't 

658
00:37:29,300 --> 00:37:31,800
work unless you're in a job I 
guess. 

659
00:37:31,800 --> 00:37:35,300
Yeah. 
So yeah, and I'm glad you 

660
00:37:35,300 --> 00:37:39,700
brought this up because yeah, I 
continue to publish some papers 

661
00:37:39,700 --> 00:37:41,000
and they are research papers. 
They are. 

662
00:37:41,300 --> 00:37:45,900
To read the often. 
I do know how many people read 

663
00:37:45,900 --> 00:37:49,300
it. 
But initially when we started 

664
00:37:49,300 --> 00:37:53,500
off messy fractals, I still had 
this enthusiasm of publishing 

665
00:37:53,500 --> 00:37:56,100
research driven paper. 
So we used to put out a lot of 

666
00:37:56,100 --> 00:38:01,200
white papers about things, which
we have created, which we have 

667
00:38:01,200 --> 00:38:03,200
done, which we have 
hypothesized. 

668
00:38:03,500 --> 00:38:08,500
And but over time, I think I 
have moved more to communicating

669
00:38:08,500 --> 00:38:13,400
through blogs and articles. 
Simplifying things to giving one

670
00:38:13,400 --> 00:38:16,600
message at a time and in a 
research paper, you don't try to

671
00:38:16,600 --> 00:38:19,200
do that. 
You try to bottle up a lot of 

672
00:38:19,200 --> 00:38:23,400
things in a very compact way. 
Within the given word limit with

673
00:38:23,400 --> 00:38:27,000
few figures and lots of 
supplementary figures and you 

674
00:38:27,000 --> 00:38:30,400
try and put it out. 
Whereas now, I just change the 

675
00:38:30,400 --> 00:38:33,300
way I'm communicating. 
And then realizing this is only 

676
00:38:33,300 --> 00:38:36,500
because you brought this 
question up because it's been a 

677
00:38:36,500 --> 00:38:38,900
few years since I even wrote 
white papers initially. 

678
00:38:38,900 --> 00:38:40,900
We were quite interested in 
writing out. 

679
00:38:40,900 --> 00:38:45,100
Why Papers and you know, 
publishing at least detail level

680
00:38:45,100 --> 00:38:47,800
of what we are doing now. 
We put it out in blogs. 

681
00:38:47,800 --> 00:38:50,300
We do the same thing, except we 
put it out doing blogs. 

682
00:38:50,300 --> 00:38:53,400
We try to give one or two simple
messages at a time. 

683
00:38:53,900 --> 00:38:58,800
And there are benefits to both 
because now I'm optimizing for 

684
00:38:58,800 --> 00:39:01,900
more people reading. 
What I've put earlier, I was 

685
00:39:01,900 --> 00:39:06,900
optimizing for putting out my 
set of results and, you know, 

686
00:39:07,200 --> 00:39:10,000
documenting it in a perfect way.
That's what I did during 

687
00:39:10,000 --> 00:39:12,000
research. 
Now, I am more about. 

688
00:39:12,000 --> 00:39:14,700
Okay, let more people read it. 
I think it will be more valuable

689
00:39:14,700 --> 00:39:16,500
that. 
We, there are benefits to both 

690
00:39:16,500 --> 00:39:18,100
are good. 
Thing is also talking about 

691
00:39:18,100 --> 00:39:20,900
communication of results within 
the company and so on. 

692
00:39:20,900 --> 00:39:24,000
They like, for example, things 
like I don't know, like I build 

693
00:39:24,000 --> 00:39:27,200
a model and I'm like, I'm like, 
okay, here's a model to tell you

694
00:39:27,200 --> 00:39:30,500
where to place your next branch 
and stuff. 

695
00:39:31,000 --> 00:39:34,000
Now, they'll ask you for an 
explanation of the model and 

696
00:39:34,000 --> 00:39:37,500
that explanation will be. 
I guess you will be very 

697
00:39:37,500 --> 00:39:38,800
different from what you would 
have written. 

698
00:39:38,800 --> 00:39:40,000
If you had written a Blog about 
it. 

699
00:39:40,000 --> 00:39:43,700
I'm guessing, I think I have 
learned to communicate better 

700
00:39:43,700 --> 00:39:49,000
over time when I try to speak as
much data science in English as 

701
00:39:49,000 --> 00:39:53,600
possible. 
So for instance, I tell people 

702
00:39:54,000 --> 00:39:55,900
in the microfinance example 
itself. 

703
00:39:56,300 --> 00:40:01,100
If you live closer to Highway 
then that willage is much 

704
00:40:01,100 --> 00:40:02,600
better. 
Right? 

705
00:40:02,600 --> 00:40:05,200
And then they will give us 
anecdotal information. 

706
00:40:05,200 --> 00:40:09,600
It always works better. 
If the answer also comes from 

707
00:40:09,600 --> 00:40:12,300
the field team, right? 
Because As they have a sense of 

708
00:40:12,300 --> 00:40:16,600
this, pollution in their own 
way, and if it comes from then, 

709
00:40:16,800 --> 00:40:21,100
obviously, the understanding is 
a lot greater than me saying, I 

710
00:40:21,100 --> 00:40:24,900
build these eight models and the
model has thrown up these 16 

711
00:40:24,900 --> 00:40:26,900
solution. 
And this solution is the most 

712
00:40:27,000 --> 00:40:30,800
optimal for, in your case. 
I think that probably doesn't 

713
00:40:30,800 --> 00:40:33,600
work. 
So I guess your journey has been

714
00:40:33,600 --> 00:40:35,300
such that. 
Like you've figured out these 

715
00:40:35,300 --> 00:40:36,500
things. 
I figured out how to 

716
00:40:36,500 --> 00:40:40,100
communicate, figured out, how to
use the data that's available. 

717
00:40:40,100 --> 00:40:42,100
It's one another. 
Question for you, if they can we

718
00:40:42,100 --> 00:40:44,400
generalize a little bit. 
I mean like I don't know like I 

719
00:40:44,400 --> 00:40:47,500
unfortunately, I don't know too 
many other people with a PhD in 

720
00:40:47,500 --> 00:40:49,300
the world working in data 
science and so on. 

721
00:40:49,300 --> 00:40:52,800
So like I don't know about your 
network, but like if you know 

722
00:40:52,800 --> 00:40:55,700
more people is this how or if 
you have worked with more such 

723
00:40:55,700 --> 00:40:59,700
people, is this how everybody 
approaches things are like or 

724
00:40:59,700 --> 00:41:04,000
are there challenges in terms of
like how you approach the 

725
00:41:04,000 --> 00:41:06,900
problem mind like whether you 
can make this transition to this

726
00:41:06,900 --> 00:41:11,500
kind of communication. 
And I think, I guess, Everyone's

727
00:41:11,500 --> 00:41:14,700
situation will be slightly 
different depending on the 

728
00:41:14,700 --> 00:41:17,800
context. 
They're working and right like a

729
00:41:17,800 --> 00:41:21,400
friend of mine. 
She was doing Behavioral Science

730
00:41:21,400 --> 00:41:27,400
Neuroscience then at lab and now
she's doing an LP with company 

731
00:41:27,400 --> 00:41:30,600
and trying to optimize the 
language in which Things are 

732
00:41:30,600 --> 00:41:33,800
Written and try to get 
information from things which 

733
00:41:33,800 --> 00:41:39,500
are written and I do not know 
when it comes to communication. 

734
00:41:39,500 --> 00:41:45,400
I don't know how she has handled
the situation and depending on 

735
00:41:46,000 --> 00:41:49,800
our particular context, like in 
my particular case, since we 

736
00:41:49,800 --> 00:41:53,900
were running the startup and 
there were 67 data, scientists 

737
00:41:53,900 --> 00:41:57,500
who are working with me, it 
became mandatory for me to 

738
00:41:57,600 --> 00:42:01,700
improve my communication with 
the data science team and with 

739
00:42:01,700 --> 00:42:07,100
the field team and that was a 
necessity, which crept in and 

740
00:42:07,100 --> 00:42:11,400
because of which, I am the way I
am today, right? 

741
00:42:11,900 --> 00:42:20,300
And so How will the I guess 
communication is going to be a 

742
00:42:20,300 --> 00:42:23,400
challenge? 
I guess it's often a challenge 

743
00:42:23,400 --> 00:42:28,000
even in my case. 
But communicating in simple 

744
00:42:28,000 --> 00:42:30,500
language is one of the key 
skills. 

745
00:42:30,800 --> 00:42:33,600
I believe in data scientist 
should pick up. 

746
00:42:34,900 --> 00:42:38,300
And otherwise the value of a 
data science team will be 

747
00:42:38,300 --> 00:42:42,500
limited because you're as good 
as the feel team can execute it.

748
00:42:42,500 --> 00:42:44,300
Right? 
Otherwise, your model is not 

749
00:42:44,300 --> 00:42:48,700
good enough. 
So so that is a mandatory skin. 

750
00:42:48,700 --> 00:42:53,600
I would think now with in data 
science rate, like a, we do you 

751
00:42:53,600 --> 00:42:56,900
think there are particular kinds
of roles which would academics 

752
00:42:56,900 --> 00:42:58,000
better than other kinds of 
roles. 

753
00:42:58,000 --> 00:43:02,000
Like, for example, you have some
people who kind of work more or 

754
00:43:02,008 --> 00:43:05,900
the product side like I work 
more on the The state where my 

755
00:43:06,900 --> 00:43:09,700
my inputs going to kind of 
making business decisions rather

756
00:43:09,700 --> 00:43:13,200
than going into the products, or
some people are closer to the 

757
00:43:13,200 --> 00:43:16,700
tech kind of code and write more
deep models. 

758
00:43:16,700 --> 00:43:19,500
Some people have more in at the 
level of what you mentioned, 

759
00:43:19,500 --> 00:43:22,700
which is like converting are 
taking a business problem and 

760
00:43:22,800 --> 00:43:26,500
formulating it as a math problem
for figuring out the context in 

761
00:43:26,500 --> 00:43:28,100
things like a. 
So do you think? 

762
00:43:28,100 --> 00:43:31,100
I mean I will be might be 
generalizing a bit, but I think 

763
00:43:31,100 --> 00:43:34,100
it's okay. 
I hope you are too but like like

764
00:43:34,100 --> 00:43:36,400
are there. 
Any certain kinds of roles where

765
00:43:36,400 --> 00:43:40,100
you think, like academics might 
do better than in other kinds of

766
00:43:40,100 --> 00:43:43,900
foods. 
Academicians by Design are 

767
00:43:43,900 --> 00:43:47,100
trained to solve problems that 
have not been solved before, 

768
00:43:47,100 --> 00:43:48,900
right. 
So, when there are tough 

769
00:43:48,900 --> 00:43:52,600
problems, it's it's better to 
get an academician on it because

770
00:43:52,600 --> 00:43:55,100
they are often not going to get 
overwhelmed by this. 

771
00:43:55,700 --> 00:43:59,700
So when it comes to problems 
like this, where we need to, you

772
00:43:59,700 --> 00:44:03,800
know, build an hypothesis and, 
you know, take guess work going 

773
00:44:03,800 --> 00:44:06,600
forward. 
And then design a solution. 

774
00:44:06,600 --> 00:44:12,300
I think academicians will be 
better, climatized to something 

775
00:44:12,300 --> 00:44:17,000
like that. 
But converting a solution into a

776
00:44:17,000 --> 00:44:19,900
product. 
I would say, is not necessarily 

777
00:44:19,900 --> 00:44:22,100
a skill, which academicians 
would be good. 

778
00:44:22,100 --> 00:44:27,800
Then because there are very few 
Labs at least in my field which 

779
00:44:27,900 --> 00:44:31,600
are designed to do that. 
In fact, at Sapien Labs, right? 

780
00:44:31,600 --> 00:44:34,500
Where I work with Cara that is 
one of the things we are. 

781
00:44:34,600 --> 00:44:38,600
Trying to achieve, which is 
take, insights, from research, 

782
00:44:38,600 --> 00:44:40,400
and try and build a product out 
of it. 

783
00:44:40,900 --> 00:44:43,900
And I'm seeing challenges 
because it's taking a long time 

784
00:44:43,900 --> 00:44:49,100
to actually do that transition 
from solution to product design.

785
00:44:50,300 --> 00:44:55,400
So probably academicians are not
highly optimized to do that 

786
00:44:55,400 --> 00:44:59,500
particular thing, but they are 
better designed at solution 

787
00:44:59,500 --> 00:45:01,300
design. 
Trying out a lot of different 

788
00:45:01,300 --> 00:45:04,300
solutions. 
I treating many time. 

789
00:45:04,700 --> 00:45:09,900
Those sort of things. 
I mean, in the course of that 

790
00:45:09,900 --> 00:45:12,400
conversation, so far. 
I think you've Bob sort of car 

791
00:45:12,400 --> 00:45:15,400
given a you've talked about, I 
think one or two problems that 

792
00:45:15,400 --> 00:45:19,200
you faced as part of your work 
with mathura micropenis. 

793
00:45:19,800 --> 00:45:25,500
So, how is the can you give me 
more examples of work that you 

794
00:45:25,500 --> 00:45:28,700
have done, like industry kind of
data science work that you have 

795
00:45:28,700 --> 00:45:34,100
done and some inputs on how your
background in Academia or 

796
00:45:34,100 --> 00:45:36,400
Neuroscience has actually 
helped. 

797
00:45:36,700 --> 00:45:39,300
In terms of how you actually 
went about the problem and so 

798
00:45:39,300 --> 00:45:43,600
on. 
Yeah, so microfinance was one 

799
00:45:43,600 --> 00:45:47,100
thing I did. 
But when it comes to safety and 

800
00:45:47,100 --> 00:45:52,000
Labs, right, so we created a lot
of metrics, which is basically, 

801
00:45:52,000 --> 00:45:55,500
we looked at lot of EEG data and
converted them into metrics, 

802
00:45:55,500 --> 00:45:58,500
such as complexity. 
So complexity can be thought of 

803
00:45:58,500 --> 00:46:01,500
as a proxy for entropy in the 
brain. 

804
00:46:01,500 --> 00:46:07,100
So how different spatial parts 
of your brain are communicating 

805
00:46:07,100 --> 00:46:09,700
different signal. 
So if all parts of your brain 

806
00:46:09,700 --> 00:46:12,900
are saying different things, 
your complexity score is higher 

807
00:46:12,900 --> 00:46:17,900
or the entropy is higher and we 
designed these sort of Matrix 

808
00:46:18,000 --> 00:46:21,400
and we try to relate these 
metrics to more real-world 

809
00:46:21,400 --> 00:46:22,900
things. 
Right? 

810
00:46:23,000 --> 00:46:26,100
In this case. 
It was travel, mobile usage, and

811
00:46:26,100 --> 00:46:30,400
things like that. 
So designing these metrics, I 

812
00:46:30,400 --> 00:46:35,700
mean it involved a little bit of
Processing little bit of data, 

813
00:46:36,000 --> 00:46:40,400
little bit of Neuroscience. 
So I guess that's where my role 

814
00:46:40,400 --> 00:46:43,400
came in here itself. 
So that is one of the things 

815
00:46:43,400 --> 00:46:51,900
which I did and when it comes to
my experience in Neuroscience 

816
00:46:51,900 --> 00:46:55,400
itself, right? 
I believe, I bring two 

817
00:46:55,400 --> 00:46:58,200
Dimensions when I look at any 
problem, right? 

818
00:46:58,600 --> 00:47:03,200
I'm always looking at it in a 
data or a mathematical. 

819
00:47:03,300 --> 00:47:06,500
Little scientific way that is 
one second. 

820
00:47:06,500 --> 00:47:11,200
It also gives me a Behavioral 
Science view of things. 

821
00:47:11,200 --> 00:47:14,600
Right? 
Like how can I nudge the user to

822
00:47:14,600 --> 00:47:20,400
do these things when it comes to
a data product and these these 

823
00:47:20,400 --> 00:47:22,900
are the things which I kind of 
bring to play it when it comes 

824
00:47:22,900 --> 00:47:26,300
to cover Dia de right? 
It can come to website design. 

825
00:47:26,300 --> 00:47:29,400
It can look, we look at a lot of
data of users who come to our 

826
00:47:29,400 --> 00:47:33,200
website and try to design a 
website based on. 

827
00:47:33,700 --> 00:47:36,100
The preferences and things like 
that. 

828
00:47:36,300 --> 00:47:39,700
And here we, I try to bring in 
two Dimensions whenever I look 

829
00:47:39,700 --> 00:47:43,100
at this problem and these are 
things, I think I picked up 

830
00:47:43,100 --> 00:47:47,200
during the course of my PhD. 
And we used to not essentially 

831
00:47:47,200 --> 00:47:51,100
only look at data but we used to
talk about a lot of other things

832
00:47:51,100 --> 00:47:56,800
which made the brain and unique 
sort of organ in your body, 

833
00:47:56,800 --> 00:47:59,500
right? 
So, yeah actually like me I want

834
00:47:59,500 --> 00:48:02,000
to dig deeper into the seventh. 
I think a couple of nerds back 

835
00:48:02,000 --> 00:48:05,100
you mentioned about the Easy and
entropy in the brain. 

836
00:48:05,100 --> 00:48:07,100
It's one. 
I read them have sort of, 

837
00:48:07,300 --> 00:48:09,600
especially interested in it and 
you can't afford it on is 

838
00:48:09,900 --> 00:48:13,300
because for me like because I 
keep talking about having a 

839
00:48:13,308 --> 00:48:15,700
traffic jam in my head sometimes
because there are too many 

840
00:48:15,700 --> 00:48:18,400
thoughts and like it's like 
they're all clashing at I have. 

841
00:48:18,500 --> 00:48:21,000
I just lose track of what's 
happening. 

842
00:48:21,000 --> 00:48:27,000
So so just I mean, I know it's 
possibly very tangential but can

843
00:48:27,000 --> 00:48:31,200
you talk about the this entropy?
And I mean, I'm also very 

844
00:48:31,200 --> 00:48:34,200
interested in it would be 
because of Not because of 

845
00:48:34,200 --> 00:48:36,600
thermodynamics but because of 
information Theory, so can you 

846
00:48:36,607 --> 00:48:38,700
just talk about the entropy in 
the brain? 

847
00:48:38,700 --> 00:48:43,200
And, and so on, I mean, just for
and I'll talk about one of the 

848
00:48:43,300 --> 00:48:46,900
results which we got, which is 
related to this entropy measure,

849
00:48:46,900 --> 00:48:48,800
right? 
We call this entropy measure as 

850
00:48:48,800 --> 00:48:52,900
complexity. 
So we took a study of around, 

851
00:48:53,500 --> 00:48:58,600
400 people across different 
socio-economic strata in Tamil 

852
00:48:58,600 --> 00:49:03,300
Nadu and we gave all of them 
shrimp. 

853
00:49:03,900 --> 00:49:07,500
Test similar to IQ test rate. 
It's a pattern completion sort 

854
00:49:07,500 --> 00:49:09,400
of test. 
And we realize it's this 

855
00:49:09,400 --> 00:49:14,400
complexity metric was very 
highly correlated to how much 

856
00:49:14,400 --> 00:49:19,200
they scored in those tests. 
And that's when then we did 

857
00:49:19,200 --> 00:49:22,400
another study and looked at how 
much these guys traveled like 

858
00:49:22,400 --> 00:49:26,100
the same guys and we realize 
this complexity is equally 

859
00:49:26,100 --> 00:49:29,400
related to the amount of person 
travel. 

860
00:49:29,400 --> 00:49:33,200
So if a person is able to get 
out of their comfort zone, try 

861
00:49:33,400 --> 00:49:36,600
Well, too many places complexity
is often higher and this 

862
00:49:36,600 --> 00:49:38,800
compound. 
Higher complexity is also 

863
00:49:38,800 --> 00:49:42,700
correlated with higher score in 
your pattern test. 

864
00:49:43,400 --> 00:49:47,200
Right? 
And so I'm not necessarily going

865
00:49:47,200 --> 00:49:52,400
to say that higher entropy in 
the brain where like traffic jam

866
00:49:52,400 --> 00:49:57,000
is correlated with high IQ. 
I mean, that will be not fair 

867
00:49:57,000 --> 00:49:59,400
for me to say that. 
I also want to ask you about the

868
00:49:59,800 --> 00:50:02,100
negatives of having it a traffic
jam in the head have user 

869
00:50:02,100 --> 00:50:04,900
noticed. 
Any other Spelled like this High

870
00:50:04,900 --> 00:50:11,100
entropy or complexity is sort of
correlated with sort of some - 

871
00:50:11,100 --> 00:50:13,600
info, if performance on, 
whatever metric mean. 

872
00:50:14,000 --> 00:50:18,800
I don't know if you have no. 
No, I've not seen any that 

873
00:50:18,800 --> 00:50:21,200
still. 
The thing with science is that 

874
00:50:21,200 --> 00:50:23,900
one of the time it's a negative 
result. 

875
00:50:23,900 --> 00:50:26,400
Right? 
What you're actually saying, we 

876
00:50:26,400 --> 00:50:28,800
usually use sort of negative 
result is the positive effect. 

877
00:50:28,900 --> 00:50:31,400
It's a negative correlation is 
not a negative result. 

878
00:50:31,400 --> 00:50:33,300
Rate zero correlation is a 
negative result. 

879
00:50:33,400 --> 00:50:36,200
Ain't ya zero correlation is a 
negative result. 

880
00:50:36,500 --> 00:50:41,600
But what I'm saying is I don't 
think I have seen any this thing

881
00:50:41,700 --> 00:50:43,900
because you go into the 
hypothesis, right? 

882
00:50:43,900 --> 00:50:46,400
And you try to prove it pretty 
great. 

883
00:50:46,400 --> 00:50:48,300
Huh? 
So in our previous podcast, in 

884
00:50:48,300 --> 00:50:51,600
our previous recording, I think 
we spoke about you are talking 

885
00:50:51,600 --> 00:50:56,200
about one particular project 
that you read about the about 

886
00:50:56,400 --> 00:50:58,900
particular kabaddi player, and 
how his balances. 

887
00:50:58,900 --> 00:51:01,400
And I think you were telling 
about the center of gravity at 

888
00:51:01,400 --> 00:51:04,600
the exhibit. 
Maybe I think this is a To end 

889
00:51:04,600 --> 00:51:09,800
today, maybe that's that could 
be a something you could talk to

890
00:51:09,800 --> 00:51:13,400
us about maybe with the context 
of your research background, 

891
00:51:13,400 --> 00:51:15,700
everything about how we went 
about, how you went about 

892
00:51:15,700 --> 00:51:16,800
solving the problem. 
Right? 

893
00:51:16,800 --> 00:51:19,500
Like I mean, I think we have 
here it equally spoken right now

894
00:51:19,500 --> 00:51:22,000
in terms of like raving, the 
problem, getting the context, 

895
00:51:22,000 --> 00:51:23,700
then building the model, kind of
zinc. 

896
00:51:23,700 --> 00:51:26,400
A if you could I hope I'm not 
throwing you off guard. 

897
00:51:26,500 --> 00:51:30,800
But like if you could take us 
through that Yeah, I can 

898
00:51:30,800 --> 00:51:32,700
definitely take you through 
that. 

899
00:51:33,100 --> 00:51:38,500
So, there is this kabaddi player
called for deep neural, and he 

900
00:51:38,500 --> 00:51:42,900
was creating havoc in all the 
teams that we were working with 

901
00:51:43,000 --> 00:51:47,800
at that particular time point. 
And he's a reader and he used to

902
00:51:47,808 --> 00:51:51,400
score points at will and win 
matches at will and basically 

903
00:51:51,400 --> 00:51:53,500
end up winning tournaments at 
will, right. 

904
00:51:53,500 --> 00:51:56,200
So with / deep level in the 
team. 

905
00:51:56,800 --> 00:51:59,600
That particular team partner, 
Pirates won the tournament. 

906
00:51:59,900 --> 00:52:04,900
Three years in a row. 
And so everything ended up in 

907
00:52:04,900 --> 00:52:08,500
this one question that, how do 
you stop for the nerve on, 

908
00:52:08,900 --> 00:52:11,700
right? 
And that's what the data 

909
00:52:11,700 --> 00:52:14,300
question, kind of summarized to 
write. 

910
00:52:14,700 --> 00:52:19,900
And we looked at a lot of data. 
There was no systematic Insight 

911
00:52:19,900 --> 00:52:24,500
when we just looked at which 
Defender was getting in out, or 

912
00:52:24,600 --> 00:52:28,100
what technique gets him out. 
And there was no, there was no 

913
00:52:28,100 --> 00:52:31,200
light at the end of that tunnel.
Right, so we had to take it to 

914
00:52:31,200 --> 00:52:35,000
another step deeper. 
So we went into the biomechanics

915
00:52:35,000 --> 00:52:40,200
and that time, I also had 
intern, who was working with me,

916
00:52:40,200 --> 00:52:43,600
and she was a physiotherapist. 
So she was interested in the 

917
00:52:43,600 --> 00:52:47,000
biomechanics of the body, right?
So, it was a good opportunity 

918
00:52:47,000 --> 00:52:50,100
for us to address that question 
at that particular point. 

919
00:52:50,100 --> 00:52:55,000
And obviously like, in most 
things in comedy, there was no 

920
00:52:55,000 --> 00:52:57,300
data available to solve this 
problem, right? 

921
00:52:57,300 --> 00:53:04,700
And so, we took a few Videos of 
perimeter walls with braids and 

922
00:53:04,800 --> 00:53:08,800
we marked out different points 
on his body, including his knee,

923
00:53:08,800 --> 00:53:12,700
the angle at which he's bent. 
And because the skill, which 

924
00:53:12,700 --> 00:53:15,000
he's famous for is called the 
dookie. 

925
00:53:15,600 --> 00:53:20,200
So two keys, essentially, a deep
neural, going parallel to the 

926
00:53:20,200 --> 00:53:24,200
ground, and he'll just be a few 
say 10 15 centimeters off the 

927
00:53:24,207 --> 00:53:26,700
ground. 
So you can bend that Loop and he

928
00:53:26,700 --> 00:53:29,200
goes between two Defenders and 
escapes. 

929
00:53:29,800 --> 00:53:32,500
So and people think he's going 
to get out. 

930
00:53:32,500 --> 00:53:35,400
So they pile on top of him and 
he ends up getting six to eight 

931
00:53:35,400 --> 00:53:36,100
points. 
There. 

932
00:53:36,100 --> 00:53:37,800
It is. 
Got a record-breaking heat 

933
00:53:37,800 --> 00:53:41,800
points in a single rate, right? 
Because of this particular 

934
00:53:41,800 --> 00:53:46,500
skill, right? 
So the idea was what is it that 

935
00:53:46,500 --> 00:53:49,200
he's bringing to the table. 
And when is it that he gets out.

936
00:53:49,200 --> 00:53:53,000
So we separated all the reads 
into two kinds of raised 

937
00:53:53,500 --> 00:53:58,300
successful raids by / deep where
the defender Dash. 

938
00:53:58,300 --> 00:54:02,100
Is that him or hold him. 
Whatever, he does and 

939
00:54:02,500 --> 00:54:06,200
unsuccessful rates by per week. 
And we looked at these two data 

940
00:54:06,200 --> 00:54:09,300
sets and we marked out different
parts of his body, where his 

941
00:54:09,300 --> 00:54:11,400
center of gravity was, you know,
what? 

942
00:54:11,400 --> 00:54:14,900
He was doing during the raid 
across time, right? 

943
00:54:15,400 --> 00:54:21,600
And that's when this result kind
of grew up that every time, a 

944
00:54:21,600 --> 00:54:26,900
beeps balance is kind of off 
that is his center of gravity is

945
00:54:27,100 --> 00:54:32,100
outside his body and at at that 
particular point when the dash 

946
00:54:32,100 --> 00:54:35,600
happens from the defender, he 
gets thrown off the court and 

947
00:54:35,600 --> 00:54:40,300
the defender wins the point. 
And now this is incredibly tough

948
00:54:40,400 --> 00:54:44,100
for us to communicate with the 
cupboard deep layers or 

949
00:54:44,100 --> 00:54:48,100
Defenders of the team. 
We were working with, right? 

950
00:54:48,400 --> 00:54:54,900
And so we try to simplify it and
told them that every time he's 

951
00:54:54,900 --> 00:54:58,200
on one foot. 
That is when the dash has to 

952
00:54:58,200 --> 00:55:01,900
land, but I don't know whether 
we were able to communicate it. 

953
00:55:02,200 --> 00:55:05,800
Anyway, we like I told you he 
wrote a white paper about it and

954
00:55:05,800 --> 00:55:10,100
we documented this result. 
But and we simplified this 

955
00:55:10,100 --> 00:55:13,000
insight to the team saying that 
make sure that / deep is the 

956
00:55:13,000 --> 00:55:16,800
third player out so that he 
never comes back before you get 

957
00:55:16,800 --> 00:55:18,200
the team out. 
Okay? 

958
00:55:18,200 --> 00:55:19,000
Okay. 
Okay. 

959
00:55:19,100 --> 00:55:21,800
Okay. 
It is about the communication. 

960
00:55:21,800 --> 00:55:23,400
There would have been like real 
challenge. 

961
00:55:23,400 --> 00:55:25,600
I'm guessing because it these 
are like comedy players. 

962
00:55:26,000 --> 00:55:27,700
I don't think. 
Anybody would have spoken 

963
00:55:27,700 --> 00:55:31,200
English, you would have to kind 
of like we have in sport itself.

964
00:55:31,200 --> 00:55:35,600
That is the case, right? 
Karthik, in sport itself. 

965
00:55:35,600 --> 00:55:39,400
I felt communication of data 
Concepts was incredibly tough. 

966
00:55:39,900 --> 00:55:44,600
So we try to break it down into 
saying that puts people 

967
00:55:44,600 --> 00:55:47,700
understand videos better. 
So we try to play a lot of 

968
00:55:47,700 --> 00:55:52,000
videos and they try to, you 
know, they have to gather the 

969
00:55:52,000 --> 00:55:54,700
inside themselves. 
So we try to show a systematic 

970
00:55:54,700 --> 00:55:56,400
set of videos from which they 
can pick. 

971
00:55:56,600 --> 00:56:00,800
That sort of insight and that 
worked in the case of badminton 

972
00:56:00,900 --> 00:56:05,100
as well as a birdie in 
badminton. 

973
00:56:06,000 --> 00:56:07,300
They were some other things, 
right? 

974
00:56:07,300 --> 00:56:12,500
The coach really helped us. 
So we wanted like a player like 

975
00:56:12,500 --> 00:56:16,800
Saina Nehwal to wait before she 
puts her so bright, so the coach

976
00:56:16,800 --> 00:56:20,300
does wrote on a piece of paper 
saying tie your shoelace every 

977
00:56:20,300 --> 00:56:22,200
Five Points. 
So that's the way he 

978
00:56:22,200 --> 00:56:25,200
communicated that message to him
resting interesting. 

979
00:56:25,400 --> 00:56:28,400
Yeah, right. 
And so yeah, communication is 

980
00:56:28,400 --> 00:56:31,600
stuff. 
So we've created tools where the

981
00:56:31,600 --> 00:56:35,400
players can, you know, see one 
set of videos and we can bias 

982
00:56:35,400 --> 00:56:38,500
them to see some particular 
kinds of videos so that they can

983
00:56:38,500 --> 00:56:42,700
in green that insult was very 
interesting because I will, I 

984
00:56:42,700 --> 00:56:45,400
mean, I'm sort of how many 
General begun data, 

985
00:56:45,400 --> 00:56:48,200
visualization. 
So I keep talking about how you 

986
00:56:48,200 --> 00:56:49,900
need to make. 
It make the visualizations 

987
00:56:49,900 --> 00:56:52,000
subject. 
You control the narrative to the

988
00:56:52,000 --> 00:56:53,700
guy who see. 
So in some sense. 

989
00:56:53,700 --> 00:56:57,200
I mean, so in sport, let's say 
like, your you You can show a 

990
00:56:57,200 --> 00:57:00,400
bar graph that gives just tear 
the piece of paper instead. 

991
00:57:00,400 --> 00:57:01,900
You just have to show up a bunch
of videos. 

992
00:57:02,100 --> 00:57:05,300
So in that sense, I think your 
skill is in choosing the right 

993
00:57:05,300 --> 00:57:09,200
set of videos such that he can 
learn for himself from those 

994
00:57:09,200 --> 00:57:13,100
videos, what your data kind of 
showed you in. 

995
00:57:13,200 --> 00:57:14,800
Exactly. 
Exactly. 

996
00:57:14,900 --> 00:57:17,800
You hit the nail on the head, 
but it is not always easy. 

997
00:57:18,200 --> 00:57:21,200
But yeah, it will take time. 
There are players even in 

998
00:57:21,200 --> 00:57:23,600
kabaddi. 
They were players who actually 

999
00:57:23,600 --> 00:57:26,200
used to start following the 
videos try to. 

1000
00:57:26,500 --> 00:57:30,000
Prove themselves, but it has to 
come from within in the end. 

1001
00:57:30,000 --> 00:57:32,800
Right? 
Finally, then, yes, over to 

1002
00:57:32,800 --> 00:57:37,000
close the conversation if I were
doing a PhD now and I've figured

1003
00:57:37,000 --> 00:57:39,100
out that my academic career is 
going over. 

1004
00:57:39,100 --> 00:57:41,600
I don't want to do a postdoc. 
I don't want to play the 

1005
00:57:41,900 --> 00:57:44,800
academic game, its own, and I 
want to get into the industry 

1006
00:57:44,800 --> 00:57:48,400
and getting to data science. 
What would your advice be to me?

1007
00:57:48,400 --> 00:57:53,100
Like, how would you, how would 
you kind of advised me to go 

1008
00:57:53,100 --> 00:57:55,500
about my career? 
What kind of jobs to look out 

1009
00:57:55,500 --> 00:57:58,000
for what? 
In terms of work and so on. 

1010
00:57:58,300 --> 00:58:03,700
So I would say one of the things
which is essential if you need 

1011
00:58:03,700 --> 00:58:06,800
to come into a data science, 
role is to understand how 

1012
00:58:06,800 --> 00:58:12,200
products are built and if a 
little more of that can go into 

1013
00:58:12,200 --> 00:58:15,400
research while you're doing your
research, I feel even your 

1014
00:58:15,400 --> 00:58:19,200
research will become more 
compact, more usable down the 

1015
00:58:19,200 --> 00:58:23,000
line, and you are going to learn
a sort of skill, which is going 

1016
00:58:23,000 --> 00:58:24,400
to take keep you afloat. 
Right? 

1017
00:58:24,400 --> 00:58:26,400
You're going to be able to build
out a product. 

1018
00:58:26,500 --> 00:58:30,300
Duct and not just a small data 
solution. 

1019
00:58:30,300 --> 00:58:33,000
For the whole thing. 
That would be one skin. 

1020
00:58:33,500 --> 00:58:38,200
I I wish I had taken a when I 
was doing my research. 

1021
00:59:00,000 --> 00:59:02,000
Thank you for listening to data 
shatter. 

1022
00:59:02,600 --> 00:59:06,100
If you like this show, please 
leave a comment, share and 

1023
00:59:06,100 --> 00:59:09,400
subscribe to the podcast. 
You can find this podcast, an 

1024
00:59:09,400 --> 00:59:13,300
apple podcasts Spotify or 
wherever else you go to get your

1025
00:59:13,300 --> 00:59:16,200
podcasts. 
Once again, this is carp, 

1026
00:59:16,200 --> 00:59:17,800
exciting work. 
Thank you.

