1
00:00:00,000 --> 00:00:03,000
So it's no longer the case that 
you just need one technology 

2
00:00:03,000 --> 00:00:05,100
that we do all of it. 
In fact, in the cloud. 

3
00:00:05,400 --> 00:00:08,100
You will have completely 
dedicated Technologies, which 

4
00:00:08,100 --> 00:00:09,200
are meant to do. 
Exactly. 

5
00:00:09,200 --> 00:00:11,000
That one thing. 
I would say. 

6
00:00:11,500 --> 00:00:15,400
It's actually par for the course
to re-engineer on, okay, that 

7
00:00:15,400 --> 00:00:17,600
happens all the time. 
This often does need to change, 

8
00:00:17,600 --> 00:00:20,200
right? 
I think that's some, in my view 

9
00:00:20,200 --> 00:00:24,600
inevitable, but I think we 
design principles, they hold the

10
00:00:24,600 --> 00:00:26,700
test of time. 
For instance, if your bathing 

11
00:00:26,700 --> 00:00:28,300
business is business decisions, 
right? 

12
00:00:28,300 --> 00:00:31,200
On top of it, you have to 
Absolutely, give the guarantee 

13
00:00:31,200 --> 00:00:33,600
that the data is correct by the 
data is wrong. 

14
00:00:34,000 --> 00:00:36,300
Then you making wrong decisions 
in potentially affect people's 

15
00:00:36,300 --> 00:00:56,400
lives in some form. 
Hello, and welcome to data. 

16
00:00:56,400 --> 00:00:59,600
Shatter the podcast on all 
things data. 

17
00:01:00,300 --> 00:01:03,700
This podcast is a series of 
conversations with experts and 

18
00:01:03,700 --> 00:01:06,400
Industry leaders in data. 
And each week. 

19
00:01:06,400 --> 00:01:09,300
We aim to unpack a different 
compartment of the data 

20
00:01:09,300 --> 00:01:12,000
suitcase. 
I am your host got the chassis 

21
00:01:12,000 --> 00:01:16,200
that I'm a blogger newspaper. 
Columnist book author and a 

22
00:01:16,208 --> 00:01:19,600
former data adds a jury 
consultant at currently head, 

23
00:01:19,600 --> 00:01:22,000
analytics, and business 
intelligence for delivery. 

24
00:01:22,300 --> 00:01:23,900
One of India's largest 
logistics. 

25
00:01:23,900 --> 00:01:26,700
Companies. 
You can follow me on Twitter at 

26
00:01:26,700 --> 00:01:32,900
Karthik s that is ka r. 
Th IKS and read my blog at no 

27
00:01:32,900 --> 00:01:41,200
Intruder.com., That is n. 0e n 
t, hu b, a.com all opinions 

28
00:01:41,200 --> 00:01:43,700
expressed in this podcast, 
belong to be and iPod touch base

29
00:01:43,800 --> 00:01:46,200
and it did not reflect the views
of any organizations. 

30
00:01:46,200 --> 00:01:49,000
We might be Associated. 
Nothing discussing this podcast 

31
00:01:49,000 --> 00:01:50,900
to be taken as financially for 
me like this. 

32
00:01:58,300 --> 00:02:01,100
I will put delivery one. 
India's largest logistics 

33
00:02:01,100 --> 00:02:05,200
companies at the time of 
recording, we deliver around 1 

34
00:02:05,200 --> 00:02:09,500
billion packages each day and 
each package gets scanned about 

35
00:02:09,500 --> 00:02:12,900
20 times, at least, as it makes 
its way from origin to 

36
00:02:12,900 --> 00:02:15,200
destination. 
So you can imagine the amount of

37
00:02:15,200 --> 00:02:18,300
data that we are dealing with. 
It's firmly in sort of Big Data 

38
00:02:18,300 --> 00:02:22,100
territory and as head of 
analytics, I have the job of 

39
00:02:22,200 --> 00:02:26,200
making sense of all this data 
and the first step in making 

40
00:02:26,200 --> 00:02:28,800
sense of data is to organize it 
effectively. 

41
00:02:30,100 --> 00:02:33,500
Today's guest is rangarajan, 
vasudevan founder, and CEO of 

42
00:02:33,500 --> 00:02:36,000
the data team. 
The anger was my classmate at 

43
00:02:36,000 --> 00:02:38,600
IIT Madras and then went on to 
do a master's in computer 

44
00:02:38,600 --> 00:02:40,500
science at the University of 
Michigan, Ann Arbor. 

45
00:02:41,300 --> 00:02:44,000
He was a founding engineer at 
Aster Data Systems, which was 

46
00:02:44,000 --> 00:02:48,500
acquired by teradata, in 2011, 
in 2015, vanga starting the data

47
00:02:48,500 --> 00:02:51,600
team, which is building, AI 
solutions for customer. 

48
00:02:51,600 --> 00:02:55,300
Intelligence is also a guest 
professor at IIT Madras where he

49
00:02:55,300 --> 00:02:58,900
teaches a course of Big Data, 
you can follow him on Twitter at

50
00:02:58,900 --> 00:03:05,100
ranga vasudeva. 
That is our Ang a Bas ude VA n. 

51
00:03:08,300 --> 00:03:10,900
What is Big Data? 
Well, I think we. 

52
00:03:12,100 --> 00:03:16,400
The original connotation used to
be about very large scale data 

53
00:03:16,700 --> 00:03:20,000
occurring at selling very high 
velocities. 

54
00:03:21,600 --> 00:03:25,300
And there was also this added 
connotation that this was data 

55
00:03:25,300 --> 00:03:30,000
that was traditionally beyond 
the way the classical database 

56
00:03:30,000 --> 00:03:33,800
is used to store data in an 
Excel sheets, in a spreadsheet, 

57
00:03:33,900 --> 00:03:38,000
row column that kind of format, 
but that that was when the 

58
00:03:38,000 --> 00:03:41,300
concept originated by back more 
than a decade ago decade and a 

59
00:03:41,300 --> 00:03:44,000
half ago. 
Yep, nowadays, if you really 

60
00:03:44,000 --> 00:03:48,600
think about it, most the most of
the conversation is around, 

61
00:03:49,000 --> 00:03:51,600
definitely the volume, and 
velocity, that's definitely 

62
00:03:51,600 --> 00:03:54,100
there. 
And then, there is this, this 

63
00:03:54,100 --> 00:03:58,400
added implication that there are
so many types of data that's 

64
00:03:58,400 --> 00:04:00,700
been generated in the, in the 
digital world. 

65
00:04:01,200 --> 00:04:07,400
Yeah, some of which is not very 
conducive for analysis by 

66
00:04:08,200 --> 00:04:12,700
regular methods. 
Right of of what Use as a 

67
00:04:12,700 --> 00:04:14,900
favorite visualization of the bi
to e. 

68
00:04:15,200 --> 00:04:17,700
So it's a broad term that 
encompasses all of this. 

69
00:04:18,100 --> 00:04:19,899
Yeah. 
For the most part. 

70
00:04:19,899 --> 00:04:24,200
I think the volume and velocity 
has stuck on as Key Properties. 

71
00:04:24,200 --> 00:04:26,300
I would say thank you. 
Okay. 

72
00:04:26,300 --> 00:04:28,400
So let's just get to the problem
statement. 

73
00:04:28,400 --> 00:04:29,300
Right? 
I mean if you think about the 

74
00:04:29,300 --> 00:04:32,500
problem statement, is that like 
you but nowadays it's you have a

75
00:04:32,500 --> 00:04:35,100
lot of data flowing around. 
It's easy to collect the data. 

76
00:04:35,400 --> 00:04:38,700
It's easy to store the data 
because of the so called Big 

77
00:04:38,700 --> 00:04:40,300
Data and things like that, 
right? 

78
00:04:40,500 --> 00:04:44,100
And cloud and All that. 
So how are you supposed to sort 

79
00:04:44,100 --> 00:04:45,300
of organ? 
Now? 

80
00:04:45,300 --> 00:04:48,400
The reason we collected data and
store, it is with the hope that 

81
00:04:48,400 --> 00:04:50,500
someday we'll be able to make 
good use of it. 

82
00:04:50,700 --> 00:04:53,500
That someday someday could be 
tomorrow, but like that, we 

83
00:04:53,500 --> 00:04:56,200
could do some analysis, make 
some use of it there, different 

84
00:04:56,200 --> 00:04:59,000
kinds of analysis that you can 
do on the data and so on so 

85
00:04:59,000 --> 00:05:01,700
hard, even with the cloud and 
big data and so on. 

86
00:05:01,700 --> 00:05:04,400
How do you organize the data? 
However, what are the different 

87
00:05:04,400 --> 00:05:07,300
sort of mechanisms in which data
is organized? 

88
00:05:07,300 --> 00:05:10,300
And like how has that evolved 
over the last 10? 

89
00:05:10,300 --> 00:05:11,700
15 years ago? 
Yeah, I guess. 

90
00:05:12,000 --> 00:05:13,700
That's the one who doesn't 
things there. 

91
00:05:14,900 --> 00:05:18,800
Firstly, I think one of the key 
motivations right for why why 

92
00:05:18,800 --> 00:05:22,100
big data on makes sense the way 
it does right now is that 

93
00:05:23,700 --> 00:05:26,900
storage is always become cheaper
and cheaper right as a years of 

94
00:05:26,900 --> 00:05:31,400
program. 
So the topic of whether to store

95
00:05:31,400 --> 00:05:33,800
the data has become almost like 
a no-brainer for most companies 

96
00:05:33,800 --> 00:05:36,400
and even individuals, right? 
You get a 1 terabyte hard drive 

97
00:05:36,400 --> 00:05:39,500
for what three hundred bucks. 
That's really really cheap. 

98
00:05:39,800 --> 00:05:42,100
Yeah. 
So then the question is Is it 

99
00:05:42,100 --> 00:05:44,000
enough that you're able to store
the data? 

100
00:05:44,000 --> 00:05:47,100
Obviously not right? 
But yeah, in at least a decade 

101
00:05:47,100 --> 00:05:50,800
ago, it used to be the case that
people were throwing away data. 

102
00:05:51,100 --> 00:05:54,500
And that was because the 
traditional methods of storing 

103
00:05:54,500 --> 00:05:57,900
data which was databases and 
things that are or even storage 

104
00:05:57,900 --> 00:06:01,400
and networks and network 
attached storage is and things 

105
00:06:01,400 --> 00:06:05,200
that all of them were fairly 
expensive when it comes to the 

106
00:06:05,200 --> 00:06:07,900
form factor versus the amount of
money that you're actually 

107
00:06:07,900 --> 00:06:09,100
paying for it. 
Yeah. 

108
00:06:09,200 --> 00:06:12,400
So as technology has evolved as 
it became more Monetize, the 

109
00:06:12,400 --> 00:06:17,100
storage equation. 
So it's no longer considered bad

110
00:06:17,100 --> 00:06:18,900
practice, adjust or how much 
ever you want. 

111
00:06:19,900 --> 00:06:25,400
But then, I think the Crux of 
why Big Data makes sense in 

112
00:06:25,400 --> 00:06:26,700
conjunction with something like 
the cloud. 

113
00:06:26,700 --> 00:06:30,400
Is that along with the fact that
you need to be able to store? 

114
00:06:30,500 --> 00:06:34,100
You also need to be able to plug
in play, extremely different 

115
00:06:34,100 --> 00:06:39,100
types of accessing energy. 
Yeah, not all forms of accesses 

116
00:06:39,100 --> 00:06:43,500
are going to be applicable for 
During every consumer or every 

117
00:06:43,500 --> 00:06:46,300
every user of the data, like 
some are going to be really 

118
00:06:46,300 --> 00:06:50,200
proficient with like, Hands-On 
programming while some like the 

119
00:06:50,200 --> 00:06:53,200
business users will just need 
like snapshots of what they need

120
00:06:54,200 --> 00:06:56,800
and they'll be people in between
right who know some amount of 

121
00:06:56,800 --> 00:06:58,400
programming. 
Let's say a declarative stuff, 

122
00:06:58,400 --> 00:07:00,800
like a skewering. 
Yeah, and they'll always be 

123
00:07:01,000 --> 00:07:02,500
some. 
Some folks who are data, 

124
00:07:02,500 --> 00:07:05,100
scientists A machine-learning 
Engineers or statisticians, who 

125
00:07:05,100 --> 00:07:07,400
don't really understand the 
weeds of like, distributed 

126
00:07:07,400 --> 00:07:09,100
programming or anything like 
that, but they still need to be 

127
00:07:09,100 --> 00:07:13,500
able to access the data. 
So what, Therefore emergent was 

128
00:07:13,900 --> 00:07:18,100
this need to unlock agility. 
Like yep, and the unlocking of 

129
00:07:18,100 --> 00:07:21,000
agility first started with 
Hadoop and Technologies like 

130
00:07:21,000 --> 00:07:23,300
that. 
We should all predominantly on 

131
00:07:23,300 --> 00:07:26,300
for my strike in the data center
kind of technology is okay. 

132
00:07:26,300 --> 00:07:28,900
So a little bit. 
How do emerge as an onto my 

133
00:07:28,900 --> 00:07:29,400
sting? 
Is it? 

134
00:07:29,400 --> 00:07:30,700
It didn't. 
It's not Cloud native. 

135
00:07:30,700 --> 00:07:36,100
I do when it first came in, it 
was done by Yahoo to basically 

136
00:07:36,400 --> 00:07:40,200
do a better job of indexing, 
and, and retrieving their 

137
00:07:40,200 --> 00:07:43,300
internal search data. 
Yeah, right. 

138
00:07:43,300 --> 00:07:47,800
And this was all done on Prem. 
So in fact at that time, how do 

139
00:07:47,800 --> 00:07:50,700
became popular because the old 
school way of doing things 

140
00:07:51,100 --> 00:07:53,300
classical data warehouse, 
database ways of doing things 

141
00:07:53,300 --> 00:07:55,500
were just extremely restrictive,
right? 

142
00:07:55,500 --> 00:07:57,200
The technology was getting in 
the way, right? 

143
00:07:57,200 --> 00:07:59,500
Yeah. 
So I took was one of the first 

144
00:08:00,900 --> 00:08:03,600
Big Data technology so to speak,
which, which didn't get in the 

145
00:08:03,600 --> 00:08:06,400
way, you know, in a sense of 
little of the anybody to just 

146
00:08:06,700 --> 00:08:10,300
put all kinds of things on top 
of it, to go answers and process

147
00:08:10,300 --> 00:08:13,300
and do things with it, right at 
the - fundamental level that was

148
00:08:13,300 --> 00:08:17,600
a very, very attractive price. 
So as a result it caught on like

149
00:08:17,600 --> 00:08:21,900
wildfire obviously the commodity
pricing model all of that helped

150
00:08:22,600 --> 00:08:25,800
so it caught on like wildfire 
became extremely popular, but 

151
00:08:25,800 --> 00:08:27,900
then it imposed other kinds of 
costs. 

152
00:08:28,100 --> 00:08:32,200
Okay, and those kinds of costs 
were like, maybe we can discuss 

153
00:08:32,200 --> 00:08:33,799
that separately, right? 
But that's a different examples,

154
00:08:33,799 --> 00:08:36,700
are the costs related to data 
processing and what does it mean

155
00:08:36,700 --> 00:08:39,900
to reason about data correctness
and things that all of that 

156
00:08:39,900 --> 00:08:43,900
become became? 
When problem statements, now, 

157
00:08:43,900 --> 00:08:49,200
with Cloud again, we don't 
journey of cloud started off. 

158
00:08:49,200 --> 00:08:52,400
As, you know, let me just run to
your virtual machine, but then 

159
00:08:53,000 --> 00:08:57,900
as things progressed as Big Data
evolved in its parallel track 

160
00:08:57,900 --> 00:09:01,400
Cloud started to pick up and 
say, look, I have all this years

161
00:09:01,400 --> 00:09:06,200
of experience of having done, 
Big Data, well and understand 

162
00:09:06,200 --> 00:09:08,100
understanding all these other 
ways in which big data is 

163
00:09:08,100 --> 00:09:10,500
failing, right? 
And I know, of course is how the

164
00:09:10,500 --> 00:09:13,700
classical Technologies our 
fingers That have really black 

165
00:09:13,700 --> 00:09:15,600
box. 
Two people have let me actually 

166
00:09:15,600 --> 00:09:19,800
learn from the best of both 
worlds and bringing those 

167
00:09:19,900 --> 00:09:21,800
features and benefits. 
Right? 

168
00:09:21,800 --> 00:09:24,600
And so that's the, I guess the 
single biggest value proposition

169
00:09:24,600 --> 00:09:27,500
that cloud is brought to the 
data table, which is that you 

170
00:09:27,500 --> 00:09:30,100
get all the editing of variety 
of different types of access 

171
00:09:30,100 --> 00:09:32,900
patterns. 
And at the same time, if you 

172
00:09:32,900 --> 00:09:35,800
need that, if you need for 
speed, right? 

173
00:09:35,900 --> 00:09:38,300
Because of organizing it in the 
certain way, which is extremely 

174
00:09:38,300 --> 00:09:40,300
optimized for it. 
You can do that as well. 

175
00:09:40,300 --> 00:09:45,600
Right, so, Cloud has now gotten 
the best of both of these 

176
00:09:45,600 --> 00:09:48,200
old-school worlds and India, uh,
agility. 

177
00:09:48,500 --> 00:09:51,500
And at the same time it's gotten
his own complexity, right? 

178
00:09:51,800 --> 00:09:55,500
Always with Cloud things are not
as easy as it seems on the 

179
00:09:55,500 --> 00:09:57,400
surface. 
So that complexity is also 

180
00:09:57,400 --> 00:10:01,700
something you have to plan for. 
So very, I would say, also will 

181
00:10:01,700 --> 00:10:04,800
calm, submissive become easier, 
but there are newer challenges 

182
00:10:04,800 --> 00:10:05,900
that have come in because of 
cloud. 

183
00:10:06,600 --> 00:10:07,600
Got it. 
Okay? 

184
00:10:07,600 --> 00:10:10,800
Now basically, I again going 
back to fundamentals. 

185
00:10:10,800 --> 00:10:13,100
I think we'll keep Back to find 
a bit easier here. 

186
00:10:13,100 --> 00:10:16,400
But like, so we are collecting 
all the data that we can get. 

187
00:10:16,400 --> 00:10:19,500
And then we want to store it in 
a way that is easy to access and

188
00:10:19,500 --> 00:10:23,300
analyze and so on later on. 
So in your view from what you 

189
00:10:23,300 --> 00:10:25,500
have seen, what are the 
different use cases for which 

190
00:10:25,500 --> 00:10:28,000
people want to access data? 
What are the different kinds of 

191
00:10:28,008 --> 00:10:31,000
things that very broadly 
speaking across domains and 

192
00:10:31,000 --> 00:10:32,600
things like that? 
What are the different kinds of 

193
00:10:32,608 --> 00:10:35,600
things that people want to do 
with data that has been sort of 

194
00:10:35,700 --> 00:10:38,400
just collected and dubbed in one
place and so on. 

195
00:10:39,200 --> 00:10:42,500
Yeah, I think the first thing to
do, First thing that everybody 

196
00:10:42,500 --> 00:10:45,100
wants across the Enterprise has 
to reason about what is it 

197
00:10:45,100 --> 00:10:48,300
existing there? 
Which what it exists, right? 

198
00:10:48,300 --> 00:10:50,600
Yep. 
Yes, like the the first use case

199
00:10:51,000 --> 00:10:56,100
and that's it's almost like a 
like a prerequisite step died. 

200
00:10:56,100 --> 00:10:57,900
Even before you start to 
planning start to plan other 

201
00:10:57,900 --> 00:11:03,000
kinds of things. 
So but classical beginner world 

202
00:11:03,900 --> 00:11:06,400
even reasoning about what exists
is actually a very hard problem.

203
00:11:07,400 --> 00:11:11,200
The reason being for see that 
there were the volume of the 

204
00:11:11,200 --> 00:11:14,300
data, just So large and 
secondly, the fact that it's 

205
00:11:14,300 --> 00:11:16,400
spread out, and at the end of 
the day, it's just a file 

206
00:11:16,400 --> 00:11:18,900
system. 
So, whatever you put in, is what

207
00:11:18,900 --> 00:11:20,700
you get. 
So garbage in garbage out into 

208
00:11:20,700 --> 00:11:21,900
those kinds of issues are also 
there. 

209
00:11:22,500 --> 00:11:24,700
So, okay, let's first use case. 
The Second Use case is the fact 

210
00:11:24,700 --> 00:11:29,300
that you now have to start to 
know what has happened recently.

211
00:11:29,600 --> 00:11:32,100
Yeah, and how is that comparing 
with what has happened in the 

212
00:11:32,100 --> 00:11:34,000
past? 
Yeah, and that's a very common 

213
00:11:34,000 --> 00:11:37,000
question that most decision 
makers at the business level. 

214
00:11:37,000 --> 00:11:39,300
Ask me. 
That's what we would call, SBI 

215
00:11:39,300 --> 00:11:40,700
devoting and things that are 
right. 

216
00:11:41,900 --> 00:11:46,200
Then the third set of use cases 
is accelerating and your, you 

217
00:11:46,200 --> 00:11:50,000
start to have the analytics 
types who want to look at things

218
00:11:50,000 --> 00:11:51,900
like, okay? 
Yeah, I have this hypothesis. 

219
00:11:51,900 --> 00:11:55,200
Let me go test it out. 
So, they want to know compare 

220
00:11:55,200 --> 00:11:56,800
what happened. 
Let's say, lastly, while he was 

221
00:11:56,800 --> 00:11:59,100
just this, the vale here and 
let's see, what are the 

222
00:11:59,100 --> 00:12:00,200
differences in fraud. 
Right? 

223
00:12:00,200 --> 00:12:03,800
As an example. 
We have that's very exploratory,

224
00:12:03,800 --> 00:12:07,400
simulations LED, right? 
Then the fourth set of use 

225
00:12:07,400 --> 00:12:10,500
cases, are people who want to 
learn from data and sort of 

226
00:12:10,500 --> 00:12:13,500
predict things in the future. 
Yep, that's the the data like 

227
00:12:13,500 --> 00:12:15,200
this model building here, kind 
of tight. 

228
00:12:16,500 --> 00:12:16,800
Right? 
Then. 

229
00:12:16,800 --> 00:12:20,700
You also have the special 
functions. 

230
00:12:20,700 --> 00:12:24,100
I would say these special 
functions are monitoring things 

231
00:12:24,100 --> 00:12:26,800
like overall risk to the 
company. 

232
00:12:26,900 --> 00:12:29,000
Yeah, right. 
Overall Financial is and things 

233
00:12:29,000 --> 00:12:33,100
that aren't even answering 
questions post by regulatory 

234
00:12:33,100 --> 00:12:35,100
bodies, right? 
And that's a very common tasks. 

235
00:12:35,400 --> 00:12:38,200
You have been you start to get 
into those kinds of special 

236
00:12:38,200 --> 00:12:41,600
requests. 
So those also are critical. 

237
00:12:41,700 --> 00:12:45,500
Use cases that are going to be 
put as a subset of analytics. 

238
00:12:45,500 --> 00:12:48,900
In some sense. 
This is not as so it. 

239
00:12:48,900 --> 00:12:53,300
Actually it's more. 
It's more a data dump like right

240
00:12:53,300 --> 00:12:57,100
here, right as opposed to as 
opposed to anything else, right?

241
00:12:57,100 --> 00:12:59,000
That's why it doesn't fall 
neatly into one of those other 

242
00:12:59,000 --> 00:13:01,500
categories here. 
The example here as a telecom 

243
00:13:01,500 --> 00:13:06,100
provider like you could get an F
IR to you know to to share data 

244
00:13:06,100 --> 00:13:08,900
of 15 listed delcambre numbers, 
right? 

245
00:13:08,900 --> 00:13:11,600
Mobile numbers. 
Yeah, so that's a glittery. 

246
00:13:11,900 --> 00:13:14,800
You have to give it a time or 
the police authorities. 

247
00:13:14,900 --> 00:13:17,600
So there's no analytics. 
Are you just you just have to 

248
00:13:17,600 --> 00:13:19,900
dump it, but that will be from 
any time in the last seven years

249
00:13:19,900 --> 00:13:21,200
has an example, right? 
Right. 

250
00:13:21,200 --> 00:13:25,700
Yeah. 
Yeah, and the last use is one of

251
00:13:25,700 --> 00:13:27,900
the very important uses is also 
worked. 

252
00:13:28,000 --> 00:13:32,000
Can I keep improving my internal
systems by analyzing the exhaust

253
00:13:32,000 --> 00:13:34,600
that's coming from it. 
Yeah, and that is also being 

254
00:13:34,600 --> 00:13:37,900
stored in the same play. 
So now, I mean, obviously, I 

255
00:13:37,908 --> 00:13:41,000
mean the way I see it and so on,
like the way you access data for

256
00:13:41,000 --> 00:13:43,100
defeat, Each of these different 
things is very different. 

257
00:13:43,100 --> 00:13:46,200
So for example, if you were to 
look at a bi use case, you 

258
00:13:46,200 --> 00:13:49,800
probably only only need the last
few days of data for some some, 

259
00:13:49,800 --> 00:13:52,400
some stuff, some other things. 
You'll need a longer time 

260
00:13:52,400 --> 00:13:54,300
period. 
So it's almost like you are to 

261
00:13:54,400 --> 00:13:56,100
take a little from your reading 
from there and so on. 

262
00:13:56,100 --> 00:13:59,500
So it's a different sort of an 
accessing things like that for 

263
00:13:59,500 --> 00:14:01,000
your special functions thing 
that you mentioned. 

264
00:14:01,000 --> 00:14:03,000
There. 
It is like going to some 

265
00:14:03,000 --> 00:14:08,100
particular place in the past and
like and then getting stuff out 

266
00:14:08,100 --> 00:14:12,000
of there and it's like going to 
some unused basement and Finding

267
00:14:12,000 --> 00:14:14,000
some file like they show in the 
movies and things like that. 

268
00:14:14,000 --> 00:14:18,900
I'm guessing so how so I'm 
assume that the way you need to 

269
00:14:19,200 --> 00:14:21,700
store and structure. 
Your data is very, very 

270
00:14:21,700 --> 00:14:24,800
different based on which of 
these needs are dominant. 

271
00:14:24,800 --> 00:14:26,800
So and I assume there's a sort 
of a trade-off. 

272
00:14:26,800 --> 00:14:29,900
So what are the sort of the real
trade-offs here? 

273
00:14:29,900 --> 00:14:33,200
What are the Technologies? 
Or I would call it Technologies,

274
00:14:33,200 --> 00:14:35,400
but what are the principles 
According to which you organize 

275
00:14:35,400 --> 00:14:40,700
the data to optimize them for 
some of these different 

276
00:14:41,300 --> 00:14:42,300
combinations? 
And use cases. 

277
00:14:43,500 --> 00:14:46,000
And I really like the word. 
Do you use to of fundamentals? 

278
00:14:46,000 --> 00:14:47,900
Right? 
Because I think this is exactly 

279
00:14:47,900 --> 00:14:50,300
is a fundamental design 
principles for data 

280
00:14:50,300 --> 00:14:51,400
architecture. 
Yep. 

281
00:14:51,500 --> 00:14:55,400
You have to think about all the 
different use cases that that 

282
00:14:55,400 --> 00:14:57,500
are typically existing just from
a consumption point of view, 

283
00:14:57,500 --> 00:15:01,700
right? 
And and cater to it with with a 

284
00:15:01,700 --> 00:15:03,700
with a very important design 
principle, which is that what is

285
00:15:03,700 --> 00:15:05,500
the least common denominator? 
Right? 

286
00:15:05,500 --> 00:15:08,000
Which actually serves all of 
these needs. 

287
00:15:08,500 --> 00:15:11,100
And if you take that LCD, kind 
of an approach, you will see 

288
00:15:11,100 --> 00:15:14,800
that - 6 natural into that site,
which is that. 

289
00:15:14,900 --> 00:15:16,500
Let's just collect all of it 
together. 

290
00:15:17,700 --> 00:15:22,200
I'll beat with some curation. 
I can't just dump it all and 

291
00:15:22,200 --> 00:15:25,000
figure it out later on. 
You have to put some curation, 

292
00:15:25,000 --> 00:15:29,700
put some boundary around it. 
And those are all not just a 

293
00:15:29,708 --> 00:15:32,500
property of the use case, but 
also just good hygiene, right? 

294
00:15:32,500 --> 00:15:34,300
You need to know. 
You need to know how you're 

295
00:15:34,300 --> 00:15:36,800
getting something. 
Otherwise do not appreciate the 

296
00:15:36,800 --> 00:15:40,100
value of getting it right? 
That's a almost make it a life 

297
00:15:40,100 --> 00:15:43,100
principle. 
Yeah, but If you operate the LCD

298
00:15:43,700 --> 00:15:46,100
LCD, thought process, like 
Occam's razor, or something like

299
00:15:46,100 --> 00:15:47,800
that. 
The first thing that happens is,

300
00:15:47,800 --> 00:15:49,100
you know, we built a big kid 
earlier. 

301
00:15:49,400 --> 00:15:51,800
Yeah. 
Now, once a player is done, then

302
00:15:51,900 --> 00:15:53,200
each of these different use 
cases. 

303
00:15:53,200 --> 00:15:57,300
Actually kind of Branch out into
their own storage options and 

304
00:15:57,300 --> 00:15:58,800
optimization of things like it. 
Rightly. 

305
00:15:58,900 --> 00:16:02,900
I do too, right. 
And here, the nature of 

306
00:16:03,800 --> 00:16:09,300
Technologies has as completely, 
you know, I would say evolve 

307
00:16:09,300 --> 00:16:13,500
over the last 10 years, okay. 
It's no longer the case that you

308
00:16:13,500 --> 00:16:15,400
just need one technology that 
will do all of it. 

309
00:16:15,500 --> 00:16:18,200
In fact in the cloud. 
You will have completely 

310
00:16:18,200 --> 00:16:20,300
dedicated Technologies, which 
are meant to do. 

311
00:16:20,300 --> 00:16:23,900
Exactly that one thing and also 
sort of we have a very good 

312
00:16:23,900 --> 00:16:26,500
price point right to make it 
justifiable for you to kind of 

313
00:16:26,500 --> 00:16:28,400
invest in that kind of 
Technology the first days. 

314
00:16:28,800 --> 00:16:33,000
So yeah, if you just want to 
give you just go use case, but 

315
00:16:33,000 --> 00:16:38,700
use case for the vi side, you 
would absolutely need a way to 

316
00:16:39,600 --> 00:16:42,200
curate data even more than what 
you've done in the Big Data 

317
00:16:42,200 --> 00:16:44,700
layer, right? 
And here, the creation that 

318
00:16:44,700 --> 00:16:48,700
you're looking to do is not just
reason about where the data came

319
00:16:48,700 --> 00:16:52,000
from. 
But also reason about how 

320
00:16:52,000 --> 00:16:54,700
different representations of the
same data look like, right? 

321
00:16:54,900 --> 00:16:58,800
And therefore, what do I 
triangulate and and promote a 

322
00:16:58,808 --> 00:17:01,600
steam, the representation that I
want, right? 

323
00:17:01,600 --> 00:17:05,500
For my business. 
So that's a crucial task, right?

324
00:17:05,500 --> 00:17:08,500
And then in a conventional 
sense, it's called as data 

325
00:17:08,500 --> 00:17:11,300
integration. 
But yeah, that's a that's a 

326
00:17:11,300 --> 00:17:13,800
crucial topic that you have to 
address and there are 

327
00:17:13,808 --> 00:17:16,500
Technologies which are very good
at, you know, optimize can be 

328
00:17:16,599 --> 00:17:20,099
very good at doing data 
integration and here again, 

329
00:17:20,099 --> 00:17:23,200
there are two angles to it, 
which is the fact that many 

330
00:17:23,200 --> 00:17:26,000
businesses are, okay? 
Looking at, you know, data, as 

331
00:17:26,000 --> 00:17:28,200
of yesterday, looking back in 
time. 

332
00:17:28,300 --> 00:17:31,800
Yeah, right, but then we always 
businesses like, for instance. 

333
00:17:31,800 --> 00:17:34,900
I guess you don't employ a right
who wants always real time data.

334
00:17:35,100 --> 00:17:38,000
Yep, I guess hosting and data of
what is happening. 

335
00:17:38,800 --> 00:17:40,200
Not is this. 
Ali analytics and machine 

336
00:17:40,200 --> 00:17:43,000
learning but just give me 
today's date are giving house. 

337
00:17:43,000 --> 00:17:44,200
Clothing company. 
Yeah. 

338
00:17:44,200 --> 00:17:47,200
We need to know, like you asked 
me where your packages for that.

339
00:17:47,200 --> 00:17:49,800
We need to know where each 
package has real-time and so on.

340
00:17:49,800 --> 00:17:51,700
So we see obviously we need the 
data. 

341
00:17:52,400 --> 00:17:53,600
Exactly. 
And you know, you want to know 

342
00:17:53,600 --> 00:17:55,900
how many packages are being 
missed because of the storm 

343
00:17:55,900 --> 00:17:57,500
happening in Gujarat, right? 
Exactly. 

344
00:17:57,500 --> 00:18:00,000
So those are extremely important
real-time data. 

345
00:18:00,100 --> 00:18:03,400
And I would say that real-time 
traffic also used to be a 

346
00:18:03,400 --> 00:18:06,900
problem with classical 
Technologies, but nowadays there

347
00:18:06,900 --> 00:18:09,200
are options available. 
Even for that in conjunction. 

348
00:18:09,400 --> 00:18:12,200
Would be decision-making that 
needs to happen on the vi side. 

349
00:18:13,200 --> 00:18:16,300
So on the pi site, I mean this 
is this is what one set of 

350
00:18:16,308 --> 00:18:19,800
Technologies would be would be 
doing machine learning. 

351
00:18:20,000 --> 00:18:22,900
I mean, obviously there are many
many Technologies each having 

352
00:18:22,900 --> 00:18:26,400
its own pluses and minuses, but 
at the heart of it, they all 

353
00:18:26,400 --> 00:18:28,700
work off the LCD, which is the 
beginner storage layer. 

354
00:18:28,900 --> 00:18:30,900
Right here, very few people 
require. 

355
00:18:31,000 --> 00:18:33,400
I mean, what if you get assigned
some machine learning technology

356
00:18:33,400 --> 00:18:36,500
is really require you to curate 
in an additional storage layer. 

357
00:18:37,100 --> 00:18:40,100
May be there optimizations, 
which are used Abel like, for 

358
00:18:40,100 --> 00:18:43,700
instance, being able to stand up
data in memory so that you can 

359
00:18:43,700 --> 00:18:46,800
hydrate much faster on it. 
Yeah, that's a, that's a pretty 

360
00:18:46,800 --> 00:18:49,200
useful technology. 
Some, some technologies have it.

361
00:18:50,200 --> 00:18:53,500
The other angle here is if you 
want to do really large-scale, 

362
00:18:53,500 --> 00:18:56,500
complex data, science 
occurrences, apply, a team 

363
00:18:56,500 --> 00:18:59,200
Learning Network. 
Then you need other mechanisms 

364
00:18:59,200 --> 00:19:02,400
of exchanging data, so that the 
volume does not, you know, flood

365
00:19:02,400 --> 00:19:04,800
you down and the task actually 
has a reasonable chance of 

366
00:19:04,808 --> 00:19:07,400
completing their. 
Our technology is available for 

367
00:19:07,400 --> 00:19:09,200
that as well. 
But those are again. 

368
00:19:09,400 --> 00:19:11,800
She likes technology is not 
really none of the men that you 

369
00:19:11,800 --> 00:19:15,400
would use on a daily basis. 
And in some of those 

370
00:19:15,400 --> 00:19:17,200
Technologies, obviously the 
computer, uh, plays a role as 

371
00:19:17,200 --> 00:19:17,800
well. 
Right? 

372
00:19:17,800 --> 00:19:20,000
If you would only use, he use 
whenever you can. 

373
00:19:20,200 --> 00:19:23,200
So that thing's gets better past
other things that I'm on the 

374
00:19:23,200 --> 00:19:25,700
regulatory side, the special 
function side. 

375
00:19:26,400 --> 00:19:29,300
What you're looking for is 
effectively a way to go, go back

376
00:19:29,300 --> 00:19:32,200
to archives. 
Like I said, and go look at that

377
00:19:32,200 --> 00:19:36,000
specific needle in the haystack,
or, you know, something like a 

378
00:19:36,008 --> 00:19:38,700
very specific type of query 
which is, which is not, which is

379
00:19:38,708 --> 00:19:42,100
not something that, Look at the 
storage is optimized for so in 

380
00:19:42,100 --> 00:19:44,900
those kinds of setups. 
It's more important that you 

381
00:19:44,908 --> 00:19:47,100
give an answer less important 
that you give an answer 

382
00:19:47,100 --> 00:19:48,100
immediately. 
Of course. 

383
00:19:48,100 --> 00:19:49,400
Yep. 
Yep. 

384
00:19:49,400 --> 00:19:52,700
So therefore the trade-offs are 
different there, right? 

385
00:19:53,000 --> 00:19:54,200
Yeah. 
Okay. 

386
00:19:54,200 --> 00:19:58,200
Yeah. 
So again, I think I'm a little 

387
00:19:58,200 --> 00:20:00,100
more into this. 
It's basically the obviously 

388
00:20:00,100 --> 00:20:03,200
like different companies will 
have like different use cases. 

389
00:20:03,200 --> 00:20:06,500
Obviously one company might we 
may not be doing any machine 

390
00:20:06,500 --> 00:20:10,100
learning for example, and so we 
might have organized our In one 

391
00:20:10,100 --> 00:20:13,500
way, so what happens? 
Let's say because companies also

392
00:20:13,500 --> 00:20:16,000
evolve over time brake. 
So for example, we might just 

393
00:20:16,000 --> 00:20:18,400
start by doing some tactical 
dashboards and stuff. 

394
00:20:18,400 --> 00:20:21,600
Then later on, we decide to add 
machine learning and then find 

395
00:20:21,600 --> 00:20:26,500
that data scientists at the cost
of their queries is either is 

396
00:20:26,500 --> 00:20:29,400
very high either in terms of 
time or in terms of dollars or 

397
00:20:29,400 --> 00:20:31,700
whatever. 
And then like, let's see another

398
00:20:31,700 --> 00:20:33,700
day. 
We start an analytic function 

399
00:20:33,700 --> 00:20:37,000
like it's a sort of a typical 
Evolution that a lot of 

400
00:20:37,000 --> 00:20:39,100
companies go through. 
I mean, not me. 

401
00:20:39,300 --> 00:20:42,000
Not be mature companies, but at 
least a lot of smaller companies

402
00:20:42,000 --> 00:20:44,000
go through this Evolution. 
So so what do you do? 

403
00:20:44,000 --> 00:20:45,400
I mean, like, how do you 
engineer this? 

404
00:20:45,400 --> 00:20:47,800
Do you can't keep the 
engineering every few years and 

405
00:20:47,800 --> 00:20:49,000
it's difficult to predict also, 
right? 

406
00:20:49,000 --> 00:20:51,400
So what do you want? 
How do companies deal with this?

407
00:20:53,000 --> 00:20:58,000
So, actually, I wouldn't, I 
would say it's actually par for 

408
00:20:58,000 --> 00:21:01,400
the course to re-engineer on. 
Okay, that happens all the time.

409
00:21:01,600 --> 00:21:05,400
I mean, like I said, most 
companies evolve organically and

410
00:21:05,400 --> 00:21:08,800
while that happens, the needs do
change and needs to change. 

411
00:21:08,800 --> 00:21:10,500
The software does need to 
change, right? 

412
00:21:10,500 --> 00:21:14,800
I think that's some, in my view 
inevitable, but I think we 

413
00:21:15,000 --> 00:21:17,900
design principles. 
They hold the test of time, 

414
00:21:18,000 --> 00:21:19,300
right? 
They hold true. 

415
00:21:19,600 --> 00:21:24,600
So you have to think about, and 
it takes a really present leader

416
00:21:24,600 --> 00:21:27,200
to think about it from day. 
One that let me actually collect

417
00:21:27,200 --> 00:21:29,000
everything and I'll figure it 
out later. 

418
00:21:29,000 --> 00:21:30,300
Right? 
So, you take that kind of an 

419
00:21:30,308 --> 00:21:34,800
approach then be LCD, which is 
the big data storage layer that 

420
00:21:34,800 --> 00:21:37,600
stands the test of time, right? 
Because that's going to be to 

421
00:21:38,000 --> 00:21:41,000
come. 
What may then at various points?

422
00:21:41,000 --> 00:21:44,200
You Branch off obviously? 
And one of the key like 

423
00:21:44,200 --> 00:21:46,400
conference is one of the key 
evolutionary decisions that most

424
00:21:46,400 --> 00:21:48,500
companies go through is. 
I've always collected data on 

425
00:21:48,500 --> 00:21:51,300
that, but certainly now this is 
have evolved. 

426
00:21:51,300 --> 00:21:53,600
So I To get my answers done in 
real time. 

427
00:21:53,700 --> 00:21:54,800
Yep. 
So what do I need to go back and

428
00:21:54,800 --> 00:21:57,400
change and the answer that is 
going to change everything from 

429
00:21:57,400 --> 00:21:59,300
the source application on words,
right here. 

430
00:21:59,300 --> 00:22:03,000
Because we are, real time is not
just a property of your pipeline

431
00:22:03,000 --> 00:22:05,600
running faster. 
It's also whether the source can

432
00:22:05,600 --> 00:22:08,100
actually give you the data on 
the first place in a way that's 

433
00:22:08,100 --> 00:22:09,800
much faster than whatever's 
here. 

434
00:22:09,800 --> 00:22:11,800
Is it meant for? 
So, those are all critical 

435
00:22:11,800 --> 00:22:13,800
decisions will have to go 
through that and that 

436
00:22:13,800 --> 00:22:15,100
regenerating is inevitable 
there. 

437
00:22:15,400 --> 00:22:20,200
But the part around, you know, 
the agility, right? 

438
00:22:20,200 --> 00:22:22,900
That's a very key thing because,
Was what you don't want to do. 

439
00:22:22,900 --> 00:22:27,900
Is get stuck with a technology, 
especially if its propriety, or 

440
00:22:27,900 --> 00:22:33,200
it has two black boxes, right? 
That prevent you from doing 

441
00:22:33,200 --> 00:22:37,100
additional flexible things here,
and more importantly that has a 

442
00:22:37,100 --> 00:22:40,200
cost of compliance, right? 
That there is a cost of 

443
00:22:40,700 --> 00:22:44,500
technical debt that you incur is
because you're not taught 

444
00:22:44,500 --> 00:22:47,600
through the consumption pattern 
and you live with that cost 

445
00:22:47,600 --> 00:22:50,300
until the point that you're 
ready to throw away or until the

446
00:22:50,300 --> 00:22:51,600
point. 
You're ready to bring in. 

447
00:22:51,700 --> 00:22:53,600
Brand-new technology of brand 
new use case. 

448
00:22:53,900 --> 00:23:00,100
So it kind of becomes a case of 
risk aversion and and cost 

449
00:23:00,100 --> 00:23:05,000
optimization than a case of, you
know, can I choose the right 

450
00:23:05,000 --> 00:23:06,100
technology right now? 
Right. 

451
00:23:06,100 --> 00:23:08,800
So you have to think about that 
angle is well, right? 

452
00:23:08,800 --> 00:23:10,300
Then you can you decide 
something right now. 

453
00:23:11,500 --> 00:23:16,200
Yeah, and so you're saying that 
re-engineering is fairly common.

454
00:23:16,200 --> 00:23:20,100
Okay. 
Okay, because I mean, what I 

455
00:23:20,100 --> 00:23:24,300
find, is that like sometimes you
Certainly said, the some new guy

456
00:23:24,300 --> 00:23:27,100
will come in and he wants to 
look at the data in a Cell, very

457
00:23:27,100 --> 00:23:30,300
different way, for example, and 
that will be very different from

458
00:23:30,300 --> 00:23:34,000
what has ever been done in the 
company and the current 

459
00:23:34,100 --> 00:23:38,000
structures and so on Earth, just
completely unsuited for that or 

460
00:23:38,500 --> 00:23:41,300
impose high cost. 
So they're even case this new 

461
00:23:41,300 --> 00:23:44,400
way is going to be sustainable 
and so on, you recommend a sort 

462
00:23:44,400 --> 00:23:47,600
of a complete re-engineering 
retooling, kind of a thing 

463
00:23:47,600 --> 00:23:48,800
happened. 
Kathy. 

464
00:23:48,800 --> 00:23:51,500
I think that's a very important 
question to consider. 

465
00:23:51,700 --> 00:23:53,900
From multiple angles, right? 
I think, the first angle is 

466
00:23:53,900 --> 00:23:54,800
technology. 
Naturally. 

467
00:23:55,000 --> 00:23:59,000
There's no doubt about that. 
But the second angle is also the

468
00:23:59,000 --> 00:24:01,400
sustainability of that kind of 
an approach, right? 

469
00:24:01,800 --> 00:24:04,500
So what is the question leaves? 
And there's a very common 

470
00:24:04,500 --> 00:24:07,500
question that lines us being 
most most Enterprises go 

471
00:24:07,500 --> 00:24:09,700
through, right? 
So then are you stuck with 

472
00:24:09,700 --> 00:24:11,000
technology? 
Which only one person 

473
00:24:11,000 --> 00:24:12,500
understands? 
Yeah, right. 

474
00:24:13,200 --> 00:24:16,800
And that's a, that's a pretty 
common problem in the world of 

475
00:24:17,100 --> 00:24:18,500
data processing and software, 
right? 

476
00:24:20,000 --> 00:24:23,700
I think it used to be the case 
that Ours was extremely popular,

477
00:24:23,700 --> 00:24:25,500
right? 
And and, and still dislike for 

478
00:24:25,500 --> 00:24:28,900
another analysis and statistics 
in use it, but then now I know 

479
00:24:28,900 --> 00:24:33,100
exactly like many people do. 
And then you now have these 

480
00:24:33,700 --> 00:24:36,700
graduates who are just weird-- 
by python in, right? 

481
00:24:36,700 --> 00:24:39,800
For everything. 
And then they tend to think that

482
00:24:40,200 --> 00:24:43,300
like statistics for instance are
all possible just within python 

483
00:24:43,300 --> 00:24:44,800
in the world of python itself. 
Yeah. 

484
00:24:44,800 --> 00:24:48,300
Now, if you're a company who 
which is built up an entire data

485
00:24:48,300 --> 00:24:51,100
science or analytics team that's
just based on our and for 

486
00:24:51,100 --> 00:24:53,500
whatever. 
Moved on to their next greener 

487
00:24:53,500 --> 00:24:54,600
pastures, right? 
Yep. 

488
00:24:54,600 --> 00:24:56,900
And then fire, the all these 
pythons graduates in what you 

489
00:24:56,900 --> 00:24:57,900
do? 
With all that our investment 

490
00:24:57,900 --> 00:24:59,600
that you have already made. 
Yep. 

491
00:25:00,500 --> 00:25:03,600
You have to think about those 
pipelines all over again, 

492
00:25:03,700 --> 00:25:07,400
especially with a cloud coming 
in the pace at which the newer 

493
00:25:07,400 --> 00:25:10,500
releases of software, like, 
soften as a general term. 

494
00:25:10,600 --> 00:25:12,500
Yep are just data. 
They're not just data science to

495
00:25:12,500 --> 00:25:14,900
get something done. 
These releases are just 

496
00:25:14,900 --> 00:25:16,600
happening. 
Almost on a monthly basis, 

497
00:25:16,600 --> 00:25:17,300
right? 
Yep. 

498
00:25:17,300 --> 00:25:20,800
So and each one has a like a, 
like a very good feature and 

499
00:25:20,800 --> 00:25:23,200
these are being tested. 
Third at massive scale 

500
00:25:23,500 --> 00:25:25,100
Enterprises much bigger than 
yours. 

501
00:25:25,200 --> 00:25:27,600
So it becomes almost like a 
no-brainer to adopt the fact 

502
00:25:27,600 --> 00:25:28,900
that, you know, like Netflix is 
doing it. 

503
00:25:28,900 --> 00:25:29,900
Let me do it right. 
Yep. 

504
00:25:30,600 --> 00:25:32,600
And then the moment you do that,
then you start to realize that 

505
00:25:32,700 --> 00:25:34,400
some of these pipelines that 
have wilted obsolete. 

506
00:25:34,600 --> 00:25:37,200
You have to re-engineer them. 
So, engineering is actually 

507
00:25:37,200 --> 00:25:38,800
earned the Mantra in the cloud 
world. 

508
00:25:38,800 --> 00:25:43,000
I would say, I just want to see 
on some of the, some of the 

509
00:25:43,000 --> 00:25:45,300
technical terms, which have 
around this, which I've never 

510
00:25:45,300 --> 00:25:49,300
really understood people talk 
about, I mean, the common word, 

511
00:25:49,300 --> 00:25:52,600
select data warehouse. 
Then there's Literally, at 

512
00:25:52,600 --> 00:25:55,400
people say, Aroma some guy 
telling me a few years back, 

513
00:25:55,600 --> 00:25:57,800
data warehouses are now 
obsolete, everything. 

514
00:25:57,800 --> 00:26:00,700
See the data leak, but the way I
understand it, like both of 

515
00:26:00,700 --> 00:26:03,200
them, sort of like, coexisting 
things are so. 

516
00:26:03,300 --> 00:26:04,600
So what is the difference 
between them? 

517
00:26:04,600 --> 00:26:09,500
And how do you, what is the? 
I mean in a Layman's language? 

518
00:26:09,500 --> 00:26:12,900
How's the, what is the 
difference between in terms of 

519
00:26:12,900 --> 00:26:15,900
how the data sets depending on 
how you choose the architecture?

520
00:26:15,900 --> 00:26:20,500
I would say that. 
The data lake is probably the 

521
00:26:22,100 --> 00:26:25,900
least common denominator right 
wing, but he talking picture. 

522
00:26:25,900 --> 00:26:29,200
Yeah. 
I said, as the name suggests. 

523
00:26:29,700 --> 00:26:34,000
It's the, it's the point is the 
place where data naturally 

524
00:26:34,000 --> 00:26:37,500
gravitates towards, right? 
So here is the analogy is of a 

525
00:26:37,500 --> 00:26:41,100
water body and you have all 
these Rivers like coming in, 

526
00:26:41,300 --> 00:26:42,900
rather all these streams and 
abilities. 

527
00:26:42,900 --> 00:26:48,400
And even they would gravitate 
towards the point where That is 

528
00:26:48,400 --> 00:26:51,700
this equilibrium where the water
can accumulate here and that 

529
00:26:51,700 --> 00:26:53,900
becomes the central place from 
where, what are the further 

530
00:26:54,500 --> 00:26:57,100
disseminates into multiple 
multiple streams for the 

531
00:26:57,100 --> 00:26:58,700
downstream and things that. 
Right. 

532
00:26:58,800 --> 00:27:01,900
So that's the that's the analogy
which is quite valid for the 

533
00:27:01,900 --> 00:27:06,900
data Lake as well. 
Now, obviously the data Lake by 

534
00:27:06,900 --> 00:27:10,700
its very nature because data is 
gravitating towards it. 

535
00:27:10,700 --> 00:27:13,200
You don't really control 
typically. 

536
00:27:13,400 --> 00:27:15,400
What is the quality of the data?
That's gravitating it. 

537
00:27:16,000 --> 00:27:17,700
So it's not something that you 
say. 

538
00:27:18,000 --> 00:27:20,100
I'm going to only allow secure 
your data, right? 

539
00:27:20,600 --> 00:27:22,300
Doesn't make sense because we 
have the lake. 

540
00:27:22,700 --> 00:27:27,000
The concept of Purity is a, is a
is a Latter-Day construct, 

541
00:27:27,000 --> 00:27:28,500
right? 
You don't put it up, put it up 

542
00:27:28,500 --> 00:27:31,000
right up with the the beginning 
when you're trying to complete 

543
00:27:31,000 --> 00:27:33,300
our concept of Purity can change
over time. 

544
00:27:33,300 --> 00:27:35,100
So what is pure now is not clear
tomorrow. 

545
00:27:35,100 --> 00:27:39,700
So yeah, and in fact there is 
Merit in storing your data as 

546
00:27:39,700 --> 00:27:41,500
well. 
Of course, you have the other 

547
00:27:41,500 --> 00:27:44,200
angle of it right here. 
You need to know what improving 

548
00:27:44,200 --> 00:27:47,900
things. 
So now that's the data you then.

549
00:27:48,400 --> 00:27:52,000
Not to W by itself because it 
cannot guarantee things like 

550
00:27:52,000 --> 00:27:54,900
Purity and quality. 
You need to have mechanisms of 

551
00:27:54,900 --> 00:27:57,600
doing that. 
The moment you have, you put 

552
00:27:57,600 --> 00:28:00,600
those mechanisms of guaranteeing
Purity and quality you start to 

553
00:28:00,600 --> 00:28:03,700
construct more curated data 
sets. 

554
00:28:03,800 --> 00:28:06,300
Yep. 
Yep, and those security datasets

555
00:28:06,300 --> 00:28:08,500
serves. 
Let's say a more specialized 

556
00:28:08,500 --> 00:28:10,700
needs like one of those use 
cases that we talked about. 

557
00:28:10,700 --> 00:28:13,900
We are another point. 
You start to think about how 

558
00:28:13,900 --> 00:28:17,900
specialized should this be? 
And what is the property, right?

559
00:28:18,800 --> 00:28:22,500
Property of curation, that that 
is being critically relied upon 

560
00:28:22,600 --> 00:28:24,900
by that the business consumer 
and so on. 

561
00:28:24,900 --> 00:28:27,700
So for instance, if you're 
basing businesses business 

562
00:28:27,700 --> 00:28:29,700
decisions, right? 
On top of it, you have to 

563
00:28:29,700 --> 00:28:32,100
absolutely give the guarantee 
that the data is correct. 

564
00:28:32,600 --> 00:28:35,300
If the data is wrong, then you 
making wrong decisions in 

565
00:28:35,300 --> 00:28:37,400
potentially affect people's 
lives in some form, right? 

566
00:28:37,500 --> 00:28:42,200
We have so that process of 
creation and the creation of 

567
00:28:42,200 --> 00:28:46,300
that curated data set is a is 
what typically, you know, 

568
00:28:46,300 --> 00:28:48,000
classical world would happen in 
our data. 

569
00:28:48,200 --> 00:28:51,800
Our house, okay. 
Okay data warehouse becomes this

570
00:28:52,300 --> 00:28:56,900
Enterprise void single curated 
zone of Truth. 

571
00:28:57,300 --> 00:28:58,300
Okay. 
Yep, right. 

572
00:28:58,800 --> 00:29:00,700
That's the classical definition 
of a data warehouse. 

573
00:29:01,600 --> 00:29:06,500
The reason it's enterprise-wide 
is because it is meant to serve 

574
00:29:06,700 --> 00:29:11,300
all business functions yet. 
The 8hr supply chain, like, 

575
00:29:11,600 --> 00:29:14,800
payroll sentence, Regulatory 
Compliance. 

576
00:29:14,900 --> 00:29:18,000
And and of course, the Top Line 
function sales, this Allison. 

577
00:29:18,400 --> 00:29:22,100
Things that I'm so because it 
was enterprise-wide, it, had 

578
00:29:22,100 --> 00:29:28,800
this very Grand appeal for the 
classical ideal 06, cios, and so

579
00:29:28,800 --> 00:29:30,800
on. 
Who would think of it as a job, 

580
00:29:30,900 --> 00:29:34,300
like a multi-month multi, multi 
your kind of a project to make 

581
00:29:34,300 --> 00:29:36,500
sure that let me just build it 
out once. 

582
00:29:36,800 --> 00:29:39,200
And if I figured out once and 
from that point on, I can, you 

583
00:29:39,200 --> 00:29:41,900
know, wash my hands off and then
incrementally, just keep 

584
00:29:41,900 --> 00:29:44,300
building things. 
And it will always like, you 

585
00:29:44,308 --> 00:29:47,300
know, survive the test of time 
and you're giving me the right 

586
00:29:47,300 --> 00:29:50,900
information. 
Now while that work wonders for 

587
00:29:50,900 --> 00:29:52,700
some of the large mature 
companies, which have a very 

588
00:29:52,700 --> 00:29:56,800
good process of curation. 
What ended up happening was the 

589
00:29:57,000 --> 00:29:59,500
smaller companies became 
smaller. 

590
00:29:59,500 --> 00:30:02,600
I would say, like nimbler 
companies companies born in the 

591
00:30:02,600 --> 00:30:05,600
digital Iran. 
They didn't need the Enterprise 

592
00:30:05,600 --> 00:30:08,300
wide view, right? 
Because here for them, heads up 

593
00:30:08,300 --> 00:30:10,100
with a completely different 
function from the Top Line 

594
00:30:10,100 --> 00:30:12,300
function. 
So it became more important for 

595
00:30:12,300 --> 00:30:14,700
them to build this for the Top 
Line function, like the business

596
00:30:14,700 --> 00:30:17,100
in the marketing function. 
And that was a lot more privacy 

597
00:30:17,100 --> 00:30:18,600
and fight. 
Because if we have We happen, 

598
00:30:18,600 --> 00:30:21,000
not move. 
But as he had often could wait, 

599
00:30:21,000 --> 00:30:22,300
right? 
They could still operate out of 

600
00:30:22,300 --> 00:30:22,800
excel. 
Sheets. 

601
00:30:22,800 --> 00:30:25,300
Not a problem. 
Yeah, so then you start to think

602
00:30:25,300 --> 00:30:28,300
about can I get great quraysh 
directed zones, like this, 

603
00:30:28,600 --> 00:30:32,400
specific to a business specific,
to a function and that's where 

604
00:30:32,400 --> 00:30:34,400
this concept of a data. 
Mart, came into play. 

605
00:30:34,500 --> 00:30:36,600
Right? 
Okay, um, in the older days 

606
00:30:36,600 --> 00:30:39,700
data, Marts were always 
existing, but in the newer age, 

607
00:30:39,900 --> 00:30:44,100
it became much easier to create 
data Marts because the needs 

608
00:30:44,100 --> 00:30:48,300
were all completely disparate. 
Like it wasn't as though the Is 

609
00:30:48,300 --> 00:30:50,400
one person who's responsible for
all those needs to be served 

610
00:30:50,400 --> 00:30:52,700
together. 
Each function, started to do its

611
00:30:52,700 --> 00:30:55,100
own thing because we had to 
build their own agility and 

612
00:30:55,100 --> 00:30:57,600
build in own speed of decision 
making and things that aren't so

613
00:30:57,700 --> 00:31:01,000
he demands became again a very 
popular activity right now. 

614
00:31:01,500 --> 00:31:05,500
So data lake is where all the 
double scope like and they're 

615
00:31:05,500 --> 00:31:08,600
from there you put in some 
quality checks. 

616
00:31:08,600 --> 00:31:11,900
You kind of, make sure that 
like, there's nothing data is of

617
00:31:11,900 --> 00:31:14,000
the reasonable standard and 
stuff, then it goes into Data 

618
00:31:14,000 --> 00:31:15,700
Warehouse, which and data 
warehouse. 

619
00:31:15,700 --> 00:31:20,300
I am I from my memory predates. 
Big data and so on the correct 

620
00:31:20,300 --> 00:31:22,300
me if I'm wrong. 
Clean up themselves, very old 

621
00:31:22,300 --> 00:31:23,500
concept, I guess. 
So. 

622
00:31:24,300 --> 00:31:26,800
And data warehouse used to be 
like in traditional. 

623
00:31:26,800 --> 00:31:30,400
It's Enterprise while white. 
But so now data, Marts are like 

624
00:31:30,400 --> 00:31:32,900
they serve particular 
businesses, where does datum, 

625
00:31:32,900 --> 00:31:34,600
so, how is this? 
How does this connection happen?

626
00:31:34,600 --> 00:31:37,300
Is it data Lake State to data 
Mart, or do, does it flow 

627
00:31:37,300 --> 00:31:40,600
through the data warehouse and 
also like when the data values? 

628
00:31:40,600 --> 00:31:44,000
Obviously like, I mean, I think 
the assumption that you build 

629
00:31:44,000 --> 00:31:47,400
it, once it serves you forever. 
It's pretty much doesn't work in

630
00:31:47,408 --> 00:31:50,800
most places. 
In a few places it right away in

631
00:31:50,800 --> 00:31:52,100
most places. 
It doesn't work. 

632
00:31:52,300 --> 00:31:54,800
So, does that mean you keep 
updating the data warehouse, or 

633
00:31:55,000 --> 00:31:56,700
after point? 
You just give up on it and just 

634
00:31:56,800 --> 00:31:59,700
let it be where it is. 
Send, like construct new data, 

635
00:31:59,700 --> 00:32:01,600
Marts in things like that. 
What do you what is, what is the

636
00:32:02,200 --> 00:32:05,000
way to go about it? 
I mean, I guess it gets a bit 

637
00:32:05,000 --> 00:32:09,000
controversial right to see. 
My view is I don't think I don't

638
00:32:09,000 --> 00:32:12,100
think of data warehouse is make 
sense anymore. 

639
00:32:12,300 --> 00:32:14,400
Okay. 
Okay, except for very 

640
00:32:14,400 --> 00:32:15,800
specialized circumstances, 
right? 

641
00:32:15,800 --> 00:32:18,000
Where it's a very heavily 
heavily regulated industry. 

642
00:32:18,200 --> 00:32:22,000
The extending matured. 
And therefore there is a bit. 

643
00:32:22,100 --> 00:32:26,800
There is more meditation 
centralizing the view of the 

644
00:32:26,800 --> 00:32:30,800
data and then decentralizing. 
Yeah, but if you look at now, 

645
00:32:31,000 --> 00:32:36,600
the modern modern data-driven 
Enterprise, the Mantra is that 

646
00:32:36,600 --> 00:32:39,200
each function is is onto its 
own, right? 

647
00:32:39,600 --> 00:32:42,900
They make their own decisions. 
They make their own schedules 

648
00:32:42,900 --> 00:32:46,900
and Agility work. 
Yep, and for that to happen, the

649
00:32:46,900 --> 00:32:50,200
dissemination of the Has lot 
more important than you know, 

650
00:32:50,400 --> 00:32:55,700
introducing Layton sees and 
blockers in order to centralized

651
00:32:55,700 --> 00:32:58,200
data, right? 
So yeah definitely vision is 

652
00:32:58,200 --> 00:32:59,100
already happened in the data 
lake. 

653
00:32:59,100 --> 00:33:01,200
So why create? 
And yet another centralized own 

654
00:33:01,500 --> 00:33:05,800
better you just disseminate as 
soon as possible and let the 

655
00:33:05,800 --> 00:33:09,800
functions themselves build their
own views of what they want to 

656
00:33:09,800 --> 00:33:12,800
do, right? 
Yeah, but they live by it and 

657
00:33:12,800 --> 00:33:15,200
the debates that means you all 
of them to happen. 

658
00:33:15,700 --> 00:33:17,800
You all of that to happen. 
So that becomes lot more. 

659
00:33:18,100 --> 00:33:21,800
Hopefully, as a mammogram so 
directly answering your 

660
00:33:21,800 --> 00:33:23,700
question. 
I don't think the warehouse make

661
00:33:23,700 --> 00:33:26,900
sense anymore, the classical way
of looking at it. 

662
00:33:28,300 --> 00:33:30,400
It makes a lot more sense to 
just take the data from the 

663
00:33:30,408 --> 00:33:32,000
data. 
They can directly create these 

664
00:33:32,500 --> 00:33:36,400
purpose-built data Marts as soon
as possible, but isn't the 

665
00:33:36,400 --> 00:33:38,900
downside of that they could 
create silos. 

666
00:33:38,900 --> 00:33:42,300
So for example, like for a 
company, I consulted for like 

667
00:33:42,300 --> 00:33:45,300
some 10 years back. 
There's someone there were two 

668
00:33:45,300 --> 00:33:47,500
days data sets which were being 
one, was maintained by the 

669
00:33:47,508 --> 00:33:50,200
finance. 
My one by the HRT is an external

670
00:33:50,200 --> 00:33:53,300
person got access to both and 
emerge on that through some 

671
00:33:53,300 --> 00:33:56,700
insights, which they couldn't 
validate because the finance 

672
00:33:56,700 --> 00:33:59,200
team couldn't see the HR teams 
data and vice versa. 

673
00:33:59,600 --> 00:34:03,200
So does a data warehouse, I 
guess is a is one place where 

674
00:34:03,200 --> 00:34:06,100
everything is there. 
But data my tag is like, how do 

675
00:34:06,100 --> 00:34:09,199
you solve this problem of 
silo-based thinking within? 

676
00:34:09,500 --> 00:34:13,199
Yeah, and I think there is two 
or three simple things you can 

677
00:34:13,199 --> 00:34:15,500
do, right? 
One, is that they could be. 

678
00:34:15,500 --> 00:34:19,100
Actually, for instance, are 
like, like a Parent determined 

679
00:34:19,100 --> 00:34:22,699
so to speak, right? 
And wherever you see that there 

680
00:34:22,699 --> 00:34:25,600
are multiple functions requiring
access to the same view of the 

681
00:34:25,600 --> 00:34:27,600
business. 
It makes sense to just create 

682
00:34:27,600 --> 00:34:30,199
one business data Mart, which 
serves that view. 

683
00:34:30,199 --> 00:34:33,199
And then each of these different
functions without takes its own 

684
00:34:33,199 --> 00:34:35,400
cut. 
Somebody and that makes sense. 

685
00:34:35,400 --> 00:34:40,199
Obviously, right then the other 
angle here is the data warehouse

686
00:34:40,800 --> 00:34:44,000
could be still very relevant 
technology, that kind of a setup

687
00:34:44,000 --> 00:34:47,500
when there are many functions 
which want the same view, but 

688
00:34:47,699 --> 00:34:50,699
with the crew, You think the 
crucial difference from the 

689
00:34:50,707 --> 00:34:52,100
classical way of looking at it 
beam? 

690
00:34:52,400 --> 00:34:54,699
You don't need to boil the ocean
to build that enterprise-wide 

691
00:34:54,699 --> 00:34:57,100
centralization, right? 
Yeah, you could just start with 

692
00:34:57,100 --> 00:35:00,100
something which is very simple, 
which is just given by these 

693
00:35:00,100 --> 00:35:02,600
three functional requirements of
these three functions, want you 

694
00:35:02,600 --> 00:35:07,000
to look at the same data and as 
the functions, get on-boarded, 

695
00:35:07,000 --> 00:35:09,800
right as more and more people 
want the data, then you start to

696
00:35:09,800 --> 00:35:12,300
see that there are patterns and 
then you start to like we've it 

697
00:35:12,300 --> 00:35:17,100
back into this form, a layer, 
like so logically, you could 

698
00:35:17,100 --> 00:35:19,000
eventually still end up. 
Building on like an 

699
00:35:19,008 --> 00:35:22,200
enterprise-wide data warehouse, 
but I guess the, the guts of 

700
00:35:22,200 --> 00:35:23,600
what I'm trying to say is you 
don't need to start by building 

701
00:35:23,600 --> 00:35:24,900
it, right? 
Yeah. 

702
00:35:24,900 --> 00:35:26,400
Okay. 
Yeah, so, that, that completely 

703
00:35:26,400 --> 00:35:29,800
makes is I mean, it makes sense 
to start to the couple of basic 

704
00:35:29,800 --> 00:35:31,700
use cases and then 
incrementally, build it out 

705
00:35:31,700 --> 00:35:33,800
rather than they spend a year 
just building a data. 

706
00:35:33,800 --> 00:35:37,100
By which today, I don't think 
any company has the time for 

707
00:35:37,100 --> 00:35:40,400
that in things like that. 
When you have to keep creating 

708
00:35:40,400 --> 00:35:42,500
data, might send you to create 
maintaining data might say. 

709
00:35:42,700 --> 00:35:45,200
So how do you organizationally? 
How do you see this? 

710
00:35:45,200 --> 00:35:48,000
Like we will do, see there's 
going to be a sort of a central 

711
00:35:48,200 --> 00:35:52,300
Data engineering team, which 
should do this or how, how 

712
00:35:52,300 --> 00:35:55,800
should this be sort of like 
structure? 

713
00:35:55,800 --> 00:36:02,800
Yeah, I think being the key is 
again, what is being guaranteed 

714
00:36:03,700 --> 00:36:06,800
to whom right getting? 
That's the, that's the, that's 

715
00:36:06,800 --> 00:36:09,700
the Crux of the requirements, 
right? 

716
00:36:09,800 --> 00:36:12,600
If you want to, just boil it 
while all that down to one 

717
00:36:12,600 --> 00:36:14,700
question. 
Yeah, which is that, if you're 

718
00:36:14,700 --> 00:36:20,900
guaranteeing, that I as a As a 
as a creator of a data Lake, I'm

719
00:36:20,900 --> 00:36:24,500
giving you the access to any and
all data produced in the 

720
00:36:24,500 --> 00:36:26,700
company. 
Like that's a very strong 

721
00:36:26,700 --> 00:36:28,300
guarantee. 
And if you're able to give that 

722
00:36:28,300 --> 00:36:33,000
guarantee, then that completely 
decouples, anybody who wants any

723
00:36:33,000 --> 00:36:36,000
data from having to go talk to 
the source system directory. 

724
00:36:36,000 --> 00:36:38,500
Because the last thing you want 
is every function, talking to 

725
00:36:38,508 --> 00:36:40,100
every source and saying, give me
a copy. 

726
00:36:40,100 --> 00:36:42,900
Also give you a copy also fight 
because it becomes like a can 

727
00:36:42,900 --> 00:36:45,600
cross him, if that order n 
square is problems. 

728
00:36:45,600 --> 00:36:48,000
So and it's extremely bad, 
right? 

729
00:36:48,200 --> 00:36:51,200
Again of a principal and 
whatever we do in technology, if

730
00:36:51,200 --> 00:36:53,700
you say it's order, n Square. 
Imagine if now, people are 

731
00:36:53,700 --> 00:36:54,800
talking to each other like that,
right? 

732
00:36:54,800 --> 00:36:57,000
I guess the night man. 
Yeah. 

733
00:36:57,000 --> 00:37:01,200
So from that standpoint, the 
creator of the data lake or the 

734
00:37:01,200 --> 00:37:03,400
owner of the data lake is 
actually playing an extremely 

735
00:37:03,400 --> 00:37:05,400
crucial goal, because he goes a 
guarantee. 

736
00:37:05,900 --> 00:37:07,900
Now, if you need to talk to the 
stores anymore, right? 

737
00:37:07,900 --> 00:37:09,700
Yeah, I'm talk to me. 
I'll give you whatever data you 

738
00:37:09,700 --> 00:37:10,900
want. 
I got of a frequency, whatever. 

739
00:37:10,900 --> 00:37:13,000
Velocity X attacks occur, right?
Yeah. 

740
00:37:13,000 --> 00:37:14,800
So that's one. 
Getting me then. 

741
00:37:14,800 --> 00:37:17,400
The next guarantee is that, if 
you're in the moment, you're 

742
00:37:17,408 --> 00:37:19,500
putting a cure. 
Raishin, what you're doing is 

743
00:37:19,500 --> 00:37:22,600
actually also interpreting the 
data in a certain way and 

744
00:37:23,000 --> 00:37:25,700
therefore it's not going to be 
applicable for all functions 

745
00:37:25,700 --> 00:37:27,500
because some of the functions 
might want to interpret in their

746
00:37:27,500 --> 00:37:28,600
own way, right? 
Yep. 

747
00:37:28,600 --> 00:37:34,000
So the some of the best 
companies I've seen they tend to

748
00:37:34,100 --> 00:37:37,500
take a decentralized approach 
when it comes to consumption. 

749
00:37:38,500 --> 00:37:41,000
Okay, they don't they don't like
the engineering team. 

750
00:37:41,000 --> 00:37:43,300
So they just say look, my job is
to guarantee that it has 

751
00:37:43,300 --> 00:37:45,800
available and I've done that 
from now on. 

752
00:37:45,800 --> 00:37:47,900
If you want to build things on 
top of it, just go, okay. 

753
00:37:48,600 --> 00:37:50,700
I've made these interfaces 
Democratic, right? 

754
00:37:50,700 --> 00:37:53,400
We have anybody can consume and 
whenever you consume, I'm going 

755
00:37:53,400 --> 00:37:55,900
to charge you back, right? 
It's like a chargeback model. 

756
00:37:56,100 --> 00:37:58,100
Yeah, so I'm going to forget and
I'm going to charge it back 

757
00:37:58,100 --> 00:38:00,700
saying, okay, you consume this 
much amount of data, but then 

758
00:38:00,800 --> 00:38:01,800
whatever you want to do on top 
of it. 

759
00:38:01,800 --> 00:38:03,500
That's up to you. 
You spin up your own engineering

760
00:38:03,500 --> 00:38:06,800
team and I think that works 
better because it decouples the 

761
00:38:06,800 --> 00:38:08,600
agility from one of wanting from
another. 

762
00:38:09,100 --> 00:38:10,500
Yep. 
Yep. 

763
00:38:10,700 --> 00:38:13,800
So again, in terms of again, 
there is a thinking in terms of 

764
00:38:13,800 --> 00:38:16,200
organization. 
They like some organizations 

765
00:38:16,200 --> 00:38:18,400
have. 
I mean, I'm coming to my Pacific

766
00:38:18,400 --> 00:38:21,300
need which is like analytics. 
Like we're like some company 

767
00:38:21,600 --> 00:38:24,800
companies, have one centralized,
centralized analytics team, 

768
00:38:24,800 --> 00:38:27,200
which takes care of all the 
analytics needs of the company. 

769
00:38:27,500 --> 00:38:29,600
Other companies sort of like 
mind. 

770
00:38:29,600 --> 00:38:31,500
We have multiple analytics, 
seems there might be some 

771
00:38:31,500 --> 00:38:34,800
centralized teams, but also like
small analytics team will be a 

772
00:38:34,800 --> 00:38:37,000
marketing analytics team. 
There will be an HR and extreme 

773
00:38:37,000 --> 00:38:41,000
and so on. 
So so in that sense in analytic,

774
00:38:41,000 --> 00:38:43,800
I again, it's a good debate in 
analytics, but also in terms of 

775
00:38:43,800 --> 00:38:47,300
engineering like you think the 
data engineering again, get 

776
00:38:47,300 --> 00:38:50,500
split across Teams like this in 
terms of especially creating 

777
00:38:50,500 --> 00:38:52,600
these data much. 
So let's assume that data lake 

778
00:38:52,600 --> 00:38:55,400
is own centrally and guarantees 
you. 

779
00:38:55,400 --> 00:38:57,300
So the 0 n Square problem is 
salt. 

780
00:38:57,500 --> 00:39:00,500
So the data lake is a good 
source of Truth for everyone, 

781
00:39:00,900 --> 00:39:04,700
but for people to build their 
own keep this not everyone is 

782
00:39:04,700 --> 00:39:08,700
adapted writing SQL queries, 
especially AI University. 

783
00:39:08,700 --> 00:39:12,700
So, how do you how do you sort 
of the war? 

784
00:39:12,700 --> 00:39:14,000
What are the trade-offs? 
Their eggs? 

785
00:39:14,000 --> 00:39:16,400
Warm? 
Yeah, yeah. 

786
00:39:16,500 --> 00:39:20,100
Yeah, I think we the moment you 
start to think more than one 

787
00:39:20,100 --> 00:39:22,000
function, right? 
And more than one function, 

788
00:39:22,000 --> 00:39:24,600
wanting the same curated view of
the data. 

789
00:39:26,000 --> 00:39:28,300
Yeah, then I think that's the 
right time to ask this question.

790
00:39:28,500 --> 00:39:30,000
Where should that curation 
happen? 

791
00:39:30,300 --> 00:39:31,800
Okay, where should the team 
decide who's doing? 

792
00:39:31,800 --> 00:39:34,100
The duration? 
Should the team report to that 

793
00:39:34,100 --> 00:39:37,300
one function of the other 
function or should it should be 

794
00:39:37,300 --> 00:39:39,900
a separate team? 
Which has, which has its own 

795
00:39:39,900 --> 00:39:42,000
reporting lines. 
And that's all, that's an 

796
00:39:42,000 --> 00:39:44,100
organizational question. 
I don't think there is any right

797
00:39:44,100 --> 00:39:49,100
answer to it. 
But in my view, the data exists 

798
00:39:49,100 --> 00:39:51,700
to so business, right? 
Yeah, that's the fundamental 

799
00:39:51,700 --> 00:39:55,100
frequency. 
So if that's the case, then the 

800
00:39:55,200 --> 00:39:57,500
team that works for the data, 
should also be very closely 

801
00:39:57,500 --> 00:40:01,800
aligned with business, not the 
writing, of course, like so and 

802
00:40:01,800 --> 00:40:07,100
so by that extension analytics 
reporting data science and data 

803
00:40:07,100 --> 00:40:10,500
engineering should all be 
ideally be aligned to the 

804
00:40:10,500 --> 00:40:13,000
business. 
Yeah, not Beyond centralized 

805
00:40:13,000 --> 00:40:14,800
thing that's reporting into ID. 
That's my view. 

806
00:40:15,500 --> 00:40:18,300
To be more specific, the 
engineering that's required on 

807
00:40:18,300 --> 00:40:21,000
data. 
Ultimately comes in the form of 

808
00:40:21,000 --> 00:40:24,000
requirements from the, from the 
analytics team, which in turn 

809
00:40:24,000 --> 00:40:25,500
gets the requirements from the 
business game. 

810
00:40:25,700 --> 00:40:28,900
Like that's the logical flow of 
requirements. 

811
00:40:28,900 --> 00:40:31,500
So in which case the data 
General theme is actually not 

812
00:40:31,500 --> 00:40:33,300
operating in silos, right? 
It's actually operating very 

813
00:40:33,300 --> 00:40:36,000
closely aligned with what the 
business eventually wants to 

814
00:40:36,000 --> 00:40:39,100
see. 
Now I know is it the 

815
00:40:39,100 --> 00:40:41,400
enterprise-wide business that we
talking about or is it just a 

816
00:40:41,400 --> 00:40:42,700
function that functional 
business? 

817
00:40:43,100 --> 00:40:45,400
That's the call that he loved to
make and accordingly? 

818
00:40:46,300 --> 00:40:48,900
You like to take a poll on 
whether that team that the, in 

819
00:40:48,900 --> 00:40:52,200
fact, set of skill sets to be 
centralized, one doesn't need to

820
00:40:52,200 --> 00:40:54,300
be aligned to the function or 
not for it. 

821
00:40:54,400 --> 00:40:58,000
And I guess the other variable 
here is cost as in dollar cost 

822
00:40:58,000 --> 00:41:00,500
rate because if you have a lot 
of data mites, then you might 

823
00:41:00,500 --> 00:41:03,300
end up like having the same 
data, duplicate it in multiple 

824
00:41:03,300 --> 00:41:07,000
places, which NF can push up 
your costs rather than getting 

825
00:41:07,000 --> 00:41:09,400
everybody to subscribe to the 
same data warehouse. 

826
00:41:09,900 --> 00:41:10,700
Absolutely. 
Absolutely. 

827
00:41:10,700 --> 00:41:14,800
And I think there is a, it's a 
very easy way to deal with it. 

828
00:41:15,900 --> 00:41:18,500
That's a classical concept has 
existed from the way in the days

829
00:41:18,500 --> 00:41:20,200
of their housing, which is of 
course data governance. 

830
00:41:20,700 --> 00:41:23,300
Let so you need to have a 
governance team and the 

831
00:41:23,308 --> 00:41:27,800
governance team plays a critical
role in ensuring that, you know,

832
00:41:28,300 --> 00:41:32,300
there isn't too much abuse going
on of the data right in the form

833
00:41:32,300 --> 00:41:34,500
of even things, like, you know, 
people using it for nefarious 

834
00:41:34,500 --> 00:41:38,000
purposes or purposes beyond what
it's intended like especially 

835
00:41:38,000 --> 00:41:40,400
with things like gdpr and 
privacy laws coming into play. 

836
00:41:40,600 --> 00:41:43,000
But it also plays this very 
simple goal, which is the fact 

837
00:41:43,000 --> 00:41:46,600
that what data sets exist across
the company All right, okay, and

838
00:41:46,600 --> 00:41:49,500
for every way in which people 
process data, is there, another 

839
00:41:49,500 --> 00:41:53,100
copy of it which which is 
resulting in that practice, so 

840
00:41:53,100 --> 00:41:55,500
that interpretation. 
So those are also actually part 

841
00:41:55,500 --> 00:41:57,900
of the remit of the data 
governance team and that data 

842
00:41:57,900 --> 00:42:03,200
governance team should report 
into the you know, what, in the 

843
00:42:03,200 --> 00:42:05,900
classical world would look like 
a risk function, right? 

844
00:42:05,900 --> 00:42:09,100
Yeah, because ultimately you're 
basing business decisions on 

845
00:42:09,100 --> 00:42:10,400
Based on data and the data is 
wrong. 

846
00:42:10,900 --> 00:42:13,500
It amounts to the risk, right? 
So grip of that data governance 

847
00:42:13,500 --> 00:42:15,500
in between as part of the risk 
of mediator. 

848
00:42:15,700 --> 00:42:18,600
Those God or Enterprise risk, 
and then they are responsible 

849
00:42:18,600 --> 00:42:21,600
for ensuring that you know, 
nothing untoward happens. 

850
00:42:21,800 --> 00:42:26,700
Or you know, people are not 
wasting time or resources or or 

851
00:42:26,700 --> 00:42:30,700
misinterpreting data. 
Okay, so, okay, got it. 

852
00:42:31,300 --> 00:42:34,800
Now, let's sort of like, I think
we will take another step back. 

853
00:42:34,800 --> 00:42:38,600
I think a while back you had or 
find mention that we will 

854
00:42:38,600 --> 00:42:41,700
discuss this later that Hadoop 
imposed other kinds of costs. 

855
00:42:41,700 --> 00:42:44,400
So I'm taking a very big leap 
from what we were discussing 

856
00:42:44,400 --> 00:42:46,300
about how to our The governance 
and so on. 

857
00:42:46,300 --> 00:42:50,100
So what is this other set of 
cost that Hadoop imposed and 

858
00:42:50,800 --> 00:42:53,100
Liz's? 
It's perfect. 

859
00:42:54,300 --> 00:42:56,000
So I think he liked would all 
Technologies. 

860
00:42:56,000 --> 00:42:59,500
There are pros and cons and Nadu
had its fair share of cons as 

861
00:42:59,500 --> 00:43:00,000
well. 
Right? 

862
00:43:00,200 --> 00:43:03,300
I think the biggest one was the 
fact that technologically it was

863
00:43:03,300 --> 00:43:05,700
such a complex Beast. 
Okay? 

864
00:43:06,300 --> 00:43:10,800
To manage that, it was not, the 
didn't have the cyclic form 

865
00:43:10,800 --> 00:43:13,800
factor of, you know, let's just 
deploy something from from the 

866
00:43:13,800 --> 00:43:15,200
internet and then we're done, 
right? 

867
00:43:15,600 --> 00:43:19,000
To get right. 
A variety of different things in

868
00:43:19,000 --> 00:43:22,100
sequence right often times. 
And if you don't do it, right, 

869
00:43:22,100 --> 00:43:23,600
then you have to go back and 
start again. 

870
00:43:24,000 --> 00:43:25,800
Now, the good part is because 
it's open source, there is 

871
00:43:25,800 --> 00:43:28,700
enough and more support that was
available, right in the form of 

872
00:43:28,700 --> 00:43:29,800
community. 
Brethren. 

873
00:43:29,800 --> 00:43:31,700
People were trying to do the 
same thing here. 

874
00:43:31,700 --> 00:43:34,900
Then that didn't really calls 
the pain, right? 

875
00:43:34,900 --> 00:43:37,400
The pain was the fact that it 
was just a technologically, 

876
00:43:37,400 --> 00:43:40,300
complex thing to manage. 
That's one thing. 

877
00:43:40,300 --> 00:43:46,500
The second thing is that, The 
problem with the you know open 

878
00:43:46,500 --> 00:43:49,400
to anything can access pattern, 
is that you're not really 

879
00:43:49,400 --> 00:43:51,500
optimized for anything either. 
Yep. 

880
00:43:52,200 --> 00:43:55,900
So what this meant is why it's 
great for Discovery. 

881
00:43:55,900 --> 00:43:57,600
Kind of use cases data science. 
Use cases. 

882
00:43:57,600 --> 00:44:00,500
Things are just need to run 
without requiring to run 

883
00:44:00,500 --> 00:44:03,600
immediately, right? 
The moment you start to impose 

884
00:44:04,100 --> 00:44:06,100
business, criticality. 
Like you have to have the 

885
00:44:06,100 --> 00:44:07,600
answer. 
Now, the answer has to be 

886
00:44:07,600 --> 00:44:09,400
correct all the time that kind 
of thing, right? 

887
00:44:09,700 --> 00:44:13,600
Then you start to run into 
problems of You know that the 

888
00:44:13,600 --> 00:44:15,400
technology does not need to be 
supporting it. 

889
00:44:15,700 --> 00:44:18,500
Yeah, and then you have to layer
your own Technologies on top of 

890
00:44:18,500 --> 00:44:18,900
it. 
Right? 

891
00:44:18,900 --> 00:44:23,000
So what ended up happening was 
an already complicated Beast. 

892
00:44:23,700 --> 00:44:26,900
Now started to have all these 
platter of an umbrella 

893
00:44:26,900 --> 00:44:30,900
technologies that were required 
in order to serve very specific 

894
00:44:30,900 --> 00:44:33,500
access patterns. 
And guess what? 

895
00:44:33,600 --> 00:44:37,200
Because of the, you know, the 
Landscaping so wide. 

896
00:44:37,500 --> 00:44:40,500
At any point in time, you could 
have a failure. 

897
00:44:40,500 --> 00:44:43,500
That one thing was incompatible 
with Another yes, of course. 

898
00:44:43,900 --> 00:44:48,700
Yep, right became a problem of, 
how do you keep this entire mass

899
00:44:48,700 --> 00:44:52,200
of software certified against 
each other as everything evolve 

900
00:44:52,400 --> 00:44:53,800
in parallel, right? 
Yep. 

901
00:44:53,800 --> 00:44:56,900
Hello as a result. 
There was a lot of fragmentation

902
00:44:57,100 --> 00:45:00,100
and that fragmentation resulted 
in enormous, confusion in the 

903
00:45:00,100 --> 00:45:02,800
minds of n. 
And n, The Price is Right what 

904
00:45:02,800 --> 00:45:05,200
to think about down like they 
are fighting to do the same 

905
00:45:05,200 --> 00:45:08,300
thing. 
What do I do like so 

906
00:45:08,400 --> 00:45:10,700
architecture became a problem. 
Like, how do you reason about 

907
00:45:10,700 --> 00:45:14,000
something because Was very easy 
to reason about in the past 

908
00:45:14,000 --> 00:45:15,600
noun, you couldn't do it 
anymore. 

909
00:45:15,600 --> 00:45:18,100
You need to have a separate 
skill set to reason about 

910
00:45:18,100 --> 00:45:20,400
things. 
And I think, one of the last 

911
00:45:20,400 --> 00:45:23,100
times last thing I wanted to 
point out there in the Hadoop 

912
00:45:23,100 --> 00:45:27,900
world that came in much later, 
was that as newer Technologies, 

913
00:45:27,900 --> 00:45:30,900
became lot more popular, 
especially on the cloud because 

914
00:45:30,900 --> 00:45:34,100
of the very fact that term, it's
community supported the moment, 

915
00:45:34,100 --> 00:45:37,400
the community went away. 
Yep, you know, we're stuck with 

916
00:45:37,400 --> 00:45:40,200
something that is not just 
obsolete from a technology 

917
00:45:40,200 --> 00:45:41,900
standpoint. 
It Dooley didn't have. 

918
00:45:42,100 --> 00:45:43,400
Have any kind of support 
anymore? 

919
00:45:43,600 --> 00:45:47,700
Yep, like so what if it's open 
source, we doesn't solve the 

920
00:45:47,700 --> 00:45:48,200
problem. 
Right? 

921
00:45:48,200 --> 00:45:50,900
Just looking P. 
Inside the code, doesn't mean 

922
00:45:50,900 --> 00:45:52,800
your life is any easier? 
Yes. 

923
00:45:52,800 --> 00:45:55,100
Oh, yeah. 
There was a fallacious amount of

924
00:45:55,100 --> 00:45:56,700
thinking. 
They're right open source means 

925
00:45:56,900 --> 00:45:58,700
white box axis, and I don't need
to think. 

926
00:45:58,700 --> 00:46:01,100
I don't need to worry about 
being locked in, but actually 

927
00:46:01,100 --> 00:46:02,700
open source. 
Is also a form of log, n, right.

928
00:46:02,700 --> 00:46:05,800
So then that became a huge 
challenge for many companies and

929
00:46:05,800 --> 00:46:10,500
then in fact, what couple of the
very popular libraries which 

930
00:46:10,500 --> 00:46:13,700
actually started off. 
I do No in a big way, they were 

931
00:46:13,700 --> 00:46:18,600
all just recently announced by 
the Apache Foundation as being 

932
00:46:18,600 --> 00:46:20,000
end of life now. 
Okay? 

933
00:46:20,000 --> 00:46:22,400
Okay, so what happened? 
So so how do pregnant all these 

934
00:46:22,400 --> 00:46:25,400
problems that you mentioned? 
So I'm sure we have figured out 

935
00:46:25,400 --> 00:46:28,100
a solution for that. 
So what what is replaced her? 

936
00:46:28,100 --> 00:46:31,300
How do, how did that transition 
take place in so on? 

937
00:46:32,700 --> 00:46:35,100
Yeah, and I was kind of 
mentioning about this before 

938
00:46:35,100 --> 00:46:39,700
Cloud, the cloud players, right?
Really were clever about it. 

939
00:46:39,700 --> 00:46:42,800
They just took P the best of 
best of Worlds, right? 

940
00:46:43,000 --> 00:46:46,600
So even some Hadoop, they took 
all the pros of flexibility 

941
00:46:46,600 --> 00:46:48,600
agility Open Access patterns and
things that. 

942
00:46:48,900 --> 00:46:53,300
And then they papered over the 
corn side of it. 

943
00:46:53,600 --> 00:46:55,000
See one of the things about 
Cloud, right? 

944
00:46:57,900 --> 00:47:02,100
It's really driving home. 
The value proposition that 

945
00:47:02,500 --> 00:47:04,500
Administration of software 
administration. 

946
00:47:04,500 --> 00:47:07,600
Of technology is no longer 
something that any Enterprise 

947
00:47:07,600 --> 00:47:10,500
needs to spend money on tight. 
That's actually a very key value

948
00:47:10,500 --> 00:47:12,600
proposition of cloud. 
So we have you do not need to 

949
00:47:12,900 --> 00:47:16,800
spend it admin time, trying to 
manage your manager machines, 

950
00:47:16,800 --> 00:47:18,800
right? 
Because it's just manageable 

951
00:47:18,800 --> 00:47:21,600
automatically by its own using 
software using automation Yellow

952
00:47:22,300 --> 00:47:26,900
by extending that logic. 
They really address this first 

953
00:47:26,900 --> 00:47:29,100
pinpoint of Hadoop, which is 
this technological Beast. 

954
00:47:29,400 --> 00:47:31,900
Now, you could argue with the 
cloud. 

955
00:47:32,500 --> 00:47:35,100
Some providers for the cloud is 
also extremely technologically. 

956
00:47:35,100 --> 00:47:38,100
Complex is so many Technologies,
so many ways of putting things 

957
00:47:38,100 --> 00:47:40,900
together, but the good thing is 
called also provides the 

958
00:47:40,900 --> 00:47:44,700
automation on top of it, right? 
So you can just fire up readily 

959
00:47:44,700 --> 00:47:47,300
available, automation scripts, 
and then just spin up this 

960
00:47:47,500 --> 00:47:50,400
massive Technologies and spin it
down as well, keeping control 

961
00:47:50,400 --> 00:47:52,100
and cost, keeping control in the
complex plane. 

962
00:47:52,200 --> 00:47:56,300
So they extended that Paradigm 
for Hadoop and that solve that 

963
00:47:56,300 --> 00:47:59,100
particular problem of, you know,
how do I get this entire thing 

964
00:47:59,100 --> 00:48:01,900
up and running? 
So, so now, when Cloud, right? 

965
00:48:01,900 --> 00:48:03,900
It's a As simple as just 
positioning using a button. 

966
00:48:03,900 --> 00:48:06,500
Click, of course. 
Yeah, you get all that. 

967
00:48:06,500 --> 00:48:09,400
Pick it up - strike. 
So that's a big thing. 

968
00:48:10,100 --> 00:48:14,000
The second thing is that the 
fact that Hadoop at this problem

969
00:48:14,000 --> 00:48:16,600
of not being optimized for any 
specific access pattern there. 

970
00:48:16,700 --> 00:48:21,000
The mean the way Cloud Solve It 
Is by by its fundamental design.

971
00:48:21,000 --> 00:48:22,800
That cloud. 
Fundamental fundamental design 

972
00:48:22,800 --> 00:48:25,600
is hurt. 
Everything is decoupled and I'm 

973
00:48:25,600 --> 00:48:28,800
going to do something that's 
very specialized for me by 

974
00:48:28,800 --> 00:48:31,200
myself. 
And for everything else, I defer

975
00:48:31,200 --> 00:48:32,200
to somebody else who's better 
than me. 

976
00:48:32,300 --> 00:48:34,500
Me, right? 
So it's also called this. 

977
00:48:35,100 --> 00:48:37,200
It's also called as a micro 
services, or Services oriented 

978
00:48:37,200 --> 00:48:38,400
architecture. 
Yep. 

979
00:48:38,400 --> 00:48:39,800
Cloud is fun. 
We build on that. 

980
00:48:39,800 --> 00:48:42,000
Right? 
So everything is a service and 

981
00:48:42,000 --> 00:48:44,600
if I if one service needs to do 
something, which is not its core

982
00:48:44,600 --> 00:48:47,300
capability, it will Outsource it
to some other service together 

983
00:48:47,700 --> 00:48:49,300
delegated to some of the surface
to get it done. 

984
00:48:49,500 --> 00:48:51,900
And then once that service gets 
it done, it brings it back in. 

985
00:48:52,100 --> 00:48:55,500
So, as an example, if I were to 
now construct, a big data 

986
00:48:55,500 --> 00:48:59,000
pipeline in the cloud storage is
a dedicated service. 

987
00:48:59,200 --> 00:49:03,600
Yes, like being able to run. 
Data science model is a 

988
00:49:03,600 --> 00:49:05,600
dedicated service like Sage 
maker on it. 

989
00:49:05,600 --> 00:49:09,000
There's an example right now. 
Once the model is run. 

990
00:49:09,000 --> 00:49:11,900
I need the output to be 
visualized as a, as a, as a 

991
00:49:11,900 --> 00:49:13,800
pretty chart. 
That's a separate service 

992
00:49:13,800 --> 00:49:17,100
altogether, right? 
And the way that all these 

993
00:49:17,100 --> 00:49:19,000
things need to talk to each 
other, the orchestration that's 

994
00:49:19,000 --> 00:49:22,300
a separate service and for all 
the things to work correctly, 

995
00:49:22,300 --> 00:49:24,600
like in terms of like failures 
and things like that. 

996
00:49:24,600 --> 00:49:26,800
And alerting logging such a 
separate service, right? 

997
00:49:26,800 --> 00:49:29,400
Yep. 
So in putting it all together, 

998
00:49:29,400 --> 00:49:31,100
there is automation. 
But automation will say, okay, 

999
00:49:31,100 --> 00:49:32,200
all these things need to talk to
you. 

1000
00:49:32,300 --> 00:49:35,500
Juror, and, and if I need to do 
something very specialized. 

1001
00:49:35,500 --> 00:49:38,300
I just need to spin up service 
and add it to my automation. 

1002
00:49:38,300 --> 00:49:42,400
I'm done. 
Like, so the access to optimized

1003
00:49:42,400 --> 00:49:45,200
ways of looking, at data in the 
cloud is, as simple as just 

1004
00:49:45,200 --> 00:49:47,900
configuration. 
Yep, and I don't need to deal 

1005
00:49:47,900 --> 00:49:50,100
with this plethora of Open 
Source software, which I don't 

1006
00:49:50,100 --> 00:49:52,200
know who's going to support is 
not going to support that 

1007
00:49:52,200 --> 00:49:53,000
ability. 
All of them. 

1008
00:49:53,000 --> 00:49:56,400
Now, taken care of right? 
So Cloud solve for that in a way

1009
00:49:56,400 --> 00:49:59,500
that is fundamental to the way 
the cloud is designed. 

1010
00:50:00,100 --> 00:50:02,200
And by Cloud, I guess you mean 
companies like Amazon. 

1011
00:50:02,300 --> 00:50:05,700
I will make this up which 
provide these big hosted the 

1012
00:50:05,707 --> 00:50:08,100
clouds are. 
So let's say AWS manages this 

1013
00:50:08,100 --> 00:50:10,500
thing about they produce a 
Jamaica the told you they see to

1014
00:50:10,500 --> 00:50:13,000
S3 all those things and they 
provide you the connections 

1015
00:50:13,000 --> 00:50:15,200
between them. 
You can spin up whatever you 

1016
00:50:15,200 --> 00:50:18,100
want at any point in time and 
like they manage the whole 

1017
00:50:18,100 --> 00:50:20,400
thing. 
So so that's so that's the value

1018
00:50:20,400 --> 00:50:22,500
that each. 
Okay people. 

1019
00:50:22,500 --> 00:50:24,800
I think we're now tying back all
our discussion, right? 

1020
00:50:24,800 --> 00:50:27,000
Leg in terms of how hard you 
have you organize the data of 

1021
00:50:27,000 --> 00:50:30,800
you kind of what big data is 
like the what are the pros and 

1022
00:50:30,800 --> 00:50:33,500
cons of Hadoop? 
How the Technology has evolved 

1023
00:50:33,500 --> 00:50:34,500
into one now. 
Suppose. 

1024
00:50:34,500 --> 00:50:37,300
We are like, let's say there's 
this new startup. 

1025
00:50:37,800 --> 00:50:40,700
Okay, which currently doesn't 
have too much data, but like, 

1026
00:50:40,700 --> 00:50:42,900
you know, that you're going to 
be collecting tons of data and 

1027
00:50:42,900 --> 00:50:44,500
things. 
As a, how do you kind of go 

1028
00:50:44,500 --> 00:50:50,100
about architecting your entire 
the data team for the lack of a 

1029
00:50:50,107 --> 00:50:53,400
better phrase in terms of like 
so that it's geared up for 

1030
00:50:53,400 --> 00:50:56,200
growth but also likes of which 
includes things like how how you

1031
00:50:56,200 --> 00:50:58,500
store your data, how we kind of 
organize your databases and all 

1032
00:50:58,508 --> 00:51:00,700
those things. 
Like how would how should a 

1033
00:51:00,700 --> 00:51:03,800
company? 
See that starting of now, look 

1034
00:51:03,800 --> 00:51:09,300
at it. 
Yeah, I think we at the first 

1035
00:51:09,300 --> 00:51:14,000
term choice would be perceived. 
Just just to decide the cloud or

1036
00:51:14,008 --> 00:51:15,000
not, right? 
Yep. 

1037
00:51:15,000 --> 00:51:18,100
I think that's the first 
question to answer in my view. 

1038
00:51:18,200 --> 00:51:22,000
The cloud is a no-brainer. 
I don't think any setup would be

1039
00:51:22,300 --> 00:51:24,900
wise to not consider the cloud. 
Even the innovations that are 

1040
00:51:24,900 --> 00:51:26,700
happening, that, of course. 
Well, let's assume the answer to

1041
00:51:26,707 --> 00:51:27,900
that is yes, right? 
Yes, Lord. 

1042
00:51:27,900 --> 00:51:28,800
Yes. 
Lord is upon me. 

1043
00:51:29,300 --> 00:51:32,800
Now, if cloud is a must the good
part is the Architectural 

1044
00:51:32,800 --> 00:51:37,100
patterns that that are 
applicable for constructing a 

1045
00:51:37,100 --> 00:51:39,900
data estate on the cloud, right?
When you can follow the latest a

1046
00:51:39,900 --> 00:51:42,200
data platform, whatever it is. 
Yeah, but the architectural 

1047
00:51:42,200 --> 00:51:45,300
patterns are or, and in such a 
way, that it doesn't matter, 

1048
00:51:45,300 --> 00:51:48,500
whether you're small or large. 
Okay, I guess we'll start with 

1049
00:51:48,500 --> 00:51:51,200
the same architectural pattern. 
And as you grow as a company, 

1050
00:51:51,800 --> 00:51:55,200
they are the execution of the 
DACA texture, can just 

1051
00:51:55,200 --> 00:51:58,000
seamlessly grow along with you 
like without having to change 

1052
00:51:58,000 --> 00:52:00,900
anything significantly. 
Yeah, and that, that's the 

1053
00:52:00,900 --> 00:52:03,500
biggest benefit of doing. 
The cloud because each of these 

1054
00:52:03,500 --> 00:52:07,300
components are individually, 
elastically scalable, right? 

1055
00:52:08,400 --> 00:52:12,300
So as an example of a new 
company, you're collecting 

1056
00:52:12,400 --> 00:52:15,700
starting to collect data. 
In fact, I would say data, is 

1057
00:52:15,700 --> 00:52:18,100
one of the critical modes right 
for most startups. 

1058
00:52:18,300 --> 00:52:21,600
Yep, because you need to build 
that mode over a period of time.

1059
00:52:21,900 --> 00:52:23,900
And the best way to build up 
mode is actually round the data 

1060
00:52:23,900 --> 00:52:24,900
Lake, right? 
Yes. 

1061
00:52:24,900 --> 00:52:27,700
Okay, out of the Crater Lake, 
just collect everything. 

1062
00:52:27,700 --> 00:52:30,000
It's Dirt Cheap. 
It doesn't matter what you put 

1063
00:52:30,000 --> 00:52:33,100
in, obviously, the more curation
you can, Do the better it is for

1064
00:52:33,100 --> 00:52:35,600
you in the long run and the 
simplest form of creation is 

1065
00:52:35,600 --> 00:52:38,400
that you can do is just track 
where the data is coming from, 

1066
00:52:38,400 --> 00:52:40,600
right? 
They get what is called as you 

1067
00:52:40,600 --> 00:52:43,800
know traceability you want to be
able to trace back like five 

1068
00:52:43,800 --> 00:52:46,800
years from now that this data 
was actually collected by that 

1069
00:52:46,800 --> 00:52:49,300
version of the software that I 
deployed on that particular 

1070
00:52:49,300 --> 00:52:52,000
machine. 
Yep, if you just able to reason 

1071
00:52:52,000 --> 00:52:53,500
about that, that's more than 
enough, right? 

1072
00:52:54,200 --> 00:52:57,000
So the data link is a very good 
foundational pattern after you 

1073
00:52:57,008 --> 00:53:00,900
need that and then don't spin up
Technologies until the end until

1074
00:53:00,900 --> 00:53:04,400
and unless you need them, right?
Yeah, I think you can put Mantra

1075
00:53:04,700 --> 00:53:06,800
and that matter of holds true 
for taking for data as well. 

1076
00:53:06,800 --> 00:53:08,400
Right? 
You don't need to understand, 

1077
00:53:08,400 --> 00:53:11,900
sorry over engineered with a 
warehouse or data Mart and 

1078
00:53:11,900 --> 00:53:14,400
things like that, maybe to start
off with you just have a very 

1079
00:53:14,400 --> 00:53:18,400
simple open source database like
postgres a my SQL, and that will

1080
00:53:18,400 --> 00:53:20,600
get you off the door and keep 
you running for two years and 

1081
00:53:20,600 --> 00:53:22,400
three years without entering 
much cost at all. 

1082
00:53:23,200 --> 00:53:25,500
Yep. 
And then the more you the more 

1083
00:53:25,500 --> 00:53:27,900
you play with it. 
Then you realize that the same 

1084
00:53:28,400 --> 00:53:32,100
database serving the need for 
both your source application as 

1085
00:53:32,200 --> 00:53:34,300
Well as the need for analytics 
is kind of creating a 

1086
00:53:34,308 --> 00:53:36,600
bottleneck. 
So at the point you start to 

1087
00:53:36,600 --> 00:53:40,400
say, okay, maybe I'll spin up 
something else using my data 

1088
00:53:40,400 --> 00:53:43,900
Lake as the source, not my post 
as a service anymore as a perm, 

1089
00:53:44,100 --> 00:53:47,100
like a brand-new, like, on 
Amazon could be like redshift as

1090
00:53:47,100 --> 00:53:47,700
an example. 
Right now. 

1091
00:53:47,800 --> 00:53:51,000
Let's plug in my the. 
I do not top of it. 

1092
00:53:51,000 --> 00:53:52,500
I started to analytics. 
The moment. 

1093
00:53:52,500 --> 00:53:56,000
I do this have decoupled down 
the source application from my 

1094
00:53:56,000 --> 00:53:58,000
analytics. 
And then, as you start to build 

1095
00:53:58,000 --> 00:54:01,500
up on top of it now, see, other 
thing to keep in mind is that 

1096
00:54:03,200 --> 00:54:06,800
Most data driven businesses. 
Ultimately want to take the 

1097
00:54:06,800 --> 00:54:09,400
output of analytics and tied 
back to the source application. 

1098
00:54:09,500 --> 00:54:13,900
If like, as an example, if I 
want to influence a customer 

1099
00:54:13,900 --> 00:54:16,700
who's actually using mice by 
application right now, 

1100
00:54:17,300 --> 00:54:20,000
obviously, I'm collecting all 
the data but then the analytics 

1101
00:54:20,000 --> 00:54:22,100
house as to feedback as a 
recommendation as an example, 

1102
00:54:22,100 --> 00:54:24,600
right way. 
So that ability to close the 

1103
00:54:24,600 --> 00:54:27,500
loop back to the source 
application is a is again a 

1104
00:54:27,508 --> 00:54:30,100
critical capability and that's 
something that you want to do 

1105
00:54:30,100 --> 00:54:33,500
from day one because it Alfred 
you might not actually have any 

1106
00:54:33,500 --> 00:54:35,700
intelligence, it could just be 
rule-based, right? 

1107
00:54:36,200 --> 00:54:38,700
Yes. 
Again as you mature, as you 

1108
00:54:38,707 --> 00:54:40,800
change these data pipelines and 
that becomes a lot more 

1109
00:54:40,800 --> 00:54:44,300
sophisticated, feel the looping 
mechanism allows you the 

1110
00:54:44,300 --> 00:54:46,600
feedback from day one. 
So that you know, that that 

1111
00:54:46,600 --> 00:54:48,500
feedback is also something we 
can keep learning on. 

1112
00:55:09,800 --> 00:55:11,800
Thank you for listening to data 
shatter. 

1113
00:55:12,400 --> 00:55:15,900
If you like this show, please 
leave a comment, share and 

1114
00:55:15,900 --> 00:55:19,200
subscribe to the podcast. 
You can find this podcast on 

1115
00:55:19,200 --> 00:55:23,100
Apple podcasts Spotify or 
wherever else you go to get your

1116
00:55:23,100 --> 00:55:26,400
podcast. 
Once again, the staff exciting 

1117
00:55:26,400 --> 00:55:27,600
one. 
Thank you.

