1
00:00:00,320 --> 00:00:04,400
The Better Business Analysis 
Institute presents the Better 

2
00:00:04,400 --> 00:00:14,360
Business Analysis podcast with 
Hi everybody. 

3
00:00:14,400 --> 00:00:17,280
It's Ben Walsh from the Better 
Business Analysis Institute. 

4
00:00:17,320 --> 00:00:21,640
And today we are going to be 
diving deep into data or data 

5
00:00:21,640 --> 00:00:26,690
analysis. 
The, I guess the era or the 

6
00:00:26,690 --> 00:00:29,450
capability or process we're 
talking about is data 

7
00:00:29,450 --> 00:00:31,570
management. 
So if you're a BA who likes to 

8
00:00:31,570 --> 00:00:36,850
think about things in terms of 
groups of processes, this is the

9
00:00:37,210 --> 00:00:39,570
whole area that talks about data
management. 

10
00:00:41,290 --> 00:00:46,650
Now as we've already noted in 
our Generative AI podcast and 

11
00:00:46,690 --> 00:00:49,130
earlier on when we talked about 
different disciplines that have 

12
00:00:49,130 --> 00:00:54,250
spun off from business analysis.
Data is really important. 

13
00:00:54,290 --> 00:01:01,130
So data science, data analysis 
in the true sense, I guess 

14
00:01:01,210 --> 00:01:04,650
anything to do with machine 
learning and big data are 

15
00:01:04,650 --> 00:01:07,610
probably words that you have 
heard. 

16
00:01:07,970 --> 00:01:10,170
And if you haven't heard of 
those words, you may have heard 

17
00:01:10,170 --> 00:01:13,880
of the tools that use. 
That new kind of data 

18
00:01:13,880 --> 00:01:19,240
architecture which is ChatGPT 
for example, or actually 

19
00:01:19,320 --> 00:01:24,840
anything these days that you 
that surface from the likes of 

20
00:01:24,840 --> 00:01:29,160
Google or Microsoft or Amazon 
are using these technologies. 

21
00:01:29,160 --> 00:01:33,120
So they are not new, they've 
been around for a while, but 

22
00:01:33,120 --> 00:01:37,840
we're now at a point we are even
smaller businesses, government 

23
00:01:37,840 --> 00:01:40,320
departments are moving towards 
this. 

24
00:01:40,980 --> 00:01:43,620
Cloud is usually the word that's
in the front. 

25
00:01:43,620 --> 00:01:46,420
It doesn't necessarily have to 
be cloud, but cloud data 

26
00:01:46,420 --> 00:01:50,860
architecture, which means that 
the way we did things on premise

27
00:01:50,860 --> 00:01:54,900
or you know with database 
servers back in the day, it's 

28
00:01:54,900 --> 00:01:56,580
starting to change for all of 
us. 

29
00:01:56,580 --> 00:02:00,940
So it's important for us as Bas 
to really understand what that 

30
00:02:00,940 --> 00:02:03,540
looks like. 
So we're going to take a bit of 

31
00:02:03,540 --> 00:02:06,900
a holistic view, we're not going
to go. 

32
00:02:07,180 --> 00:02:08,820
Too deep. 
And I'm going to actually call 

33
00:02:08,820 --> 00:02:13,980
out where I think the line is 
here for Bas versus those in the

34
00:02:13,980 --> 00:02:17,820
data specialized roles. 
So you know, you know how how 

35
00:02:17,820 --> 00:02:22,500
much information you do need to 
know and then if it sparks your 

36
00:02:22,500 --> 00:02:25,980
interest to know more than you 
can of course go below into the 

37
00:02:25,980 --> 00:02:29,100
technology side and you know, 
understand what each component 

38
00:02:29,100 --> 00:02:32,340
does. 
So yes, this episode is around 

39
00:02:32,340 --> 00:02:36,900
data management and let's get 
started. 

40
00:02:38,940 --> 00:02:41,500
Okay. 
So I guess there are three ways 

41
00:02:41,500 --> 00:02:43,580
in which we could look at this 
topic. 

42
00:02:43,580 --> 00:02:45,380
We could look at it from the 
bottom up. 

43
00:02:46,580 --> 00:02:50,220
The one way of talking about 
that would be the architecture 

44
00:02:50,660 --> 00:02:53,420
or the data model. 
That's one way we could look at 

45
00:02:53,420 --> 00:02:58,980
it or we could look at it from 
the kind of how it. 

46
00:03:00,330 --> 00:03:04,850
Is exposed to our users in terms
of our processes and functions, 

47
00:03:04,850 --> 00:03:07,170
in terms of what we call 
capabilities. 

48
00:03:07,890 --> 00:03:11,610
Or we could talk about it 
holistically at the top from the

49
00:03:11,610 --> 00:03:13,690
data governance going down to 
the framework. 

50
00:03:14,370 --> 00:03:18,370
Now as Bas, it's always best to 
start from the top down. 

51
00:03:19,050 --> 00:03:23,210
Yes, bottom up or playing chess 
and working out what's behind 

52
00:03:23,210 --> 00:03:26,610
the curtain is always useful 
sometimes, so we can test our 

53
00:03:26,610 --> 00:03:28,750
assumptions. 
But at the end of the day, 

54
00:03:28,910 --> 00:03:32,230
especially in this topic, if you
haven't started from the top 

55
00:03:32,230 --> 00:03:34,750
down, you won't get the results 
that you need. 

56
00:03:35,510 --> 00:03:37,750
This is very, very similar to 
projects. 

57
00:03:37,950 --> 00:03:41,310
When objectives or problems 
aren't defined, you know it 

58
00:03:41,310 --> 00:03:45,190
becomes a mess, or you know the 
projects become less and less 

59
00:03:45,190 --> 00:03:49,750
likely to be delivered if you 
haven't got that up front kind 

60
00:03:49,750 --> 00:03:53,760
of managed up. 
Governance and framework in 

61
00:03:53,760 --> 00:03:57,560
place before you start. 
So that is true of BA work in 

62
00:03:57,560 --> 00:04:02,720
terms of the BA plan, but it's 
also true of the data side. 

63
00:04:03,920 --> 00:04:08,760
So without going into boring 
detail around meetings and data 

64
00:04:08,760 --> 00:04:11,840
governance boards and all the 
rest of it, which is part of 

65
00:04:11,840 --> 00:04:18,160
this, let's talk about the 
fundamental elements of data 

66
00:04:18,160 --> 00:04:23,090
management. 
Data management does incorporate

67
00:04:23,090 --> 00:04:26,170
what I've just said, the data 
governance layer, let's let's 

68
00:04:26,170 --> 00:04:29,850
not go into that box, it's not 
very, I don't know, interesting 

69
00:04:29,850 --> 00:04:31,890
to talk about. 
But there is a you need to have 

70
00:04:31,890 --> 00:04:38,610
a governance structure and 
processes then in the kind of 

71
00:04:40,090 --> 00:04:43,930
functional side or if you like 
the operational side, there are 

72
00:04:43,930 --> 00:04:48,290
two main parts of data 
management. 

73
00:04:51,780 --> 00:04:55,820
And you could say that they are 
defined in a data management 

74
00:04:55,940 --> 00:05:01,540
framework. 
The two parts are the data 

75
00:05:01,540 --> 00:05:05,740
management lifecycle. 
So the lifecycle in which data 

76
00:05:05,740 --> 00:05:09,780
projects go through or a piece 
of data goes through, they're 

77
00:05:09,780 --> 00:05:14,940
one and the same and they are a 
structured way, a lifecycle if 

78
00:05:14,980 --> 00:05:19,810
you like, of. 
Defining the steps in which your

79
00:05:19,810 --> 00:05:23,970
data project should go through 
and you know, stage gate and be 

80
00:05:23,970 --> 00:05:27,490
thought about and also a piece 
of data goes through this 

81
00:05:27,930 --> 00:05:34,530
process and we're going to refer
to some really useful predefined

82
00:05:34,570 --> 00:05:39,810
lifecycle Data management, 
Lifecycle terminology and 

83
00:05:39,810 --> 00:05:43,250
journey and steps referencing 
data dot. 

84
00:05:43,820 --> 00:05:48,940
Gov dot NZ, which is the New 
Zealand government data kind of 

85
00:05:49,140 --> 00:05:52,660
experts. 
It is one way of defining this 

86
00:05:52,660 --> 00:05:55,100
journey. 
You may have come across others.

87
00:05:55,100 --> 00:05:59,420
There's, I think there are a lot
of data bodies of knowledge 

88
00:05:59,420 --> 00:06:02,140
which define what this life 
cycle is. 

89
00:06:02,550 --> 00:06:05,670
I'm going to refer to this one 
because I quite like it and it's

90
00:06:05,670 --> 00:06:08,630
relevant here in New Zealand. 
So if you don't have one and 

91
00:06:08,630 --> 00:06:11,750
especially if you live in New 
Zealand, I would suggest just 

92
00:06:11,790 --> 00:06:14,990
adopting this life cycle. 
So that's that's a big one 

93
00:06:14,990 --> 00:06:17,950
straight away and I'll define 
the steps and what they meant in

94
00:06:17,950 --> 00:06:20,470
a minute. 
So we've got our life cycle. 

95
00:06:20,470 --> 00:06:24,310
So this these are the steps in 
which data go through and your 

96
00:06:24,510 --> 00:06:28,790
projects that have data in them 
should go through then. 

97
00:06:29,610 --> 00:06:31,170
So if you like, that's the 
process. 

98
00:06:31,610 --> 00:06:35,250
Then in order to support you 
doing that and in order to 

99
00:06:35,250 --> 00:06:39,330
support data management, there 
are capabilities. 

100
00:06:39,570 --> 00:06:43,610
There are different skills, 
human capabilities, but 

101
00:06:43,610 --> 00:06:49,610
technical capabilities that need
to be in place in order for you 

102
00:06:49,610 --> 00:06:53,290
to have true data management. 
Now those capabilities would 

103
00:06:53,290 --> 00:06:57,210
have been quite different 20 
years ago or 40 years ago. 

104
00:06:57,610 --> 00:07:01,570
So when we talk about these 
technical capabilities, these 

105
00:07:01,570 --> 00:07:06,570
data management capabilities, 
they are the, I guess, the 

106
00:07:07,650 --> 00:07:09,490
architecture, the data 
architecture. 

107
00:07:10,210 --> 00:07:15,610
That exists in 2023, So if you 
have been involved in, you know,

108
00:07:15,610 --> 00:07:18,690
you've been a DBA or you've 
worked with databases, this 

109
00:07:18,690 --> 00:07:22,730
might be an evolution of what 
you've already known. 

110
00:07:23,130 --> 00:07:27,370
And some of these words will not
be dissimilar to the standard 

111
00:07:27,370 --> 00:07:31,250
pattern for managing data, but 
below the capability. 

112
00:07:31,490 --> 00:07:36,330
When we actually look, you know,
into the engine, the technical 

113
00:07:36,330 --> 00:07:39,770
architecture, those technologies
have definitely changed. 

114
00:07:41,550 --> 00:07:45,550
So we've got our lifecycle and 
we've got our data capabilities 

115
00:07:45,550 --> 00:07:48,630
and that's what we're gonna be 
focusing on today in this 

116
00:07:48,630 --> 00:07:52,110
podcast. 
As I mentioned earlier, above 

117
00:07:52,110 --> 00:07:55,590
all that is data governance. 
So that's the more management 

118
00:07:55,590 --> 00:08:02,590
side of data in terms of 
committees and organization. 

119
00:08:03,080 --> 00:08:06,080
And in terms of just making sure
that you've got that governance 

120
00:08:06,080 --> 00:08:10,560
across the top and then below 
all of this, below the life 

121
00:08:10,560 --> 00:08:14,800
cycle and the capabilities, you 
should just have some data 

122
00:08:14,800 --> 00:08:18,120
fundamental management and that 
is just best practice, things 

123
00:08:18,120 --> 00:08:21,880
like data security 
classifications, just very basic

124
00:08:22,520 --> 00:08:25,500
ways in which people. 
Manage data. 

125
00:08:25,500 --> 00:08:29,620
So they're not specific to the 
capabilities necessarily or the 

126
00:08:29,620 --> 00:08:32,460
steps you go through. 
They go across all of those and 

127
00:08:32,460 --> 00:08:37,100
they're just basic ways in which
we deal with pieces of data, 

128
00:08:37,580 --> 00:08:41,100
Okay. 
So you've got the foundational, 

129
00:08:41,340 --> 00:08:43,380
if you like, fundamental 
capabilities down at the bottom.

130
00:08:43,740 --> 00:08:46,060
You've got data governance 
across the top. 

131
00:08:46,420 --> 00:08:48,700
But what we're talking about is 
the middle, the actual when the 

132
00:08:48,700 --> 00:08:51,860
rubber hits the road, which is 
the life cycle, which data goes 

133
00:08:51,860 --> 00:08:54,420
through and which your projects 
should go through when they deal

134
00:08:54,420 --> 00:08:58,260
with data and capabilities. 
Now capabilities can be talked 

135
00:08:58,260 --> 00:09:01,260
about in terms of from a BA 
perspective and that's what 

136
00:09:01,260 --> 00:09:03,620
we're going to touch on. 
And then below those 

137
00:09:03,620 --> 00:09:05,700
capabilities, you have the 
architecture. 

138
00:09:06,420 --> 00:09:10,220
So we use capabilities as a way 
to talk about both the human and

139
00:09:10,220 --> 00:09:12,020
technology capabilities that are
needed. 

140
00:09:13,120 --> 00:09:16,240
And below that, that's the 
connection back to the 

141
00:09:16,240 --> 00:09:18,600
architecture. 
Back to the solution, the Hal. 

142
00:09:19,760 --> 00:09:26,280
OK cool, so let's get started on
this data lifecycle now, the 

143
00:09:26,280 --> 00:09:30,320
data management lifecycle. 
As I said, I am going to refer 

144
00:09:30,320 --> 00:09:38,120
to data.gov dot NZ for those who
want to reference and see what 

145
00:09:38,120 --> 00:09:41,570
I'm talking about here. 
And I'll take these on on on a 

146
00:09:41,570 --> 00:09:44,690
blog, on our website on the 
Better Business Analysis 

147
00:09:44,690 --> 00:09:47,570
Institute, a website as well so 
you can reference it. 

148
00:09:50,010 --> 00:09:53,650
Now when we go through this 
lifecycle, it's made-up of seven

149
00:09:53,650 --> 00:09:56,530
steps, so not too many for you 
to understand. 

150
00:09:56,690 --> 00:09:58,890
Usually 5, somewhere between 
5:00 and 8:00. 

151
00:09:58,890 --> 00:10:03,490
Things are generally a good 
amount of steps, which is why I 

152
00:10:03,490 --> 00:10:08,250
like this particular journey 
map, if you like. 

153
00:10:08,610 --> 00:10:14,730
The first step is plan. 
So this is less technical and 

154
00:10:14,730 --> 00:10:17,890
more around, you know the 
processes and resources that are

155
00:10:18,250 --> 00:10:20,770
that you need to manage the 
data. 

156
00:10:21,570 --> 00:10:25,810
It's around the project goals 
for this data and just really 

157
00:10:25,810 --> 00:10:29,170
making it really clear, you 
know, we can use something 

158
00:10:29,170 --> 00:10:31,450
called the data management plan 
if we really like. 

159
00:10:31,610 --> 00:10:33,690
It's where you create, you know 
how you're going to handle this 

160
00:10:33,690 --> 00:10:36,890
data, what's the plan for this 
piece of data or pieces of data.

161
00:10:36,890 --> 00:10:38,530
So we're not just talking about 
the name. 

162
00:10:38,930 --> 00:10:41,530
On a customer, we're talking 
about customer data or we're 

163
00:10:41,530 --> 00:10:46,770
talking about moving a whole 
server that has a whole lot of 

164
00:10:46,770 --> 00:10:49,930
data sets to a new technology in
the cloud. 

165
00:10:49,930 --> 00:10:51,810
So how are we going to handle 
that? 

166
00:10:51,810 --> 00:10:54,570
What is the plan? 
So that should be, you know, you

167
00:10:54,570 --> 00:10:56,610
should work with your project 
manager on that and that should 

168
00:10:56,610 --> 00:10:58,810
be very clear. 
But it's specific around the 

169
00:10:58,810 --> 00:11:01,970
data side of things and it might
be how you're going to migrate 

170
00:11:01,970 --> 00:11:03,890
data. 
You know, what is the plan 

171
00:11:04,010 --> 00:11:05,890
before you even start doing 
anything. 

172
00:11:06,250 --> 00:11:08,530
So it doesn't necessarily 
involve any. 

173
00:11:09,850 --> 00:11:11,970
Fundamental data management 
capabilities. 

174
00:11:11,970 --> 00:11:15,810
At this stage it could just be 
pen and paper or a Google doc or

175
00:11:15,810 --> 00:11:18,690
you know, a Word document or a 
PowerPoint presentation, but 

176
00:11:18,690 --> 00:11:20,610
you're really planning out what 
you need to do. 

177
00:11:20,970 --> 00:11:23,770
So you know, you would start 
with writing plan and then you 

178
00:11:23,770 --> 00:11:25,890
would list out what are you 
going to do with that. 

179
00:11:26,210 --> 00:11:28,810
Now it's really important that 
your stakeholders, your product 

180
00:11:28,810 --> 00:11:31,410
owners, your sponsors understand
what the plan is. 

181
00:11:31,410 --> 00:11:32,890
So it should be written in their
language. 

182
00:11:33,090 --> 00:11:37,170
A bit like a spec, but just the 
same as an agile BA plan that we

183
00:11:37,170 --> 00:11:38,530
talk about at The Better 
Business. 

184
00:11:38,650 --> 00:11:41,410
Analysis Institute. 
This is a plan for your data, 

185
00:11:42,650 --> 00:11:43,290
right? 
Cool. 

186
00:11:43,890 --> 00:11:46,650
So you've got your plan and now 
you're going to execute on your 

187
00:11:46,650 --> 00:11:49,130
plan. 
So the second step after plan is

188
00:11:49,130 --> 00:11:53,730
collect. 
So this is where data is 

189
00:11:53,730 --> 00:11:57,450
gathered or generated. 
And this is the information 

190
00:11:57,450 --> 00:12:01,450
that's in scope of your plan. 
So you need to be able to 

191
00:12:01,450 --> 00:12:03,370
collect this information. 
And it's not. 

192
00:12:03,370 --> 00:12:05,210
We're not. 
You may start. 

193
00:12:05,580 --> 00:12:07,980
Your brain will be triggered and
you will start to think about 

194
00:12:07,980 --> 00:12:11,300
how that's going to happen. 
But at the moment just identify 

195
00:12:12,180 --> 00:12:15,180
the different datasets which 
you've got in your plan. 

196
00:12:15,500 --> 00:12:18,060
And we need to, you know, we 
need to collect information. 

197
00:12:18,060 --> 00:12:22,780
So this could be information and
other data stores within your 

198
00:12:22,780 --> 00:12:25,140
organization. 
It could be accessing or 

199
00:12:25,140 --> 00:12:28,100
consuming data from other 
locations outside of your 

200
00:12:28,100 --> 00:12:30,140
organization. 
It could be multiple data 

201
00:12:30,140 --> 00:12:33,090
sources. 
It could even be capturing new 

202
00:12:33,090 --> 00:12:37,650
data through say survey forms. 
So it's all of that and you just

203
00:12:37,650 --> 00:12:40,290
need to define, you know, what 
are your collection points, how 

204
00:12:40,290 --> 00:12:43,970
are you going to collect this 
data, what is that? 

205
00:12:44,010 --> 00:12:46,570
And literally you need to 
collect first. 

206
00:12:46,570 --> 00:12:49,570
That is the first you don't. 
If you don't have the data, you 

207
00:12:49,570 --> 00:12:52,210
need to go and collect the data.
So it's about gathering that. 

208
00:12:53,710 --> 00:12:56,750
And you know when we get down to
the capabilities, you know there

209
00:12:56,750 --> 00:12:59,590
is technical terms like 
ingesting that data. 

210
00:12:59,790 --> 00:13:03,870
So how do you get that data into
your particular data management 

211
00:13:03,870 --> 00:13:06,470
platform. 
So at this stage you're 

212
00:13:06,470 --> 00:13:09,990
collecting it, where it goes is 
important, but we're not talking

213
00:13:09,990 --> 00:13:14,510
about the how right now. 
So we've plans got a plan, we 

214
00:13:14,510 --> 00:13:18,950
then collect now as the data 
comes in from multiple sources. 

215
00:13:19,670 --> 00:13:22,910
We need to describe the data. 
We need to have some agreement 

216
00:13:22,910 --> 00:13:26,710
around the metadata standards in
our organization and we need to 

217
00:13:26,710 --> 00:13:28,790
describe the pieces of data that
we've got in. 

218
00:13:29,110 --> 00:13:32,990
So you know, you may and your 
plan come up with a data model. 

219
00:13:32,990 --> 00:13:34,950
So we're saying okay, we're 
going to collect information 

220
00:13:34,950 --> 00:13:37,230
around our customers. 
We're a store. 

221
00:13:37,230 --> 00:13:39,350
We're going to collect the 
information around our store and

222
00:13:39,350 --> 00:13:41,630
we're going to collect the 
information around our inventory

223
00:13:41,910 --> 00:13:44,390
and we might, I don't know, sell
tshirts for example. 

224
00:13:45,400 --> 00:13:48,760
So therefore you could have 
grouped that data into those 

225
00:13:48,760 --> 00:13:51,800
three areas in terms of 
inventory, in terms of store and

226
00:13:51,800 --> 00:13:54,720
in terms of customer. 
And you could have mapped out 

227
00:13:54,720 --> 00:13:59,800
what data you need to collect, 
only collect data you're gonna 

228
00:13:59,800 --> 00:14:02,480
use and you would have defined 
that. 

229
00:14:02,480 --> 00:14:05,160
So at this point, you may have 
lots of data sources you may 

230
00:14:05,160 --> 00:14:08,060
have downloaded. 
I don't know, bought e-mail 

231
00:14:08,060 --> 00:14:10,860
marketing lists with a whole lot
of customer names who might be 

232
00:14:10,860 --> 00:14:16,300
interested in your T-shirts. 
You may know what the inventory 

233
00:14:16,300 --> 00:14:20,740
system fields are, so you know 
what those are and you may also 

234
00:14:20,740 --> 00:14:23,420
know all the information you 
need to run the shop. 

235
00:14:23,780 --> 00:14:26,540
So you've defined that you've 
run that down and that's what we

236
00:14:26,540 --> 00:14:30,940
talk about and it's important 
down to go the down to the data 

237
00:14:30,940 --> 00:14:34,580
type level here, not database 
level, but just know that a name

238
00:14:34,580 --> 00:14:37,820
might be. 
Just a string or a piece of 

239
00:14:37,820 --> 00:14:40,460
text, for example. 
Whereas if we were collecting a 

240
00:14:40,460 --> 00:14:42,220
phone number, that would be a 
number. 

241
00:14:42,980 --> 00:14:45,780
So it is worth, you know, 
describing that data at that 

242
00:14:45,780 --> 00:14:48,900
level at least as a BA, and 
passing that on to. 

243
00:14:49,720 --> 00:14:54,240
Those in the data team to maybe 
go a little bit further or for 

244
00:14:54,240 --> 00:14:56,920
you to do that later on if that 
happens to be your role. 

245
00:14:57,360 --> 00:15:00,320
So yeah, describe the pieces of 
data, what are they, What are 

246
00:15:00,320 --> 00:15:04,040
they used for, what are the, 
what would they, where did they 

247
00:15:04,040 --> 00:15:06,680
come from, what are they going 
to be used for? 

248
00:15:06,680 --> 00:15:11,240
So input and then the output, 
not necessarily the process. 

249
00:15:12,130 --> 00:15:14,330
Of course. 
So now that we've collected our 

250
00:15:14,330 --> 00:15:18,050
data, if you like, or at least 
virtually collected our data and

251
00:15:18,050 --> 00:15:22,410
we've named and appropriately 
defined the metadata around that

252
00:15:22,410 --> 00:15:26,010
data describing it, we then need
to store it. 

253
00:15:26,450 --> 00:15:28,650
So the data needs to be stored 
somewhere. 

254
00:15:28,970 --> 00:15:32,970
And data is digital these days, 
mostly digital. 

255
00:15:32,970 --> 00:15:36,290
And if it isn't, we need to get 
it into digital form and we need

256
00:15:36,290 --> 00:15:38,810
to be able to store it in a 
digital repository. 

257
00:15:38,810 --> 00:15:41,410
And that's the highest term, 
that's a very BA term for. 

258
00:15:41,610 --> 00:15:45,130
Saying you know what you might 
think of as a database for 

259
00:15:45,130 --> 00:15:48,890
example, or data like, there's a
whole lot of terms of ways you 

260
00:15:48,890 --> 00:15:50,530
can store information these 
days. 

261
00:15:50,930 --> 00:15:54,290
But we are talking about at the 
highest level here, digital 

262
00:15:54,290 --> 00:15:57,130
repository. 
And that of course needs to be 

263
00:15:57,130 --> 00:16:02,040
secure, reusable. 
You know, we need to be able to 

264
00:16:02,040 --> 00:16:04,440
protect the data we're 
collecting, especially if it's 

265
00:16:04,440 --> 00:16:07,200
customer data. 
But equally if it's around our 

266
00:16:07,200 --> 00:16:10,560
operations, around our store and
our inventory, you know, it's 

267
00:16:10,560 --> 00:16:12,680
not. 
If we, you know, don't store 

268
00:16:12,680 --> 00:16:16,720
that appropriately, we put it 
into, I don't know, into save 

269
00:16:16,800 --> 00:16:20,160
all of our information into a 
Google Sheet that's got a public

270
00:16:20,160 --> 00:16:23,600
link and that information is 
accessed, then you know that 

271
00:16:23,600 --> 00:16:27,240
could have negative consequences
both to our reputation but also 

272
00:16:27,240 --> 00:16:29,600
to our business and to our 
store, so. 

273
00:16:30,360 --> 00:16:34,400
We really need to, you know, 
think about how we're going to 

274
00:16:34,400 --> 00:16:37,640
store information and that that 
quickly follows the collection. 

275
00:16:37,640 --> 00:16:39,760
So you're collecting it. 
You've you've obviously got to 

276
00:16:39,760 --> 00:16:42,120
put it somewhere. 
So this is and you might say, 

277
00:16:42,120 --> 00:16:44,760
well, Ben, why are we just 
thinking about this now? 

278
00:16:44,760 --> 00:16:47,240
Well, you're not, you've done 
this in your plan. 

279
00:16:47,240 --> 00:16:49,840
You've already got an idea about
where you might have stored it, 

280
00:16:49,840 --> 00:16:54,200
but right now it's the action of
storing the data somewhere. 

281
00:16:54,440 --> 00:16:55,640
OK. 
And that could be multiple 

282
00:16:55,640 --> 00:16:57,200
places. 
We'll come back to that when we 

283
00:16:57,200 --> 00:16:59,560
talk about technical 
capabilities. 

284
00:17:00,960 --> 00:17:03,480
Once the data's stored, so 
you've collected, you've 

285
00:17:03,480 --> 00:17:06,319
described it, and you've stored 
it somewhere or in multiple 

286
00:17:06,319 --> 00:17:11,640
places, you then analyze the 
data, Okay and so that is, you 

287
00:17:11,640 --> 00:17:14,760
know, you explore it, you 
interpret it, You look at the 

288
00:17:14,760 --> 00:17:17,720
data in different ways. 
It might be the number of sales 

289
00:17:17,920 --> 00:17:19,920
for T-shirts in your store, for 
example. 

290
00:17:19,920 --> 00:17:23,440
It might be the number of 
customers who visited the store.

291
00:17:23,980 --> 00:17:27,220
It might be the number of 
transactions, it might be your 

292
00:17:27,220 --> 00:17:29,140
stock levels. 
So you're analyzing the 

293
00:17:29,140 --> 00:17:33,340
information you're really 
looking for, grouping that 

294
00:17:33,340 --> 00:17:36,340
information into useful. 
So when you're analyzing, the 

295
00:17:36,340 --> 00:17:41,700
goal of it is to interpret it 
into useful data sets or data 

296
00:17:41,700 --> 00:17:45,740
views if you like. 
So you might combine data from 

297
00:17:45,740 --> 00:17:49,540
your customer repository. 
And your inventory repository 

298
00:17:49,540 --> 00:17:52,220
and your store repository to 
have, you know one. 

299
00:17:53,620 --> 00:17:57,860
So you are able to create a view
where you see all customers by 

300
00:17:57,860 --> 00:18:01,340
store, by inventory for example,
where they bought their stuff. 

301
00:18:01,340 --> 00:18:04,940
So the next step, so once you've
analyzed the data and you should

302
00:18:04,940 --> 00:18:07,420
have an idea about how you want 
to use your data at that point 

303
00:18:08,020 --> 00:18:11,460
and then you want to use it. 
So that's the next step. 

304
00:18:11,460 --> 00:18:14,020
So we've planned, we've 
collected, we've described, 

305
00:18:14,340 --> 00:18:17,420
we've stored it, we're analyzing
and now we're using it. 

306
00:18:17,700 --> 00:18:21,020
The data is used for a purpose. 
It was collected or generated. 

307
00:18:21,300 --> 00:18:23,580
So you should have defined why 
you were collecting the 

308
00:18:23,580 --> 00:18:25,300
information and now you want to 
use it. 

309
00:18:25,300 --> 00:18:29,500
It might be a report that that's
a general use case, is that you 

310
00:18:29,500 --> 00:18:32,620
are collecting information so 
you have better insights on your

311
00:18:32,620 --> 00:18:36,140
customers and so this could be a
report or you know, a business 

312
00:18:36,140 --> 00:18:37,860
intelligence dashboard if you 
like. 

313
00:18:38,340 --> 00:18:44,300
Dashboard is a very good use of 
data and you know the idea is 

314
00:18:44,300 --> 00:18:48,620
that you could reuse that data 
also for for other purposes. 

315
00:18:48,620 --> 00:18:51,540
You might feed it, now you've 
collected it and analyzed it. 

316
00:18:51,740 --> 00:18:54,260
You might feed it into your 
e-mail marketing system. 

317
00:18:54,260 --> 00:19:00,020
So using can can be consumed is 
another way of expressing that. 

318
00:19:00,380 --> 00:19:04,820
It could be to present the data 
inside report, but it could also

319
00:19:04,820 --> 00:19:08,260
be make it available for use to 
be consumed. 

320
00:19:08,340 --> 00:19:10,820
By other systems. 
So it could be, yeah, like I 

321
00:19:10,820 --> 00:19:12,340
said, your e-mail marketing 
tool. 

322
00:19:12,500 --> 00:19:15,820
So that'd be very useful to be 
used by MailChimp or you know, 

323
00:19:15,900 --> 00:19:18,420
the meta platform or whatever 
marketing tool you're using. 

324
00:19:19,140 --> 00:19:21,820
And the final step is to then 
save. 

325
00:19:22,410 --> 00:19:25,610
Or destroy the data. 
So if you're going to keep that 

326
00:19:25,610 --> 00:19:28,170
data for a long time, and you 
know this is where records 

327
00:19:28,170 --> 00:19:30,970
management policies come in, 
this is where you really need to

328
00:19:30,970 --> 00:19:34,650
understand archival rules and 
disposal rules and how long 

329
00:19:34,650 --> 00:19:36,450
you're allowed to keep that 
customer data for. 

330
00:19:36,770 --> 00:19:39,930
You need to make a decision here
in terms of saving it, which is 

331
00:19:40,170 --> 00:19:43,490
could be archived because it 
might not necessarily need to be

332
00:19:43,810 --> 00:19:46,890
accessed all the time, or you 
need to destroy it. 

333
00:19:47,210 --> 00:19:52,050
You need to take actions to 
safeguard that data for its long

334
00:19:52,050 --> 00:19:53,970
term. 
Viability and availability. 

335
00:19:54,330 --> 00:19:58,890
So in practical terms, that data
might not be, you know, saved on

336
00:19:58,890 --> 00:20:01,210
your desktop and might be saved 
in the database. 

337
00:20:02,210 --> 00:20:04,970
So this is we really need to 
think about how we get a long 

338
00:20:04,970 --> 00:20:10,490
term save our information. 
Now just to recap, when you've 

339
00:20:10,490 --> 00:20:14,690
done your plan at your first 
planning stage, you should have 

340
00:20:14,690 --> 00:20:17,930
these headers written down. 
Click describe store. 

341
00:20:18,360 --> 00:20:19,320
Analyze. 
Use. 

342
00:20:19,520 --> 00:20:23,720
They will destroy and you've 
already mapped out how you're 

343
00:20:23,720 --> 00:20:27,040
gonna do those things, at least 
at a high level. 

344
00:20:27,160 --> 00:20:30,280
You know high level business 
requirements level, not 

345
00:20:30,280 --> 00:20:32,560
necessarily the detail 
necessarily. 

346
00:20:33,240 --> 00:20:37,280
And as you go into each level, 
each step you may define the 

347
00:20:37,280 --> 00:20:40,340
detail level requirements. 
And the solution you're going to

348
00:20:40,340 --> 00:20:45,540
use, so that is the data 
management lifecycle as defined 

349
00:20:45,540 --> 00:20:48,500
by data.gov dot NZ. 
It is pretty good. 

350
00:20:48,940 --> 00:20:53,500
And the other useful side of 
this particular plan is they do 

351
00:20:53,500 --> 00:20:57,220
talk about digital capabilities.
Now when I talk about 

352
00:20:57,220 --> 00:21:01,980
capabilities, capabilities is a 
word that is used a lot you can 

353
00:21:01,980 --> 00:21:05,400
talk about. 
It means skills, and you can 

354
00:21:05,400 --> 00:21:08,160
also mean you know human 
capabilities or technical 

355
00:21:08,160 --> 00:21:14,200
capabilities. data.gov dot NZ 
outlines the human capabilities,

356
00:21:14,240 --> 00:21:20,240
the processes or abilities that 
your particular team or people 

357
00:21:20,240 --> 00:21:24,920
in your organization will need 
in order to kind of reach a high

358
00:21:24,920 --> 00:21:29,940
level of maturity. 
And so it's if you if you if you

359
00:21:29,940 --> 00:21:32,620
have that requirement. 
If your job is to audit where 

360
00:21:32,620 --> 00:21:34,500
you're at, please go to that 
website. 

361
00:21:34,500 --> 00:21:38,900
It's fantastic and it will it 
aligns those capabilities with 

362
00:21:39,220 --> 00:21:42,620
the data management lifecycle 
I've just talked about. 

363
00:21:42,620 --> 00:21:48,220
So your your job is is nearly 
done but because we live in an 

364
00:21:48,300 --> 00:21:52,420
IT world and our job is to. 
Jump between the world of 

365
00:21:52,420 --> 00:21:56,300
business and IT we we should be 
really talking about what I 

366
00:21:56,300 --> 00:21:59,980
would say the technical 
capabilities that we now require

367
00:21:59,980 --> 00:22:01,620
in order to support that life 
cycle. 

368
00:22:02,770 --> 00:22:07,090
And like I said at the start of 
this podcast, these capabilities

369
00:22:07,250 --> 00:22:12,450
have evolved over time. 
They to be honest these highest 

370
00:22:12,450 --> 00:22:16,810
level capability names may well 
have existed 20 years ago, but 

371
00:22:16,810 --> 00:22:19,570
definitely the techniques or 
some of the ways in which we do 

372
00:22:19,570 --> 00:22:22,210
things inside these boxes have 
changed. 

373
00:22:22,530 --> 00:22:25,450
So inside these boxes there are 
sub capabilities if you like, 

374
00:22:25,450 --> 00:22:31,050
functions or features that these
various systems allow us to do 

375
00:22:31,370 --> 00:22:35,290
or do well. 
But at the highest level, we as 

376
00:22:35,290 --> 00:22:37,050
Bas need to think about these 
things. 

377
00:22:37,410 --> 00:22:39,490
They're not complicated and I'm 
going to read them out. 

378
00:22:39,490 --> 00:22:48,410
Now this particularly are names 
that really Amazon AWS use to 

379
00:22:48,410 --> 00:22:50,770
define these areas, but I think 
they're pretty common across 

380
00:22:50,770 --> 00:22:54,890
whatever platform you're using, 
be that Google Cloud or Azure or

381
00:22:54,890 --> 00:23:00,090
even an on premise setup. 
OK, so these capabilities are 

382
00:23:00,090 --> 00:23:02,810
technical capabilities. 
Just to recap, they need to 

383
00:23:02,810 --> 00:23:07,410
exist in order to support that 
data management lifecycle I just

384
00:23:07,410 --> 00:23:11,090
talked about. 
The first one is data sources. 

385
00:23:11,330 --> 00:23:12,890
So these are things. 
These are nouns. 

386
00:23:13,010 --> 00:23:15,610
They're not just collect data or
store data. 

387
00:23:15,650 --> 00:23:18,650
They're actions. 
This is data sources. 

388
00:23:18,650 --> 00:23:23,050
We need to have data sources. 
And that could be a database, it

389
00:23:23,050 --> 00:23:27,170
could be SQL Server, it could be
a other systems like CRM 

390
00:23:27,170 --> 00:23:29,250
systems. 
It could be devices these days, 

391
00:23:29,250 --> 00:23:32,490
It could be small devices that 
feed in, it could be your 

392
00:23:32,490 --> 00:23:34,970
contact center, it could be 
logs, it could be anything, 

393
00:23:34,970 --> 00:23:36,570
right? 
It could be your social media 

394
00:23:37,210 --> 00:23:40,930
marketing platform. 
But data sources need to exist. 

395
00:23:40,970 --> 00:23:45,970
You need to have some way of 
storing, sorry of sourcing 

396
00:23:45,970 --> 00:23:47,650
information. 
You need to know what those are 

397
00:23:48,010 --> 00:23:49,610
okay, so data source is pretty 
easy. 

398
00:23:50,640 --> 00:23:55,360
Then in order to use that 
information, we need to get that

399
00:23:55,360 --> 00:23:58,880
information into a platform or 
to a series of components. 

400
00:23:59,080 --> 00:24:04,040
So it's useful for us and we 
call that function ingestion 

401
00:24:04,440 --> 00:24:06,880
Okay. 
So we ingest data. 

402
00:24:07,560 --> 00:24:11,000
If you like, you could say 
capture is probably the very 

403
00:24:11,000 --> 00:24:15,640
high level, you know, holistic 
word, but it's not quite right. 

404
00:24:16,290 --> 00:24:19,850
In this case, the reason why 
sorry, ingestion is really 

405
00:24:19,850 --> 00:24:23,130
important is that we're actually
taking the information and 

406
00:24:23,330 --> 00:24:26,370
putting it somewhere now that's 
that's important. 

407
00:24:26,370 --> 00:24:29,890
There's an important reason for 
using that word because it's 

408
00:24:29,890 --> 00:24:34,010
suggesting you're taking a copy 
of that information and there 

409
00:24:34,010 --> 00:24:35,730
are other ways of accessing 
information. 

410
00:24:35,730 --> 00:24:38,690
You don't have to move it. 
You could report directly from a

411
00:24:38,690 --> 00:24:42,410
source system or you could use 
something called data fabric and

412
00:24:42,410 --> 00:24:45,850
data virtualization to kind of 
pull that information when you 

413
00:24:45,850 --> 00:24:47,890
needed. 
It but not ingest the 

414
00:24:47,890 --> 00:24:49,650
information. 
OK. 

415
00:24:49,890 --> 00:24:52,610
So if we're going to, if it's 
really critical for us and we're

416
00:24:52,610 --> 00:24:55,250
going to add we want to tag it 
and we want to use it 

417
00:24:55,570 --> 00:24:59,050
effectively and we want to 
process that information and we 

418
00:24:59,050 --> 00:25:02,530
wanted to join it with others, 
then it is best for us to ingest

419
00:25:02,530 --> 00:25:04,890
that information. 
So we've got data sources and 

420
00:25:04,890 --> 00:25:08,650
we're going to ingest at least 
some of that information into 

421
00:25:08,650 --> 00:25:11,210
our system. 
We then need to store that 

422
00:25:11,210 --> 00:25:12,890
information. 
So we've got storage. 

423
00:25:13,570 --> 00:25:16,650
Now generally there are three 
categories within there. 

424
00:25:16,650 --> 00:25:21,010
You've got you ingest yours 
saving what you've ingested and 

425
00:25:21,010 --> 00:25:23,930
its raw form. 
So whatever format and what 

426
00:25:23,930 --> 00:25:27,930
you've got it that we then then 
do a process of cleaning the 

427
00:25:27,930 --> 00:25:30,010
information and we curate the 
data. 

428
00:25:30,010 --> 00:25:34,210
So there are storage involves 
kind of moving information 

429
00:25:34,210 --> 00:25:37,770
around. 
You know, processing that 

430
00:25:37,770 --> 00:25:39,330
information. 
So there might be different 

431
00:25:39,330 --> 00:25:42,730
levels of kind of copies of that
information in various states. 

432
00:25:42,970 --> 00:25:46,930
So we need to have storage now 
as I just mentioned, we need to 

433
00:25:46,930 --> 00:25:52,850
have processing and this is 
where advances in data 

434
00:25:52,850 --> 00:25:54,610
management have really taken 
off. 

435
00:25:54,970 --> 00:25:58,610
So we can you know in terms of 
AI, machine learning and some of

436
00:25:58,610 --> 00:26:02,090
the tools that we've got 
available, the way in which we 

437
00:26:02,090 --> 00:26:05,160
clean data. 
This is part of processing and 

438
00:26:05,160 --> 00:26:08,200
we transform data. 
The tools out there are much you

439
00:26:08,200 --> 00:26:10,720
know are really advanced. 
And this is where you know the 

440
00:26:10,720 --> 00:26:14,920
old ETL processes have really 
been replaced by using some 

441
00:26:14,920 --> 00:26:16,680
systems that can do that 
automatically. 

442
00:26:18,160 --> 00:26:22,240
So if we take raw data and our 
storage, we may process that 

443
00:26:22,240 --> 00:26:24,960
data to clean it, transform it, 
and then we've got a clean data 

444
00:26:24,960 --> 00:26:28,520
set. 
So regardless of what format. 

445
00:26:28,780 --> 00:26:31,700
The data came in after we've 
cleaned it and transformed it. 

446
00:26:31,700 --> 00:26:38,260
We get it into a clean, useful 
way, and then we go back and we 

447
00:26:38,260 --> 00:26:41,500
compress this a bit around, 
aggregating information, 

448
00:26:41,500 --> 00:26:45,220
segmenting it. 
We might have multiple. 

449
00:26:46,800 --> 00:26:50,120
Different copies of the same 
piece of data. 

450
00:26:50,120 --> 00:26:52,960
So it could mash that up and we 
can enrich the data and then we 

451
00:26:52,960 --> 00:26:56,400
kind of have this saved back 
curated data. 

452
00:26:56,680 --> 00:27:01,640
And again this is kind of a 
almost a modern day processing 

453
00:27:01,640 --> 00:27:04,880
technique. 
So if all our data, if you think

454
00:27:04,880 --> 00:27:07,400
about it, all our different 
sources, we had 20 different 

455
00:27:07,400 --> 00:27:11,200
sources, we inject ingest those,
they're in raw format, we clean 

456
00:27:11,200 --> 00:27:13,560
them and transform them so they 
get into our data model. 

457
00:27:13,760 --> 00:27:16,520
We then enrich them and we. 
Segment them. 

458
00:27:16,520 --> 00:27:18,520
So now it's in more than just 
our data model. 

459
00:27:18,520 --> 00:27:22,520
It's in a useful format. 
Then we can start to use that 

460
00:27:22,520 --> 00:27:26,680
information and we use that 
information in a couple of ways.

461
00:27:27,480 --> 00:27:31,920
One, we use analytics. 
So data analytics, this is where

462
00:27:31,920 --> 00:27:35,920
the term data analysis came from
is when you're starting to use 

463
00:27:35,920 --> 00:27:38,320
the information and you can 
think about that and through 

464
00:27:38,320 --> 00:27:41,960
statistical data analysis, data 
science and dashboarding. 

465
00:27:42,960 --> 00:27:45,960
The other way we could use it is
we could share it with others or

466
00:27:45,960 --> 00:27:50,080
collaborate with other systems. 
So we were just doing this, use 

467
00:27:50,080 --> 00:27:52,000
this platform for a part of our 
business. 

468
00:27:52,320 --> 00:27:55,400
Then we could provide an API so 
it could be used by another part

469
00:27:55,400 --> 00:27:58,760
of the business or another tool.
And we could also collaborate 

470
00:27:58,760 --> 00:28:01,920
with other partners or 
stakeholders or agencies that 

471
00:28:01,920 --> 00:28:05,530
want to use our information. 
So they're two consumption use 

472
00:28:05,530 --> 00:28:07,650
cases. 
One is analytics, presenting the

473
00:28:07,650 --> 00:28:10,610
information and one is 
collaboration where we can share

474
00:28:10,770 --> 00:28:12,810
the information so other people 
can consume it. 

475
00:28:15,090 --> 00:28:19,210
Obviously what we also need to 
think about is there are some 

476
00:28:19,730 --> 00:28:23,130
more technical kind of access 
and security capabilities we 

477
00:28:23,130 --> 00:28:26,290
need, which is roughly called 
technical data governance, but 

478
00:28:26,290 --> 00:28:29,170
we need to have a capability for
access rights. 

479
00:28:29,750 --> 00:28:33,270
And in terms of controls and 
auditing and security and then 

480
00:28:33,270 --> 00:28:36,510
there is another bit which is 
really important which is called

481
00:28:36,870 --> 00:28:40,750
cataloging where we assign, we 
have a kind of a metadata 

482
00:28:40,750 --> 00:28:44,230
schemas and potentially data 
crawlers that will start to tag 

483
00:28:44,230 --> 00:28:48,390
our data automatically so that 
they can be used more 

484
00:28:48,390 --> 00:28:51,870
effectively and kind of self 
managed within the system. 

485
00:28:52,390 --> 00:28:56,150
So just to recap, we've got data
sources, we ingest those, we've 

486
00:28:56,630 --> 00:28:59,350
got storage for those, we've got
processing. 

487
00:28:59,750 --> 00:29:03,670
Then consumer either through 
analytics or collaboration, we 

488
00:29:03,670 --> 00:29:09,030
then have some kind of security 
access audit capability, which 

489
00:29:09,030 --> 00:29:11,830
you could call technical data 
governance and then we've got 

490
00:29:11,830 --> 00:29:14,980
cataloging. 
And if you happen to be a I 

491
00:29:15,020 --> 00:29:18,820
guess a social marketing company
or a or an advertising company, 

492
00:29:19,060 --> 00:29:23,820
you may what we what we term 
activate that data. 

493
00:29:23,820 --> 00:29:26,620
So you could use that 
specifically for advertising or 

494
00:29:26,620 --> 00:29:29,980
real time marketing, but most 
companies don't have that piece.

495
00:29:30,500 --> 00:29:33,940
So those are the technical 
capabilities that need to exist.

496
00:29:34,780 --> 00:29:37,500
And if you have those 
capabilities and you've planned 

497
00:29:37,740 --> 00:29:40,900
and you have a way of saving or 
destroying your data, then those

498
00:29:40,900 --> 00:29:43,500
middle capabilities that we 
talked about before, sorry, 

499
00:29:43,500 --> 00:29:46,340
those middle process steps of 
collection, collecting, 

500
00:29:46,620 --> 00:29:50,620
describing, storing, analyzing 
and using data will be fulfilled

501
00:29:50,620 --> 00:29:53,420
by those technical capabilities 
we just talked about, which were

502
00:29:53,900 --> 00:29:58,260
data sources ingestion, storage,
processing, analytics, 

503
00:29:58,260 --> 00:30:01,620
collaboration, cataloging and 
some kind of data. 

504
00:30:02,180 --> 00:30:05,380
Technical data governance in 
terms of security and audit and 

505
00:30:05,380 --> 00:30:09,300
access, OK, so those are the two
main parts. 

506
00:30:09,300 --> 00:30:13,220
If you have all of those pieces,
the process and the 

507
00:30:13,220 --> 00:30:18,140
capabilities, then you have a 
modern day, well, I guess you 

508
00:30:18,140 --> 00:30:21,220
have at least a framework for 
data management. 

509
00:30:22,420 --> 00:30:27,620
What then makes the difference 
here is what is doing or what is

510
00:30:27,620 --> 00:30:31,660
what technology is performing 
those capabilities. 

511
00:30:32,440 --> 00:30:36,200
So I mentioned before the likes 
of AWS, there's Google Cloud, 

512
00:30:36,280 --> 00:30:39,120
there's Azure, there are many 
others and they all have their 

513
00:30:39,120 --> 00:30:43,440
own flavor of tools that carry 
out those capabilities we talked

514
00:30:43,440 --> 00:30:46,320
about. 
So for example. 

515
00:30:48,150 --> 00:30:52,070
AWS in terms of their customer 
data platform, they will suck in

516
00:30:52,750 --> 00:30:57,190
many or any data sources and 
ingest those and it uses things 

517
00:30:57,190 --> 00:31:05,030
like Amazon Kinetics and it uses
Appflow or Amazon API Gateway 

518
00:31:05,230 --> 00:31:09,630
and then stores it into buckets 
which called S3 buckets for 

519
00:31:09,630 --> 00:31:12,710
example. 
It then uses step functions with

520
00:31:12,910 --> 00:31:15,110
the process and orchestrate the 
information. 

521
00:31:15,110 --> 00:31:21,070
It uses AWS Lambda. 
AWS Glue for workflows and then 

522
00:31:21,870 --> 00:31:25,390
pushes that information out to 
analytics through Amazon 

523
00:31:25,390 --> 00:31:27,390
Redshift. 
It even has its own reporting 

524
00:31:27,390 --> 00:31:30,230
tool. 
And for data collaboration it 

525
00:31:30,230 --> 00:31:33,030
uses gateways again and 
cataloging. 

526
00:31:33,030 --> 00:31:35,710
It uses AWS Glue for data 
cataloging. 

527
00:31:35,950 --> 00:31:40,950
So Amazon has subcomponents, 
technical pieces of the puzzles 

528
00:31:40,950 --> 00:31:46,510
or services that it. 
And allow which allows you to 

529
00:31:46,510 --> 00:31:50,550
fulfill those capabilities. 
So Sba's, we don't necessarily 

530
00:31:50,550 --> 00:31:56,990
care what those technical 
functions necessarily are, 

531
00:31:56,990 --> 00:32:02,150
features or tools, sorry that we
don't really care what they are.

532
00:32:02,150 --> 00:32:05,710
We just need to know that we 
have a way of accessing our 

533
00:32:05,710 --> 00:32:08,790
sources, ingesting, storing, you
know, processing that 

534
00:32:08,790 --> 00:32:11,230
information and consuming 
consuming it. 

535
00:32:11,740 --> 00:32:14,620
And cataloging it. 
So if we have those in place, 

536
00:32:14,780 --> 00:32:16,580
then we're not. 
We don't necessarily need to 

537
00:32:16,580 --> 00:32:19,900
understand these technologies. 
And most of the time, there's 

538
00:32:19,900 --> 00:32:23,380
usually a hybrid approach where 
some of this information is 

539
00:32:23,380 --> 00:32:25,900
still on premise and not in the 
cloud. 

540
00:32:26,100 --> 00:32:29,140
Or you're using AWS to do part 
of this. 

541
00:32:29,380 --> 00:32:32,700
And then when you do your 
reporting, you're using Power BI

542
00:32:32,980 --> 00:32:35,300
and it's connecting to this. 
So it doesn't necessarily mean 

543
00:32:35,300 --> 00:32:38,820
you need to adopt one whole 
vendor to do all these things, 

544
00:32:38,820 --> 00:32:43,890
which might become expensive. 
However, all those capabilities 

545
00:32:43,890 --> 00:32:47,330
need to exist. 
So what does that mean for you 

546
00:32:47,690 --> 00:32:50,730
as a BA? 
Now you understand that there 

547
00:32:50,730 --> 00:32:54,770
needs to be this data management
framework which includes the 

548
00:32:55,290 --> 00:32:59,930
life cycle and these 
capabilities and means that you 

549
00:32:59,930 --> 00:33:04,010
can actually start to write 
requirements, start to plan, put

550
00:33:04,010 --> 00:33:06,650
together a data plan and start 
to write requirements that use 

551
00:33:06,650 --> 00:33:12,960
these features. 
So for example you might say as 

552
00:33:13,080 --> 00:33:19,120
a user, sorry, as a Yep as a 
data analyst I would like the 

553
00:33:21,720 --> 00:33:28,840
New Zealand Census household 
income data to be ingested into 

554
00:33:29,040 --> 00:33:33,360
the data warehouse and therefore
you can use these terms which 

555
00:33:33,360 --> 00:33:36,080
means that the if you've grouped
those under these kind of 

556
00:33:36,080 --> 00:33:40,640
features and functional areas. 
Then the architect then knows 

557
00:33:40,640 --> 00:33:46,000
that you've got a requirement to
use those areas of components, 

558
00:33:46,000 --> 00:33:49,000
and then it can start thinking 
about technology that might be 

559
00:33:49,000 --> 00:33:53,560
best fit for that purpose. 
And of course, depending on how 

560
00:33:53,560 --> 00:33:56,640
many data sets you have, how 
complicated they are, that could

561
00:33:56,640 --> 00:33:59,520
determine what kind of tools 
they use in those areas. 

562
00:34:00,000 --> 00:34:02,840
You can ingest data, for example
into Google Sheets or into 

563
00:34:02,840 --> 00:34:06,560
Excel, and so you know that 
might be OK as long as you have.

564
00:34:07,120 --> 00:34:09,719
The appropriate data governance 
and security and access rights, 

565
00:34:09,719 --> 00:34:12,320
which is why we don't really use
Excel for that purpose. 

566
00:34:12,840 --> 00:34:17,000
But that might be OK and that 
might be fine for us depending 

567
00:34:17,000 --> 00:34:20,800
on the use case we've got. 
So I hope that that's given you 

568
00:34:20,800 --> 00:34:24,400
some insights in terms of how 
data management works and what 

569
00:34:24,400 --> 00:34:27,679
you need to think about as a BA.
I'm sure there's lots of other 

570
00:34:27,679 --> 00:34:30,960
subtopics we could go into, but 
I hope you have enjoyed this 

571
00:34:30,960 --> 00:34:33,199
podcast and I'll see you next 
time.

