1
00:00:00,600 --> 00:00:03,480
Hi everybody, and welcome back 
to the Better Business Analysis 

2
00:00:03,480 --> 00:00:06,080
podcast with your host, Benjamin
Walsh. 

3
00:00:06,440 --> 00:00:11,400
And today we're diving into a 
topic that every modern BA needs

4
00:00:11,400 --> 00:00:14,280
to understand. 
That's right, this BA Bytes 

5
00:00:14,280 --> 00:00:19,280
episode will be focused on 
modern data management and 

6
00:00:19,280 --> 00:00:22,760
analytics. 
The Better Business Analysis 

7
00:00:22,840 --> 00:00:26,400
Institute presence, the Better 
Business Analysis Podcast with 

8
00:00:27,000 --> 00:00:35,760
Benjamin Walsh Data is the 
backbone of decision making. 

9
00:00:36,280 --> 00:00:41,320
It's the backbone of AI, machine
learning and as BAS we need to 

10
00:00:41,320 --> 00:00:44,280
know how to work with it, 
analyse it and ensure its 

11
00:00:44,280 --> 00:00:47,800
quality. 
But with new technology like 

12
00:00:47,800 --> 00:00:53,080
data fabrics, modern analytical 
methods, and automated 

13
00:00:53,080 --> 00:00:58,360
pipelines, how do we keep up? 
Well, don't worry, I've got you 

14
00:00:58,360 --> 00:01:00,400
covered. 
In today's episode, we will 

15
00:01:00,400 --> 00:01:05,080
break into 10 things you need to
know to get started with modern 

16
00:01:05,239 --> 00:01:07,760
data management as a business 
analyst. 

17
00:01:08,560 --> 00:01:12,520
So the first thing we need to 
really talk about is that number

18
00:01:12,520 --> 00:01:14,640
one, data isn't structured 
anymore. 

19
00:01:16,680 --> 00:01:20,000
OK? 
So if you're in the world of I 

20
00:01:20,000 --> 00:01:24,040
guess relational databases, 
which was new when I started in 

21
00:01:24,040 --> 00:01:28,440
IT, or you are around cleaning 
data and putting in spreadsheets

22
00:01:28,440 --> 00:01:31,440
and so forth, the world has 10 
steps ahead of you. 

23
00:01:32,160 --> 00:01:36,160
Once Upon a time data was mostly
structured and there were neat 

24
00:01:36,160 --> 00:01:39,760
rows and columns in a database. 
Think Excel if you haven't 

25
00:01:39,760 --> 00:01:44,320
worked in databases before. 
But now we have structured, semi

26
00:01:44,320 --> 00:01:47,680
structured and unstructured 
data. 

27
00:01:47,680 --> 00:01:50,880
And before we were trying to get
that to be structured so we 

28
00:01:50,880 --> 00:01:54,760
could use it. 
But now techniques have kind of 

29
00:01:55,200 --> 00:01:59,000
caught up and also we are more 
close to source. 

30
00:01:59,000 --> 00:02:04,160
And So what I mean by that is 
text, images, incentive things, 

31
00:02:04,160 --> 00:02:09,240
so IO, TS, sensor data, social 
media interactions, we need to 

32
00:02:09,240 --> 00:02:10,680
be able to deal with all of 
those things. 

33
00:02:10,680 --> 00:02:13,280
And to be honest, you can't 
structure all that information 

34
00:02:14,080 --> 00:02:18,720
and keep up with the structuring
of that information in time to 

35
00:02:18,720 --> 00:02:20,560
use it. 
So we need to better deal with 

36
00:02:20,560 --> 00:02:23,880
that from those 3 formats we 
just talked about structured, 

37
00:02:24,160 --> 00:02:26,880
semi structured and unstructured
data. 

38
00:02:27,400 --> 00:02:30,920
And as BAS, we need to 
understand where data comes from

39
00:02:31,680 --> 00:02:37,080
and how to work with these, I 
guess, diverse formats in terms 

40
00:02:37,080 --> 00:02:41,160
of collecting it, capturing it, 
processing, storing it, 

41
00:02:41,560 --> 00:02:44,680
transforming it. 
So we can get it into a form in 

42
00:02:44,680 --> 00:02:48,240
which it is consumable for 
whatever use case that we want 

43
00:02:48,280 --> 00:02:50,560
to use. 
So that might be back into data 

44
00:02:50,560 --> 00:02:53,840
pipelines or systems. 
It might be a Power BI report, 

45
00:02:53,840 --> 00:02:56,680
so business intelligence 
reporting. 

46
00:02:57,560 --> 00:03:01,520
It could be integration with 
other systems. 

47
00:03:02,880 --> 00:03:07,040
It could be back to consumers. 
So there are lots of different 

48
00:03:07,040 --> 00:03:10,640
consumption use cases and some 
of those are structured, 

49
00:03:10,640 --> 00:03:14,920
unstructured and semi structured
and their formats differ as 

50
00:03:14,920 --> 00:03:18,400
well. 
That leads me into #2 the data 

51
00:03:18,400 --> 00:03:20,880
management life cycle. 
It is critical. 

52
00:03:21,360 --> 00:03:23,760
Data doesn't just appear and 
disappear. 

53
00:03:23,760 --> 00:03:25,960
It follows a life cycle. 
And some of these are quite 

54
00:03:25,960 --> 00:03:27,840
different. 
Gartner has one. 

55
00:03:27,840 --> 00:03:33,080
There is kind of some data 
governance forums that have a 

56
00:03:33,080 --> 00:03:35,920
good life cycle. 
You would have heard a version 

57
00:03:35,920 --> 00:03:37,440
of this. 
Here's one. 

58
00:03:37,440 --> 00:03:41,000
It's ingestion, storage, 
processing, transformation, 

59
00:03:41,440 --> 00:03:45,560
analytics and disposal. 
But there are different ways of 

60
00:03:45,560 --> 00:03:47,440
talking about that. 
So I said consumption, which I 

61
00:03:47,440 --> 00:03:50,800
prefer analytics, which is a 
consumption type. 

62
00:03:51,120 --> 00:03:54,960
So there is a really standard 
data management life cycle and 

63
00:03:54,960 --> 00:03:58,160
you can argue about whether or 
not data goes in a straight line

64
00:03:58,160 --> 00:04:00,840
or which it doesn't, whether or 
not it can go back through those

65
00:04:00,840 --> 00:04:03,080
processes. 
But there are fundamental 

66
00:04:03,080 --> 00:04:04,640
building blocks. 
There's about five of them. 

67
00:04:04,640 --> 00:04:07,000
And even with different 
terminologies and different 

68
00:04:07,000 --> 00:04:09,320
organizations, they're they're 
really consistent. 

69
00:04:09,360 --> 00:04:10,960
OK. 
And I would take a Gartner 

70
00:04:11,240 --> 00:04:14,640
approach here and just not get 
involved in organizational 

71
00:04:14,640 --> 00:04:17,120
distractions. 
Usually your government or best 

72
00:04:17,120 --> 00:04:19,320
practice in the private industry
has already defined these 

73
00:04:19,320 --> 00:04:24,440
things. 
Now knowing where data is in the

74
00:04:24,440 --> 00:04:28,800
life cycle in those five or so 
steps helps BAS define 

75
00:04:28,800 --> 00:04:32,840
requirements and align 
stakeholders on expectations of 

76
00:04:32,840 --> 00:04:37,080
data quality, what they might 
need to do to collect the data 

77
00:04:37,080 --> 00:04:39,720
or what state it might be in. 
And I'm experiencing this right 

78
00:04:39,720 --> 00:04:45,760
now with a very key client and 
expectations, I tell you, are 

79
00:04:45,760 --> 00:04:47,800
all over the place. 
OK. 

80
00:04:47,800 --> 00:04:51,160
And people don't realize we need
to invest either money, time, 

81
00:04:52,280 --> 00:04:57,840
process, change engagement in 
order to enrich your data #3 is 

82
00:04:57,840 --> 00:05:00,440
important. 
And this is changing the game. 

83
00:05:00,440 --> 00:05:05,320
So even if you are maybe a data 
architect, a data analyst 

84
00:05:06,320 --> 00:05:09,760
working in a more traditional 
environment, data warehousing 

85
00:05:09,760 --> 00:05:13,360
environment, you need to know #3
which is data fabrics, OK? 

86
00:05:13,360 --> 00:05:16,840
And they're changing the game. 
So it's weaving data through 

87
00:05:16,840 --> 00:05:19,480
pipelines. 
Traditionally data was managed 

88
00:05:19,480 --> 00:05:21,840
in silos. 
OK, So think of different 

89
00:05:22,360 --> 00:05:27,760
blocks, maybe one block per data
management life cycle step or 

90
00:05:27,760 --> 00:05:30,920
per application or per use case.
So maybe you've got data in 

91
00:05:30,920 --> 00:05:35,080
Salesforce, maybe you've got 
some data in a data warehouse, 

92
00:05:35,320 --> 00:05:38,920
maybe you've got data in CRM 
Dynamics 365, maybe you've got 

93
00:05:38,920 --> 00:05:41,800
it in SQL databases, maybe 
you've got it in spreadsheets, 

94
00:05:42,080 --> 00:05:43,720
maybe you have it in survey 
forms. 

95
00:05:43,840 --> 00:05:46,560
This is a typical organization, 
right? 

96
00:05:46,560 --> 00:05:49,400
And so there there is a way that
we structure what we call that 

97
00:05:49,400 --> 00:05:55,080
subject area data or entity data
to have a bit of an idea about 

98
00:05:55,080 --> 00:05:58,120
where we should store things and
why we use applications and and 

99
00:05:58,120 --> 00:06:00,120
whatnot. 
And that's leading the way in 

100
00:06:00,120 --> 00:06:03,480
terms of application design. 
But we also need to be aware 

101
00:06:03,480 --> 00:06:08,960
that there is simply very few 
places that are able to work in 

102
00:06:08,960 --> 00:06:13,720
one monolithic system like AERP,
SAP for example, and that be 

103
00:06:13,720 --> 00:06:16,040
their only system. 
A lot of people move to that, 

104
00:06:16,040 --> 00:06:17,840
but that has it's own 
constraints. 

105
00:06:18,160 --> 00:06:22,240
And so there's this acceptance 
that we're always going to, 

106
00:06:22,440 --> 00:06:26,560
we're not always be in control 
of I guess the ecosystem of our 

107
00:06:26,560 --> 00:06:29,560
data. 
So what our customers use, what 

108
00:06:29,560 --> 00:06:35,640
our data consumers want to use, 
the technical landscape that we 

109
00:06:36,200 --> 00:06:38,320
are exposed to. 
So we need to be able to connect

110
00:06:38,600 --> 00:06:42,160
to this environment. 
And so therefore you need to use

111
00:06:42,160 --> 00:06:45,320
a data fabric to do that. 
And that's a new approach that 

112
00:06:45,320 --> 00:06:48,920
integrates all data across 
systems into a unified 

113
00:06:48,920 --> 00:06:50,400
architecture. 
The architecture is still 

114
00:06:50,400 --> 00:06:53,880
unified at the high level, both 
business and technical and it 

115
00:06:53,880 --> 00:06:57,440
allows probably real time access
and better analytics. 

116
00:06:57,440 --> 00:07:01,880
Now I've said probably real time
access, real time actually cost 

117
00:07:01,880 --> 00:07:03,720
money. 
And when we say real time, it 

118
00:07:03,720 --> 00:07:05,840
could be a in a day. 
It's we're not talking about 

119
00:07:05,840 --> 00:07:08,440
microseconds here. 
And things do take a little 

120
00:07:08,440 --> 00:07:10,520
while to process through if you 
want them to be in a right 

121
00:07:10,520 --> 00:07:12,320
state. 
So when we say real time, just 

122
00:07:12,320 --> 00:07:15,920
be careful with that term as it 
be at now. 

123
00:07:15,920 --> 00:07:19,920
This is something that Beas need
to advocate for when discussing 

124
00:07:19,920 --> 00:07:22,640
modern data strategies. 
And I am currently writing a 

125
00:07:22,720 --> 00:07:28,440
paper about this to accept multi
cloud, maybe on premise on cloud

126
00:07:28,440 --> 00:07:32,800
solutions, transitional states. 
And we need to really think 

127
00:07:32,800 --> 00:07:35,920
about data fabrics as a solution
there as opposed to 

128
00:07:36,200 --> 00:07:42,720
consolidation #4 and we've 
touched on this, data fabrics. 

129
00:07:42,720 --> 00:07:44,480
OK. 
So the thing about data fabrics,

130
00:07:44,720 --> 00:07:47,800
there is a product called 
Fabric, Microsoft Fabric, which 

131
00:07:47,800 --> 00:07:51,080
we'll get to in a minute. 
But it isn't the only product 

132
00:07:51,080 --> 00:07:55,200
out there, but #4 is data 
pipelines. 

133
00:07:55,200 --> 00:07:56,520
OK. 
So think about these as 

134
00:07:56,520 --> 00:08:00,000
pipelines in your house where 
you need water to go. 

135
00:08:00,000 --> 00:08:03,280
That's a good analogy. 
And it connecting to the main 

136
00:08:03,280 --> 00:08:07,440
pipe, which is also connected to
other infrastructure that 

137
00:08:07,640 --> 00:08:11,600
provides water to your house. 
Now these pipelines are like 

138
00:08:11,720 --> 00:08:14,600
factory assembling lines, right?
So you could think about them as

139
00:08:14,600 --> 00:08:17,720
a factory along a factory or 
water going through, being 

140
00:08:17,760 --> 00:08:22,040
routed through a pipe. 
And maybe it changes from fresh 

141
00:08:22,040 --> 00:08:26,120
water to dirty water to hot 
water. 

142
00:08:26,880 --> 00:08:28,880
Data moves through various 
stages, OK? 

143
00:08:29,320 --> 00:08:35,559
And we need to extract, 
effectively extract data. 

144
00:08:35,720 --> 00:08:40,320
And there's a broader kind of 
high level abstraction of the 

145
00:08:40,320 --> 00:08:44,880
data management life cycle 
covers a few steps, which is 

146
00:08:44,880 --> 00:08:47,240
like collection and capturing 
the data and maybe getting it 

147
00:08:47,240 --> 00:08:50,200
into the state you want. 
Then we've got transformation 

148
00:08:50,640 --> 00:08:53,240
and then we've got load. 
And this is an old term and we 

149
00:08:53,240 --> 00:08:58,760
call it ETL or ELT depending on 
which way around you do the the 

150
00:08:58,760 --> 00:09:02,120
loading and the transformation. 
Now understanding how these 

151
00:09:02,120 --> 00:09:04,560
pipelines work. 
So these are the technical 

152
00:09:04,560 --> 00:09:08,360
capabilities needed to meet the 
life cycle we talked about, OK. 

153
00:09:08,400 --> 00:09:12,760
So I'm going to say that again, 
the data flows through the data 

154
00:09:12,760 --> 00:09:15,320
management life circle, OK. 
Now more conceptual, they're 

155
00:09:15,320 --> 00:09:19,160
both business and technical 
capabilities, but under the 

156
00:09:19,160 --> 00:09:22,640
hood, if you like, and the just 
that connects our business layer

157
00:09:22,920 --> 00:09:27,600
down to our technology solution,
we have these steps which are 

158
00:09:27,600 --> 00:09:31,160
broadly now referred to as the 
data management steps. 

159
00:09:31,480 --> 00:09:35,000
And historically, we're talked 
about in terms of ETL and 

160
00:09:35,000 --> 00:09:38,120
understanding how those 
pipelines work, right, in both 

161
00:09:38,120 --> 00:09:41,760
the new world data management 
life cycle or ETL world, which 

162
00:09:41,760 --> 00:09:44,000
are kind of one in the same, 
just different terminology and 

163
00:09:44,000 --> 00:09:47,560
groupings that will help you as 
ABA ensure that data is 

164
00:09:47,560 --> 00:09:52,480
processed correctly and useful 
for decision making, right? 

165
00:09:53,040 --> 00:09:57,040
So I'll give you an example. 
I have collected data in a 

166
00:09:57,040 --> 00:09:59,880
survey. 
I've surveyed all my customers 

167
00:09:59,920 --> 00:10:02,760
about a new product that I have 
launched. 

168
00:10:03,320 --> 00:10:06,920
Now that product might be a web 
product and might be on my 

169
00:10:06,920 --> 00:10:10,640
website and that might be 
integrated with my CRM solution,

170
00:10:10,640 --> 00:10:15,520
which is say HubSpot. 
Now I may have a greater 

171
00:10:15,520 --> 00:10:18,440
architecture than just those 
components, but let's just keep 

172
00:10:18,440 --> 00:10:22,480
it simple here. 
I may have sent out a mail, I 

173
00:10:22,480 --> 00:10:28,320
guess SurveyMonkey, sorry, 
survey, and I've integrated that

174
00:10:28,320 --> 00:10:35,520
with HubSpot and I service maybe
the product I've got on the 

175
00:10:35,520 --> 00:10:37,440
website. 
So when they use the product, 

176
00:10:37,640 --> 00:10:41,440
survey pops up, which happens to
be something different, which is

177
00:10:41,440 --> 00:10:43,480
SurveyMonkey. 
And when they capture the 

178
00:10:43,480 --> 00:10:45,560
feedback, it goes back into 
HubSpot, right? 

179
00:10:45,560 --> 00:10:48,680
So my data moves around. 
Now in that case, we are 

180
00:10:48,680 --> 00:10:50,480
collecting data through 
SurveyMonkey. 

181
00:10:52,160 --> 00:10:55,960
We're actually collecting it 
there and we're capturing it in 

182
00:10:55,960 --> 00:10:59,520
HubSpot. 
We might be transforming it into

183
00:10:59,520 --> 00:11:02,640
HubSpot. 
We may be connecting it with 

184
00:11:02,640 --> 00:11:06,760
other information from, for 
example, the website and the 

185
00:11:06,760 --> 00:11:10,400
product that we're using. 
And then we might be say loading

186
00:11:10,400 --> 00:11:14,680
that into say reporting tables 
and Power BI, for example, out. 

187
00:11:15,000 --> 00:11:17,560
And so that we need to think 
about what is the state of the 

188
00:11:17,560 --> 00:11:19,920
data in all those different 
steps. 

189
00:11:20,440 --> 00:11:24,600
Another way of looking at those 
steps is to look at it in the 

190
00:11:24,600 --> 00:11:29,280
data management life cycle term,
which I prefer, and to think 

191
00:11:29,280 --> 00:11:34,520
about this something called the 
medallion model where we kind of

192
00:11:34,520 --> 00:11:39,880
classify our data in terms of 
bronze, silver and gold in terms

193
00:11:39,880 --> 00:11:42,720
of it's usefulness. 
And so as it moves to the data 

194
00:11:42,720 --> 00:11:45,920
management life cycle and gets 
closer to consumption, it gets 

195
00:11:45,920 --> 00:11:48,040
better. 
And so it's a gold form, OK. 

196
00:11:48,040 --> 00:11:51,040
And that's also another way that
you can look at data in a modern

197
00:11:51,160 --> 00:11:52,720
way. 
So you may hear those terms. 

198
00:11:53,360 --> 00:11:56,320
And that's much better than this
kind of ETL process because it 

199
00:11:56,320 --> 00:12:00,480
doesn't really allow you to know
the quality or it doesn't give A

200
00:12:00,480 --> 00:12:04,120
tag of quality along the way, 
which is the most important for 

201
00:12:04,120 --> 00:12:09,360
most organization #5 is that 
data quality has levels, as we 

202
00:12:09,360 --> 00:12:11,360
just talked about. 
And you can actually look at 

203
00:12:11,360 --> 00:12:15,280
these, not just in this gold 
layer model, medallion model, 

204
00:12:15,600 --> 00:12:19,160
but you can look at these 
through 6 dimensions. 

205
00:12:20,160 --> 00:12:22,160
And poor data leads to poor 
insight. 

206
00:12:22,160 --> 00:12:23,920
So we, we need to be really 
important about that. 

207
00:12:24,080 --> 00:12:27,040
And so one is accuracy. 
How accurate is the data? 

208
00:12:27,400 --> 00:12:32,520
And the trick to making sure 
that it is accurate is to focus 

209
00:12:32,560 --> 00:12:36,280
on its capture. 
So making sure you capture it in

210
00:12:36,280 --> 00:12:38,680
an accurate way with validation,
OK. 

211
00:12:38,680 --> 00:12:40,680
And you don't want to build in a
whole lot of validation checks 

212
00:12:40,680 --> 00:12:42,240
because that might take a long 
time. 

213
00:12:42,840 --> 00:12:47,040
There's completeness. 
So what data do we need from 

214
00:12:47,040 --> 00:12:49,760
different sources to add to the 
picture to know that our 

215
00:12:49,760 --> 00:12:52,760
product, the feedback we've got 
through SurveyMonkey and the 

216
00:12:52,760 --> 00:12:56,200
product itself on the website 
come together to give us a 

217
00:12:57,160 --> 00:13:00,000
complete picture? 
We need it to be consistent. 

218
00:13:00,000 --> 00:13:05,160
So we need to collect it again 
and again and again through 

219
00:13:05,160 --> 00:13:10,040
multiple different time periods,
maybe different customer 

220
00:13:10,040 --> 00:13:14,000
segments in order to compare it.
We also need to factor in 

221
00:13:14,000 --> 00:13:16,360
timeliness. 
So if you've collected data from

222
00:13:16,360 --> 00:13:19,840
last year, you're making a form 
this year, it's just not good 

223
00:13:19,840 --> 00:13:22,160
enough. 
So a lot of the solutions that 

224
00:13:22,160 --> 00:13:26,400
we use traditionally take a long
time to process in a lot of 

225
00:13:26,400 --> 00:13:29,040
effort, time and effort. 
And so we need to use these new 

226
00:13:29,040 --> 00:13:32,760
modern techniques to be able to,
like we said, real time it. 

227
00:13:32,960 --> 00:13:36,280
But what we mean by that is just
get it in a more timely fashion.

228
00:13:36,280 --> 00:13:38,920
So within the period in which 
you need to make the decision. 

229
00:13:38,920 --> 00:13:41,880
So if that's within a day, then 
you need to get it within a day.

230
00:13:41,880 --> 00:13:45,360
If you need it within 1/4, you 
need it within the quarter, OK. 

231
00:13:45,360 --> 00:13:47,520
And that that's the whole 
process of getting it, 

232
00:13:47,520 --> 00:13:52,160
collecting it and capturing it, 
you know, accessing it, 

233
00:13:52,160 --> 00:13:55,040
transforming it, storing it and 
getting it ready for 

234
00:13:55,040 --> 00:13:56,840
consumption. 
So there's quite a lot going on 

235
00:13:56,840 --> 00:13:59,560
there. 
The data needs to be valid 

236
00:14:00,120 --> 00:14:02,320
truth, if you like. 
OK, so we need truth to the 

237
00:14:02,320 --> 00:14:03,920
data. 
It needs to actually be true. 

238
00:14:04,920 --> 00:14:07,200
If you, for example, if you're 
involved in statistics, you'll 

239
00:14:07,200 --> 00:14:13,040
know all about data quality and,
you know, surveys and the margin

240
00:14:13,040 --> 00:14:16,360
there and all the rest of it. 
We can't make decisions based on

241
00:14:16,360 --> 00:14:18,320
a small data set generally, 
right? 

242
00:14:18,360 --> 00:14:20,680
We can make assumptions. 
And so that's why we say big 

243
00:14:20,680 --> 00:14:24,600
data because we need it to be 
valid and to be accurate. 

244
00:14:25,360 --> 00:14:28,560
And the last bit is we need some
uniqueness. 

245
00:14:28,560 --> 00:14:31,320
So what what we mean by that is 
we don't want duplicate 

246
00:14:31,600 --> 00:14:35,400
information coming from 
different sources having 

247
00:14:35,400 --> 00:14:38,440
different versions of the truth.
That's why we talk about single 

248
00:14:38,440 --> 00:14:42,120
source of truth, which is the 
most used word in IT ever. 

249
00:14:42,240 --> 00:14:45,600
What we mean that a single view 
of the truth, not source, OK, 

250
00:14:45,600 --> 00:14:47,080
because you will have multiple 
sources. 

251
00:14:47,080 --> 00:14:50,040
So that term, you can kill that 
term whenever you hear it and 

252
00:14:50,040 --> 00:14:52,880
say that term is old school. 
There are multiple sources. 

253
00:14:53,080 --> 00:14:55,720
What we need to make sure and 
sources are good. 

254
00:14:55,720 --> 00:14:59,120
By the way, we need one view of 
the truth, right? 

255
00:14:59,120 --> 00:15:02,840
No doubt our conversation would 
would be where it needs to be 

256
00:15:02,840 --> 00:15:05,520
without adding a bit of 
boringness to the conversation. 

257
00:15:05,520 --> 00:15:10,080
And that boringness comes into 
two very important areas, which 

258
00:15:10,080 --> 00:15:13,160
some people love, I find boring.
But do you know what? 

259
00:15:13,160 --> 00:15:16,000
I I'll tell you how I get around
not being brought out of my 

260
00:15:16,000 --> 00:15:19,200
brain when I dip into this. 
And that is governance and 

261
00:15:19,200 --> 00:15:21,920
compliance. 
So we have regulations, we have 

262
00:15:21,920 --> 00:15:24,640
government policies, we have 
internal policies. 

263
00:15:25,000 --> 00:15:27,400
And data governance isn't 
optional. 

264
00:15:27,400 --> 00:15:30,760
OK, you have to do it. 
BAS need to work with compliance

265
00:15:30,760 --> 00:15:33,680
teams. 
We need to understand privacy 

266
00:15:33,680 --> 00:15:35,600
acts. 
We need to ensure that data is 

267
00:15:35,600 --> 00:15:39,680
handled responsibly, securely 
and ethics are used. 

268
00:15:39,920 --> 00:15:45,400
Now, if this doesn't blow your 
trumpet like for me, then there 

269
00:15:45,400 --> 00:15:46,840
are so many good models out 
there. 

270
00:15:47,200 --> 00:15:51,440
The trick is don't come up with 
your own look, see what best 

271
00:15:51,440 --> 00:15:54,360
practices and adopt it. 
OK, and then and then if you 

272
00:15:54,360 --> 00:15:58,480
need a massager, you can, but I 
would assume that every, I don't

273
00:15:58,480 --> 00:16:01,000
know education government 
department in the world has very

274
00:16:01,000 --> 00:16:03,720
similar governance across it. 
You need to have internal 

275
00:16:03,720 --> 00:16:06,320
governance turned out over Baker
'cause that's where bureaucracy 

276
00:16:06,320 --> 00:16:12,080
can kill good outcomes. 
So you need to apply data 

277
00:16:12,080 --> 00:16:13,240
governance. 
You can't ignore it. 

278
00:16:13,600 --> 00:16:17,640
And So what I deal with when I 
had something that I don't enjoy

279
00:16:17,640 --> 00:16:20,560
as much, like some vegetables I 
don't like, is you eat them 

280
00:16:20,560 --> 00:16:22,640
first, right? 
Get them done 1st, and then get 

281
00:16:22,640 --> 00:16:27,400
on to the stuff you do enjoy, 
which might be improving 

282
00:16:27,760 --> 00:16:30,560
outcomes through the data you've
got and insights. 

283
00:16:30,560 --> 00:16:36,120
OK, so that's number six. 
And if we move on to #7 we've 

284
00:16:36,120 --> 00:16:39,280
touched on this quite a bit 
lately, and that is AI and 

285
00:16:39,280 --> 00:16:40,480
machine learning. 
OK? 

286
00:16:40,480 --> 00:16:45,680
They're driving insights. 
But it's so important that to 

287
00:16:45,880 --> 00:16:51,240
realise that if you do not have 
great data, your AI and machine 

288
00:16:51,240 --> 00:16:55,480
learning are a waste of time. 
So this is a prerequisite for 

289
00:16:55,480 --> 00:16:59,760
your own, using your own data to
make informed decisions. 

290
00:17:00,640 --> 00:17:03,640
Data is not just about 
spreadsheets and dashboards. 

291
00:17:03,640 --> 00:17:06,720
And yeah, sometimes they're 
really good, but AI powered 

292
00:17:06,720 --> 00:17:09,720
analytics can uncover patterns 
that you can't. 

293
00:17:09,960 --> 00:17:14,079
It can predict trends, it can 
make automated decision making. 

294
00:17:14,319 --> 00:17:18,920
And B as need to understand how 
to interpret those insights, 

295
00:17:19,200 --> 00:17:22,880
communicate them effectively and
explain why maybe based on the 

296
00:17:22,880 --> 00:17:26,640
data that's been inputted into 
this, been consumed by these 

297
00:17:26,640 --> 00:17:31,560
tools, why those insights might 
be different to what were 

298
00:17:31,560 --> 00:17:34,000
expected. 
And that if you've pushed for 

299
00:17:34,000 --> 00:17:38,480
these tools early, when your 
maturity model is low, even 

300
00:17:38,480 --> 00:17:40,640
though you you want to use them,
you want to get these outcomes, 

301
00:17:40,640 --> 00:17:44,560
every CIOCTO in the world is 
pushing for these tools. 

302
00:17:45,120 --> 00:17:48,640
If your data is crappy, your 
insights are going to be crappy 

303
00:17:50,760 --> 00:17:53,200
#8 do. 
This is so important. 

304
00:17:53,520 --> 00:17:57,680
And this is where we need to 
make sure that data is not owned

305
00:17:57,680 --> 00:18:03,240
by digital or IT OK per SE. 
Not an ivory tower exercise here

306
00:18:04,120 --> 00:18:08,680
#8 is self-service analytics OK?
And it empowers teams. 

307
00:18:09,280 --> 00:18:12,400
There is a lot of kit out there 
in the data space. 

308
00:18:12,400 --> 00:18:15,640
There are a lot of tools you 
could use and they really need 

309
00:18:15,640 --> 00:18:18,800
to be selected based on your 
environment. 

310
00:18:18,800 --> 00:18:21,600
You need to choose the right 
tool for the right job and the 

311
00:18:21,600 --> 00:18:25,280
right environment. 
So business unit users don't 

312
00:18:25,280 --> 00:18:27,440
want to wait for IT to generate 
a report anymore. 

313
00:18:28,400 --> 00:18:31,520
But not only that is your 
business users might be data 

314
00:18:31,520 --> 00:18:35,960
analysts, they may require data 
analyst capabilities and maybe 

315
00:18:36,040 --> 00:18:39,240
BAS outside of digital and they 
need access to the data, they 

316
00:18:39,240 --> 00:18:42,680
need access to pipelines, they 
need access to continually 

317
00:18:42,680 --> 00:18:46,480
improve these needed access to 
run their own jobs. 

318
00:18:46,920 --> 00:18:50,720
So what tool are you going to 
use as an interface layer? 

319
00:18:50,880 --> 00:18:53,360
So they don't have to be data 
engineers, but you've set it up 

320
00:18:53,360 --> 00:18:56,840
so they can build on top of that
infrastructure. 

321
00:18:57,080 --> 00:19:01,400
Again, the data fabrics, you 
know, pipelines, visibility, 

322
00:19:03,360 --> 00:19:07,000
modern analytical tools, right? 
They offer self-service 

323
00:19:07,000 --> 00:19:10,400
capabilities, meaning anyone can
access and visualize data. 

324
00:19:10,400 --> 00:19:13,160
So that's starting at the 
analytics end. 

325
00:19:13,320 --> 00:19:17,080
The analytics tools are now 
exposing the data pipeline so 

326
00:19:17,080 --> 00:19:20,160
you can see where the data came 
from and maybe know why you're 

327
00:19:20,160 --> 00:19:21,920
getting the insights you're 
getting. 

328
00:19:22,800 --> 00:19:26,840
BA should help design intuitive 
interfaces and ensure that 

329
00:19:26,840 --> 00:19:31,000
stakeholders get the right 
insights and know why that data 

330
00:19:31,000 --> 00:19:37,880
point is the way it is #9 data. 
Storytelling is a must have 

331
00:19:37,880 --> 00:19:43,400
skill so facts don't drive 
decisions, stories actually do. 

332
00:19:43,560 --> 00:19:50,120
And BAS need to go above and 
beyond charts, numbers and craft

333
00:19:50,120 --> 00:19:53,120
compelling narratives around 
data. 

334
00:19:53,600 --> 00:19:58,200
The collection, the ecosystem, 
the application framework, the 

335
00:19:58,200 --> 00:20:04,760
customer journey to make. 
Insights clear and actionable 

336
00:20:04,840 --> 00:20:11,480
for stakeholders and #10 always 
align data strategy with 

337
00:20:11,480 --> 00:20:14,120
business goals. 
There is no point in having 

338
00:20:14,120 --> 00:20:19,720
great AI pipelines, massage data
if it's not going to be used. 

339
00:20:19,720 --> 00:20:23,000
If we go back to our HubSpot 
example, if we're surveying 

340
00:20:23,000 --> 00:20:25,840
customers on the features they 
enjoyed about our product 

341
00:20:25,840 --> 00:20:29,680
through SurveyMonkey, but we're 
never going to use that to 

342
00:20:29,720 --> 00:20:32,120
actually make a change to our 
product because our product 

343
00:20:32,120 --> 00:20:37,560
strategy doesn't incorporate 
enough, I guess, ad hoc customer

344
00:20:37,560 --> 00:20:40,160
feedback from the website. 
Then don't do it. 

345
00:20:40,160 --> 00:20:43,720
What's what a waste of time. 
At the end of the day, data 

346
00:20:43,720 --> 00:20:47,440
management isn't just about 
technology, it's about business 

347
00:20:47,440 --> 00:20:50,840
value. 
Every data initiative should go 

348
00:20:50,840 --> 00:20:54,720
back to strategic goals or 
investment objectives and a bit 

349
00:20:54,720 --> 00:20:59,720
of business case model and 
explain whether that's reducing 

350
00:20:59,720 --> 00:21:05,000
cost or increasing efficiency or
improving customer experience. 

351
00:21:05,000 --> 00:21:08,000
Why do we need to invest in this
data project? 

352
00:21:08,520 --> 00:21:13,240
I've heard horror stories of IT 
or data teams building data 

353
00:21:13,240 --> 00:21:16,640
products, spending millions 
building data products that no 

354
00:21:16,640 --> 00:21:20,880
one wants to use. 
So what you might find is 2 

355
00:21:20,880 --> 00:21:23,320
things. 
In that case, 1, you haven't 

356
00:21:24,240 --> 00:21:26,720
capture requirements and you're 
not meeting objectives. 

357
00:21:26,720 --> 00:21:30,080
So therefore your user base, 
internal user base or your 

358
00:21:30,080 --> 00:21:34,000
customers are not getting what 
they asked for or what they 

359
00:21:34,000 --> 00:21:36,720
want. 
And there might be another 

360
00:21:36,720 --> 00:21:39,280
thing, another insight that I've
experienced. 

361
00:21:39,680 --> 00:21:42,440
Sometimes people want to fish 
for themselves. 

362
00:21:42,680 --> 00:21:48,320
So in a modern data analytical 
world, we need our users to be 

363
00:21:48,320 --> 00:21:51,600
able to fish on top of these 
tools and a secure, you know, 

364
00:21:51,600 --> 00:21:54,720
pond with fish with the rod that
we give them. 

365
00:21:55,240 --> 00:21:58,160
But it is no longer it's job to 
own data. 

366
00:21:59,040 --> 00:22:00,600
I will see you next week.
