Episode Transcript
Honor Hey, Josh, how’s it going?
Josh It’s going well. Excited to be joining the MAD Data podcast that we have some really exciting guests with Kate and Scott on the line and excited to dig into the conversation. Maybe just just to start. I think folks know me. I’m Josh CEO and co-founder of Databand, helping to deliver proactive data observability to data teams out there in Scott and, up to your quick and from you. If you don’t mind, maybe Kate, you starting out.
Kate Yeah, sure. Thanks for having me on your podcast, guys. Happy to be here. Especially with Scott Taylor, the data whisperer. Such an honor. I’m Kate Strachnyi and the founder of DATAcated, which is essentially focused on all things data related in as far as it goes with community building course development sort of getting the message out there on the importance of data and really bringing together all data professionals. My background didn’t start out in data analytics or data science, but I’ve been in the space for about the last eight years now.
Scott And Scott Taylor here, the data whisperer, yes, I help calm data down, that’s what I do and data whispering thrilled as well to be on the MAD Data podcast and partnering up as we often do with Kate always thrilled to be chatting with her, we bandy about topics on a constant basis. But my background is more on. The data management side worked for some iconic data brands like Nielsen and Dun Bradstreet, WPP, and I’m out there now doing just purely content events, podcasts like this, videos helping people understand the strategic value of proper data management. I work with brands as well as the enterprise side. To assist them to put together a data story for data management, why managing their data is so important to the organization.
Honor Very cool. That’s so awesome. Well, welcome, thanks for being here with us. And just wanted to give folks a bit of background on how we started on this conversation. We were talking previously the idea of first mile data quality and how that is the foundation for data quality that follows that if we want to take it all the way to the last mile and I know Kate, you are a runner. So we were discussing parallels between these things. So maybe we can start with you, Scott. Tell us a little bit about your philosophy around the importance of this first mile reliability.
Scott You can boil my data philosophy down to three words truth before meaning. So I believe you’ve got to determine the truth in data that comes from data management, data governance, data stewardship, master data reference data metadata, MDM, RDM, pm all those foundational activities that enterprises engage in to create, curate and distribute this core foundational data content to the rest of the organization. You got to get that stuff in line. You got to get that data management act together before you spend too much time deriving meaning out of it through analytics, data, science, visualization and all these other wonderful things that are probably more tangible and more visible to the business side. So I like to remind people truth before meaning it’s not chicken or egg. Here it is egg and omelet. If you don’t have the truth in that data in that first mile, I don’t know how you’re going to make it to that last mile.
Honor I love that, and Scott, always love hearing your soundbites and omelet. I will definitely steal that. So Kate, what what are your thoughts around this? The dependency of the last mile of data quality on the first mile. I feel like oftentimes when we talk to folks in the community, we talk about first mile and last mile almost completely separately.
Josh I would just love to hear how you define first mile, last mile also. So what distinguish between those?
Kate Yeah, yeah, I was going to I was going to bring up the when when Honor and I spoke for the first time about being on the podcast, and we talked about sort of how running is a good analogy or running a marathon is a good analogy for the conversation here today, where the first mile is really, you know, the first couple of miles that you’re running to start the marathon. And then the way we see it is, that’s the debate of governance, the data wrangling, the the truth behind, you know, before the meaning type of work. And that’s extremely important. And then we get sort of to the last bit of it, which is data visualization, the storytelling, pulling out the insights and getting the meaning from the data, which we can call sort of that last line of analytics where we get the data into the right hands of the business in time so they can make their decisions that are data driven. And what’s interesting is that I said sort of on that last mile where most of my work and efforts are around taking the properly structured data that it’s been collected the right way with great data quality and sort of massaging that into a data visualization or insights or story that we can bring over to the business. Now, when we talk about this, we do tend to have these conversations separately where we have the data governance team, the data management team doing their thing. And then my expectation as data observer analyst sitting all the way at the other side of the spectrum is that I trust that that data is structured and perfect and in the shape that I need it to be to visualize this data. So going back to the running analogy, if you know, if I trip and break my ankle in the first mile, chances are I’m not going to make it to that last mile. And it’s sort of it’s not. It’s not going to be a good marathon, right? But similarly, if we don’t get that data over that last mile and get it into the right hands of the business, then all the work that we’re doing in the let’s say, first half of the marathon is a sort of four for nothing, right? We were struggling. We’re working hard, we’re putting an effort, we’re spending the time. But if we don’t get it across the finish line, then why are we doing it right?
Scott I kind of wince at the idea of Kate tripping and breaking her ankle
Kate so it will never happen
Scott so well. But that’s a point that, you know, data start somewhere and it ends up somewhere. And I think to kind of build on what some of the stuff that Kate’s talking about, a lot of the attention and tangibility of data is where it ends up. I have a lot of fun talking about where it starts and reminding people it’s got to start in the right way, otherwise it’s not going to end up where you need it to be.
Honor Yeah, for sure, and actually want to point to Josh, because Josh, you kind of represent more of that almost like the extreme first mile, right, because we’re talking about proactive data observability. Not only are we talking about data governance, we’re also talking about the ability to see everything as early as a source, external data sources going into a data process. Can you maybe speak a little bit about that, like literally the the beginning of the first mile journey?
Josh Well, yeah, that’s why I was curious about Ken Scott’s definition for why first and last mile medium? Because a lot of teams there are so many miles and there’s a lot of different ways to cut and divide what exactly we’re talking about when we break up the different parts of the value chain. I think one way of looking at it is discrete ends of the process, like you have your ingestion part, you have what’s sitting around in a data lake somewhere. You have the movement into the warehouse or the analytical layer or the data science, or those might be different miles that you’re going through. Or we could be talking about miles as divided by the roles in the team, the things that the data engineers take care of, which tends to be this upstream stuff. And then the stuff that the data analysts or scientists are building more downstream. So I think just hearing about how folks segment the value chain is interesting. Our philosophy is definitely we take yeah, we take Scott’s suggestion to the extreme that we really want to see data coming in from source locations in the right form, structurally correct, consistent, and that it makes it into this process from the get-go looking healthy and reliable. Now there are definitely cases where even if you have perfect data and from the source, some stuff might get messed up downstream and you want to have tools in place that help you catch those issues as well. But having something in mind at the get go when you’re just beginning that whole from the external API or you just have that data drop in from the outside data provider or pulling from that database somewhere else in the business, being able to know right up front, OK, at least my inputs are OK and I can trust that. And if something breaks now I know where to focus. My attention, we feel, have to be a really good approach and proactive because the faster in the process you can catch the you can catch something the the less time otherwise would be accrued, waiting for it to hit the dashboard, waiting for it, to hit that table in the warehouse where you would otherwise maybe, maybe catch it. So we feel pretty strongly about that approach and starting from the source and kind of working our way. Are we back from there?
Kate Yeah. You know, I’ll add one one quick thing. I know we’re talking through a running analogy here, but one analogy that always comes to mind when we’re discussing sort of the, you know, the source like you mentioned and then going all the way to ingestion is water, rain, water being processed over time, sort of coming in from from the filters underground, wherever you get your water right and then going through all the pipelines in your house and think of data pipelines here. And if you get 30 water, if you clean the water basically at the beginning, it’s a lot easier to to keep it clean and throughout throughout that whole process and then the person drinking the water may, you know, not throw up after. Whereas if the water has been poisoned or dirty, then chances are it’s going to be a lot more difficult for you to maintain that. So I agree just the sooner you can get to it, the better and the easier everybody’s life is going to be.
Josh I actually love the water pipe analogy, not just because it’s also called the pipeline, but. And I will pause on the data van commercials so much in a sec. But one other aspect of our approach that relates to this analogy is we we don’t just plug in at the integration point and see the water coming into the pipeline for the front. We’re also watching the pipeline and we want to see that the structure of the pipeline is. Properly working and the water slowly be passing through and then in the pipe, we also want to see the flow of water that’s actually occurring. So the reason that that’s really powerful is because if we see that there is, you know, we’re looking at the final tank that the water is being delivered to and we see the water level going down. You want to be able to go upstream to the pipe and say, OK, here is the hole, and I can associate this hole that is leaking with a certain time when that pipeline ran or the data is coming in and how the structure of the pipeline actually works. So being able to just tease through those different levels of the stack, I think, is also fundamental to how we view the world.
Scott All this water talk is making me thirsty as well.
Scott the climate first mile as even earlier round at an enterprise or an external data provider. You know, what are they covering? Why are they tracking what they’re tracking? What is the, as I said, kind of the master data reference data metadata requirements of this enterprise? Did they define customer the right way? Do they have a common hierarchy? Do they have taxonomies that are at least, you know, leverage what the business is trying to do? Do they have the right geographies? Do they have lots of duplicates? Did they search before they created a customer in those processes from external data providers? They all do that work to create that content. But does that content fit into the structure that that enterprise needs to, you know, that they use for integration and interoperability and so on? So maybe it’s training before the first mile, but I like literally starting on day two, cataloging business, glossaries, all that kind of really, really basic stuff.
Josh It’s interesting because in our last episode, we had Johannes from Komodo Health and we were talking with him about the accolades or service contracts that they have in place with their upstream external data providers. And what happens if they catch an issue coming from one of those sources? What kind of bargaining power do they have to make them improve it? And what does that process look like? I’m curious on that point. As you travel further and further upstream, have you seen any creative or do you see an emerging standard in how different data providers are guaranteeing the quality of data that they’re sending into an organization and be seeing any patterns there yet
Scott know the pattern is there’s no pattern that I’ve seen. So my whole career in business when I wasn’t working for myself has been with external data providers. And again, the large ones that are kind of iconic, that’s Nielsen and Kantar and so on. Their mission is. To create Syndicate a bull data content that entire markets can use, so they’re really focused on making sure they get it as right as possible. This blossoming in kind of like Cambrian age we’re going through here of an explosion of external data providers, whether it’s alternate data, third party data, whatever their call data marketplaces, I think creates a whole. Kind of environment where the the inability to kind of measure and match and understand one source versus another is going to be a continuing challenge. And I I I have just poetically some issues with even the word quality because it’s so emotional. It’s so subjective. Everybody’s coming in. You know, all data has quality. All right, it’s either good grade or sucks. But obviously the external providers are all going to talk about their quality, quality, quality. But that quality is going to be the relevance of that quality, whether that quality really meets the business objective. Are only a couple of decisions people make when looking at external data. I think as important is sometimes even more important is the coverage of that data. Does it cover, as I mentioned, entire market, can that enterprise put that data into play in some sort of operational way instead of just getting a bunch of little attributes for a test? Does that data have a structure that’s an agreeable and, you know, those sorts of things, but invariably people would ask us, said Nielsen. And at the end, you know, what’s your quality level? And it’s always like, you know, best available ninety nine point nine nine percent on somebody finds those two records in Bulgaria that they don’t like or the way they drove by the closed. And so you get into this unending rabbit hole of discussion about what quality means. And so I try and even take the quality conversation and put it aside for a minute. Not that it’s not that it’s not important to have great quality. But if that’s the lead and that’s all you’re looking at, ironically, you’re not going to get what you what you really want.
Honor Do you mean it’s because it’s too vague or too broad that we won’t be able to make it actionable unless we put we placed more specific definitions around it.
Scott I kind of feel the opposite. I mean, I feel that that I agree with the first part. Yeah, because it’s vague and it’s broad and it’s subjective and it’s emotional and everybody’s got an opinion and it’s always relative. Now I understand, you know, let’s talk about the value of this data and some of it’s just in the words you use. You know, the value of this data is really important to your organization because it aligns with the business objectives. It covers your business universe. It’s fully integrated bowl, and it’s updated at this cadence that’s important to what you’re trying to do. We have the right kind of lays in place in terms of fixing or change, you know, correcting or filling in gaps that you might find because it’s always a moving target when you come from a syndicated data provider. But there’s a confidence from the provider that needs to be established so that enterprise can trust. That source, so as an example, when I was at Nielsen, we had a thirty thousand records supermarket database. That doesn’t sound a lot like a lot when Deb is talking about 400 million records. But those thirty thousand records represented a hundred percent of the supermarket universe in the United States. That’s the number we talked about. So if you talk about coverage, it’s like it’s not about no records, it’s about percentage coverage of your business. That number can go up and down. That could change different trade channels, but we at least provided a universal definition that people could benchmark their activity off of. And that’s an extremely powerful position to be in, and it offers a tremendous amount of value to those enterprises who are trying to figure some of that stuff out. So, you know, long winded way of just sort of saying holidays a thing, but it’s not the only thing. And there’s a lot of other vectors you’ve got to look at on external data providers and even your own data that you’re managing before you just throw it all in on quality.
Kate Yeah. Scott, I just wanted to add, I really love the idea of potentially changing it from being called the quality of data to the value of data, because that would also encourage the business and everybody else in an enterprise to actually care. Because once you want to talk about data quality, everyone’s like, you know, OK, good night, I don’t want to be here. This sounds like we’re going to be doing data entry all night and cleaning stuff up. But once you talk about let’s improve the value of our data, I think that sets off different alerts and alarms in people’s brains, at least for me at work. So my like, oh yeah, the value of data, it completely makes sense. Hmm. I haven’t. I haven’t. I haven’t heard you say this before, so this is surprising new stuff.
Scott Imagine we’ve been on like a hundred interviews together. I’m still coming up with new stuff. I never
Josh had data. We get it out of focus. But I was curious how you because in my head, when I hear that I have a distinction between what I would consider a valuable data to consider high quality data set, and I’m trying to reconcile that and maybe collapse that difference as we talk through it. But do you have a do you have a sense of separation between a feed of data that the business depends on, like it’s really valuable to the business that you know, the company’s product is built on the ratings going into Nielsen, that they can’t run their business. Without that, it’s really valuable, right? At the same time, it can be very unreliable. It can be low quality in the in the measurable sense. It can have a lot of inconsistencies from Europe. Missing values all the time relate. How do you distinguish between those, I guess more conventional ways of looking at what’s valuable and then what’s quality?
Scott You know, those things that you’re talking about, you know, latency, dependability, these are all things that people have to look at when they’re evaluating third party data providers and actually evaluating their own internal process to sell these large branded data suppliers. They know that they know Nielsen is a great place to have grown up in the data space because they take data. So it’s what they make. You know, a lot of people like to bandy about in the in the market. You know, every company is a data company. Now they’re not, you know, unless you make data, you’re not a data company. Sorry to pop. You know, some folks bubbles, but they they take a lot of pride and understand really their place in the ecosystem. And sometimes some of them get a little ahead of themselves in terms of licensing and how they try and get lock in. And that’s all the business model. But just talk about the content that content is it right and isn’t trustworthy? Those companies aren’t going to depend on it, and a really noble goal is to try and become a standard in some sort of vertical or market place for something that is related to the data content you have, whether it’s Nielsen ratings or credit ratings or whatever those kinds of things are, or even just simply, here is the comprehensive universe file of this type of entity, be it locations or brands or consumers, whatever it happens to be.
Josh Interesting. So I’m amazed as you’re talking, I’m reforming definitions in my head, and I’m thinking about if I if I bought a car and the car didn’t turn on and it didn’t drive down the street, I wouldn’t say, Wow, what a valuable car. That’s terrible quality. I would say what a terrible investment this one’s and low value it is. So maybe the distinction that I’m really thinking about is the potential of data about potential energy versus the actual kinetic energy, and that the real value that I’m seeing as we’re working out of the way we think about. Interesting.
Honor I like that idea. I think that’s that’s really interesting. So if we were to just from that exchange, how would we want to maybe rephrase it? Would it make more sense to maybe talk about impact of this data like if like, rather than necessarily like dividing value from quality? Does that make more sense like business impact of data?
Josh I’m so curious how you measure it. Yeah, like, I mean, maybe as a data nerd in me, but how whether it’s called value quality or whatever word we want to put on it. What am I actually demanding of my data providers that goes into the contract that says you get paid every month? What do I expect of them?
Scott You know, I mean, having been one of those vendors. You come in with your value proposition and say, this is what we’re going to do. Now some folks want to make sure you’re going to do every one of those things under certain timing with certain laws in place. But even the ones that didn’t, they had a license to this, to this set of data. And we’re depending based on, you know, what we are presented that it was going to do the work that we suggested or presented that it was going to do. And they said there’s always back and forth if you’re working on that depends on the kind of data, but all the data services that I’ve worked with were always stuff that was going right into somebody’s operational system like, you know, hard line wiring it into their into their ERP systems, their MDM systems, you know, really basic, basic foundational stuff versus what actually most data providers provide, which is some sort of analytical enrichment metric score. You know, a nice talk about I was like rows and columns, so people are really good at adding columns. It’s really hard to align rows. I always been in the row business. You know, the columns are about the rows. There’s lots of people who sell columns. There’s only a few folks who will sell the rows. So very philosophically here, almost, you know, symbolically and. If you don’t know what rows you have, then the columns are worthless, if you do know what road you have and you only have a certain amount of columns, you can go to somebody and say, here’s a bunch of roads. Add more of these columns of whatever that indicate or attribute whatever that thing happens to be. Feature engineering on the instances. Every time I come to it, it’s always about either, you know? Comes back to me as rows and columns, which is also really easy way to explain it to senior business leadership because. They look at tables. They look at reports, they’re not looking at graph databases and those chrysanthemum things that you can kind of move around that we all get dazzled by. They’re just like, you know, red means stop, green means go. Make it simple. Make sure it’s right and.
Kate Scott just wanted to add, so I didn’t I don’t have the the same background as you in terms of working, as you know, for a vendor of a data provider. But my assumption would be as a as a receiver of data, I would still expect to see some sort of metrics around data accuracy. I’d have some level of expectations around completeness and making sure that the data, shall I say, quality is good. I know sort of going back and forth on the value slash quality here. I think it’s still a little up in the air in terms of which, which term fits the best. Maybe both, right? But we still want to make sure that the data that we are receiving that that water, that’s the that’s flowing from the new pipeline that we’re adding to our main water pipelines is not going to mess everything up for us and make sure that we can actually fit all that in there.
Scott Yeah, yeah, those are all valid requests. Those are all kinds of things that, you know, I was certainly part of or providing, you know, no records, no attributes with the fill rate. What’s the frequency update rate? All those kind of things go into the bigger idea of quality. The reason I really pick on quality is because enterprises, the folks I always talk to, they’re trying to sell in data management programs. And what I counsel them with is if you go to your CEO and your board and talk about how you want to improve data quality, you’re not going to get funded. And a lot of the pundits in the data management space have been talking about how quality is so important for decades, and we’re still not getting the respect and the funding and engagement we need. So I take a step back and go quality is a pitch. It ain’t working. It’s not landing right. People are getting change. The change, the pitch.
Honor What pitch do you recommend that would would get the funding?
Scott You know, start with the reasons why managing data is of strategic importance to your enterprise. Show why it will enable your strategic intentions, why managing data and actually analyzing data, both truth and meaning are going to help the company get to where it needs to go and where it needs to go isn’t better. Data quality where it needs to go isn’t better. Feature engineering. It isn’t. Should we use SQL or no SQL or Python or all these other things? Frankly, you can all respect. It isn’t. We need better data observability. It’s. One of three buckets of value, we’ve got to grow the business, we got to improve the business. We’ve got to protect the business. So grow might be increased. Sales improve might be operational. Efficiency protect might be mitigate risk. The beauty of the space we are all in is data can do all three of those things, sometimes with the same record if it’s stewarded and govern properly. And there isn’t another department in an enterprise that can claim they can help grow, improve and protect the business at the same time with the same effort. And so I find that as a really exciting part of the opportunity. But as folks say, often you don’t align it to the business objectives and you’ve got a winner. Talk about the process and the how and the technology, and you’re going to have really small audience. .
I just wanted to share a quick story of relevant to what Scott just said about educating and sort of telling the entire enterprise the importance of proper data quality. So we were working with the CRM tool, a very well-known CRM tool, and we were using it to log our pipeline or, you know, keeping track of what clients were working with. This is a very large organization, and the people entering data into the CRM tool didn’t really understand the point of Why do I have to fill out these 89 specific fields every time we get a new client or have a new conversation? So they sort of went about it in the quickest way possible where, you know, just filling in the required fields or typing in a is The BFG, you know, whatever it takes in first, just so they can click complete and sort of go on with their day. And that resulted in extremely poor data quality, which in the end was actually feeding a very important piece of analytics. It was going into a dashboard that was informing the CEO of the company what he should do going forward, write his revenue, planning his his cost management. So it was very, very important. And I think, you know, the company undertook an initiative to educate the individuals on the importance of data quality and sort of why their job matters and why they really need to take the time and input the right data. But unfortunately, that didn’t really change much. So I know it’s important to educate. But in the end, Scott will tell you it’s all about the people, right? We actually have to train the people and really, really get it into their head as to why this is important and maybe spend some more time training them on the how how do they actually get some of the data points that maybe they didn’t have easy access to? So they sort of put in whatever or selected any random thing from the dropdown menu that was in the CRM tool? So, yeah, goes all full circle. The data is important, but if the people are not on board with the importance of data quality, it’s going to be hard to get anywhere.
Honor That’s a really interesting story, it actually makes me think of some of Databand clients where some of their sources are traditionally just low tech like CSV sources. And so it was really hard to catch these errors or do anything about it, to your point, other than using a tool that would allow teams to see it coming. So I mean, I don’t know that that necessarily addresses the problem, but it buys us a little bit of time. Josh will probably speak a little bit more to that. He worked a lot more closely with those folks.
Josh Well, I think in general, being able to see those problems coming in and attribute them to a point in time is a big factor, right? So if you have that data pool happening all the time from the CRM, and that might be a routine behavior where folks aren’t like adding data and properly, or it might be something that you just you change up the sales organs. People are learning the system, being able to see from sale, you know, from Salesforce that data set that we’re pulling from that API and we’re going from. Usually I expect this level of completeness in the data set and seeing that all of a sudden go to 20 percent complete as what was usually at 80 and being able to attribute that to a particular point in time of an actual data ingestion from that. So you can go in and say you can respond quickly, Hey, we need to get this data in before it’s two weeks later. People are looking at the dashboards and notice, Oh, it looks like we have lower sales in New York than we normally do. That’s unusual because nobody’s inputting the geography. So my main comment on it, which just being able to catch those issues over a early and being able to to attribute the start of the problem on the right timescale is how we’ve seen companies get out of that kind of hole. And I think what’s interesting about the example also is that it’s another case where the data is telling you about a problem. It’s not it’s not solving a problem for you, but the problem is the organization, the organizational change that you need to make in order to incentivize those folks to get the data in the right way. But at least knowing where you stand, that’s a good place to start. Yeah.
Scott One of my favorite examples of that is when you look at somebody’s category and you look at a pie chart of their categories of their business and there’s a big slice called other. Hmm. Lamia, all those things are just rampant now. And, you know, and then the marketing department will say we have a channel driven marketing strategy and they’re working off of data that this is other. So those things need to get fixed. It’s a challenge. It definitely goes across the organization. But that’s where you start to see some of the symptoms of the lack of data governance, the lack of data management.
Kate Yeah, it’s interesting when I’m filling out a form form that doesn’t matter too much, if I see other, I’m clicking it. It could be anything if I if I see a drop down to 10 options and one of them is other, I’m like, OK, other
Honor let’s just say,
Kate Hey, it’s like, let’s keep going. So it’s the curse of the other. I get it.
Honor So just to recap really quickly, I think we touched on a few areas. I know that we started off talking about the dependency, the last mile on the first mile and then we went into a direction of Scott pointed out that that really it’s not so much about quality. It’s more about like what pieces of matter, right? We’re talking about truth before, meaning then we’re really asking truth that actually matters to the business. Paraphrasing there and then Kate, you pointed out that it’s all about people management, as well as managing training and ensuring that everyone is on the same page and recognizing the importance of data quality, rather than just selling data quality as a concept which is probably going to be meaningless to certain teams. And then, Josh, you mentioned that the value of catching things early in the data process. So if we were to really sum up like what would be a good recommendation for teams to turn data quality from a an abstract and abstract theory into practice? What’s the best first step to take? And I’ll go around the room and I’ll start with Scott.
Scott This first step, I think, is always to just sort of assess where you are, what are the critical domains you need to cover and how well are those being managed. All those critical domains and I mean them in the master data sense, not the data mesh sense. Are, you know, types of data, those are always the relationship data you have customers, vendors, partners, prospects, citizens, patients, clients, whatever you call those things. And then the brand data, you have products, services, offerings, banners. However, you bring value to the relationships you’ve got through those brands and making sure that stuff is as right as possible because that is truly the foundation. Those are truly the the nuts and bolts of every other kind of analytics propensity to do whatever. Machine learning, A.I., all those other things so that that that stuff’s going to be right first. And those are again, common domains that every organization has, you know, have relationships. You don’t have a business, you don’t have brands, you don’t have anything to sell to those relationships. So that’s really, you know, the basis of being a company or are around those things, relationships and brands.
Honor Kate, what about you? What’s a good first step that you would recommend?
Kate Yeah. You know, if we keep talking from the perspective of getting people to care and getting people trained up on proper data quality and input, I think a good first place would be is to show everybody where this ends up right. Sort of going back to the water analogy, show them the human, drinking the water and asking them, You really want to poison this, you know, grandmother or somebody, some some some person people might care about and, you know, showing them maybe the dashboard that is seen by the CEO who’s using this to make their decisions and sort of show them where, where their data entry is impacting the end product? And then also, you know, to to Josh’s point, if we can go back and find the point in time when this data entry mistake or lack of care was made, we can likely also identify the individual that made the mistake or input data entry error and sort of hold them accountable in both ways in in positive and negative right. If if you see somebody consistently skipping fields or sort of rushing through the process of any sort of data ingestion into into the into the company sort of pointed out and say, Hey, you made this mistake 10 times in the past week, maybe you need some more training or something else, but also recognizing the individuals that are bringing in the good quality data and maybe giving them some sort of recognition as well.
Honor Josh, what’s your recommended first step?
Josh My recommendation is to inventory the problems that you’re facing so that you can prioritize your attention on what to fix first and build out a solid plan for how you want to drive change. I think we see I would say we see two kinds of we see two kinds of initiatives ones that are driven from impulse. That observability seems important and we want a tool for it. And the other type of of the other type of approach that we see is. We’re struggling with this issue. We noticed that one of our sources is not sending us data at the right time, so we may also have this this other issue of schema changes coming from this location that’s taking us down. My analyst has raised several times that the dashboards that they’re seeing are not accurate and that relates to so-and-so getting some inventory of what are the big issues that are really causing pain in the organization so that you can identify the right areas of investment to actually solve it. That would be my my major recommendation on how to get started with data quality that needs to meet the organizational change at some point and get buy in from the folks at top that might not be in the weeds quite as much. But just like we’re talking about the entire podcast and collect data about what’s hurting the company and build a strong case for the kinds of tools or approaches or changes that are can help you fix it.
Honor All right. And before we wrap, Scott, you have a book about data governance and data management. Can you show us and maybe give us a quick commercial?
Scott I just I just happened to have a copy of the book right here. Funny, you should mention that telling your data story. 10 story time for data management. Ninety nine percent buzzword free. I didn’t want to over promise. So keep it at that level. But it’s my take on how to sell in data management in an organization and looking at the space, I realize data storytelling as a discipline and as important as it is, is focused primarily on that last mile. It’s focused on taking data, putting in a business context, driving business action from it. How do we visualize it? How do we help the folks who are going to use it? Put it in play. Super important, but where’s the data story about making the data about that first mile? And so that was the endeavor I undertook to put this book together. That’s why the subhead is data storytelling. But for data management and every organization needs both. It’s not like Sophie’s choice here. You got to pick they. They all need both every enterprise. But I’m trying to help the data management community kind of get their voice out there to get a seat at the table to make sure they get the proper funding and support and so on.
Honor Very cool. And Kate, tell folks a little bit more about dedicated and who the community is for.
Kate Yeah, happy to. So the data community, we’re calling it the data cated circle is really for all data professionals. So including students, data analysts, data scientists, folks who are focused on data governance, data management, data engineering. It’s it’s a very broad spectrum of data professionals that I would like to bring together. And yeah, I mean, this is what we’re trying to cover it all. We have discussions. We have a book club, we have monthly live sessions. And interestingly, Scott Taylor was our very first speaker at the WHO indicated Spotlight for December. So we have sessions booked out for the rest of the year that folks can join and network, and it’s really like a LinkedIn for data professionals. Love it.
Scott That’s awesome. And I remember number one, if you don’t mind me, mention it was member
Honor number one
Scott literally was I visit your number
Josh one
Kate was the very first number. We have a screenshot of the texture
Josh and I joined now. When a member will be,
Kate you will be probably thirteen hundred or fourteen hundred something
Honor good. Yeah, you better snag your spot, Josh..
Scott Kate Moss, she makes it happen.
Honor Well, thank you so much for coming on. This was such a fun conversation, and I’m sure we’re going to continue this discussion. Thank you again. Take care. Bye bye.
Kate Thank you.
Josh Same, next time.
Honor Hey, Josh, how’s it going?
Josh It’s going well. Excited to be joining the MAD Data podcast that we have some really exciting guests with Kate and Scott on the line and excited to dig into the conversation. Maybe just just to start. I think folks know me. I’m Josh CEO and co-founder of Databand, helping to deliver proactive data observability to data teams out there in Scott and, up to your quick and from you. If you don’t mind, maybe Kate, you starting out.
Kate Yeah, sure. Thanks for having me on your podcast, guys. Happy to be here. Especially with Scott Taylor, the data whisperer. Such an honor. I’m Kate Strachnyi and the founder of DATAcated, which is essentially focused on all things data related in as far as it goes with community building course development sort of getting the message out there on the importance of data and really bringing together all data professionals. My background didn’t start out in data analytics or data science, but I’ve been in the space for about the last eight years now.
Scott And Scott Taylor here, the data whisperer, yes, I help calm data down, that’s what I do and data whispering thrilled as well to be on the MAD Data podcast and partnering up as we often do with Kate always thrilled to be chatting with her, we bandy about topics on a constant basis. But my background is more on. The data management side worked for some iconic data brands like Nielsen and Dun Bradstreet, WPP, and I’m out there now doing just purely content events, podcasts like this, videos helping people understand the strategic value of proper data management. I work with brands as well as the enterprise side. To assist them to put together a data story for data management, why managing their data is so important to the organization.
Honor Very cool. That’s so awesome. Well, welcome, thanks for being here with us. And just wanted to give folks a bit of background on how we started on this conversation. We were talking previously the idea of first mile data quality and how that is the foundation for data quality that follows that if we want to take it all the way to the last mile and I know Kate, you are a runner. So we were discussing parallels between these things. So maybe we can start with you, Scott. Tell us a little bit about your philosophy around the importance of this first mile reliability.
Scott You can boil my data philosophy down to three words truth before meaning. So I believe you’ve got to determine the truth in data that comes from data management, data governance, data stewardship, master data reference data metadata, MDM, RDM, pm all those foundational activities that enterprises engage in to create, curate and distribute this core foundational data content to the rest of the organization. You got to get that stuff in line. You got to get that data management act together before you spend too much time deriving meaning out of it through analytics, data, science, visualization and all these other wonderful things that are probably more tangible and more visible to the business side. So I like to remind people truth before meaning it’s not chicken or egg. Here it is egg and omelet. If you don’t have the truth in that data in that first mile, I don’t know how you’re going to make it to that last mile.
Honor I love that, and Scott, always love hearing your soundbites and omelet. I will definitely steal that. So Kate, what what are your thoughts around this? The dependency of the last mile of data quality on the first mile. I feel like oftentimes when we talk to folks in the community, we talk about first mile and last mile almost completely separately.
Josh I would just love to hear how you define first mile, last mile also. So what distinguish between those?
Kate Yeah, yeah, I was going to I was going to bring up the when when Honor and I spoke for the first time about being on the podcast, and we talked about sort of how running is a good analogy or running a marathon is a good analogy for the conversation here today, where the first mile is really, you know, the first couple of miles that you’re running to start the marathon. And then the way we see it is, that’s the debate of governance, the data wrangling, the the truth behind, you know, before the meaning type of work. And that’s extremely important. And then we get sort of to the last bit of it, which is data visualization, the storytelling, pulling out the insights and getting the meaning from the data, which we can call sort of that last line of analytics where we get the data into the right hands of the business in time so they can make their decisions that are data driven. And what’s interesting is that I said sort of on that last mile where most of my work and efforts are around taking the properly structured data that it’s been collected the right way with great data quality and sort of massaging that into a data visualization or insights or story that we can bring over to the business. Now, when we talk about this, we do tend to have these conversations separately where we have the data governance team, the data management team doing their thing. And then my expectation as data observer analyst sitting all the way at the other side of the spectrum is that I trust that that data is structured and perfect and in the shape that I need it to be to visualize this data. So going back to the running analogy, if you know, if I trip and break my ankle in the first mile, chances are I’m not going to make it to that last mile. And it’s sort of it’s not. It’s not going to be a good marathon, right? But similarly, if we don’t get that data over that last mile and get it into the right hands of the business, then all the work that we’re doing in the let’s say, first half of the marathon is a sort of four for nothing, right? We were struggling. We’re working hard, we’re putting an effort, we’re spending the time. But if we don’t get it across the finish line, then why are we doing it right?
Scott I kind of wince at the idea of Kate tripping and breaking her ankle
Kate so it will never happen
Scott so well. But that’s a point that, you know, data start somewhere and it ends up somewhere. And I think to kind of build on what some of the stuff that Kate’s talking about, a lot of the attention and tangibility of data is where it ends up. I have a lot of fun talking about where it starts and reminding people it’s got to start in the right way, otherwise it’s not going to end up where you need it to be.
Honor Yeah, for sure, and actually want to point to Josh, because Josh, you kind of represent more of that almost like the extreme first mile, right, because we’re talking about proactive data observability. Not only are we talking about data governance, we’re also talking about the ability to see everything as early as a source, external data sources going into a data process. Can you maybe speak a little bit about that, like literally the the beginning of the first mile journey?
Josh Well, yeah, that’s why I was curious about Ken Scott’s definition for why first and last mile medium? Because a lot of teams there are so many miles and there’s a lot of different ways to cut and divide what exactly we’re talking about when we break up the different parts of the value chain. I think one way of looking at it is discrete ends of the process, like you have your ingestion part, you have what’s sitting around in a data lake somewhere. You have the movement into the warehouse or the analytical layer or the data science, or those might be different miles that you’re going through. Or we could be talking about miles as divided by the roles in the team, the things that the data engineers take care of, which tends to be this upstream stuff. And then the stuff that the data analysts or scientists are building more downstream. So I think just hearing about how folks segment the value chain is interesting. Our philosophy is definitely we take yeah, we take Scott’s suggestion to the extreme that we really want to see data coming in from source locations in the right form, structurally correct, consistent, and that it makes it into this process from the get-go looking healthy and reliable. Now there are definitely cases where even if you have perfect data and from the source, some stuff might get messed up downstream and you want to have tools in place that help you catch those issues as well. But having something in mind at the get go when you’re just beginning that whole from the external API or you just have that data drop in from the outside data provider or pulling from that database somewhere else in the business, being able to know right up front, OK, at least my inputs are OK and I can trust that. And if something breaks now I know where to focus. My attention, we feel, have to be a really good approach and proactive because the faster in the process you can catch the you can catch something the the less time otherwise would be accrued, waiting for it to hit the dashboard, waiting for it, to hit that table in the warehouse where you would otherwise maybe, maybe catch it. So we feel pretty strongly about that approach and starting from the source and kind of working our way. Are we back from there?
Kate Yeah. You know, I’ll add one one quick thing. I know we’re talking through a running analogy here, but one analogy that always comes to mind when we’re discussing sort of the, you know, the source like you mentioned and then going all the way to ingestion is water, rain, water being processed over time, sort of coming in from from the filters underground, wherever you get your water right and then going through all the pipelines in your house and think of data pipelines here. And if you get 30 water, if you clean the water basically at the beginning, it’s a lot easier to to keep it clean and throughout throughout that whole process and then the person drinking the water may, you know, not throw up after. Whereas if the water has been poisoned or dirty, then chances are it’s going to be a lot more difficult for you to maintain that. So I agree just the sooner you can get to it, the better and the easier everybody’s life is going to be.
Josh I actually love the water pipe analogy, not just because it’s also called the pipeline, but. And I will pause on the data van commercials so much in a sec. But one other aspect of our approach that relates to this analogy is we we don’t just plug in at the integration point and see the water coming into the pipeline for the front. We’re also watching the pipeline and we want to see that the structure of the pipeline is. Properly working and the water slowly be passing through and then in the pipe, we also want to see the flow of water that’s actually occurring. So the reason that that’s really powerful is because if we see that there is, you know, we’re looking at the final tank that the water is being delivered to and we see the water level going down. You want to be able to go upstream to the pipe and say, OK, here is the hole, and I can associate this hole that is leaking with a certain time when that pipeline ran or the data is coming in and how the structure of the pipeline actually works. So being able to just tease through those different levels of the stack, I think, is also fundamental to how we view the world.
Scott All this water talk is making me thirsty as well.
Scott the climate first mile as even earlier round at an enterprise or an external data provider. You know, what are they covering? Why are they tracking what they’re tracking? What is the, as I said, kind of the master data reference data metadata requirements of this enterprise? Did they define customer the right way? Do they have a common hierarchy? Do they have taxonomies that are at least, you know, leverage what the business is trying to do? Do they have the right geographies? Do they have lots of duplicates? Did they search before they created a customer in those processes from external data providers? They all do that work to create that content. But does that content fit into the structure that that enterprise needs to, you know, that they use for integration and interoperability and so on? So maybe it’s training before the first mile, but I like literally starting on day two, cataloging business, glossaries, all that kind of really, really basic stuff.
Josh It’s interesting because in our last episode, we had Johannes from Komodo Health and we were talking with him about the accolades or service contracts that they have in place with their upstream external data providers. And what happens if they catch an issue coming from one of those sources? What kind of bargaining power do they have to make them improve it? And what does that process look like? I’m curious on that point. As you travel further and further upstream, have you seen any creative or do you see an emerging standard in how different data providers are guaranteeing the quality of data that they’re sending into an organization and be seeing any patterns there yet
Scott know the pattern is there’s no pattern that I’ve seen. So my whole career in business when I wasn’t working for myself has been with external data providers. And again, the large ones that are kind of iconic, that’s Nielsen and Kantar and so on. Their mission is. To create Syndicate a bull data content that entire markets can use, so they’re really focused on making sure they get it as right as possible. This blossoming in kind of like Cambrian age we’re going through here of an explosion of external data providers, whether it’s alternate data, third party data, whatever their call data marketplaces, I think creates a whole. Kind of environment where the the inability to kind of measure and match and understand one source versus another is going to be a continuing challenge. And I I I have just poetically some issues with even the word quality because it’s so emotional. It’s so subjective. Everybody’s coming in. You know, all data has quality. All right, it’s either good grade or sucks. But obviously the external providers are all going to talk about their quality, quality, quality. But that quality is going to be the relevance of that quality, whether that quality really meets the business objective. Are only a couple of decisions people make when looking at external data. I think as important is sometimes even more important is the coverage of that data. Does it cover, as I mentioned, entire market, can that enterprise put that data into play in some sort of operational way instead of just getting a bunch of little attributes for a test? Does that data have a structure that’s an agreeable and, you know, those sorts of things, but invariably people would ask us, said Nielsen. And at the end, you know, what’s your quality level? And it’s always like, you know, best available ninety nine point nine nine percent on somebody finds those two records in Bulgaria that they don’t like or the way they drove by the closed. And so you get into this unending rabbit hole of discussion about what quality means. And so I try and even take the quality conversation and put it aside for a minute. Not that it’s not that it’s not important to have great quality. But if that’s the lead and that’s all you’re looking at, ironically, you’re not going to get what you what you really want.
Honor Do you mean it’s because it’s too vague or too broad that we won’t be able to make it actionable unless we put we placed more specific definitions around it.
Scott I kind of feel the opposite. I mean, I feel that that I agree with the first part. Yeah, because it’s vague and it’s broad and it’s subjective and it’s emotional and everybody’s got an opinion and it’s always relative. Now I understand, you know, let’s talk about the value of this data and some of it’s just in the words you use. You know, the value of this data is really important to your organization because it aligns with the business objectives. It covers your business universe. It’s fully integrated bowl, and it’s updated at this cadence that’s important to what you’re trying to do. We have the right kind of lays in place in terms of fixing or change, you know, correcting or filling in gaps that you might find because it’s always a moving target when you come from a syndicated data provider. But there’s a confidence from the provider that needs to be established so that enterprise can trust. That source, so as an example, when I was at Nielsen, we had a thirty thousand records supermarket database. That doesn’t sound a lot like a lot when Deb is talking about 400 million records. But those thirty thousand records represented a hundred percent of the supermarket universe in the United States. That’s the number we talked about. So if you talk about coverage, it’s like it’s not about no records, it’s about percentage coverage of your business. That number can go up and down. That could change different trade channels, but we at least provided a universal definition that people could benchmark their activity off of. And that’s an extremely powerful position to be in, and it offers a tremendous amount of value to those enterprises who are trying to figure some of that stuff out. So, you know, long winded way of just sort of saying holidays a thing, but it’s not the only thing. And there’s a lot of other vectors you’ve got to look at on external data providers and even your own data that you’re managing before you just throw it all in on quality.
Kate Yeah. Scott, I just wanted to add, I really love the idea of potentially changing it from being called the quality of data to the value of data, because that would also encourage the business and everybody else in an enterprise to actually care. Because once you want to talk about data quality, everyone’s like, you know, OK, good night, I don’t want to be here. This sounds like we’re going to be doing data entry all night and cleaning stuff up. But once you talk about let’s improve the value of our data, I think that sets off different alerts and alarms in people’s brains, at least for me at work. So my like, oh yeah, the value of data, it completely makes sense. Hmm. I haven’t. I haven’t. I haven’t heard you say this before, so this is surprising new stuff.
Scott Imagine we’ve been on like a hundred interviews together. I’m still coming up with new stuff. I never
Josh had data. We get it out of focus. But I was curious how you because in my head, when I hear that I have a distinction between what I would consider a valuable data to consider high quality data set, and I’m trying to reconcile that and maybe collapse that difference as we talk through it. But do you have a do you have a sense of separation between a feed of data that the business depends on, like it’s really valuable to the business that you know, the company’s product is built on the ratings going into Nielsen, that they can’t run their business. Without that, it’s really valuable, right? At the same time, it can be very unreliable. It can be low quality in the in the measurable sense. It can have a lot of inconsistencies from Europe. Missing values all the time relate. How do you distinguish between those, I guess more conventional ways of looking at what’s valuable and then what’s quality?
Scott You know, those things that you’re talking about, you know, latency, dependability, these are all things that people have to look at when they’re evaluating third party data providers and actually evaluating their own internal process to sell these large branded data suppliers. They know that they know Nielsen is a great place to have grown up in the data space because they take data. So it’s what they make. You know, a lot of people like to bandy about in the in the market. You know, every company is a data company. Now they’re not, you know, unless you make data, you’re not a data company. Sorry to pop. You know, some folks bubbles, but they they take a lot of pride and understand really their place in the ecosystem. And sometimes some of them get a little ahead of themselves in terms of licensing and how they try and get lock in. And that’s all the business model. But just talk about the content that content is it right and isn’t trustworthy? Those companies aren’t going to depend on it, and a really noble goal is to try and become a standard in some sort of vertical or market place for something that is related to the data content you have, whether it’s Nielsen ratings or credit ratings or whatever those kinds of things are, or even just simply, here is the comprehensive universe file of this type of entity, be it locations or brands or consumers, whatever it happens to be.
Josh Interesting. So I’m amazed as you’re talking, I’m reforming definitions in my head, and I’m thinking about if I if I bought a car and the car didn’t turn on and it didn’t drive down the street, I wouldn’t say, Wow, what a valuable car. That’s terrible quality. I would say what a terrible investment this one’s and low value it is. So maybe the distinction that I’m really thinking about is the potential of data about potential energy versus the actual kinetic energy, and that the real value that I’m seeing as we’re working out of the way we think about. Interesting.
Honor I like that idea. I think that’s that’s really interesting. So if we were to just from that exchange, how would we want to maybe rephrase it? Would it make more sense to maybe talk about impact of this data like if like, rather than necessarily like dividing value from quality? Does that make more sense like business impact of data?
Josh I’m so curious how you measure it. Yeah, like, I mean, maybe as a data nerd in me, but how whether it’s called value quality or whatever word we want to put on it. What am I actually demanding of my data providers that goes into the contract that says you get paid every month? What do I expect of them?
Scott You know, I mean, having been one of those vendors. You come in with your value proposition and say, this is what we’re going to do. Now some folks want to make sure you’re going to do every one of those things under certain timing with certain laws in place. But even the ones that didn’t, they had a license to this, to this set of data. And we’re depending based on, you know, what we are presented that it was going to do the work that we suggested or presented that it was going to do. And they said there’s always back and forth if you’re working on that depends on the kind of data, but all the data services that I’ve worked with were always stuff that was going right into somebody’s operational system like, you know, hard line wiring it into their into their ERP systems, their MDM systems, you know, really basic, basic foundational stuff versus what actually most data providers provide, which is some sort of analytical enrichment metric score. You know, a nice talk about I was like rows and columns, so people are really good at adding columns. It’s really hard to align rows. I always been in the row business. You know, the columns are about the rows. There’s lots of people who sell columns. There’s only a few folks who will sell the rows. So very philosophically here, almost, you know, symbolically and. If you don’t know what rows you have, then the columns are worthless, if you do know what road you have and you only have a certain amount of columns, you can go to somebody and say, here’s a bunch of roads. Add more of these columns of whatever that indicate or attribute whatever that thing happens to be. Feature engineering on the instances. Every time I come to it, it’s always about either, you know? Comes back to me as rows and columns, which is also really easy way to explain it to senior business leadership because. They look at tables. They look at reports, they’re not looking at graph databases and those chrysanthemum things that you can kind of move around that we all get dazzled by. They’re just like, you know, red means stop, green means go. Make it simple. Make sure it’s right and.
Kate Scott just wanted to add, so I didn’t I don’t have the the same background as you in terms of working, as you know, for a vendor of a data provider. But my assumption would be as a as a receiver of data, I would still expect to see some sort of metrics around data accuracy. I’d have some level of expectations around completeness and making sure that the data, shall I say, quality is good. I know sort of going back and forth on the value slash quality here. I think it’s still a little up in the air in terms of which, which term fits the best. Maybe both, right? But we still want to make sure that the data that we are receiving that that water, that’s the that’s flowing from the new pipeline that we’re adding to our main water pipelines is not going to mess everything up for us and make sure that we can actually fit all that in there.
Scott Yeah, yeah, those are all valid requests. Those are all kinds of things that, you know, I was certainly part of or providing, you know, no records, no attributes with the fill rate. What’s the frequency update rate? All those kind of things go into the bigger idea of quality. The reason I really pick on quality is because enterprises, the folks I always talk to, they’re trying to sell in data management programs. And what I counsel them with is if you go to your CEO and your board and talk about how you want to improve data quality, you’re not going to get funded. And a lot of the pundits in the data management space have been talking about how quality is so important for decades, and we’re still not getting the respect and the funding and engagement we need. So I take a step back and go quality is a pitch. It ain’t working. It’s not landing right. People are getting change. The change, the pitch.
Honor What pitch do you recommend that would would get the funding?
Scott You know, start with the reasons why managing data is of strategic importance to your enterprise. Show why it will enable your strategic intentions, why managing data and actually analyzing data, both truth and meaning are going to help the company get to where it needs to go and where it needs to go isn’t better. Data quality where it needs to go isn’t better. Feature engineering. It isn’t. Should we use SQL or no SQL or Python or all these other things? Frankly, you can all respect. It isn’t. We need better data observability. It’s. One of three buckets of value, we’ve got to grow the business, we got to improve the business. We’ve got to protect the business. So grow might be increased. Sales improve might be operational. Efficiency protect might be mitigate risk. The beauty of the space we are all in is data can do all three of those things, sometimes with the same record if it’s stewarded and govern properly. And there isn’t another department in an enterprise that can claim they can help grow, improve and protect the business at the same time with the same effort. And so I find that as a really exciting part of the opportunity. But as folks say, often you don’t align it to the business objectives and you’ve got a winner. Talk about the process and the how and the technology, and you’re going to have really small audience. .
I just wanted to share a quick story of relevant to what Scott just said about educating and sort of telling the entire enterprise the importance of proper data quality. So we were working with the CRM tool, a very well-known CRM tool, and we were using it to log our pipeline or, you know, keeping track of what clients were working with. This is a very large organization, and the people entering data into the CRM tool didn’t really understand the point of Why do I have to fill out these 89 specific fields every time we get a new client or have a new conversation? So they sort of went about it in the quickest way possible where, you know, just filling in the required fields or typing in a is The BFG, you know, whatever it takes in first, just so they can click complete and sort of go on with their day. And that resulted in extremely poor data quality, which in the end was actually feeding a very important piece of analytics. It was going into a dashboard that was informing the CEO of the company what he should do going forward, write his revenue, planning his his cost management. So it was very, very important. And I think, you know, the company undertook an initiative to educate the individuals on the importance of data quality and sort of why their job matters and why they really need to take the time and input the right data. But unfortunately, that didn’t really change much. So I know it’s important to educate. But in the end, Scott will tell you it’s all about the people, right? We actually have to train the people and really, really get it into their head as to why this is important and maybe spend some more time training them on the how how do they actually get some of the data points that maybe they didn’t have easy access to? So they sort of put in whatever or selected any random thing from the dropdown menu that was in the CRM tool? So, yeah, goes all full circle. The data is important, but if the people are not on board with the importance of data quality, it’s going to be hard to get anywhere.
Honor That’s a really interesting story, it actually makes me think of some of Databand clients where some of their sources are traditionally just low tech like CSV sources. And so it was really hard to catch these errors or do anything about it, to your point, other than using a tool that would allow teams to see it coming. So I mean, I don’t know that that necessarily addresses the problem, but it buys us a little bit of time. Josh will probably speak a little bit more to that. He worked a lot more closely with those folks.
Josh Well, I think in general, being able to see those problems coming in and attribute them to a point in time is a big factor, right? So if you have that data pool happening all the time from the CRM, and that might be a routine behavior where folks aren’t like adding data and properly, or it might be something that you just you change up the sales organs. People are learning the system, being able to see from sale, you know, from Salesforce that data set that we’re pulling from that API and we’re going from. Usually I expect this level of completeness in the data set and seeing that all of a sudden go to 20 percent complete as what was usually at 80 and being able to attribute that to a particular point in time of an actual data ingestion from that. So you can go in and say you can respond quickly, Hey, we need to get this data in before it’s two weeks later. People are looking at the dashboards and notice, Oh, it looks like we have lower sales in New York than we normally do. That’s unusual because nobody’s inputting the geography. So my main comment on it, which just being able to catch those issues over a early and being able to to attribute the start of the problem on the right timescale is how we’ve seen companies get out of that kind of hole. And I think what’s interesting about the example also is that it’s another case where the data is telling you about a problem. It’s not it’s not solving a problem for you, but the problem is the organization, the organizational change that you need to make in order to incentivize those folks to get the data in the right way. But at least knowing where you stand, that’s a good place to start. Yeah.
Scott One of my favorite examples of that is when you look at somebody’s category and you look at a pie chart of their categories of their business and there’s a big slice called other. Hmm. Lamia, all those things are just rampant now. And, you know, and then the marketing department will say we have a channel driven marketing strategy and they’re working off of data that this is other. So those things need to get fixed. It’s a challenge. It definitely goes across the organization. But that’s where you start to see some of the symptoms of the lack of data governance, the lack of data management.
Kate Yeah, it’s interesting when I’m filling out a form form that doesn’t matter too much, if I see other, I’m clicking it. It could be anything if I if I see a drop down to 10 options and one of them is other, I’m like, OK, other
Honor let’s just say,
Kate Hey, it’s like, let’s keep going. So it’s the curse of the other. I get it.
Honor So just to recap really quickly, I think we touched on a few areas. I know that we started off talking about the dependency, the last mile on the first mile and then we went into a direction of Scott pointed out that that really it’s not so much about quality. It’s more about like what pieces of matter, right? We’re talking about truth before, meaning then we’re really asking truth that actually matters to the business. Paraphrasing there and then Kate, you pointed out that it’s all about people management, as well as managing training and ensuring that everyone is on the same page and recognizing the importance of data quality, rather than just selling data quality as a concept which is probably going to be meaningless to certain teams. And then, Josh, you mentioned that the value of catching things early in the data process. So if we were to really sum up like what would be a good recommendation for teams to turn data quality from a an abstract and abstract theory into practice? What’s the best first step to take? And I’ll go around the room and I’ll start with Scott.
Scott This first step, I think, is always to just sort of assess where you are, what are the critical domains you need to cover and how well are those being managed. All those critical domains and I mean them in the master data sense, not the data mesh sense. Are, you know, types of data, those are always the relationship data you have customers, vendors, partners, prospects, citizens, patients, clients, whatever you call those things. And then the brand data, you have products, services, offerings, banners. However, you bring value to the relationships you’ve got through those brands and making sure that stuff is as right as possible because that is truly the foundation. Those are truly the the nuts and bolts of every other kind of analytics propensity to do whatever. Machine learning, A.I., all those other things so that that that stuff’s going to be right first. And those are again, common domains that every organization has, you know, have relationships. You don’t have a business, you don’t have brands, you don’t have anything to sell to those relationships. So that’s really, you know, the basis of being a company or are around those things, relationships and brands.
Honor Kate, what about you? What’s a good first step that you would recommend?
Kate Yeah. You know, if we keep talking from the perspective of getting people to care and getting people trained up on proper data quality and input, I think a good first place would be is to show everybody where this ends up right. Sort of going back to the water analogy, show them the human, drinking the water and asking them, You really want to poison this, you know, grandmother or somebody, some some some person people might care about and, you know, showing them maybe the dashboard that is seen by the CEO who’s using this to make their decisions and sort of show them where, where their data entry is impacting the end product? And then also, you know, to to Josh’s point, if we can go back and find the point in time when this data entry mistake or lack of care was made, we can likely also identify the individual that made the mistake or input data entry error and sort of hold them accountable in both ways in in positive and negative right. If if you see somebody consistently skipping fields or sort of rushing through the process of any sort of data ingestion into into the into the company sort of pointed out and say, Hey, you made this mistake 10 times in the past week, maybe you need some more training or something else, but also recognizing the individuals that are bringing in the good quality data and maybe giving them some sort of recognition as well.
Honor Josh, what’s your recommended first step?
Josh My recommendation is to inventory the problems that you’re facing so that you can prioritize your attention on what to fix first and build out a solid plan for how you want to drive change. I think we see I would say we see two kinds of we see two kinds of initiatives ones that are driven from impulse. That observability seems important and we want a tool for it. And the other type of of the other type of approach that we see is. We’re struggling with this issue. We noticed that one of our sources is not sending us data at the right time, so we may also have this this other issue of schema changes coming from this location that’s taking us down. My analyst has raised several times that the dashboards that they’re seeing are not accurate and that relates to so-and-so getting some inventory of what are the big issues that are really causing pain in the organization so that you can identify the right areas of investment to actually solve it. That would be my my major recommendation on how to get started with data quality that needs to meet the organizational change at some point and get buy in from the folks at top that might not be in the weeds quite as much. But just like we’re talking about the entire podcast and collect data about what’s hurting the company and build a strong case for the kinds of tools or approaches or changes that are can help you fix it.
Honor All right. And before we wrap, Scott, you have a book about data governance and data management. Can you show us and maybe give us a quick commercial?
Scott I just I just happened to have a copy of the book right here. Funny, you should mention that telling your data story. 10 story time for data management. Ninety nine percent buzzword free. I didn’t want to over promise. So keep it at that level. But it’s my take on how to sell in data management in an organization and looking at the space, I realize data storytelling as a discipline and as important as it is, is focused primarily on that last mile. It’s focused on taking data, putting in a business context, driving business action from it. How do we visualize it? How do we help the folks who are going to use it? Put it in play. Super important, but where’s the data story about making the data about that first mile? And so that was the endeavor I undertook to put this book together. That’s why the subhead is data storytelling. But for data management and every organization needs both. It’s not like Sophie’s choice here. You got to pick they. They all need both every enterprise. But I’m trying to help the data management community kind of get their voice out there to get a seat at the table to make sure they get the proper funding and support and so on.
Honor Very cool. And Kate, tell folks a little bit more about dedicated and who the community is for.
Kate Yeah, happy to. So the data community, we’re calling it the data cated circle is really for all data professionals. So including students, data analysts, data scientists, folks who are focused on data governance, data management, data engineering. It’s it’s a very broad spectrum of data professionals that I would like to bring together. And yeah, I mean, this is what we’re trying to cover it all. We have discussions. We have a book club, we have monthly live sessions. And interestingly, Scott Taylor was our very first speaker at the WHO indicated Spotlight for December. So we have sessions booked out for the rest of the year that folks can join and network, and it’s really like a LinkedIn for data professionals. Love it.
Scott That’s awesome. And I remember number one, if you don’t mind me, mention it was member
Honor number one
Scott literally was I visit your number
Josh one
Kate was the very first number. We have a screenshot of the texture
Josh and I joined now. When a member will be,
Kate you will be probably thirteen hundred or fourteen hundred something
Honor good. Yeah, you better snag your spot, Josh..
Scott Kate Moss, she makes it happen.
Honor Well, thank you so much for coming on. This was such a fun conversation, and I’m sure we’re going to continue this discussion. Thank you again. Take care. Bye bye.
Kate Thank you.
Josh Same, next time.