> Episode Details

The Case For Data Catalogs: Are Your Teams On The Same Page?

As companies scale out their data operations and data demands continue to mount, communications and knowledge alignment form every team’s operational backbone. Castor CTO, Amaury Dumoulin, explains the foundational importance of data catalogs as teams expand in size and knowledge.

About Our Guests

Amaury Dumoulin

Co-Founder & CTO Castor

Amaury co-founded Castor where he also operates as CTO.  As a software and data specialist, Amaury’s goal is to bring value to the end customer through those two areas. Over the last years, he has built a data team from scratch at a major European fintech company.

Episode Transcript

Honor Hey, Harper, how’s it going? 

Harper Hey, Honor. Living the dream like always, you know, excited to talk to our guest here about a data catalogs and data quality and how that’s played a role in his his career and that product that he’s building. 

Honor Yeah, welcome on the show Amaury. Why don’t you tell us a little bit about yourself? 

Amaury Well, hi Harper and an Honor. I’m glad to to meet both of you. OK, so quick intro on myself. I’m I’m a software person and I discovered data a few years ago, and so it’s been about building products that involve both of them. I was at a fintech European fintech for four years and I was super exciting working around fraud detection, compliance and in building a data team from scratch. And one of the issues we had was How do you ramp up a data analyst? How do you share the knowledge? And that’s where the ID from Castro was fit was given to me was. Kind of discussed with one or two of my co-founders, and so at some point we reached a point where they told me, You should join us in your AM building this Castro solution we’ve been working on for over a year and put them over two years, so we’re super excited to talk to both of you here. And thanks for having me. 

Honor Yeah, definitely. We’re really excited to talk about this because their data catalogs. Much like a lot of other subcategories within the data industry is also going through like its own renaissance. What did you see personally in your own experience to be a major struggle for data teams that makes data catalogs a really great spot for providing a solution?  

Amaury So as an ex head of data and also a software person, I kind of. So both sides of the tables when like this, there’s a heavy demand on answering business questions when the tools are lagging behind or just when you’re scaling up the team. And a new people kind of struggle with understanding the data model and how everything is is done. So in terms of pain points, it ranges through this onboarding of a new analyst, which is the bread and butter of of discovering assets, which is what we author, but also about collecting, sharing or updating knowledge, even if you collect it and use story at some point. Keeping it keeping it up to date is already kind of a challenge by itself and also which is something you shouldn’t be shy about avoiding a key contributor whenever someone, especially for start ups, was one of the first employee as leaving, and all of his knowledge is just fading away and it’s important to have them. So typically, and only in terms of of a catalog of the assets, those already pain points. But if we go further down the road of the data journey with the visualizations, a lot of pain points are coming up in terms of understanding how it’s connected in and and how to leverage the full lineage from the sources up to the dashboards. 

Harper That’s awesome. I’ve used data catalogs and various at various roles in the past, and, you know, that could range from an Excel spreadsheet that has everything put together or an output from my post-grad DB that tells me what my relational model looks like. But now we just hear from you really quickly. How does Castor define a data catalog? What what belongs in that data catalog, but what doesn’t belong in the data catalog? 

Amaury Well, I think it should it should stay in sync with your data, as I said of this, the first kind of. Expectation you would have from a catalog is is that in maps correctly, the data assets so diffuses the tables and the columns, the schemas. That’s the first thing. But now we we don’t want to have several tools, so I’m not saying it’s a one stop shop for every data needs. You might have this certainly not the case, but we’re trying to dig deeper into metadata space. So that means bringing the metadata from the Visa zation tool. So we’re supporting different tools, bridging them, as I said earlier about the lineage, but also how it’s constructed. For instance, in Lucca, you have the notion of explorer, and this is an interesting topic because you’re defining kind of a model that they’re sitting on top of the model of a warehouse that helps lookat maps, mapping and doing abstraction. And you want to you want to have this into tool because otherwise the people that are designing tiles or dashboards, whatever name you give to, you know, pieces of dashboards in the full dashboards, they need to have an an understanding of what are the building blocks. So we are in this base of mapping everything in terms of metadata. And as we are shifting a lot, there’s a made to data open API that’s been coming out recently. It’s exciting time for you to for this area of expertize. 

Honor Really cool. It really sounds like the long and short of it is it has this purpose of knowledge transfer setting up a knowledge base for the entire organization. So. When you look at teams that don’t have access to this kind of tooling, what kinds of mistakes do you think are common? 

Amaury Well, I mean, in terms of usual mis mistakes, I’ve I’ve done and I’ve seen also kind of indulged on. One of them is doing too much run and not even not enough building, so I’m going a bit higher level. But if you’re only answering questions because you say, Oh, cooperate, marketing finance are asking this for yesterday, then it’s super difficult to reuse existing work to even know it exists. And even if you and if you don’t have the proper organization, know decentralization of analyst in a different team, then it’s super hard also to kind of balance between the demands of those teams. And you’re like falling into the trap of who speaks the louder kind of winds. And also, there’s no rituals and kind of. Ways to talk between the engineering and the analysts about how do we build some interesting pieces of software? How do we use off shelf solutions such as ours to reduce the frictions? So the first thing may seem kind of of of a stupid thing to say, but if you’re not devoting enough time to building, or even if it’s just taking off shelf to set it up tools to help you out, diminishing the friction, then it’s it’s never going to improve itself. So that’s one mistake. If you think answering the question, one hundred percent of the time is the only thing to do and not devote 10, 20, 25, 30 per cent. Whatever the the ratio you have in mind, then you’re doomed because you’re never going to build tools and you’re going to suffer in kind of at some point you just explode because of that. Another kind of pitfall is when you’re not centralizing the KPI validation. It’s tricky one because it’s mostly about people and processes is not so much about a technical thing. You need a core team of people. We’re kind of entitled to validate this. Usually the data engineering, but not only because they’re not usually really business versed, whereas data analyst, they’re really more focused on the business side. But maybe what they offer in terms of data modeling is not the best. So you need both for the kind of skill set. So people from both sides in this validation team also you need to start with, you know, bootstrapping those top level dashboards, you say, OK, those are like the five, 10, 20 top dashboards that management really relies upon and those needs to be have solid, solid, you know, rock solid KPI definitions underneath. So you have to start up with those KPIs and then flow into those dashboards if you don’t centralize this. Every team is going to have a different type of dashboards where different types of what a revenue is, what an active user is. And this is where you end up meetings with people with different kinds of figures, sometimes even with different orders of magnitude, if you go in a niche where everything is kind of magnified. And that’s I mean, I’ve been there. It’s it’s almost a bit preoccupying for the business because you don’t want to take decisions upon this. And finally, into this KPI, you want to give guidelines to the data’s towards what we call data. See words here for me, at least, and I’d be curious about what you think about this definition is people were kind of verse and scope, and not that analysts say they able to shift a bit. The sequel to modify it somehow, but not designed proper heavy queries and those people, they need to have guidelines about what’s safe, what’s not safe and where the the information lies. This is where data catalog have also meaty parts. Playing one less type of mistakes have seen as a when you’re unleashing self-service data without any any guidelines, without any, you know, rules. Self-service data is incredibly powerful for a tool such as Metal Base. It’s it provides a lot of ease for the Upstream 16, even sales team because it provides them with alerts, with creating and doing a lot of dashboards on their own. But if you don’t put conventions to separate, you know, local dashboards from organization ones, if you don’t build like a data mart, tables that are kind of building blocks or whatever names you put to it like truth tables, but at least the tables with the KPI and twinned in it for when they want to add a result. For instance, I’m a salesperson. I want to know what’s the revenue that this company that this account gives that gives me I shouldn’t be do it myself. That’s dangerous. I should use this building book. But and the third one with which everyone struggled with is how to phase out and outdated or unused dashboards to make sure you’re only using a limited set of up to date and widely used. Because I don’t know, it’s just have you don’t want to see your surgeon that only operates once a week and the same way you don’t want to use a dashboard that’s only seen once a week because it means once a month. Even worse, it means there’s no practice. Nobody sees the issues with it. There’s no observability on top of it. So it’s it’s dangerous because it’s fragile, because not enough eyes are cast on it. 

Honor I think that’s a thing that it is a a mindset change in how we need to treat data as something that Harper talks about a lot is data in motion is something that we all need to be able to optimize for.  In order to ensure that all your processes upstream downstream are all lined up and then going back to the point about this creating a framework to minimize friction between teams. Have you seen that data catalogs are a complement to this level of awareness of communication? Or is it more like a teams that don’t have it don’t even know that they have this problem? 

Amaury Harper, you want to jump in on it? 

Harper Yeah, definitely. First, I think that’s a great question, and it’s definitely something that needs to be considered for organizations that are looking to grow not only their data teams, but also their data culture. And at the end of the day, for someone to have a successful use of the information they’re bringing in that data culture and the understanding of how it should be used and how we should treat it is going to be key for that growth and adoption throughout the organization. And I think data catalogs are key to bringing forward a data culture that is easily to understand. And also, like you mentioned of like having a data catalog creates a way for you to communicate across teams. You were talking about the mistakes earlier and you started at the one place that, like a lot of data engineers would definitely feel for, is that when you have data catalogs or sort of any sort of like data governance initiatives that come through, it tends to fall down onto the data engineering team to validate those assumptions, validate that information that that’s there, but actually loved the fact that you brought in the idea that other roles can play a part of that as well. And other organ, like other domains and smaller teams within the data organization, can actually own the artifacts that are going to be documenting in these catalogs. You mentioned, I think you called them see people, but like the people that know enough skill to be dangerous, but not enough to really be on the engineering side. I love incorporating those people into these types of processes, like for me, like as they been talked about before. I think that it’s important that you learn that organizations learn how they can democratize their data and give access to everybody inside of the organization and get their feedback on how that works. So if you’ve got that, if you’re that salesperson and you’re looking at that dashboard, that’s hopefully not updated once a month, hopefully it’s updated every hour if you can, but you see something that isn’t quite aligned or doesn’t quite make sense. Or you see how two metrics within that dashboard could potentially give deeper insight if they’re compared against each other. There needs to be a way for that information from that salesperson to feedback to not only the analytics team that’s building that, but then also the analytics team be able to talk to the engineering team and how to provide that. So I think catalogs are incredibly important and a useful tool for helping empower everybody within the organization to get greater value out of their data. And that’s. 

Amaury Yeah, same same, likewise. It’s a great question. And we’ve reached this need of documenting because the knowledge is is otherwise shared and slack is shared in person when that was a thing. OK, but it’s it’s always been. It’s always been relying on a lot of communication. So there’s no there’s no reason it should stop. The only challenge that needs to be more efficient because you don’t want to repeat yourself just as if you were talking to your grandpa about your last vacation. You don’t want to repeat every, every three times the name of the superb dish you had in Cuba. But what I mean is that, you know, we’re just integrated slack, you know, capability. And what are we trying to do? Is having a way to store the information. Once you say, OK, look it up, that joint condition is there. You don’t want it to be lost and you want it to be attributed to the right asset, just as you said, Harbor, it’s this kind of conversation going round. You want just then to say, OK, this dashboard’s bit problematic every once in a while. It just just messes up. We don’t really know why. So maybe you want to add information there if you just have a channel of data ops quality, channel whatever. And sometimes you know what happens. The company gets bigger and you segments the data. It is Slack channels into sub channels, but still in the end, it’s all pulled together in one channel. So if you go, I don’t know if you’ve done this, but when I left this company or three hundred people after four years there, I try to skim through and searching in slack like it’s a great tool. But to store tribal knowledge is not the best tool. It’s so collaborative, yes. But like with a shared memory on top of this sitting on top of this, but it’s not like rocket science is. Just try to think through how it goes. So it goes with, Oh, do you know how to join this crazy table between those microservices that have tricky, you know, names of columns? Oh yeah, yeah, he did this, but you have to add this little tiny conditions on top of it because otherwise it just crashes. Oh, thanks. And someone says this one year before the bubble didn’t change. And then someone asks one year after because he or she has to do a new implementation variation. So it’s also about that. It’s about giving, tooling and giving ways. If the tooling is not good enough or not sufficient to put this tidbits of insights and of of communications around, it could be also, as you said about it doesn’t work or this is problematic. We should split this model into tables. We should add another layer. We should add another break down dimension and the conversation can start in this tool. And and if it’s done, then you see the history, which is also super interesting to understand how it was built. When you’re a newcomer or when they’re the, you know, the company change, the organization changes. Sometimes you have to split. You know, there were two teams, you have to have a central team and you had decentralizing and you want to keep having the information afloat. Even if the company revolving around saying that information is central, maybe not. 

Honor Does data quality come into view at all for data cataloging? 

Amaury Well, it’s it’s definitely it’s definitely two two sides of the same coin or even like brothers and sisters definitely have a lot of overlap and a lot of collaboration at play. So first thing catalog must stay in sync and we should always kind of detect suspicious activity or behavior because the analyst consumes it. And for instance, if a column that is widely used drops and should be detected because that’s sometimes it’s it’s wanted because the schema evolves. But most of the time, it should not be go unseen and undetected because it’s not good. So we’re trying to see it from an analyst side of things, not so much engineering. So for instance. There’s a there’s a lot of tough points, for instance, of observability, but more again from an analyst’s side of things. You also want to, for instance, beckon or notify the downstream consumers. As I said earlier, those that are stewards or even like the CMO, is the head of acquisition in the marketing department. Because you want to tell them, OK, don’t trust this dashboards today, because there’s something. That you don’t have to worry about, but it is happening. I’ve seen, but still there’s a red flag here. Equality is essential for this to trust the data otherwise and I’ve been there. You have someone saying, Oh, it’s not trustworthy, then in their head, it means it’s never trustworthy. So every time is like, was it good enough? Oh, it wasn’t good yesterday. So is it good today? And you end up having one to one with the CMO asking you around? And I loved having this conversation, but it kind of stressful for both parties because one doesn’t trust and the other one is like, it works. Ninety nine percent of the time you caught me off guard that one time. But the real answer here is providing them a tool to understand why and when it works or not, so that they are able to know it from their self and trust you on this. But then if your dashboard is faking it and underneath it doesn’t work, then it’s on you again. But I’m assuming you try to do things right and you have good faith in this, even if you can make errors. So the downstream consumers, it’s also super important in catalog, for instance, is someone is is marked as an owner and a dashboards and a table that is in the lineage of its downstream kind of is seen as not fresh. Maybe you want to notify them that something is wrong because the data points are too old. So that’s that’s one. And also, you want to also make sure that the ETL dbt points in constructing levels of models one one intermediate model disappear shouldn’t be the case, or there’s no more rows in it. We’re not in that direction, but complements you so we could feed from the observability tools and quality tools to make sure that in the end, the consumers and the analysts can see it and monitor it, which is essential for them to trust and to build like dashboards and analysts that the the business can rely upon. 

Harper Yeah, yeah. I think that the the thing that’s difficult with like a data catalog is that at its core, it’s a piece of documentation, right? And in my experience as an engineer at working with lots of different engineers. No one enjoys writing documentation because it’s kind of like when you have to repeat that name of the Cuban dish to you, to your grandfather, right? Like I already wrote all of the code. Now you want me to go back and write it again, but in plain English, that way it can be done here. And yes, we all recognize that that is good practice, and we all recognize that, that it needs to be done. And it’s really the sign of being, you know, some would argue it’s a sign of being a good engineer, right? They’re probably right, is what I would say. But because of that extra effort and because of documentation gets left last, it tends to be. I think that’s the biggest reason that you see data catalogs being an afterthought whenever you have data teams creating new initiatives. And the thing that’s exciting about Castor and the data catalogs that are in the space at this point is that it creates an automated way for bringing this information to a centralized location for people to understand how their data objects are changing not only at the table level, but also at the view level or or an analysis like the like the management level as well, like the metadata about how people can access these items and where they’re located and things of that nature. And automating that collection and automating the creation of documentation that’s really going to help empower everybody in the organization to feel more confident in AI suggesting versus suggesting a change, or that the dashboard is actually giving them the information they need or perhaps coming in and understanding how they can be self-service in a good way. And I know you don’t like the idea of self-service MRI, and I think for the most part, you’re right. You know, if you like, if you let this be the Wild West, we’re like, anybody can do whatever they want, like self-service can go downhill very quickly. It’s a very slippery slope. But if you have tools like cancer in place and you have a good, healthy data catalog that helps you understand how they can access it, and that when you write the SQL query, you have these extra conditions that need to be in place and make sure you hop on your right foot before you hitting execute. Then everyone can do it in the same manner and they get predictable results. And that’s that’s what I really love the conversation of like a metadata first approach when it comes to data management and finding ways to document and create understanding across not only teams but domains within your organization is really, really powerful. one thing I’m excited about with the data cataloging space and data documentation and government space in the modern data is that we we all recognize that data is a stateful object, right? It matters the context that you’re viewing this data. So that way you understand what’s going on around it. However, without having data observability in place, without having data governance, without having a data catalog, without understanding the dbt models that are acting on it in the pipelines that are acting on this model. The only way that we ever solve this state of data in the past is when it was in the database, when it was in the warehouse, when it was in the table, when you were looking at the view, as you put it up and having a way to persist, those changes of data. So that way we can understand the in-between spaces to the full stateful nature of your data between one table to the other and one server to the other is what’s going to really let us take that DevOps approach of not only do we have detection from data quality, but now we have our catalog and we have our data observability tools that are giving us awareness into what’s going on. And then we can use those items to build up a better data platform and iterate on how to create change and correct those issues as they come through. I’m curious, like, are there any interesting use cases that you’ve seen Caster users take the data catalog and the tool and the caster tool and have that? Influencer of all of their data quality practices or their data management practices, and how can you share some stories about how they’ve been successful in doing that? 

Amaury Yeah. Let me let me just circle back on a few things you said, which I think really interesting. The first is that, yeah, you’re totally right. Metadata has to be as close as possible and documentation has to be as close as possible to the source. This is why we’re connected directly to the sources. And and it has to be in sync. So that’s solely true. Then a second part of what you said is. I don’t remember asking for it, but on this on the last question, you just popped up about kind of measures of success or kind of changes we’ve seen that were brought up by the. The recommendation and tool is on boarding, I mean, on boarding and new allies for some of our customers was really impressively faster and smoother because of what the the already kind of put into tool and the fact everyone was already in place. So setting it up and kind of filling a bit of it was super interesting and super powerful. Now remember the second part, the second part you said about documenting documenting is a hassle, no doubt. You nailed it. It’s true. Nobody wants to spend life documenting the same thing for software. But you know what they say stuff documenting is is like sex. It’s it’s better than to have bad sex, then no sex at all. So, yeah, that’s clearly, clearly a stupid one. But what I like about it is that we stress the fact that. Documenting can be boring and certainly can be a hassle. You don’t want to be assigned to doing documentation for too long a period and all, but at the same time, if you’re saying people were never did the real work at documenting, they’re not going to know what to write. I mean, I’m not trying to do that to be tough on technical writers, but to be a technical writer, you have to be technical in the first place, otherwise you just kind of rephrasing stuff that somebody else did. So what we try to do is leverage the time that people spend documenting. We have automatic propagation of documentation if you have seen columns. We also have building automatic documentation for the ETL. If you have a five time stitch going around so that all your sales force, all your Zendesk tables, they are ready to describe because this is the same for everyone and you shouldn’t spend time with documenting things that we could do for you. So we’re trying to also kind of slowly integrate and and totally go beyond the fact that documenting time is kind of precious and we don’t want to spend it with something to a low value in that direction. We also try to focus you on the tables, which are the highly popular and highly use once so that your efforts is spent on the ones that have higher impact. So if we even if we kind of acknowledge that it’s not going to be a lot of documenting happening around still with, we want to tell this and what matters the most and not a waste. Everyone’s time and things that are less of of use and less of importance and worry. 

Honor What do you think is a sign that a team is leveraging their data catalog? Well, so I think like when I think about data quality or data observability, like a sign is the CEO doesn’t call to scream at you. What is the equivalent in data catalogs? 

Amaury Yeah, I’ve seen that one. I remember emails. Oh, is this true? How is it happening? OK, yeah. Yeah, I’ve seen that conversation at that moment, and it’s frustrating for both parties. So when do you know that that it kind of works and if people are satisfied and kind of using it? Well, the search, the search is the first Bing search. We’ve been working a lot to improve it and we’ll we will improve it yet again. But the fact that we’re a catalog means you’re searching for answers and you want to find them. So if they. If they’re trying to search for assets or even better, if they want to know if a problem was already sold, for instance, and our activity email validation, whether the kind of topics do try to pin down if the search usage ban pops and if also they find where they want, it means we’re getting it. Another thing is the increased documentation once they found what they wanted and if they circle back to add this information, this piece of additional piece of information to discovered and for it to share it to others. And then that means also we’re getting it. It means there’s a full lifecycle of a search I find I enhance and doesn’t mean you have to bootstrap the documentation or sorry, kick-started documentation from zero and say OK before shipping it. We’re going to document for a month that’s just going to bore everyone out. But if you go along, you know, along the line along lifeline there, like the lifecycle of it. And as people go by the usage, they feed the tool, then that also means it’s working in a really sane and, you know, lively way. And we’ve seen it happening and we really try to push fruit so that it happens in a lot of other companies. But this certainly kind of a tricky part of, you know, much like any shading. You know, I don’t know if you had like a, you know, a little motorcycle or something around or when you were little, but kickstarting it is difficult. It’s kind of the same way you want to have this thing flowing. But the start of it is difficult because this kind of, you know, circle effect is needs to be first initiated.

Harper I think that the the difficulty that engineers face with documentation or like the way that it feels, maybe, is that it’s a high effort task with perceived low value to yourself because you’re documenting something that you already know how it works, right? And so it’s hard to get motivated and get excited about something that’s going to take a lot of your effort. But ultimately, you’re not going to extract value out of it, but you’re, you know, the value is going to be there. If you come back to that piece of code a year later and you haven’t touched it, you know, remember why you made these design decisions. But you know you’re the perfect engineer and you’re never going to need to come back. You’re going to know exactly how it works. But the whenever you have tools that create this documentation in a centralized place, I love the idea of the search that you talked about there. I’m sorry because you could come across data objects that you’re not familiar with and immediately go in and kind of dig through, OK, how do these relate? How do I use them? And then you see that this tool, like the Kastor, for example, has the documentation from the other teams that own these other data objects. And now you see the value that you’re getting out of it, right? And so and they’ve also decreased the barrier of entry in terms of the effort that it takes to create that documentation. So by at the same time lowering the amount of effort it takes to create the documentation and increasing the perceived value. You come to a place where like, OK, we can prioritize this as part of our workflow. And I think that’s that’s what’s going to be the key to data teams being successful long term with these data catalog tools like catcher in the space. 

Honor And if we see a parallel between what we do, at Databand being in data observability, as well as what you’re mentioning, we’re data cataloging, there is this. Tendency for us to not prioritize things that I can immediately extract value added, right, Harper, to your point that the idea of being taking a proactive approach with data observability, for example, which is what we do. Similarly, with data cataloging, if I set up this framework for organizing my knowledge base and there are people will benefit from this, the entire team will benefit. What do you think we really need to see for data culture to start being more proactively collaborative? 

Harper I think that’s like the billion dollar question that a lot of tools are trying to answer, and I think that’s why you see the explosion and observability towards. That’s why you see the explosion in cataloging tools. And I I say this for the sake of the episode and also just because it’s true. But like you have to start with with a catalog, you have to start with with a place, a centralized location that people can access and understand. What the data ecosystem looks like, because without understanding what your data environment contains and how your data is. Working within that environment, you’re never going to understand what the parameters of your problem space are because I say like my I don’t have enough data, my data is not quality of my data isn’t coming in at a good enough time. OK, great. But where is it supposed to come from? Where is it supposed to go? Like, how are you defining that it’s quality? How are you defining the time and that it’s just come in on? Like, How are you creating those SLAs, right? But before you even create an SLA, the tooling in the space for the data industry has to find a way to have a common a common language to communicate these issues and talk about how people are viewing them. Much, much the same way that like version control really change the way that, like software engineering works like many years ago, that allowed a way for engineers across organizations, across companies, across domains to finally say, OK, I see what has changed here. I understand what’s going on and also talk to each other through a platform. So I don’t think we’re going to get the exact same tool, right? There’s going to be a get for data, for example, but having a way and having a standard around communication about data and data problems, regardless of the industry that you work in, that’s what’s really going to lead the data to becoming more. I’ll use my phrase democratized, right? That’s very. 

Amaury Yeah, it’s it’s funny you talk about Democratize, I think it’s also a term in the the the the analogy with politics, it’s it’s been around because. It’s also a matter of data for the whole, the whole society, but yeah, getting back to your part of your question, I think it’s super tricky to bring a new tool and let alone a new philosophy and in a stack and a team. And much as I agree a lot with the conversions we’ve been waiting for, debate kind of led the way for modeling, but even for modeling, they kind of paved the way for lingua franca of the the tool used. But you can use it in a lot of different ways. You could have five big tables where everything is. You have hundreds of tables with only five columns. They didn’t really define a standards of how you do modeling, but they define the standards of how you interface, how you designed this, where you designed this, this kind of of your key. It’s like they provide a software to design your house like AutoCAD, but they don’t tell you how to do the blueprints so much in the same way. We’re pretty much liberal. I think we’re pretty much liberal in the way that team. They don’t have to change their processes of behavior at this step, but they have a nice tool and they kind of are invited to use the tool because it’s going to be always smoother with the tool and they don’t have to ask themselves a lot of questions. And the mental burden is lower. So in the end, they can do whatever they want. Much the same way you can answer in Fred’s in slack or never, you can use emojis or not. You can divert the discussions to another channel. Slack doesn’t tell you how to do things, but you’re kind of shepherd. That’s a nice word for this into using certain ways because it’s better in the tool. But if you want to do differently, we certainly don’t at this step kind of enforce it, but we kind of show a nice direction and a second one. Yeah, it’s instead we we we will integrate with with the tools, the the the open APIs, the stuff that are going around so that we try and not force a standard at this step. We try to evolve around it because, as you said, it’s moving super fast. And if we take a direction and force people that are not happy with it, it’s also going to be tricky. But yet at the same times, we’re thinking about, you know, beckoning in a direction, saying, OK, those tables there doesn’t seem really used. Those modeling these seem a bit tricky because it’s only, you know, fin wrappers of one on top of the others. Maybe you could reduce this, I don’t know. We could think about pieces of advice around around the modeling, around the documentation and so on. But it’s already kind of tricky to have a tool with the right UI that, as you said, kicks in into having adoption, having people liking it, that we’re not forcing the way to work, but more about building the tool they’re going to love working with. I don’t know if it answers your point, but it’s how I see things now. 

Harper I love the perspective that’s that’s really great. And oftentimes I’ve learned that it’s less important if you answer the question and more important that you get the thought out there. So the conversation gets started because at the end of the day, conversations like these is where they’re going to lead to the next great ideas and the next thoughts about how we can actually meet the capabilities and meet the needs of the space in general. But I just wanted to ask you, we’ve talked a lot. You’ve talked about your experience and the inspiration for creating Kastor, and you focus in on like the analytic side and the different visualization tools and warehouses that you’re that you’re interacting with. At the time, you talked about different open APIs that you wanna bring information from. So just curious, like what’s what’s what’s what’s next for Castor? Like, what’s what’s your what’s your vision like? What do you all hope to accomplish over the next year? 

Amaury Well, as I said, we want to reach nice usage within each company that that means inside of the data team data analyst mostly, but not only perfect, easy usage of data scientist, for instance, but also about a data see which those people outside of the data are real but are still kind of focused and using that data tools, specifically the visualization. So it’s about having adoption, people coming back to it, using it, and this is what we are laser focused upon. So in terms of vision, we we might we might bring and when monitoring on this metadata from the source to the consumption, make way to see that all the lights are green or not. We want to Indiana also help the transition to be data driven. You know, all those companies must share something you said earlier, and it’s true to democratization. But the fact that they’re using it to take smart decision even to build a product. So, you know, analysts, they’re going to be even more knowledgeable because they have they don’t they don’t have the privilege of I’ve been there for long enough, so I have all the tribal knowledge which you don’t have because it’s all in the tool, the KPIs that really crystal clear and and most importantly, linked to all the assets. So you know that this table is used to define a KPI, and this one isn’t. And also all these suggestions and automation to give you the best, the best practices. But yet again, we’re not there into criticizing or forcing any any processes. It’s just a suggestion at this step. And also we’re also interested in connecting and feeding of all the other topics at play. So quality governance. It’s it’s super interesting. One topic also that is on the table is smart editors, but I just for follow through it here because I think it’s interesting, but it can also be really powered by that useful insights. I think an editor linked to the catalog, but at the same time, an editor that is close. To the source also has, you know, enhanced power. So the editor is is a tricky part because everyone wants to bring an editor, but no one can bridge all the gaps with having the best of of old word. I don’t know if you’ve seen a Snowflake’s editor, but it’s the new version. It’s it’s super, super useful, but nobody will bring documentation. Nobody will bring visualization tool to to this. So in the end, they also have limits to what they can do because the frontier of their knowledge stops at snowflake and not in any other parts of it. 

Honor Awesome. Well, Amaury this this is really interesting. Thank you so much for sharing your insights on data cataloging. Thank you for coming on the show. We’re really excited to see where Castor goes next, and we’re definitely going to stay in touch. So thank you again. 

Amaury Thanks to both of you. Was a lovely time spent with you, and I hope you have a nice day. And in Austin, nice off to New Haven. 

Honor Yes, yes and thank. Enjoy Paris. Thank you. Take care. Bye bye. 

Amaury Bye bye. Take care. Bye.

Stay Connected

Sign up for the newsletter