Modern Data Love: Why Data Observability & Data Integration Belong Together

Our modern data reality is highly complex. Data teams are recognizing that previous data processes made for a bygone era are tough to scale. Airbyte Co-Founder & CEO, Michel Tricot, and Databand.ai Co-Founder & CEO, Josh Benamram, offer insights and advice on what to do when the sheer number and variety of external data sources that a business counts on multiply at a dizzying rate – and show no sign of stopping. Our guests speak to the necessity of paying special attention to the left of the warehouse to ensure that all your data gets in and all your data is correct. In order to achieve data quality at scale and future-proof your data operations, data observability and data integration must work together hand-in-hand.

Modern Data Love: Why Data Observability & Data Integration Belong Together

About Our Guests

Michel Tricot

Co-founder & CEO Airbyte

Michel has been working in data engineering for 15 years. Originally from France, he came to the US in 2011 to join a small startup named LiveRamp. As the company grew, he became Head of Integrations and Director of Engineering, where his team built and scaled over 1,000 data ingestion and distribution connectors to replicate hundreds of TB worth of data everyday.

 

After LiveRamp’s acquisition and later IPO (NYSE:RAMP), he wanted to go back to an early stage startup. So he joined rideOS as Director of Engineering, again deep in data engineering. While there, he realized that companies were always trying to solve the same problem over and over again. This problem should be solved once and for all.

 

This is when he decided to start a new company, and Airbyte was born.

Josh Benamram

Co-founder & CEO Databand.ai

Josh is Co-Founder and CEO at Databand.ai. He started his career in the finance world, working as an analyst at a quant investment firm called SIG. He then worked as an analyst at Bessemer Venture Partners, where he focused on data and ML company investments. Just prior to founding Databand, he was a product manager at Sisense, a big data analytics company. He started Databand with his two co-founders to help engineers deliver reliable, trusted data products.

Episode Transcript

Honor Hey, Harper, how’s it going?

Harper Going well, going well and just working through that day to day grind? If you know what I mean, I’m really excited for this episode of the day. We’ve got two really fascinating people really great minds in the data space. Excited to hear their opinions. We’ve got Michel Tricot from Airbyte with us. And then also our CEO at Databand, Josh Benamram, and CEO of Airbyte, Michel Tricot.

Harper Why don’t you take a minute here to introduce yourself and just let people know what you’re working on and how you got into the data space?

Michel Yeah, thank you. Thank you Harper. Thank you Honor. So I’m the co-founder and CEO of our Airbyte and Airbyte is a data integration platform. We’re open source labs community and I’ve been in the data space for the for the past 15 years looking at different scales of financial scale, a few hundred gigabytes on internet scale with where we’re moving hundreds and hundreds of terabytes every single day and can say I got quite a few scars from this experience grew pretty solid team on. What does it mean to move data reliably and taking all that learning and applying it to Airbyte to build the best-of-breed data integration platform with the help of our community.

Josh Thanks. Yeah. So I’m Josh. I’m CEO, one of the co-founders of Databand that work really closely with Harper and Honor and the team. So my background, I’ve done a lot of different kinds of things, but there’s always been a really common thread of data in the stuff that I’ve worked on. I was a analyst at a quant trading firm in the past while I was a again an analyst at a larger investment firm. I was a product manager in a data company before Databand and started Databand to help address some of the the challenges that I faced myself, or that I saw people that I worked closely with SpaceX and all those previous lives. And we can off to a good start.

Honor Super cool. Thanks, Josh. So. What I would love to start hearing about is actually origin stories. Michel, can you tell us a little bit about how Airbyte started and how it discovered itself to what it is today.

Michel Yeah. So. So we actually started Airbyte in January 2020 at the time, what we had in mind was the problem space. And both my co-founder and I had experience the problem space of what does it mean to integrate multiple, like very intelligent systems together. But we were still in a phase where we’re looking for what is a product we actually need to build. We went through, YC in January 2020, which was actually the first wasted batch that was like a COVID batch, so that interrupted in the middle and had to go home. And yet from there, what we were doing is really a lot of customer exploration, always talking with data teams. We started to build an initial product and with COVID, we realized that that product was was not a good one. And with all the relationship that were traded with these data teams, we started to figure out and see what Airbyte should be and what kind of product for what we had to build and open source became like something that is very obvious on how you actually solve data integration. We really moved forward with Airbyte in July 2020. We released the first version November 2020. And since then, yeah, we’ve been used by over 6000 different company. We have an open source community of three of over three thousand people, and we went from six very, I would say, MVP connectors to over over 120. This number keeps growing.

Harper I love hearing about the focus on community there. I think that’s one of the underrated parts of being in the data space is that whether it’s on Twitter or that’s in Slack or in a district that you’re in, there’s a lot of different outlets that people want to talk about their data problems and how they can really work through that management. And I can only imagine starting or deciding to start a company has to be difficult enough, right? But then having to do that in the midst of one of the greatest events that’s affected humankind, especially in our lifetime, I’m sure that’s created a lot of challenges for you and Josh from your perspective. How did growing a startup through a pandemic, especially one in the data space, were like the data needs were constantly evolving? How did that really change for for what you set out to do originally and then where you’ve ended up now?

Josh Yeah, it’s a great question. I think there are some trends, like building a start up during a pandemic. There are some trends that just lean in our favor that made this a lot easier than it would have been maybe five years ago. We’re not aiming to send a big team of salespeople and sales engineers on Prem and a client site and hook servers together and get our software running. Our entire solution can be deployed over the web, obviously. Most of our clients are accessing through the web, so our cloud and the accessibility of our solution through these remote channels makes it a lot easier to build a company this way within the team. All the different tools and tricks that we have to keep in touch with folks, and I think like Slack has been a huge part of how we’ve grown this company.

Harper Yeah, I think that I know for myself, I can totally relate to the the need to connect over the last year, right? That’s something that I think a lot of people listening definitely relate to as well. But it’s kind of interesting how that comes up in this conversation, just about the time that we’ve when we’re working through and growing through, but also like the particular lens that both Airbyte and Databand have taken when it comes to the data space, you know? I mean, Michel you mentioned that you all set out to create an integration tool of some sort and you’ve kind of found a way to make that work as an open source tool and you learn things from talking to different clients. I’m curious to hear why you felt integration was the place that you wanted to start in terms of addressing a need in the data space.

Michel Yeah, I think the problem is that the the thing is with data warehouses, has become the new way of consuming data and in organizations. And the thing is what your every single company by their brand new data warehouse, they start buying BI tools. They start hiring people to actually manipulate that around the warehouse. But they always face the same problem, which is how did you get the data into your warehouse? And they might ask one of the update engineers to say, Oh, let’s bring Salesforce data into the data warehouse says, let’s bring this new database data into the data warehouse. And very quickly I realized that there is an infinite number of sources that I actually need to bring into my warehouse, and I’m not going to bring and like build a team just for maintaining and building these connectors. And at that point, they are going to be looking around. It’s just, you know, we have this theory that is just a written script that you need to write to him. But when you have one, it’s OK. When you have two, it’s OK when you have three. That’s when things start to collapse, and we want to make sure that we are here when people face the server script, false script that they need to build, and we want to offer them an out-of-the-box solution at that point. So and at that time or so, it’s just it’s it’s one of my expertise. I think when you’re starting a company, you also need to make sure that you leverage your skill and your expertise as much as possible so that you can anticipate more of the roles that you might be facing the upcycle you would be facing. And yeah, I mean, that’s what I’ve been doing for the past 15 years. So data integration is just what fuels also the growth of data warehouses. If you don’t have the integration, you don’t have good use of your warehouse. So basically, how how can we feel that growth?

Harper Yeah, absolutely. I think it’s a really smart way of looking at the space myself, coming from a data warehousing background and kind of coming into the data space as the data engineering field grew. One thing that I really saw was that adoption of the software engineering object oriented approach. So that way you aren’t using these low code, no code situations with like Informatica SSIS before and kind of gets left out of the modern data side conversation, but that’s an episode for another day. You can check back in on that later. But bringing back the conversation to how integration really with really relates to that connection with teams within the pandemic. In order for us to communicate well that data, then we really have to take a step back and observe what’s going on and what isn’t what isn’t working. And you’d mentioned that you wanted to be this tool that people could come to you and have a standard way of getting these integrations going. But whenever those are running, you also want to be there to help them whenever they observed something’s wrong. And I think that that’s kind of one of the things that I find exciting working at Databand that brought me here is that observability aspect in the data space as a whole. It’s still new, and it’s something that we’re working as a data community to really understand and solve the best way. And does that kind of resonate with the what led Databand into working on the observability? Not only that space as a whole, but specifically focusing in more on the. Shifting left more on the integration side and more on the source side of everything from the observability perspective.

Josh Yeah, that’s a I think it is a good framework for us. I think so we we started the company a little bit before the pandemic. We got started around with in 2019, and at that point in time, we were working with a small number of really a select group of design partners in the system. And there was something very consistent that we would hear and working with those design partners, we were helping them solve a number of different kinds of challenges around their data pipelines. And the thing that would keep coming up is what’s going on with the data, what’s going on in the data. And this is what really shifted our focus from the first level to this observability problem when we started even using the term observability to describe the value that we provide in really do within that. When we as we started working in the system in this problem set we saw an explosion of tools and approaches that are addressing a similar kind of need. How do you understand whether data is reliable? And on one level, this was really encouraging for us because we realized, Wow, this is a problem that every company is thinking about and a lot of other stars are thinking about it. They must be real, which was very encouraging. The other thing that it made us think is, OK, what’s unique and different about what we do within data down within our company that will help us stand out and set us apart from all these other approaches that that we see and what we. Used to inform that focus that we have is, again, those clients that we work with and what we saw as a clear pattern matching within our user base and the market behind it. And what we saw because I mean, we already had sort of this baked in bias in our company towards more engineering driven organizations like really data engineering chops, big data organizations, companies that are working on analytics, machine learning, a lot of different use cases for their data products. What we consistently saw was the origin of issues and the key challenges that were hardest for them to solve and where they wasted most of their time as engineers trying to fix were problems that came from the ingestion part and just getting that data in initially, because to Michel’s point, we’d probably prefer the term lakehouse for the kind of architecture that we see within our user base most. But if I use that warehouse mentality. For companies that ultimately deliver data into that warehouse downstream for our user base, there’s often a lot of work that happens before it gets there, and there’s a lot of question marks if there isn’t a lot of work, there’s just a lot of question marks on what’s going on in these sources before the data even lands in the warehouse that might throw off models, dashboards, pipelines, machine learning. And what we’ve been really affected by doing is helping those organizations like drill into those specific challenges coming from certain sources. So I love that just the volume of data sources that Michelle was talking about that Airbyte supports is really cool because the the number of sources that our clients work with is just growing every quarter compounding and Michel’s ability, Airbyte’s ability to make that process scalable and sustainable for them and our ability to help them understand when those systems are misbehaving and how to catch problems as soon as they come in. That shift last mentality, this is something that I think really just relates to what we want to do and sort of following a little bit of how we stand out in the market as well from our observability perspective.

Honor Michelle, do you have any use cases that come to mind that? like the volume, the volume of external sources that are that folks grapple with and changes that occur. Being able to check your data quality early on in the process. Is this something that resonates with with everybody, within the Airbyte user community?

Michel Yeah, very much. Actually, the thing is. Our users were a different type of of people using, but we have the data engineer was actually maintaining our python, exposing it as a platform to to the organization. And then we also have, like all the data analysts and leveraging that data to extract insight into into action and to activate that data. And one thing that the users of the data want is to make sure that the data they’re working with is stamped with quality. They don’t want to have to discover that after spending a week on this analysis, suddenly actually now there is a problem with the data. So, and to to Josh’s point what you want is to discover this kind of issue as quickly as possible because the data comes from the source. So if you can detect directly after it’s been extracted from the source that there is something wrong, well, that there is something that is drifting on your data, you want to interrupt your flow and you want to start on the resolution side to make sure that before it lands to the place, what is going to be consumed? It is, it is. It is fixed and it is very so it resonated completely with me. And after that, it’s like, how do you create solution and how do you get this like two or three solutions to work together on the one hand to detect on one end result? And then you also want to potentially stop and restart the replication?

Josh I’m curious what kinds of use cases you think Airbyte really shines in? I mean, what are the best use cases for you? And when do you see customers really gravitate towards your solution or engineers really gravitate towards your solution versus other ways of integrating data? Yeah. And when

Michel do you have an example of like, what do you mean by as a way of integrating data at that

Michel point?

Josh Building up their own custom python that they run from a scheduler or from Cron, or using a no code integration tool like a Matillion or Stitch data or something like that?

Michel Yeah. Yeah, actually, the thing when we’re thinking about Airbyte is that our goal is to really unify these use cases. We want to make sure that there is a central place where data is flowing through, and this is very important because it helps integrate with observability tool a lot better. Instead of having to integrate with five different data integration platform, you just have one. And we try to address most of the use cases that you are describing, which is the no code one which is we have. How do we provide out of the box connectors? And this is what we’ve been focusing a lot of effort on. Then it’s like, how do you provide a platform on which a data engineer can build a very custom connector? Sometimes you have, like internal sources, we want to make sure that they can build it and they don’t have to be able to scheduler around the data on how to monitor Cron job around it. So just put it as part of the Airbyte platform and the other one which is, you have this out of the box by doing what you see sometimes like data. Teams doing is just. The out of the box connect or low-code connector on no-code connector doesn’t fit exactly what’s in. Maybe it’s missing some data model. Maybe it’s missing some fields on the data and that one want to make sure that people can take this out of the box, connect on and tune it for their own need and then put it back into our minds. So the goal here is really about unifying data integration and make sure that you have the visibility and the control of all your data pipes and the one single one single platform. And we have a lot of people today who are using, yeah, whether it’s Matillion, whether it’s Stitch, whether it’s Fivetran. But you always realize that these only solutions are very limited in the sense that on either they don’t have the flexibility they need. So as you say it, just building internally, and we want to make sure that we help them build internally and have that know, know or it’s not customizable enough. So at that point, same as building internally, want to move that to Airbyte some of the cases, which is its want to keep data under control. I think this is a pretty big requirement that more and more organization have, which is I have a product database of very sensitive data. I don’t want my data to be from any or someone else’s cloud. I want to make sure that the data movements remain on my infrastructure. And at that point, because they can’t use a low code, no code service that people have access to the data. What they do is just they build a connector internally and was an open source helps a lot that we discover these use cases. So you just have platforms that can use it internally and they can access and move the data without having anyone having access to it.

Josh So just a quick question on that to you. Does that mean that comes from a security or data privacy rationale that they want to? Okay, that’s interesting. So some users prefer to build an Airbyte connector internally just for the purpose that no data is getting piped through an additional hub of some third party integration service.

Michel Exactly. Like if you’re looking at like the Stitch and Fivetran, for example, they’re all hosted solutions. So if you want to replicate data from a sensitive finance database or some internal like sensitive internal resources, especially when you go into a regulated industry like fintech and the help, this data can look so like somebody has remained inside so they will not choose the solutions they would always build internally. And that’s where we want to make sure that they have a platform on which they can build it integration and put on, you know, replicating data from a database was there. It’s something that you can expose to the outside world or whether it’s something that is internal. It’s the same. So what if we just move this out-of-the-box database connect and let it run within that structure directly?

Josh Yeah. Do you um, do you often find those kinds of use cases, the more security conscious ones are those more prevalent for companies that are building up connectors to really unique data sources like you mentioned the financial data source. So maybe there’s a hedge fund that has some unique data set that they’re connecting to, and they use Airbyte to standardize that. Do you usually see that that kind of issue or that kind of rationale? Tell me more from unique data sets, or is that also common for what I imagine the standard ones being like AdWords, Facebook ads, LinkedIn ads, that standard crowd?

Michel Yeah, no less so. I think as long as you made a decision already to use Facebook and Google ads, you’re fine using a SaaS service and you’re fine having the data flows through that. That’s when you start connecting your product database or like SAP, it’s like a safety systems and things like that, all internal cues. That’s when you have this requirement. Also, there is a price consideration that which is getting your data to flow across multiple cloud is more. It is inefficient and it’s more expensive and at that point, you can just create a direct pipe from your infrastructure directly into your warehouse. You can you can save a lot. So there is all this is there is privacy, the security aspect, because there is also the cost aspect to.

Harper So I’m always entertained by what I tend to hear common themes regardless of the conversation. You know, when you’re talking in the data space, everyone has the same types of problems. They just look a little bit different within their domain. We talked a little bit about saturation being something that leads people to adopt everybody to a certain number of sources where it’s just not easy to manage all those. We talked about security being a big reason that people are coming in there. And from your perspective, Josh, is there a certain aspect of managing integrations that makes people reach out and say, How can I get a better view of this? And how do I find a way to make observability a part of my integration plan as opposed to a secondary thought when it starts integrating with those sources?

Josh Yeah, that’s a great question. So I think first off, I think that the data quality, data reliability, observability space, it’s still so new that I think on one level, it is rare for companies to come in and know exactly where they need to be looking in their stack to start integrating data quality. So there’s some leadership that we need to provide insight into the market when we see use cases that line up really well with the problems that we solve them and help teams understand our view, best practices on how they should get, they should get started. I think what what is interesting is there’s an obvious correlation between the number of data sources that you work with and the level of, as you put it, the security requirements around those data sources and how closely you need to be watching them and the demands around data quality that that need to be applied over there. So if you have one data source that’s moving into some simple tables in a warehouse in BigQuery and then that’s going to look for the the drivers behind a big data quality project might not be that significant that you need to bring in some new solution to really help you. But if you have a lot of data sources, there’s there’s a lot of saturation, as you put it. There’s a lot of variety in the data that you’re working with. That issue will compound and it will become really important to bring in a observability tool just as an example. And I think Michel is going to have it’s a little bit like we’ll commonly see, let’s say you have 20 or 30 different data sources that you work with. You’re using Airbyte to standardize your connection to those that’s being delivered into a lake or warehouse GCS or BigQuery, and you notice in a dashboard somewhere that there’s less data than you typically get. And now the question is, OK, where is that coming from? If you imagine the flow an engineer needs to go through to find that kind of issue today, they need to see the table underneath that Looker dashboard, understand that the data looks normal there compared to history. Hopefully, they’re tracking some level of that. If that table is the aggregate of a number of data sources, there’s plenty data sources are getting into that table. Now we’re going back a step and we’re trying to look at each of those independent data sources, each of those, you know, Airbyte connectors and trying to understand, OK, which one here is the root cause? Or is it some combination of all of these in some weird statistical anomalies? So the more complexity, the more variety, the more the more saturation that you have at that data source level, the more important it’s going to be to be watching how those connectors behave, really at the point that the integration starts because it will just shift the mentality from let’s let’s see the issue on the dashboard and then trace back to what’s find the issue as soon as it arrives and nip it in the bud before anyone downstream maybe even notices it. So definitely see that correlation there to what Michel was mentioning.

Honor Yeah, I’m seeing this from what you’re both talking about, and it seems like so, Michel, on to your side of the business with data integration, the need for custom connectors often is related to security, privacy needs. And then, Josh, what we’re talking about with data observability, the volume of external data sources that we have to manage often means that data observability occurring left of the warehouse earlier in the process is a good idea. So curious to get your thoughts on what does this mean? What does this portend for the future as we shape how this space shakes out? A lot of the conversation around the the data stack the MDS tends to occur a little bit later like warehouse and there and to the right of it with this shift towards the left and everything that you’ve mentioned. What does that mean for the overall space? What do you what do you see? What kinds of changes do you think are needed for us to place more emphasis there? And I’ll start with you, Michel. If you have thoughts around the.

Michel Yeah. I think what people are are realizing right now is the thing data is a factory and. We’ve started to be very good at how on the right side of the warehouse, which is you have all these beautiful tool and BI tool that work very well, but and that gives you insights you want. But as people become more empowered with data and have more greed to get more and more data, suddenly all the processes that we had before don’t scale anymore, because now people want more sources. They want more places where they have more data sets, they use more tools, they want to centralize more. And this is typically the problem of a factory, which is first you have a very white glove way of processing. It’s one of two data sources. Now what we’re doing is we’re starting to build the factory to manage all the data value chain, and it starts with getting the data. Observability is basically the quality check on top of it and making sure that all your manufacturing line are actually behaving the right way, and that if there is a problem somewhere, you detect it right away. And I think organizations are starting to realize that they need to find to build the right process for them by assembling all these tools together. But ultimately, that’s what we’re going for. Let’s let’s make it a factory with a manufacturing line, data flows. You have sensors you have to check, you have quality checks and you try to minimize also the amount of human intervention and human interaction to make sure that the data flows and you minimize the error when before the data gets leveraged. So we’re building this manufacturing line right now, and that’s what’s going to happen in the future, and we’ll see more and more of these tools to streamline the process.

Josh Yeah, I think that’s a great analogy. I think the the standardization and the tooling that Michel and everybody are bringing into our space are critical for companies like to them that there’s maybe I don’t want to admit it, but there’s really nothing for us to do if there aren’t tools like like Airbyte that actually move the data from Point A to Point B. We’re watching that data. So there’s no data moving. There’s really no discussion here and tools like Airbyte that make that process a lot easier for organizations to scale up and maintain and feed that data hunger that data demand makes the need for a solution like ours all that more important. I think there a we talked about security a lot. There’s there’s an interesting analogy here with web traffic security, where when you’re a user or signing into a website, there’s a contract there between the actual service that’s giving you access to the web page and the technology to make sure that it’s secure communication back and forth from the user to the website. There’s that contract that sits there, and just like as a user, you don’t want to be inputting credit card information into an unsecure web channel as a data organization. You don’t want to be no offense working with tools like Airbyte if you can’t guarantee that the data coming in is actually what it should be, because those will be if the data is wrong to no fault of Airbyte from that external source. If that data is wrong, you’re going to make some bad decisions based on it in the dashboard. So you’re analyzing. So there’s similarly needs to be this contract. Ultimately, between the integration tooling and the data certification qualification data observability tool to pull something like data about. And that’s I think, one pattern that I see and I’m excited about and I hope comes out of a partnership out of these kinds of conversations. The other aspect of the space that I’m interested in and excited about I mentioned before is this idea of the lakehouse and how we see more convergence of processing into more centralized tools and how we see tools like Snowflake trying to move up the stack and more of the engineering layer and having more of this Spark-ish processing that people would otherwise do at Databrickks. Then Databricks moving down into the warehouse with Delta Lake and delta tables. And the convergence. There is really interesting for us because we’re working with teams that have so many different types of data but still want to centralize everything and having some some tools where you can put unstructured, semi-structured or fully structured data and just have everything in a single place. Is it exciting? Definitely. I think we’re seeing some interesting patterns around how ELT and ETL are changing and where observability and integrations step in with that paradigm. Like in the lakehouse environment. We see a little more duplication. You know, there’s ELT into the lake or staging layer, and then there’s ELT again into the warehouse. There’s this kind of ELT, LT or ELT squared happening there, which is a really interesting pattern for us to watch because we ultimately want to make sure that that flow of data and to act is is what it needs to be to guarantee good reliability. But those are the things that come to mind up to the questions.

Michel There is a. A parallel to make was when you like you, you a developer, your code is on, it is not for the sensing if it’s testes versus it’s untested. And I think it goes the same way for data, which is if you have data, but you cannot track it back to where it comes from and don’t have quality should step one on top of it. You will always second guess your. The insights that you get from it. So yeah, having the tool like it Databand to actually ensure that data is what it is and that you can track where it’s come from and you can go. Tracking back to the initial source is mandatory for any and any serious data team to which you must prove that your data is correct before you start acting on the insights you get from it.

Harper I think that’s really good advice and a really good way of thinking about the evolution of the space in the paradigm that Josh mentioned to I. And, you know, if anyone’s listening to any other episode, everyone knows that I’m a big fan of the DataOps movement and how we can start applying these dev ops items to the data space. And I think that really that mentality that started to mature over the last five years or so is really what’s leading to a lot of the tools that we’re seeing, particularly like Airbyte and Databand. And I think everyone in the data space realizes that the complexity that exists there is what makes it fascinating, it’s probably what draws a lot of us to that problem in the first place. But I would like to ask you all for, I’m going to put you on the spot here before we close out and I’m going to ask you to pull out your your crystal ball. And we started this conversation kind of talking about how both Airbyte and Databand when got started and how you had one perspective on the market. But as you’ve gone through that, you’ve realized that the needs of your of your user base is a little bit different. You’ve adapted to that. So if you if you had to pick something that you hold true today. That you think would potentially actually evolve over the next 12 to 18 months about your user base? What do you think that would be like? What do you think that you can learn more about your user base? So that way you can continue to deliver the product that you want to and the oddest couple on there is like any advice that you have for the people listening on how they can take advantage of this paradigm shift in the data ops when it comes to looking at the integration and observability space.

Honor So recap. So you’re asking for a prediction and advice, right? I’m just I’m just something that.

Josh Yeah, prediction about our users, which is I like that question and advice. So. Prediction about our users when you think about that for a sec, I think what what’s interesting about how we see this team, the teams that we’re working with evolve. First of all, how rapidly they’re changing their architectures and the convergence on to a best-of-breed stack that we’re seeing really across the board. So if there’s something that I would like, first thing that I would like to learn more about from my user base is what are the tools that they’re really excited about using in the market and how quickly do they expect to get there? Because that informs like our strategy and Databand how we want to connect them to these different services and make sure that we’re well integrated across the stack. So from a tech level, that’s really interesting for me to to watch. Are they going to be going into who’s going to be going into Delta Lake and Delta tables and Databricks that ecosystem and who’s going to stay within Snowflake and build more into that stack? That’s one one area of interest. The other area of interest for me is since I think we’re we’re so early on in the market and the in the data quality domain. And I think there’s a lot of great companies that are doing a lot of leadership in this space and really helping teams understand how they can be better measuring things. But what I’d love to know more six months from now relative to where what I know today is how different is data quality from organization to organization and what kind of levels of inspection of data sets and alerting and anomaly detection all these different techniques. What’s the best layer, the best abstraction layer to offer into the market as it as a general platform that’s multi-purpose across different teams? What’s the right layer in there to apply where you, you address, you know, 90 percent of the needs across 90 percent of companies? So I’ll be happy if six months. Looking back, I know much more clarity around how all different kinds of organizations really think about their data quality and what level is really making the biggest difference for those different data orgs. I’ll let Michel come up with advice also.

Michel Yeah, I really liked what you said about the best-of-breed. I think that has been a huge shift in how data teams are thinking about data, which is they used to use and to buy this end-to-end solution that tried to do everything but only do 70 percent of it. And after one or two years, you start having like this parallel track that starts being grown within companies and. Data teams are going to more and more toward, the decomposition of their data stack and taking the tool that is the best for them to solve the problem that they have today in having the future and make sure that this tour can be if they hit the limit of what tools they can actually change to another tool that might bring them further. And having this decomposition not only gives you more, not only gives you more, gets better for the future, prepare you better for the future, but also it allows you to quickly add new use cases. And when you’re thinking of data, I think what we’ve seen is. You need to have like a lot of pressure that is still put on data engineers and technical profiles to actually do something with the data and with the findings that we’re seeing is that these teams are actually becoming the platform, the data platform layer for the organization. And suddenly, you can invite user that a lot less technical, but extremely data savvy to leverage that data. And the best of breed there, I think, is going to evolve more and more toward the simplicity of usage and enabling less less technical users. But people that have a goal on what do I want to get from the data instead of leveraging always your data team to do this analysis? And you know, that’s always the same thing. When you want to do analysis, you ask one question as to one question, you have ten more that you need to answer and the person was actually asking you, this question is probably the best to actually go and dive into this hours question instead of someone replying to that question on their behalf because they would probably be stretched in terms of time instead of just understanding of the problem. To go further into analysis. And yeah, I think 2020 to 2023 is going to be about how what the tools that enable data teams to become a platform and the simplicity of these tools to enable more growth.

Harper You had no idea that you did this, but you just touched on my favorite discussion point inside of the data community, when you when you talk about bringing in those people that maybe a little bit less technical but really have the domain knowledge and have the ability to not only ask those questions, but answer them. A finding a way to get them involved in that data platform, I think is incredibly valuable and bringing insights all the way from the bottom up to the top so that the best decisions can be made. And I think you’re right that it comes down to the data space evolving to have these best of breed tools.

Harper So anyway, so I really enjoy this. Michel, thank you so much for coming on. I really enjoyed your perspective and I’m really excited about the Airbyte project. It’s a really fun tool. I’ve dabbled around with the CDCC, like kind of messing around and just enjoying the fact that it takes me, what, 30 to 60 minutes to build out, you know, relatively small API that would have normally taken me, you know, maybe an entire week to really figure out previously. So really excited to see that going. Josh, always a pleasure. I don’t think that there’s anybody that can speak to observability on the integration side as well as you can, so it’s always fun. But thank you all again and see all next week.

Honor Thanks. Take care. Bye.

Michel Thank you.

Additional related links:

The Data Supply Chain: First-Mile Reliability

End-To-End Observability Goes Beyond Your Warehouse