> Episode Details

Built For Complexity: Data Observability Meets Data Integration

Taylor Murphy, Head of Product and Data at Meltano, introduces the history and vision behind the open source brainchild that spun out of GitLab. In a world where businesses rely on a high variety of external data sources, many of which may be obscure and unique to an industry, the ability to develop custom connectors is indispensable. The growth of the Meltano community signals a larger trend where data teams are choosing tech stacks that adapt to their business contexts rather than the other way around. While data observability has become embraced as a data quality must-have, the industry recognizes that growing complexity calls for a distinctly proactive approach to data observability.

About Our Guests

Taylor Murphy

Head of Product and Data Meltano

Taylor Murphy is the Head of Product and Data of Meltano, an open source data platform that enables collaboration, efficiency, and visibility. Taylor has been deeply involved in leading and building data-informed teams his entire career. At Concert Genetics he scaled the Data Operations team to enable the management of hundreds of thousands of genetic tests and millions of claims records. At GitLab, he was the first data hire where he focused on building and scaling the data organization as the company headed towards its IPO. He has been involved with Meltano since its inception, acting as the primary customer with whom the team engaged to understand the needs of modern data professionals.
Taylor is passionate about maximizing the potential of data and building the future of the data profession. Outside of work, he loves spending time with his wife, two boys, and two dogs.
He graduated from Vanderbilt with a PhD in Chemical and Biomolecular Engineering and the University of Tennessee at Chattanooga with a BS in Chemical Engineering.
Twitter: @tayloramurphy

Josh Benamram

Co-founder & CEO Databand.ai

Josh is Co-Founder and CEO at Databand.ai. He started his career in the finance world, working as an analyst at a quant investment firm called SIG. He then worked as an analyst at Bessemer Venture Partners, where he focused on data and ML company investments. Just prior to founding Databand, he was a product manager at Sisense, a big data analytics company. He started Databand with his two co-founders to help engineers deliver reliable, trusted data products.

Episode Transcript

Honor Hey, Harper, how’s it going?

Harper Hey, I’m just rocking that day to day. Excited to talk to you. I think Taylor from Meltano is with us today is all right.

Honor Have Taylor as well as Taylor? And also we’ve got Josh our CEO on as well, so maybe Taylor, we’ll start with you. Tell us a little bit about yourself.

Taylor Yeah, so thanks for having me on. You know, longtime listener First-Time Caller. So excited to be here. Yeah. So my background is I am a chemical engineer by training and kind of fell into the data world coming out of grad school. I worked for a small startup in Nashville and really gained a lot of my data chops there to the point where I was the data team guru out around me. And I was kind of leading the data organization there for for about four years and then moved over to GitLab, where I was a data engineer as well led the data team there for a little bit. Yeah, now I’m with with Meltano, I’m happy to talk more about the Meltano as well, but that’s that’s the short story for me.

Honor Awesome. Well, welcome. Really excited to have you. And then Josh, I feel like you’re practically co-hosting now. You’re on every other episode or more. So maybe just like a quick one liner about yourself.

Josh I hope people recognize me by now. Josh CEO at Databand, helping to make data engineering teams deliver a little more reliable, more trustworthy data to their consumers, working really closely with Honor and Harper on helping to get that done.

Harper So to I have to ask for you to tell us a little bit more about how you all came to the name of Meltano and how you all kind of started there and then how you came out of GitLab because I think it’s a really interesting tidbit.

Taylor Yeah. So Meltano has an interesting story and background. It was originally a kind of a project within the data team, but also a special interest to Sydse Sijbrandij, who’s the CEO of GitLab. It was called Biz Ops originally, and so the idea was like, Well, everything that we’re doing with GitLab in terms of DevOps and DevOps tooling for software developers seems to make sense for data teams and native professionals. And so the original idea was like, Well, that we’re just going call this the business project and start figuring out how to build essentially like a GitLab for for data teams. We eventually at some point decided that business wasn’t the right name, was actually spinning up a business operations unit internally. And so we needed to come up with something. And so we actually I remember distinctly we were having like a large brainstorming session about like, what can we call this? And there were some weird names that we had originally proposed. I think one was like Buffalo Kick, or just like smashing two words together to come up with something that wouldn’t possibly be trademarked. At some point, we started going through like, Well, what are the different stages of a lot of data life cycles? And so there’s extraction loading transform, you know, transformation analysis, modeling. There’s like Jupyter notebooks, there’s orchestration. So we had all these letters and we started kind of jumbled them up. And basically, it’s originally it was like an initialism of those different things. So model, extract, load, transform analysis, notebook orchestration. We’ve now moved away from like being or we’ve moved away from the strong Initialism representation. Now it’s just Meltano is what Meltano is. But yeah, those those are the origin stories and we always have ELT in the name, I guess.

Harper Is there like a weekly award? Forever can say model, extract, load, transform, analyze, notebook. I missed the orchestra basically like, say it seven times faster. Yeah, I think we should do that. That would be a good community community award. I like this. I like this. We could do like a like a fluffy bunny competition to see if we can understand what you’re saying as you get the marshmallows in it. But how did you all? So that’s where you all started and tell us a little bit more about the vision and how Meltano has grown because you all started focusing on a different like a specific part of the data lifecycle. Is that right?

Taylor Absolutely, yeah. So the original mission, ambition and we’ve kind of we’ve circled back to what the original ambition was, was to build a tool for the entire data lifecycle end to end. And that was the original focus. And we had a few developers and kind of a project lead on there. And I was essentially the prototypical customer for business at the time that that became Meltano. And so they were kind of building different parts of the stack. We leaned in on the singer ecosystem. We found that at the time that Stitch had started and we’re building that we were using dbt working on some kind of an Open-Source modeling layer. We were, you know, look, hml was prominent at the time and still is. But we were using Looker towards the end of 2018, and it was kind of we were kind of all over the place. We knew what we wanted to do. We wanted to to build a tool for the end to end lifecycle. But at some point, it wasn’t able to keep up with the needs of GitLab as we were scaling because I joined when I was about 200 people. By the end of twenty eighteen, I think we were somewhere around like four or five hundred people. And at some point where, you know, I basically had to say, is the manager like, Well, this isn’t ready for production, but it’s valuable. We want to keep using it. And so we’re going to actually, you know, go pay for a couple of tools over here while we keep working on this. So that created a little bit of a cleaner separation between the data team and the Meltano team. Then from there, they continue to iterate with a lot of feedback from us, at some point they hired Danielle Morel to be the general manager. I think this was in early 2019 to be the general manager of the Meltano team. And one of her focuses was to to focus kind of on the end of the data lifecycle. So focusing on BI visualization and being kind of an open source bi tool for startup founders in particular. There wasn’t a ton of traction with that, and eventually GitLab decided to scale down the team to just one person, which was Dalaman and who’s now our CEO. When it’s scale Douwe, the Douwe without kind of took a step back and looked at the ecosystem, looked at what had been built within Meltano and what were the immediate needs of the the world that he saw around him in terms of data and data challenges, and really came to conclude that tooling around extraction, loading and transformation, coupled with the power of software best practices. So, you know, DevOps for data or data ops was kind of what people were asking for, even though they weren’t articulating it very well. So in May of 2020, he pivoted the project to really focus on open source extract and extraction loading and transformation with that very strong data ops foundation. And that’s when we started to see traction. The interesting thing about that pivot was a lot of it was pure marketing. In a sense there nothing fundamentally changed when we made that decision in the project. It was just positioning. Meltano had always been very much plugin based, built with, you know, data ops principles in mind. So there’s, you know, YAML file their version control and we’re thinking about these things kind of from the start. So the pivot was to focus on that, but still have the broader ambition. And now that we’ve realized, it seems we’re getting along. But now that we’ve spun out, we’ve recommitted to that larger mission and vision of having the end to end data ops platform, but really focusing in on data integration first and now is a great solution for that. And now we’re really building out kind of the rest of the the stack as it were.

Josh That’s really net new information for me, actually, which is fun to hear on our on our humble podcast. But I’m curious because I thought, well, know it’s just really zeroing in on the integration requirements and the new open source ELT angle. Are you when you say a broader end to end product, are you? Who is it that you’re thinking about competing with or are you trying to take over the Airflow space with orchestration that last? “O”, like, how do you see your product evolving and taking over more of the center and stack? And what does that mean for you?

Taylor Yeah. So where we’re seeing and where we really want to fit in kind of comes into the framing of like the data operating system. So recently, Ben Stancil from Mode has talked a lot about this, where one of the missing pieces in the current data ecosystem is this kind of overarching operating system. So people are buying great solutions for some of their pain points. Whether it’s a reverse ETL or operational analytics they have, they paid for a good BI tool. They’re paying for a good data warehouse and then maybe they’re adding more. So now there’s, you know, there’s a new metrics layer that you need. Yeah, you’ve got some extraction and loading tool. If you’re using dbt for transformations or, you know, there’s like a jumble of tools that you need to make a modern data stack. And now people are having a challenge of integrating those, having those kind of bilateral integrations. And then also they’re starting to see, well, I’ve got this great data stack, but now how do I know when I want to change something upstream? Is it going to break something downstream? And so there’s these hacky ways to to to figure that out or you write custom scripts and we’re taking the approach of OK, we believe, like our long term vision and belief is that open source that for basically any tool out there, some open source and an open core tools can compete with the big boys. And we’ve seen that in software development generally. And so where we see us coming in is providing that data operations foundational layer and enabling you to kind of bring your ideal data stack into Meltano. So we want if you want to use Airflow, that’s great. We think you should use it. We’re going to kind of provide that layer to enable the bilateral integrations to be much easier between your different tools and provide that foundation so you can have isolated environments, end to end testing and between each of those. And then eventually have kind of the the overall single kind of control plane view of your entire data stack. That’s that’s kind of the the division right now. Is that resonating with you on?

Josh Yeah, I mean, it’s interesting, I think one one. I mean, absolutely like the fragmentation that exists within the stack now is a pain point for everybody. I think most organizations want a single tool to do everything. Why work with seven different panes of glass when you can do one? I think that when you could work with one. I think the trade off there is making compromises on best of breed solutions. Things I’m really focusing on are the best at that particular layer, and I think that this trend is something that we just continually see happen in the software engineering and infrastructure world. Things flowed from fragmentation and best of breed to consolidation. So it’s interesting seeing its own product roadmap there. One thing that that you you point out there was around the Open-Source closed source aspect of this, and I think that’s something we also think about a lot. We have some elements of our product that are open source, but really, we want people to use that today, at least with our commercial application. It feels a little more like Datadog when you’re working in our tool in the sense that they have that open source agent that someone’s running off with native dogs, open source agent and using it independently, really. And so there’s different levels of open source and then. How I’m curious how you see what areas of the stack do you think will remain dominated by open source and and what will be latimore too closed? Or do you just see open source on everything? And I think Snowflake and Databricks is an interesting angle into that as to very dominant players in our domain.

Taylor Yeah. So there’s a couple of things in there. So I’ll kind of go in reverse. I think long term, we we have this just belief that for any product out there, there will be some open source competitor that is as good or better. To your point about what’s going to take the longest. My sense is that probably the data warehouse layer will be potentially the slowest to to have that strong, strong competitor just because products like BigQuery and Snowflake are fantastic and you get a ton of value out of them immediately. But we’re seeing a ton of Open-Source players in the space for that specific concern around. OK, well, we have, you know, there’s an open core product or an open source product. Our mission right now is to enable everyone to realize the full potential of their data. And we specifically chose the language around everyone because we want to be inclusive of kind of a global audience. I think we. Globally, there is a need for better data tooling, no matter where you are. And it’s too easy to kind of focus on basically the U.S. market and enterprises that have the money to pay for the tools that they need. And so we have a deep desire to build a fantastic open source, free product that gives you a lot of the benefits of the data ops platform that you can use for your, for your team, for your company, as long as you need. And then only once you need kind of the enterprise level features, that’s when then you would most likely pay us. We we want to work with open source and open core vendors in a very collaborative manner. We do see a world where as Meltano is able to connect with more parts of the stack that adds value to all these vendors. If Meltano can be a great way for different companies and for different users to try out different products within that within their data stack. There’s, you know, good integrations. And of course, if we’re building, if companies are building good APIs connections, we can integrate with those as well. So there may be a world where you’re using a paid hosted version of some bi tool, but there’s still value that Meltono can add. There be the API via collecting either some metadata or and adding the the isolated environments and end to end testing with the rest of your stack. So we’re really going for the plug and play approach. Kind of. A rising tide lifts all boats approach to the to the market.

Honor Taylor, with data integration, since that is the area where Meltano is first focusing on, what do you think is the most important thing to get right?

Taylor Data integration and part of the reason Douwe did the initial pivot is for basically anything you want to do, you have to to do data integration, whether it’s moving data from point A to point B, so it’s in the right place to like. I have 10 different data sources and I need to combine them. So it’s about do you have data in the right place? Is it of the right format and is it of the right quality to answer the questions that you have? And that’s the start of any journey, whether it’s I have a spreadsheet in Excel and I want to read some SQL queries on it, or I’ve got a piece of paper with some survey data, and I need to input that into a database somewhere. So it for anything you want to do, that’s where you’re going to start is you’re going to extract the data, you’re going to put it somewhere and then then you’re going to transform it. And so we felt it important to get that right and be really good at that. And then we can and be a compelling solution for data integration alone to give us that that strong foundation to show people that we can also do the rest of that data stack as well.

Honor Mm-Hmm. That really makes sense. And I actually do want to bring in Josh on this point. What, Josh? What have you been seeing with some of the customers that we’re working with at the integration layer that you might have been surprised by?

Josh Good question. So I think I mean, starting our journey. One thing that surprised me was going into teams that you really don’t, I think intuitively maybe appreciate how data intensive they are and how many data sources they actually work with. I think there are some kinds of businesses where you can sort of assume from the outside, yeah, this looks like a big enterprise are probably working with hundreds of sources, and you can imagine a lot of complexity scaling there. But one thing that’s been surprising for me was how much more prevalent it is across the market, even in nimbler scale up start ups. Organizations that are working with dozens and sometimes hundreds of data sources in domains that you wouldn’t necessarily expect to have hundreds of data sources applying it to a team of, I don’t know, 10 or 15 data engineers. So, you know, we have we have client cases and gaming. We’ve seen a lot of cases within the ad tech world, within even consumer products where folks are working with many, many different data sources and just the volume there is surprising in our, I think, initial journeys into the market. We actually spend a lot of time with with machine learning teams and one of our realizations there was there are certain there’s a there’s a group of really advanced organizations that are doing really heavy stuff with ML, but these are really leading the pack in terms of data teams and most organizations out there are still pretty new in that domain. And I think where the market has really matured a lot more quickly is companies that just work with a huge amount of variety. So that’s been surprising to me. I think what’s also been surprising is good for a company like ours, but just how challenging it is to know when you’re working with a source of data that it’s coming in as you expect and that it’s reliable and that you can depend on it even before all the craziness begins to happen without you’re you’re analyzing that data. So that kind of leads well into the types of problems and we focus on, but also very surprising.

Honor Taylor, do you see a parallel with how the Meltano community is building out the taps and targets? Like is what Josh just talked about? Is this something that you’re also seeing from the how your community is developing all your different connectors?

Taylor Yeah. So that’s one of the great things about kind of how we’ve approached it integration is we’ve really leaned into the Singer specification for data transfer. So Singer was started by Stitch, who was then acquired by Talend. But the singer specification is open source, and it basically just describes how to software packages will communicate their data in a way that kind of makes sense to people. And so it’s, you know, like JSON new lines, there’s a schema message. There’s records that come through don’t need to know the specifics of the spec, but the cool thing is, if you build a target that can accept data in that specification, you can build any number of taps as what they call their, you know, connectors that can output data in that specification. And so what that does is it unlocks kind of the long tail of connectors for these companies that have data sources that I’ve never even heard of. So for any kind of thesis is that for any data integration company, they’re typically going to top out somewhere around 150 200 connectors that are like high quality and well-maintained because that’s just the kind of parallel distribution of companies that use these specific products need data out of them. But there’s this long unserved tail of data in different products that you can’t go to some of these vendors and find connectors for. And that’s where the Open-Source aspect really comes into play is. We’ve built on, you know, Meltano side. We’ve built an SDK that makes it extremely easy to build a new tap to pull data from an API, from a database or even from a file and output that data in the singer format. And then you can just use it with any of the targets, whether it’s Snowflake or Redshift or Athena or, you know, CSV or whatever, whatever you need. And so it really enables small teams to say, Well, I’m, I think, the most interesting connectors that we have. So we one of the things that we’ve launched is called Meltano Hub. It’s a place to discover and find all of these connectors that the community is building. And some of the fun ones that people have built is a like a Rick and Morty tap to get data from this random lot of Rick and Morty API. I’ve seen people build a tap for a rock climbing gym like there’s management software for rock climbers, and I don’t I don’t know that you would ever get some of these bigger companies to actually write a connector for that. But because they were able to write to the spec, they’re able to do it and, you know, off to the races. Yeah. So it really just addresses the kind of long tail needs in a sustainable way.

Josh Not to hit us too hard here, but I think that’s one of the reasons that it’s cool talking to Meltano and hearing about how you think about the market, because that if we focus a lot on helping companies understand that data looks good as soon as it comes in, really from the source, we want to make sure that it’s you never even get into a situation of garbage in, garbage out. We want to block that garbage up front, and we sense a good correlation between companies that work with novel data sources, new kinds of data providers that maybe aren’t the standard bunch that are a little less practiced at being data providers and data quality issues that come up. So not to fire shots at Rick and Morty and rock climbing gyms, but imagine that there’s a lot of other things that they do in their core business, then share data. So those would seem like good places to obtain a data observability tool. Or it’s really focused on helping make sure that if you begin to depend on that data that it’s reliable and trusted and your team can really start building it into good products.

Honor Taylor for your team as they’re building out Meltano is the need for data observability, one that feels relevant?

Taylor Yes. So I was thinking about this in preparation for the podcast, and I’m curious to hear you all’s perspective on observability and where I was coming. My general conclusion on on this is that there are multiple layers to observability in, and I’d like in a world where we have infinite capacity and ability to track everything that is like the most detailed level of observability. And you could validate that, eh? Eh. This data in this column in this row maps all the way through the end, and observability is about bringing the right level of abstraction in summarization to people at the right time so that they can dive in and solve these. These different problems. And so to your question specifically, is observability something we think about and care about? Absolutely. Because what are our target personas right now is a data engineer, and what they are building are systems for managing data flows and data workflows. And so they can when you set up a system or a pipeline and you have a supply chain, you want to know what’s what’s going on with it. But then you also want to know, do I need to dive in and inspect more what’s going on? And so there’s there’s just these different levels of of abstraction. You can have an exit code that is zero. Everything is good or you have an exit. So that’s one something’s not right. And then you dove in, you see like, Oh, you know, is it something with the number of rows that are different? OK, no, that’s not where it is. Let me peel back a little bit lower. Is it the size of the data coming through? I got the right number of rows, but each column is, you know, one byte or something. So there’s these multiple layers, and figuring out what’s a useful abstraction for people is is a challenge, which is why I’m glad there’s folks like all that are thinking about this deeply. And we’re also thinking about how to improve observability overall within the Singer ecosystem, but also with Meltano more broadly. So within the Singer spec itself, there are kind of default metrics that you can get. We want to add more metrics. So if you’re running a tap, you can get information about like how long did API requests actually take? And you can get summary statistics from that. How many rows are you outputting over time? And there’s just more information that we can put in there and feed that into a tool like Databand and, you know, let users kind of pick and pick and choose and make recommendations about what level of abstraction they need to pay attention to. And then, you know, intelligently say, Hey, something, something’s going wrong. Like, you know, go go dive in here. I’m curious if that resonates with you all in kind of the layers of of observability and abstraction.

Josh Harper, I think has done some really interesting thought leadership around levels of observer ability, if you want to take that one.

Harper Yeah, I was going to hop in here. I think the one thing I wanted to comment that you mentioned earlier, Taylor, just the fact that thought Meltano is focused on trying to provide that foundation right by focusing on the ingestion and being able to build from there. And I think that that’s a really key point, especially when it comes to the not only the modern data stack, but the observability space itself, because we’re still trying to figure out the right way to present it right. And you talk about these different layers that are that are inherently part of the process for us to understand, OK, what is the right signal that we’re looking for and how do we make that actionable, right? And also, when you reference layers, I just can’t help but think of like Shrek and like, there’s like this onion and like, you know, it’s a very complex situation, but you know, data observability is like an onion nugget. But but you’re right, though, is that there the thing that’s fun about the spaces that we’re trying to improve the developer experience for data engineers, right? It doesn’t matter which organization you’re working with as a matter of use Databand, doesn’t matter if you use Meltano, doesn’t matter if you use a managed Spark cluster or manage Air Force service. The whole point here is make it easier for people to access their data and act upon any issues that occur. And so taking those ideas from DevOps and taking the cycle that goes from detection into awareness and into like iteration to complete that lifecycle and create a healthy ecosystem like that’s the goal of everybody here. And you’ve seen the explosion of the observability space and you have a lot of companies that are focusing in on the data at rest and like when it’s in the warehouse and like, what is the health of the data when it’s there and you have us who are smoking you sitting in on the observability and operational layer and then looking at the data as it moves through your pipeline and how that might affect the data that’s already in your warehouse. So for myself, when I when I think about how do we describe treating a holistic observability hierarchy, I kind of talk about that based foundational level being that operational health of the systems that are moving your data yourself, like, are my pipelines running because my pipelines are running? Does it really matter if I know what my data looks like and once it’s in the warehouse? And then once I know what pipelines are running, OK, what’s happening with the pipe, with the data that’s moving through that system? You know, that’s where we talk, where you talk about kind of bringing those metrics out of the Singer API tunnel, making it obvious for what’s going on with the data that’s happening there. And that’s really where Databand is focusing in and having the ability for people to come to our application. Look at the various sources that are coming in and understand, OK, this schema has changed twice over the last week. How can we create a error check to ensure that this won’t break systems that are pipelines that are running downstream? And once you have that foundation of operational and holistic data set monitoring, that’s when you can get a little more granular and start looking at column level profiling, understanding. OK, do I have nulls in here? Do I have? Is this a distinct call and do we expect everything to be different? And the reason that I like to point out column level first is that that really leads to understanding what is important to validate at the row level, because if this column is not distinct, like it doesn’t matter if I check if this column is a unique ID whenever I check that row, right? And so by looking at the pipeline, looking at the dataset, checking the columns themselves, and then that lets you inform the rules of your row validation, it kind of gives you like a good place to start and where you want to get to. And I like that distinction as well because right at that center level where you talk about column level profiling, I think you have this nice little bridge between like the data engineer and like what has become like the analytics engineer, right? Like if you’re data engineers and your infrastructure team are focusing on on your pipeline and your data as it comes in, and you can make sure that it looks good and meets the shape that you expect from a schema, like a column level perspective. You can pass that information downstream and then your analytics engineers who are taking the data that’s in your lake house and it’s it’s moving into your warehouse. You can say, OK, my columns, as they expected when they came in, let me make sure they’re still the way that I expect them. Let me do my transformations and then we validate the rows at that level and it takes a load off the tail of the. It takes a load off of the right side of this pipeline for the analytics so they can focus on what it’s really important to them from like a data quality perspective instead of having to look at the entire picture of their data. And so you get efficiencies that occur there by building up in that way. So I’m going to step off my soapbox and see if that resonates with you at that point.

Taylor Yeah, I think where I want to to go from, there is what both of us actually, we have been talking a little bit about like pipelines in production and in pipelines as they’re running. And the other aspect that we’re trying to bring kind of with the data ops platform is when you need to make changes to it, are you able to do that in an isolated environment? Are you able to test before you deploy things to production? And that’s kind of what we want to bring to to the whole stack. And so if like one of our, you know, ambitious visions is, you know, you can have your entire data platform defined within a single Meltano project. So when you do have, like an analytics engineer, know we integrate with dbt with integrate with airflow. If you have an analytics engineer that is working on dbt, are they able to add a new column to an extraction and loading in a test environment? Run the dbt models in a dev and a dev environment? And then also potentially, you know, look at a visualization all in this this single project on a single, safe environment to have the confidence that when they when that is merged, that it’s going to go to production, everything will be good. You still need the observability in production, but it’s also there’s also like there’s there’s the happy path and then there’s adjacent paths that data ops best practices kind of bring to the to the stack that are kind of missing right now. And I think, you know, frankly, we need to. dbt has done a good job of introducing people to these concepts, but I think we just need it for the entire stack. We did recently release an environments feature for Meltono so that you can define as many environments as you want inherent configuration and have environment-specific configuration for your dev versus prod environment. So it’s easy to kind of spin up spin up dev, run your extraction load and then when it’s all done, you know you’re confident everything looks good, just run everything in production. So that starts to to really show like how we’re trying to bring the the the the back side of things to the to the data stack. And so I think it’s yeah, it’s it’s it’s just it’s exciting. I love it.

Harper I’m equally excited when you talk about this because like you talk about the happy path and that has it’s been hard to define for a data team for a long time. And until you have the companies like Meltano and ourselves like focusing in on providing that developer experience, that’s similar to what we’ve seen mature and software data teams are going to continue to. You get the question like, oh, well, is the data correct? I think it’s correct. But now we have a way to actually proving it right. And that’s that’s that’s what I think is really exciting at this time. Josh, any thoughts on I like having those like environments and being able to validate and be confident that you’re delivering the right data to your teams, like how does that resonate with what you’ve seen in the market?

Josh Yeah. Well, I think what we often think about the different ways of looking at data. You address to you there of the different levels of observability that you can get into are you’re starting with the system of or pipelines running as data moving through those pipelines, columns, rows, that kind of pyramid that goes into deeper and deeper inspection of the data. Another vector that we use to look at it is how does data move through that data in motion and then how does it sit and aggregate within tables over time or data sets over time? And I think. When you look at the development lifecycle, being able to really balance between these different approaches and understand both of them, I think that’s where that becomes really important. As an example, if you’re in a process of iterating on a pipeline, let’s say you’re doing that within a safe zone, a staging environment, somewhere outside of production. It’s going to be really hard to know what’s going on with your data. If you’re observability, tool is just about to run on tables every 12 hours, and it’s just doing a scan of those tables over time and telling you if things look a little wonky. If you have some, some system or some data checks that are built into the process that are, you’re sort of getting your pipeline as you go, as you go through those iterations and you’re able to check those within the development lifecycle and understand that that data is healthy. In fact, as you, you run that process and down and push them to prod and be able to check things in between. That is where I think that the balanced look of seeing data in motion and data accumulating, I think that’s where where that becomes really important and why we’re we’re trying to think about things from a balanced perspective. Our solution today is definitely more weighted towards looking at the data as it flows through. I think that’s one of the things that makes us really strong and the complex integration world by having a a perspective on both of these things, I think is definitely where we want to go as a as a broader observability suite.

Taylor Yeah, because because when you when you’ve launched a new environment, it doesn’t exist in isolation from the production. Happy path and so you want. You were like, OK, you know, time has flown like that. Time has gone by and I’ve had my observability data. I’m now branching to to test something on staging and that previous history is still relevant to what’s happening in staging. You’re just changing something. And then when you read that back in time continues on the data in motion

Taylor I just think there’s not an easy problem to solve, but it’s it. It feels right to talk about things in that kind of. And frankly, the git-based way where you know you have a main branch and you branch off of that and everything kind of gets merged in at some point in time.

Josh Yeah, I was going say, I think I think part of it is being able to see what the before and after looks like within your your data lake or your data warehouse after it changes come in. What when we see a lot of teams doing as well as us and wanting is the ability to see what’s happening in the data as you’re actually doing the iterations, though. So I make a change to the pipeline code. I change a plus sign into a minus sign somewhere I run my pipeline and I see what effect that has on the data. If I expect that data to decrease because I am now subtracting something or dividing something, whatever. If I if I expect that data to decrease no filtering, I want to see that decrease when I run my pipeline in dev and then feel confident before I even push it into prod and maybe mess up something on some tables somewhere. So being able to to really, I think part of this is like taking into the pipeline and seeing that data move through as something that I think we would see as important as part of that overall dev lifecycle that you’re building.

Harper Yeah, I think one thing that makes this space really interesting for me to work in is that like you talk about these environments and how this is important to understand that everything that happened before, that is still relevant. But you want to be able to kind of isolate what you want to test before bringing it back end, ensure that you’re going to have the results that you expect. And it’s it’s finding that right level of abstraction that you mentioned earlier where you can maintain the rich context that exist there but not bring all of the maybe what to isolate the moving parts that are important. And that’s something that’s really just been missing for four data teams over throughout the lifecycle of this point. So it’s fun to see how people are approaching that problem because there’s because of the complexity that exists in data management. You know, you try to find ways to decouple your code and your data changes. So that way you can isolate exactly what that issue is and then finding ways that different companies can really make that accessible to other teams is it’s just kind of fun to see different brains work different ways.

Taylor Yeah. One extra point I want to make. There is one thing that I talked about with Meltano that I get excited about and kind of guides how I think about the product long term is I want as a data, as a former practicing data engineer, I want tools that make it easy to get things done. But when something goes wrong, I’m able to basically punch through to the code underneath and understand what’s going on. And so, you know, the experience of Neil ToKnow should be super easy to get set up. But if something is weird, something is weird with your data, you should be able to understand what’s happening behind the scenes. And so that’s like our ethos of open source and community based, you know, project, you know, development kind of goes along with that ethos because I don’t want as a data engineer. I wouldn’t want to go into an organization where it’s like, Hey, this is the tool we use, something goes wrong. You need to go, you know, always talk to support like as an engineer on resolve things, and I’m going to figure it out myself. And so, yeah, so it’s the abstraction, but enabling you to peel back the curtain, peel the onion, if you will, when you need to go deeper.

Honor So Taylor, what do you think is most exciting for Meltano in the next year?

Taylor So in the next year, we’re really excited about basically validating and finding true product market fit with the data ops platform vision. We’re confident now that we could focus solely on data integration and have a product that we could sell and make revenue from from that alone. But we have bigger ambitions and we want to validate that, you know, this is something that that people need. We believe it strongly. We’ve talked a lot of users, but we are on that path to getting true product market fit on the data ops platform side. So for us, we have fantastic investors that are that believe in that mission and vision and are encouraging us to go in this direction. We have a dedicated community that’s excited about what we’re doing, and we’re just hoping to grow and earn trust and usage and contributions that enable us to kind of push on this mission and vision long term. So in the next year, we’re hiring, we’re growing. We’re building out this, this vision of data ops. And yeah, it’s just it’s very exciting and very it’s a fun time to be in data. Generally, it’s a great time to be a practicing data professional. I mean, it’s cool to be part of the company that’s building tooling for these, for these users and these these folks.

Honor And Josh, what about you? What is exciting to Databand with the growth of companies like Meltano?

Josh I think. Until Meltano builds up their own observability tool what sounds like, I think just having friendly folks that we can get good integrations into and be able to get metadata from in the right way. And I think, well, I’m joking because Taylor mentioned earlier about the importance of interoperability. I think despite this direction and we would agree with the same thing is just making sure as as we scale our own company, we’re able to tap into these resources that companies like Molano are providing so that for our users, it’s just easier and easier to get the information that helps them do their work and helps them be most productive. So I’m very excited about the the it sounds like API that Hamilton or at least the new metrics that they’re bringing in and new new holistic features for different parts of the value chain that you may be wrapping into your and your process and and that we could also look to us as more and rich areas of metadata to help us detect more problems before they hit the folks that that are going to get disturbed by bad data so that that part of it is definitely exciting to me.

Honor And I also want to just make a quick open call to any Meltano users who might be listening to this, that if you would like to see some observability capabilities, give us a shout that would give us the cue to develop tighter integrations. So since we’re coming up on time here just really quickly, any any tips for the community that are that is working closely on integration, what do you think is going to be important for them to focus on as as the Meltano ecosystem becomes more and more lively?

Taylor Yeah, I think we’re looking. What we’re looking for from the community is to really crowdsource a lot of the best ideas to put into tools like our SDK for building, you know, new taps and targets. The specification for Singer in particular is open, as is how we’re developing Meltono if there’s something that you want to see. We love getting contributions and it’s we’re very transparent and open about it. And so basically, we just want to hear from you if you have strong opinions. Meltano, we really see is as a data tool built by us and the community. It’s built by professionals and built by data professionals, and we want to just have everybody involved and start to to really think about better ways of doing this together.

Harper For any non-Meltano users, I do recommend checking out Meltono Hub, like Taylor mentioned earlier, it’s a really great source to kind of get a feel for what they were doing and how that’s working their ecosystem and also check out Databand and we got some good stuff over there as well. And then if you’re a Twitter person, I will also recommend following Twitter because the guy is funny. It’s not just the looks that he presents on camera. No, I’m kidding. It’s a it’s a really good. It’s a really good follows. Thanks so much for the time today. I really enjoyed being here. A lot of fun. Josh Taylor.

Taylor This is great. Thanks for having me.

Honor See you, bye.

Stay Connected

Sign up for the newsletter