Press Release - IBM Acquires Databand to Extend Leadership in Observability

Read now
> Episode Details

The Case for Dataset Centric Visualization

We love Maxime so much that we had him back for a second podcast! In this episode, Max makes the case for dataset centric visualization. He walks through how different BI tools offer different approaches to building dashboards. Then he discusses how dataset centric modeling is a powerful approach that combines the best of the query-centric and semantic-centric visualization approaches.

 

Check out Maxime’s blog post here – The Case for Dataset-Centric Visualization

About Our Guests

Maxime Beauchemin

CEO & Founder Preset

Fun fact – Maxime Beauchemin is the original creator of Apache Airflow and Apache Superset. He is now the CEO and founder of Preset, an open analytics data platform, built on the Apache Superset platform, which helps makes any team productive with data. Maxime also has over a decade of experience in data engineering, at companies like Lyft, Airbnb, Facebook, and Ubisoft.

Episode Transcript

Ryan: So the topic today we really want to talk about was this case for data centric visualization. And I know, Max, you wrote a blog about this recently, maybe like a few months ago or a month ago or something like that. Let’s talk about this and explain kind of what was your overall objective of that blog and what were you trying to get out of that blog? And what was the point you’re trying to make there?

Maxime: Thinking about the core question of like what are good interfaces for humans to interact with data at a very basic level. And I think like the art of data modeling and analytics is a little bit of a it’s not a lost art. I think it’s an interesting topic in general. I think there’s been little progress since, you know, dimensional modeling and corporate information detectors. So if you look at some of the books from the nineties, early 2000s, there’s Ralph Kimball and Bill and them talking about data modeling in general. And they build a case for data modeling, for analytics using different techniques and in my blog post, building a case for. The what I would call data set centric data modeling, which is very, very simple and easy to reason about what is a dataset, right, we could get analytical on that and talk about like, what do I need by dataset? But by dataset I just mean like a table, a table or a tabular structure, you know, a table or view the beauty of these tabular structure and data sets that are super simple right they’re like easy to reason about anyone who’s interacted with a a spreadsheet or a matrix. You can kind of understand this notion of rows and columns, and it makes it much easier for people to explore, explore reason about visualization, answer their own question, then something like a dimensional model, right? So if, if the one way to think about like what is the best way to expose data for people to consume? One way to put it to say like, oh, you know, you can use dimensional model. Then it gets complex. Complex pretty quickly, gets facts and dimensions, many facts, many dimension. You have this plus matrix which facts relate to which dimension, and then they fall into what historically was called the semantic layer. I don’t know how deep you want to get into that, but what is the semantic layer? It’s usually a proprietary abstraction that lives inside BI tool or something, like lookers Look ML. Something like business object universe is something like a micro-sized project that tries to glue these. So yeah, these, these complex data models for your data and then the semantic layers in abstraction on top of it that makes it maybe easy for people to consume. The reality is these semantic layers are very complex, hard to reason about, hard to model, hard to maintain. So really and that in this blog post and I invite people that are interested in on the topic that want to see a much more structure, a way to get into that than me like just babbling about it. It’s just building a case for something simple, a data set, right? So I’m saying like data engineers today should expose very simple tabular data sets which relevant metrics and dimension for people to explore and build data visualization. It’s easy to reason about for data scientists, that’s simple and good. Virtuous.

Josh: Let’s take it back to the basics. I definitely agree. I think it’s hard to argue the table, the spreadsheet. In essence, it’s the most intuitive way of presenting data in raw form, right? This is why I think spreadsheets and Excel still run the world of data analytics, basically. Why did dimensional modeling come about? What what purpose did that serve?

Maxime: Yeah so, I think dimensional modeling when it came about was pushing one idea. One is the normalizing less maybe. Right. So like finding a compromise between highly. Well, let’s step back and talk about normalization a little bit. I’m not sure I’m assuming the audience is like data practitioners and they probably understand the term normalization and the tradeoffs. But I’ll just like go very, very quickly on this. And if that’s not clear, I would pause for anyone and I go and research, you know, normalization is really key concept to understand and data, data engineering. But the general idea with normalization is if you highly normalize, data does not repeat itself and tend to get complex on the schema. So data models for OLTP transactional systems are now typically second normal form. Third normal form is are highly normalized. And you have a lot of entities an invoice, a customer, a customer type, a customer status, there’s no duplication. Therefore you have very highly complex schema that are non typical of but hard to query because they have to make a lot of joints to answer any question you probably like if doesn’t hundreds of tables. These schemas are not intuitive to query on the other end. I’m arguing on the dataset front like de-normalize as much as you can, right? Like if it’s highly de- normalize, that means you have flat data sets with a lot of repetition, like your customer type will be repeated for each event, for instance, right? Which is like highly duplicative and as tradeoffs. I think dimensional modeling was trying to find a balance between satisfying simplifying the schema, but keeping some amount of normalization to reduce data duplication and having a little bit more of reuse of things. So I’m not sure if that answers the question. You know.

Josh: It actually does, because the reason I ask is because I, I think like most things, this approach data centric visualization behind it, there’s some value. Right. And the, the the dimensional models I think of at the end of the day, it’s a efficiency thing. It helps with the efficiency of storing and for the machine at least querying data. That’s how I think about it. And with the the data centric approach, this helps make things more human readable, I believe. But at some point as it’s going to become more and more inefficient for the machine to store and process that data as it’s kept in the de-normalized structure. So I’m curious where you see the line.

Maxime: Yeah. So I was going to say was that the data set does not have to be materialized till like the data set as a tabular abstractions like could be a view that behind the scene joins together a dimensional model. So maybe you start a data as a dimensional model, maybe you store it as a different level of normalization, maybe it’s more normalized, but you still expose data sets so that people don’t have to necessarily understand the underlying complexity of the structure. You know, like one really good case for normalization is something called slowly changing dimension type two. So like that means like let’s say you have a user table and slowly changing, maybe people change gender for whatever are people correct their birthdates right. And that that happens in time and sometimes what you want is what was the attribute of the dimension at the time of the event. So that means like if you de-normalize the, the dimensional attribute inside the fact, then that’s great because you have like what was the gender of the person and what was their date of birth at the time of transaction. You don’t need to go back and update your facts. So the facts are up and stem are changing. But if what you want is the the most recent or most accurate attribute of the dimension, I might have to go and like reprocess ten years of data in which I put, you know, these attributes as part of the fact. So, you know, you’re like, so that’s, you know, as you deploy data and stamp it and you do incrementally criminal processing, you want to avoid these mutations. So in that case, you know, you can you can still expose the dataset as a view if that’s what you want. You know, it’s still possible, I think, to expose a tabular abstraction as a data set with the the normalization tradeoff behind.

Josh: Okay. So understanding the argument basically at the lower levels of the maybe lower levels of the long term, but I’ll say the lower levels of the infrastructure stack. We want to have things more normalized. We want to have things represented more dimensionally, more or less. And then as you get closer and closer to the humans who are actually developing the analytics and building a pie chart out of something that normalized structure, the dimensional models can become more and more difficult to consume. And you want to have some layer of of data set creation of basically spreadsheet creation that allows folks to people to consume the information in a way that they’re comfortable and that humans are naturally built to interpret this stuff, which is basically line by line with all attributes collected together in a single flat view.

Maxime: I think that makes yeah, I think that’s logically, I think is where we’re gone. So it’s either, you know, you need to present an abstraction. You don’t want to expose maybe all the sheer complexity of your data structure to the consumers. We need an abstraction. The semantic layer historically is an abstraction, and we could get into that, too. It’s kind of something that knows how to query your data, right? The idea of semantic players like you have maybe a big menu of facts in dimension. You drag and drop things and it tells you a behind the scene. It goes in run complex equal because it knows and I’m arguing against some of the patterns from semantic player for a variety of reason and the blogpost when you think about like what is it there’s a it’s kind of a semantic it’s an abstraction too that makes it easy for people to query the data, too. So, you know, as maybe, maybe it’s the case I’m building is dataset centric as an alternative to to a semantic layer centric approach.

Josh: Interesting. So basically, you believe in pushing the work or because at some level of the stack here, someone is doing work that takes the data in its raw form, which I would argue should be more dimensionally built within the databases, within the data warehouse and at the other on the other end is presenting that in some graph or pie chart or something in the visualization layer. There’s some there’s some work that’s going to do that. Translation Along the way, you’re saying let’s push more of the work in the layer of the dimensional model to these data, set abstractions and reduce the amount of work between the visualization and the data set, as opposed to saying, we’re going to have a lot of the work for the people building analytics at the semantic layer, which is building that kind of bridge between the normal model, the dimensional model and the visualization.

Maxime: Yeah, I think you’re identifying the trade off very well. And the reality is, like this complexity of this abstraction needs to exist somewhere. The question is like, where should it live and what is the interface? We want to present what people and to people. The issues with the semantic layer. I think that it’s very virtuous right to that. There’s some really good intentions behind it is to say like, Hey, I’m just going to show you a bunch of like dimensions and metrics and I’m going to manage this complexity. The issue, there’s a bunch of issues with this approach that I’m talking about in the blogpost, one of which is like typically there’s no universal semantic layer, there’s no open source tool. So that means it has to live and currently in a vendor, a proprietary tool. So that means like if you’re using, you know, looker, that’s in look at email. If you’re using Tableau, it’s quite interesting because they use extracts and extracts are very much like data set in a lot of ways. So I think like Tableau success comes from this data set or extract centric approach maybe, you know, but most people, most companies that have multiple BI tools and then to have to manage this complexity and multiple semantic layer with their redundancy, you know, as sub ideal. So. So the fact is semantic layers are proprietary. Do you want to put a lot of complexity in it? It’s also like change management in the semantic layers really hard. So it’s hard enough to do change management and data ops in your pipelines. So now you have the semantic layer you need to like the version and evolve alongside of your pipelines. It’s difficult. So I’m arguing there this complexity best lives in the transform layer in a lot of ways. Calling the dbt of the Airflow layer, because you already have that complexity, you have already have this knowledge, you already need to have your pipelines. Why not use the same technology and abstraction to go one level deeper and to creating datasets and that that comes very naturally there. You push the complexity in a place where you already have complexity and effectively you don’t need to have an extra set of tools and knowledge and things to evolve in isolation and duplicate multiple tools. So that’s a part of the rationale of data set against our or. The semantic layer.

Josh: I actually when I was just reading your blog and was thinking about the topic, I also went to the Tableau extract as the closest handy thing mentally for me to to compare this to. And I think you’re right there’s a simplicity to how the extract. Provides information to people building tableau dashboards. That has led to a lot of the success of Tableau and its prominence in the space. My own background coming from another analytics company called Sisense. We would compete heavily against Tableau on the basis of these very constrained, non scalable extracts that someone had to build across the business. They would get stale. They would have different definitions for things across different teams and they would be physically limiting it because you can only fit so much in an extract load. That’s a memory query against it from a Tableau dashboard. So I imagine in principle some of the same kinds of constraints with the dataset centric approach, but it does feel like also there’s new technologies and new new products available that make that that make the dataset approach more scalable or address a lot of the limitations of the tableau extract itself. So I’m curious what some of those are.

Maxime: Yeah, I would say, like from an abstraction standpoint, the tableau extract is a dataset like structure, right? So the case I’m making for dataset, our tabular things are a better interface for humans, for our for mortals. You know, that, that don’t necessarily want to like learn your star schema or your snowflake schema or a collection of dimensional model and how they, you know, the bus matrix is a Kimball concept. It’s like all your facts, all the dimension and how they relate make orders as defined by customer, but like inventory is not right. So there’s some subtleties there when you try to drag your drag like order, order, quantity and inventory quantity and you drag, drag and drop customer like customer name does not work with inventory. And that’s something that you need to understand as someone, you know, interacting with dimensional. Marlow Or Semantic Layer. But back to Tableau, I think like part of the success of Tableau is built on this like extract tableau abstraction. I don’t think the actually I think Tableau is a horrible database company. So I think that Tableau extract and memory or, you know, living on Windows servers, they typically that would not scale to a billion rows. You know, I don’t know how they do now and and people are moving towards live mode, which is tableau without the extract. But I think still with a tabular kind of dataset centric type approach.

Josh: So do you feel like views basically solve all those issues that limited how you work with Tableau extracts? Like if you think about all the kinds of working with Tableau extracts, do you feel like there’s enough products or approaches today in modern databases where you can basically offset all those cons or do some of them persist? Do we need new tools to support this approach?

Maxime: Yeah, let’s talk about that. I think very much so. Right. So I think the issues, what extracts where like scalability and the fact that Tableau is not a database company and they like the extracts are generally were not distributed and were not very good be on a certain scale. So I think nowadays we have really good cloud data warehouse like distributed data warehouses and things like if you want a really fast and memory dataset, you’ve got things like you’ve got so much choice. Now you can go what druid or you know, or click house or right. So I think if you want a really fast tabular data set nowadays, you can have that. And here’s something really interesting about the star schema. So when you think about modern databases, modern distributed databases all have some sort of segment. When you think about like how is the data represented physically and modern databases? So Druid as a notion of segment ethic park files are kind of a segment if you think in some ways. And if you look at like what’s inside a segment, right, it’s a block of data, right? There’s a certain number of rows. But if you look at how the data stored inside a Druid segment or a park data file, it looks like a mini like a mini star schemas inside a file format. So we have like what I would call almost it’s like an inverted star schema that lives inside that database segment. So like if you and I to explain a little bit for the audience, like what I mean by that or like what do I mean by like a segment that contains an inverted like dimensional model? If you look at inside a park at file and how it’s structured, well, first we have a footer and then park it file that says, like in this market file, here’s what you should expect to find and should expect to find in all these columns. And it’s going to have some of. Information about which column is the site’s columnar format. So each column has starred in a different subsection of that file. Now, if you’re accessing, you know, three out of the hundred columns, it will know which file which offsets to read in order to answer that to best answer the query. And what’s interesting is that there’s a lot of dictionary encoding, there’s different and cutting schemes inside file segment. There might be like reverse bitmaps, there might be dictionary encoding stuff, they might be like run line encoding or other mechanisms. But these kind of like a dictionary encoding is kind of is going to effectively a star schema inside that file. So we’re going back to inside it. We still have a star schema, but it lives inside, you know, inside files instead of existing in the database. Like at the macro level, it’s got interesting.

Josh: Well, I think in the in the blog you have you allowed the idea is pretty nicely and just I really like the visual honestly in the blog that helps me think about the tradeoffs here between going with a dataset approach and a semantic layer approach. And I feel like that is a it’s a very practical question that I think teams can ask themselves is do we want to orient this? Do we want to build out a semantic layer or do we want to orient our teams towards this dataset centric approach? And I love talking about these kind of core concepts in how data is organized because it really it really highlights the tradeoffs that you’ll need to think about as a team, like where where the work is focused and how that enables or potentially slows folks down.

Maxime: Are there are some things that we forgot to talk about until now, which I think is interesting, which so here I’m saying like data centric data, dataset centric data modeling. One thing that’s emerging right now and people are talking a lot about the metrics layer and people are arguing that we should do. If you think about it, what is a metrics layer, it’s metrics centric data modeling where metric is. KING Right. So for me, I’m like, let’s make the dataset. KING Right. I like people. I want people to think about the information architecture of their data with datasets first, right? Saying like, we’re talking about the sales datasets or we’re talking about the order. It is as we’re talking about the inventory dataset, other people, I think in this space are like, oh, you know, metric is king. We should do metric centric data modeling. For me, I’m like, all metrics really can’t be decoupled from their dimensions, right? So I think there’s like virtues to all of that, you know? So some people are like, Oh yeah, we should think about metric metrics for something like we should start with data sets versus other people might think like thinking about dimensional modeling is like think about dimensions first, which is really entity centric data modeling. Let’s think about dimensions and facts. For me, I just like datasets, you know, they’re easy to reason about, like tools, understand what to do. Users are, you know, can kind of move forward with that. But there’s other, you know, competing approaches.

Josh: Yeah. I mean, I can imagine with these different approaches, like most things, they will be relevant at different. The preferred approach will be tied to the maturity of an organization where in an organization you set savviness with data, how deep you are in that data stack. There’s a lot of those different factors that may lean towards where a team decides to build this out. In in preset today in your organization today, do you have or are you dataset centric?

Maxime: In a lot of ways we are, you know, and I can talk about what it means but yeah so Superset as a tool so they open source tool that presets a commercial offering around is a managed service for Apache Superset and Superset in a lot of ways. Assume a dataset centric approach that means like an increase in Superset like the information architecture is all founded on the fact that like all charts need to be built from a data set and, and there’s, you know, it starts with data set. So it kind of forces us to, to have this approach. So internally we do do that, but we do have all sorts of layers and between and things that look and feel like dimensional model. And then I’m bullish on something that I would call entity centric data modeling too. So that’s more for the data set as the interface for the tools and the humans interacting with the data behind the scene. You know, I personally, I like the idea of adding strong entities. So like a user table, a team stable for us, a workspace table, and then I like to snack. Upshot these dimensions. So that means like we tend to have to duplicate data for a daily snapshot of these entity centric table. So our user table, we have the full list of users for each day since the beginning of preset, for instance. So can I use our history table? That’s highly duplicative, right? Because like things that like users dimensions don’t change all that much from a day to the next. But we model it. I like to model it in the way that we have a full snapshot every day. And we like to bring metrics. So behavioral information and facts about those entities D normalize that into the dimension. So that means like in our user table, we have like total number of visits they’ve had since the beginning of time or how many visits they’ve had in the past 728 days. So we’d de-normalize metrics inside entity tables and then we snapshot these these entity tables everyday so you can do time series on it. Tell me how many users add visited, you know, more than than five times out of past seven days. So you can ask really intricate questions easily there. Maybe there’s another post, blog post that I o myself and others that are interested in the topic like entity centric data modeling, which is kind of pushing the idea of dimensional modeling, but the normalizing fact Snapchatting dimensions and some ideas behind that.

Josh: So it’s interesting how you describe it. And some of this comes from my ignorance and not using Superset myself. So excuse me on it, but did you. So you’re saying Superset is built in sort of a dataset centric approach, did you? So it is your thinking on this topic now, the blogs you’re putting out now, is this like verbalizing a lot of the opinions that you had in initially building out Superset to explain the approach to folks?

Maxime: Yeah. So it’s like are we justifying Superset with this blogpost or the other way around, right? Where like maybe that was the philosophy to start with. And then Superset followed that philosophy.

Josh: Not a question about whether that might be totally innocent. Just, you know, we do thing and things that you can stages you build first and then maybe you explain to the world why you build things a certain way. But just curious if that’s sort of.

Maxime: Yeah, I mean, like the reason for the blogpost is one simple the post is less about how you should model your data. I think it doesn’t form that, but it’s also like there’s there’s different visualization tools out there and they’re built on different premises. And that’s what I’m arguing in the blog post is like, first, I think the best way is like the dataset centric approach. And then it so happens that Superset ethic was built in that way. I mean, it’s no mystery, right? I think that Superset was built in that way because I thought it was a better approach and I think it’s a better approach. It’s kind of serve our purposes, too. There’s also people who come from come to Super Scout preset from different backgrounds. You come from Tableau, the dataset centric approach, like, oh, there’s, it’s like an extract or that makes sense to me. If you come from a looker, you might be looking for ways to like do something like to look at email and we need it to tell people like, Hey, this is not how Superset works and here’s the virtues of this different approach and the trade offs. And then the other thing we didn’t talk about that I’m talking about in the blogpost is another approach for tools is the query centric approach, just like each visualization is built from a query, right? And there’s a lot of tools like that. You said you came from Sisense. I think Sisense acquired Periscope. Our scope is more one of these. Still, Taddeo is more like that. It’s more like you write a query for each chart and there’s some really good positive things with that.

Josh: Like Tableau live, right? Like Tableau, a lot of the more query centric Tableau extracting, more dataset centric. Would you agree with that?

Maxime: I would say query live. You would still be you can do that with Tableau Live. But I think the better approach with Tableau Live would be to create the first create a live dataset through like, you know, original set with metrics in the mention and use it like an extract. But sure you can. And it’s the same with Superset, right? You can have a query centric approach with Superset if you want. So so you can say I’m going to write. You know, I’ve got Sequel Lab Inside Sequel Inside Superset, and you can go and write very complex queries onto a very complex model. You could have like third normal form, know a really complex database and go write really complex query for each chart. But then there’s not much of an abstraction, is there? Or like each chart needs to have to replicate this abstraction. So and the blogpost I’m talking about how that kind of the trade off and the negative side of having. So if you do one query, one chart and you have a dashboard with, you know, 20 charts, then you have a lot of that complexity. It needs to be replicated for each chart, how you join all the tables to get to the charts that you need. And then there’s some other properties of that that I’m talking about in the blog post. But if you want to have something like a dashboard level filter, so say like a country selector right on their dashboard, that when you pick a country, it affects the 20 charts. So if you want to do that, you might have with the query centric approach, you have to go and refactor your 20 charts to know what to do with that dashboard level filter. Then that can be really difficult and cumbersome. So it is that time centric approach. It’s all a bit easier because you can say all these charts are built on this dataset. I have a dashboard level filter and when I switch it I know how to apply a country filter to the data, to the different charts. So trade offs there, you know. To get into, like the virtues of the virtues of the query centric approach is if you have a really complex schema and you don’t have the abstraction and you’re really good at school, you can do anything that, you know is just like super maintainable, manageable. It’s lacking that abstraction layer.

Josh: Well, I know we’re we’re coming up on time. You’re a busy, dude. So one thing as I stare for more minutes at your visualization here between query centric and what will those of the article and everything but query centric data centric semantics centric approaches. If I, I, I have a bias that the BI layer, the, the visualization layer, if you’re really, really good at visualization, basically you’re not going to be much good at anything else. And that’s my, that’s my bias. Like, there’s so much to do in visualization. I really feel like that layer should be fairly thin in the data stack, and whatever tooling you have there should be fairly focused on just making really, really good charts and and ways of presenting data. That is a hard problem to solve. It feels to me like the query data said semantic entity, whatever, where we’re pushing that layer down, like I’d love to see tooling that helps teams navigate these different approaches and choose which way they want to go and keep sort of inventories and catalogs of different abstractions according to how they want to organize that within their team. So at some point, I’d love to learn more about the kind of products that we’re seeing emerge on that layer of the stack, because it feels like that’s going to be just another important layer of the modern, modern approach.

Maxime: Yeah, you know, if we think about like we’re unbundling the I already laminating that data stack, right like that’s just a phenomenon that’s happening in data we you to have this monolithic structure that would try to solve all the data problems. Now we’re realizing like the DevOps movements, like we cut all the things and like smaller tools that play well together. I think it’s happening with DI and as you said, I think you put a finger on it like visualization is complex enough if you have the right data sets that it’s a really good area for preset Superset to specialize and do. That’s what we specialize and do. And then for me I’m like the complexity of what can be joined and like the semantics around interact one data. I’m pushing it up to the transform layer to some other layer and I think it lives very nicely in the transform layer. So if you’re good at airflow, you’re good at dbt, you’re good data ops, and you the same stuff. You use their brain complexity to that world that you already need to have a handle on and you need to secede and hygiene and all this stuff like push it there and get good at that. Like double down on airflow, double down on dbt. And you put that complexity there. Then the the other thing we’ve seen we’ve seen things like metric you well cube Jess metrics layer that are emerging kind of in that space too. So there’s a lot of value that can be. So that becomes a little bit more of a pivotal abstraction that can be reused by the tools. So that’s I think like for us, we’re interested. And so dbt is coming up with a semantic layer type abstraction. There was a metric. So we’re like looking to integrate on that. As we laminate this complexity out of BI, we focus on visualization and we’ll integrate with the winners, whether it’s, you know, whatever people are using Mossad’s it’s metric well it’s transform I know it’s you know whatever solutions are there is like we want to play nicely with that and integrate as opposed to try to be monolithic and solve everything interesting.

Josh: Yeah, we’ll get tools like orchestrators in particular. It’ll be interesting to see if that is where these these definitions naturally live. To me, a lot of the orchestrators, you know, dbt and and I’ll put Airflow in the same category as dbt for this conversation. They still do a lot of other things where it’s like a lot just about kind of moving data around as opposed to gathering these kinds of abstractions in a way that fits the different needs of the business. That feels to me like something new and different and other. But it’ll be interesting maybe to see if that’s where they began building into.

Maxime: The I think like the very naturally moves to call it, I call it the transform layer. But it’s when you think about dbt itself or airflow, it can it’s really a semantic layer because you can do transformation. We could define views there, too. Right. And dbt, you you’ll say like, I want to define a view. I want to materialize it. I want to yeah, I want it to be materialized or not. I think it’s similar to what Airflow. So for me I’m like, yeah, that’s where you know, that belongs. There’s a question, do we need something more specialized there? Something like transform or something like devices, you know, emerging metric layer. Probably I think I think there’s a there’s a strata there. As we laminate the stack, there’s a there’s like a layer in a stack that that that is important. If you think of look at people that like to lookers looking out I guess is a pretty great semantic layers. I think we do like an open source, um, call it like specialized layer there. I think that’s, that allows people for solving some of these challenges there. So if you do want a semantic layer type abstraction, we need like a nonproprietary, like open source, universal place for people to put that into. So I’ll be interesting to see, you know, that emerge and evolve alongside like database proxies. I think the thing that we’re going to see like the equivalent of it’s like something that gets in between your your apps and your databases. And, you know, it’s a little bit like, um, the equivalent with like the API gateways. But for data, like the API gateways is a huge space that developed quite a bit and there’s a lot of value to be deployed and delivered on the API gateway. I think we’re going to see like database gateways emerge as a, as a place where some of those abstractions exist.

Josh: Well, this is my favorite part of the podcast is just getting into the new things to build and the new innovations that may or may not ever come about. So I appreciate you taking us there and also the great tools that you’re building for the community.

Ryan: Yeah, Max, how can how can people get in touch with you like post this podcast? Where can people.

Maxime: Like, I’m busy, please, they’ll get in touch with me as well. If you have like three kids, you know, Twitters on Mr. Crunch on Twitter, that’s really a good way he can DM me there I think that’s open in my communities right. So the Superset slack through preset check out preset I go to if you’re interested in a manage by service that’s going to be straightforward like data set to do. You have a data set, you want to visualize it, do it. We have a freemium offering five seats for free and then it’s pretty straightforward. It’s like 20 bucks per user per month at low volumes and discount at a higher volume, but that’s by pay as you go. No vs no sales rep. If you don’t want to talk to that one and grow into it, try it if you like it, double down on it. If you don’t like it, go buy something else. But it’s great that that is part that’s part of the vision for pre said there is not a lot of that in data I think that’s emerging now with the modern data set about NPI. There was no like, hey, I just want like a world class visualization tool that is manage. I can get set up today without talking to anyone and no, no one trying to call me to sell me, you know, a 50 k worth of software I could just spend. Like I just try it for free. I think that’s part of the the value proposition at present. So check us out at present. I know that’s the sorry for the little commercial bit, but at the same time that’s the reason why we do podcast. So people are aware of the stuff we’re building and the relevance of the tools that we’re, you know, pushing forward.

Josh: So we do it out of pure altruism for the community. So we’re not quite sure.

Maxime: Yeah. And it for us like you want to use Superset too, you can interface at the Superset level on the pure open source if you want to run it. If you don’t want to manage version, go and run it. And preset is the way that we fund a lot of the work that goes into and do Superset. So as you sign up for four preset, a lot of are that good. Well it kind of goes back to the community too.

Josh: Yeah. By the way, I know I mentioned that I’ve never used Superset. My company does. As far as I know, we’re pretty happy with it. So it’s another upvote for you.

Maxime: Awesome. Yeah. And we have like tons of webinars and things. So if you want to learn about Superset Josh personally or anyone out there, you know, I think there’s a lot of like learnings that are exposed like to Preset is also it’s a manager so Superset but it’s also an education layer to accelerate you know, data literacy and just getting more people to be proficient with data, you know, easily. So that’s it. Thank you, guys. I was it. That was a pleasure. We went much deeper than I did in the blog post. I’m happy I was able to expose this idea of like inverted star schema inside the Modern Database segment. I don’t think I’ve had that done that before. So it’s a new idea.

Ryan: Thanks again for having us. Part of the podcast, man. And we’ll talk soon.

Maxime: Awesome. Thank you, guys!

Josh: Thanks Max.

Stay Connected

Sign up for the newsletter