On-demand Webinar: How to use end-to-end data lineage to drive better actions

Watch now
> Episode Details

How Pinterest Is Migrating To The Modern Data Cloud With Confidence

Jessica Larson is a Data Engineer at Pinterest and the author of Snowflake Access Control: Mastering the Features for Data Privacy and Regulatory Compliance. Jessica stopped by to talk about how Pinterest is migrating to the modern data cloud with Snowflake and the lessons learned in her new book.

About Our Guests

Jessica Larson

Data Engineer Pinterest

Jessica Larson is a Data Engineer at Pinterest and author of Snowflake Access Control (snowflakeaccesscontrolbook.com). She’s passionate about cognitive science and building an enterprise data warehouse as the first employee on the team at Pinterest. Before becoming a passionate Data Engineer, she’s worked as a Data Analyst and BI Systems Engineer.

Episode Transcript

Ryan: Hey everyone, welcome to another MAD Data podcast. I’m your host, Ryan Yackel, and we have a very special guest on today’s podcast. Her name is Jessica Larson. She’s a data engineer over at Pinterest. And get ready to hold up that book, Jessica. She’s also the author of Snowflake Access Control, which is available for purchase on Amazon.com right now. And you can get that in digital and in hardcopy, right?

Jessica: That’s correct. And it’s also available from other non Amazon retailers, Target and Barnes & Noble, etc..

Ryan: Awesome. We’re going to be talking about that a little bit later today, but we’re first going to start talking or the main topic today is we’re going to be talking about how Pinterest is migrating to modern data cloud with confidence. And so we’ll be talking through some of the policy things going on there, some of the technical stuff that they’re working through. Standard people, processes and tools that they’re kind of in a Pinterest’s wheelhouse to migrate into the data cloud with confidence. But first, I also want to make sure that we introduce our CEO of Databand, Josh, how are you doing?

Josh: Doing well. Thanks, Ryan. Excited to talk to Jessica today.

Ryan: Yeah, and everyone knows about you Josh, so we’re not going to ask you any questions, but for you Jessica, give us a little bit about your background before we got on the podcast, we were talking about your, your kind of interests in your specialty is in cognitive science and talked a little about Elon Musk and how that’s kind of a person in your sphere of influence, but maybe not. And I want to kind of get your background for everyone here on the podcast.

Jessica: Yeah. So in college, I studied cognitive science, did a minor in computer science. I was really, really interested in the intersection of the human brain and how we think and computers. So I briefly worked in a lab where we were doing brain machine interfacing, which is partly why I like to throw some, some shade at Elon. I think that, you know, he has a little bit of a habit of making promises that aren’t necessarily realistic. And we’ve got a long way to go in neuroscience before we can even come close to some of the things that he’s assuming we have already figured out.

Ryan: You’ve heard it here first. Yeah, I’ve preordered Neuralink. Is that bad?

Jessica: Only if you get the brain surgery required to have that.

Ryan: I’ll postpone that.

Jessica: Yeah, probably do that. Unless, you know, you’re getting some sweet deal and don’t really want to live a terribly long life. So after college, I kind of figured that data was a really natural path forward from cog sci and computer science. And so I started off on the data analytics side. I did some environmental risk analysis for oil and gas. I worked in freight forwarding, doing some data analysis, and that was kind of when I realized that data analysis was not really it for me. And I would much prefer to be on the data engineering side building out those tools to support people doing that analysis. And so since then I’ve been at ease doing some really fun stuff there. And then I’ve been at Pinterest for the past little over two years.

Josh: How did you transition from data analytics to data engineering? What was that like?

Jessica: Yeah, so I think it really helped that I had a pretty strong foundation from college. I was I’m something like three classes away from a double major, so I did a lot of coding in college. And so what I started doing was just taking on some projects from the data engineering side, like on top of the analysis stuff I was doing and then I just kind of did a lot of political stuff to, to just convince everybody that what made the most sense was for me to move over.

Josh: What were the main what were the main differences and projects between analytics and engineering? Like what? Where was that line in your company?

Jessica: Yeah. So a lot of the things I was doing as a data analyst was, you know, somebody on the business side would want a dashboard that had, you know, X, Y, Z metrics or had charts that looked like X. And I would go in and I would build those out in sequel and then, you know, kind of configure the charts like within Periscope data was what we were using. Now when I transitioned over to the data engineering side, I started working on start working on more like traditional, like coding things. So I was building. Pipelines from external data sources. I started doing I built out this tooling that would basically make it so that we could sync our database with our Google sheets, which allowed us to share data with external partners that we weren’t able to do through our tool, all sorts of projects like that.

Josh: This is like the, the common precursor to Snowflake data sharing. I don’t know if you’re using that. We do hear about sheets being used a lot for sharing data without, you know, emailing files over.

Jessica: Yeah. So this this actually ended up being a little bit of an interesting use case because we needed to allow truckers to communicate with the warehouse, to communicate with dispatch and to communicate with all of the data coming from our platform. So I don’t know that data sharing actually would have solved it because we still need it at a really like low entry point for truckers to be able to look at this and say, hey, I picked this up. And so I think now the flex port platform is able to handle all these things, but at the time it was like they could use their cell phone on Google sheets and just type something in, which was really helpful.

Josh: That’s super interesting.

Ryan: I think the second person we’ve talked about trucking and analytics and shipments and we just got we had a podcast with Chad Sanderson over at Convoy and he was walking us through a bunch of stuff he’s doing over there. So it’s interesting that seems like data engineering is a hot commodity in that particular space.

Jessica: It is. It is. And I think that there’s a lot of really interesting parts of the logistics space that make it really interesting. And also the timing kind of works out in a strange way because it takes so long to get the licenses you need to deal with the data. So in a lot of ways, it’s just quite far behind everything else that’s going on.

Ryan: Awesome. Well, that’s a really cool background, Jessica, I love the cognitive science stuff. I love the the shade on Elon Musk, even though he gets a lot of shade these days. But I want to kind of pivot to the main topic of today, which was how Pinterest is migrating to a modern data cloud with confidence. And, you know, before we get into it, maybe just all the ins and outs of where you’re where you’re focusing on and helping helping lead that team there, maybe a little bit like what’s your what’s a day in the life of Jessica over at Pinterest? And how does that kind of back into the main product you’re working on today with the team?

Jessica: Yeah, absolutely. So, you know, a typical day is meeting with one or more of my business stakeholders, kind of getting an understanding of the problems that they’re trying to solve and how they think data is going to solve those problems for them. You know, maybe it’s somebody wants to move some new data and so I’m going to partner with them. Our data privacy team, who’s going to basically tell us, hey, this is exactly how you need to handle this data. We’ll part a partner with kind of like our sister team that does all of the pipeline thing. So I’ll kind of make sure that all of that happens. And then working with another member of my team to provision access, to create the roles, to figure out who are who are all the different people that are going to need access to this data and what type of access they should have and actually going and making all of that happen.

Josh: You said sorry. You said a sister team that’s responsible for building the pipeline. Okay. So and you’re in data engineering.

Jessica: That’s correct. Yeah.

Josh: It sounds like. Yeah. How do those two like what what area of responsibility do you have that’s not pipelining? And then how do you work with that pipelining side?

Jessica: Yeah. So I’m on the like platform side, so I am responsible for everything. Everything Snowflake. I guess so. You know, making sure that all of the data is secure in the right way. Everybody has the tools that they need to interact with the database, whether they’re those data engineers that we were talking about or they’re data analysts or they’re by engineers or they’re, you know, any any different person that is going to be interacting with the database, making sure that they’re taken care of and they have the stuff that they need to do.

Josh: Do you ever do you ever work on building pipelines or are you ever building any sort of workflows or it sounds like it’s it’s the ops within data infrastructure, essentially. Like how how much do you kind of flow between those areas of responsibility?

Jessica: I’m typically building tooling for the people who are doing that. So so if it’s like a pipeline from some external source or something that requires like, you know, a typical like Python pipeline, like our I built out like the library for that, for some of that like downstream transformation stuff. You know, I have a few different airflow operators that I built out, so I wouldn’t say it’s so much ops. It’s more like. All of the engineering things, I guess, that are required for just the database so that, you know, the pipeline and the team that’s building those pipelines, they don’t necessarily need to know anything specific about Snowflake. They’re going to pretty much dump something to a TSV and that’s fine.

Josh: That’s a good clarification. So sometimes I’ll use the word scaffolding to describe that. It’s like you kind of build the template, the scaffolding, so that the analysts or engineers can focus on the logic and side of it and not how this thing runs. Are you ever so are you ever creating gags that go on a schedule or that that’s really when things will move over to the other stakeholders. You’re sort of customers.

Jessica: So kind of both. I do have some dags that are more doing random administrative stuff, so I have some like monitoring and alerting that goes through airflow like that. But yeah, for the most part it’s like I’m building the things so that other people can use them in their dogs.

Josh: Interesting. I have a..a interest area and understanding like how how roles and responsibilities are getting. They’re starting to be defined between the platform side of engineering and the engineering analytics science side of the house. But I will spare us that that rabbit hole right now to get more into the, you know, topic of discussions today.

Ryan: That seems like a topic every every podcast we could get into that topic. I don’t know if you feel the same way, Jessica, but it seems like it’s a super evolving landscape. It’s like constantly changing, like that’s constantly changing.

Jessica: The conversation that we have today about it. If we were to have that conversation today would be an entirely different one than if we had it in six months. Right. And I think it needs to the what we expect a data engineer to be good at and know how to do. It’s crazy. You just everything just you just need to know everything. It’s totally fine.

Josh: Maybe you want to give me one more question on this right before we go back to.

Ryan: Sure. I’ll allow it.

Josh: Thank you. So something that came up in a in a recent podcast that we have was the we talk a lot about this interface between the data platform team, the data engineering team, the analytics team, like what’s going on inside the data house. And we were talking a lot in this previous podcast with Chad Sanderson from Convoy about another interface on the other side of things, which is the software engineering house into the data org and data platform really in a lot of places is the it is just at that boundary. So I’m curious how, if at all, you’re working with the software engineering side to get work done?

Jessica: Yeah, I work with the software engineering team a bit. One example might be like the software engineering team that we have that supports what we call like our big data platform. So this would be like the Hive, Hadoop, like that kind of system that will be moving data that we are moving data out of into our Snowflake Enterprise Data warehouse. But they also support what we call a spinner at Pinterest, which is just our version of Airflow. And since I’ve written quite a few operators, I work with them quite a bit. There is some work that I need to do in the next month or two because we need to upgrade the we basically we’re like a few versions behind with like the base snowflake operator. And so we need to, I need to work with them to upload, upgrade that, but then also make sure it doesn’t break any of the changes because we have really significantly forked that in order to have like very specially tailored operators for specific use cases.

Josh: Interesting.

Jessica: Yeah.

Josh: Okay. For another podcast, we’ll dive deeper there. So going back to the the modern data cloud and how this project is moving at Pinterest. Can you tell us, first of all, what what is a modern data cloud? What do you mean by that?

Jessica: That’s a great question. So I think I think of like the modern data cloud as being kind of those like newer hip, cool software solutions, like database as a service type solutions where you’re not just getting your standard, you know, these are relational. This is relational data. You can use SQL to query it. You’re getting a lot of features on top of that where it’s really tailored towards a data engineering team, right? So for example, in Snowflake, there’s a whole bunch of security features that exist. There’s like secondary roles, there’s these like row access policies, there’s all these things that would have never existed in like a post-cresc because I think, you know, it’s why like it’s, it doesn’t make any sense. You really need to have it as like a full software as a service type of an offering in order for things like that to make sense. Right.

Josh: Is the modern data cloud synonymous with Snowflake or are there other other tools in there, solutions in there that that you’d also put into that into that bucket?

Jessica: Oh, I mean, there’s tons there’s you know, we’ve got Databricks there’s you know, there’s all the other tooling that goes hand in hand to like, like a Fivetran and let’s see trying to think what else might be good to mention in there. I’ve got one of my, my friend’s startups, Mozart is like a nice orchestration platform. There’s all sorts of cool new things that you can do in the modern data cloud.

Josh: Cool. So there’s a big project going on at at Pinterest. There’s a big initiative to move into the modern data cloud. What is driving that initiative?

Jessica: Yeah. So it’s. To be clear, it is a subset of the data. It’s not all of our data. We’re still planning on having most of the product data, that kind of thing, live in that big data platform. But moving any of our internal data so our HR stuff, our you know, everything related to like hiring and internal reporting and external reporting to our investors. All of this data that’s just super, super sensitive. That and a lot of situations like tons of companies including Pinterest. It’s kind of in all of these small little spots all over because the number one thing you want to do when you’re holding super sensitive data like this is make sure as few people as humanly possible have access to this data. But then you end up in the situation where you have no best plan, right? If somebody gets hit by a bus, you’re just screwed. All your data is gone. Right? And so to be able to move to something like Snowflake, like Databricks, one of these types of platforms that really has security at the forefront, kind of allows you to solve all of those problems at the same time.

Josh: Sometimes we hear it’s because we suddenly have 100 times more data than before and we just can’t support this internally on our infrastructure or it is taking way too much of our time to go in and optimize all these Hadoop processes and on prem Spark processes. It sounds like for your team, this was more about improving the accessibility of the data and layers of security controls that you can bring in within these platforms. Is that is that correct?

Jessica: Exactly.

Ryan: It sounds like it’s not like a book that you would write or something. Right? You can write a book?

Jessica: Yeah. Gosh, I could probably write a book on that.

Ryan: If I write a book on something like that, right?

Jessica: Yeah, I think so. Yes. I think that was a big one. And then I would say data quality also plays into that. Right, because you have when you have like all these like disparate systems and you have some kind of hacky stuff rigged up because you don’t want to share it with a whole bunch of people. You know, you end up in a situation where maybe something’s being manually pulled on a weekly basis instead of, you know, having like a proper pipeline. Right.

Josh: Interesting. Well, it’s it’s I think it might be counterintuitive to some folks, because the cloud for a long time has had a reputation for being less secure. You’re suddenly giving your data out to some outside vendor that’s going to be stored on their environment, or at least processed in some way through someone else’s infrastructure or accessed through someone’s infrastructure, as opposed to, you know, locking this data down in our basement that we can watch 24 seven under our control. And what’s interesting hearing you talk about is it sounds like it’s not just the cloud has suddenly got to the point where it matches the level of security that we can meet internally on our own Hadoop environment or or on our own Postgres. Now, the cloud is even better. Security controls is not the way that you view it.

Jessica: Yeah. And I would say a lot of me, most of the data lives in the cloud anyways. You know, whether it’s like some Hadoop system that you are dealing with yourself, like a lot of times it’s, you know, you have it running on an easy to and everything’s in S3, right? So it’s RWC has a anyways I, I think it’s an extremely small number of companies that actually physically supports on prem stuff. So I think it makes a lot more sense to go from a system where it’s already living in the cloud, just it’s harder to deal with versus now it’s somewhere more secure and also in the cloud.

Ryan: Well. So I’m always interested in the, you know, the tooling part is obviously really awesome and really cool and as nice shiny objects and things like that. But what are some things? Obviously Pinterest is an amazing company, but just like every company, there’s internal things you have to work through. And and there’s, you know, teams that just get along one day, don’t get along, are there? And you come together, you figure out a plan. And but what are some of the things that maybe that are more on the process side or the people side that you’re having to also navigate as you’re thinking through how to build this modern data cloud with confidence, right?

Jessica: Yes. So one of our biggest concerns is, again, security. And so as a result, we have we have a bunch of security mandates of how we can interact with Snowflake. And so I will answer your question. It’s kind of coming around to this. So one of the things that we’ve done is we’re using Snowflake as a source of truth for access control. So any tool that we use to connect to it needs to connect in such a way that we are actually running the queries in Snowflake using that user’s credentials. So whether that is using OAF or some sort of a variable so that the database knows which which user is querying the data. And so that ended up being a very interesting and difficult thing to solve for when it comes to those downstream transformations. Right. The tool that we were using initially, like on the big data platform side of things is like a whole homegrown home built tool that allows people to write some SQL and then run it on some schedule to to do those denorm transformations. It does not allow us. We would have to it would just be the craziest amount of work in order to make that work how we need it to with Snowflake. And so I have actually taught my all of my analysts are I, I just love them so dearly. They are such troupers. I have taught all of them how to write dags. I have taught all of them how to use version control software. I have a quick and dirty guide to the terminal so that they can learn some basics around, like navigating on the command line.

Josh: So you’ve cultivated some analytics engineers. We have a term for this now.

Jessica: Exactly. Basically, I have analytics engineers now.

Ryan: That’s another title that they like. What’s the next title going to be?

Jessica: That was my first job was an analytics engineer. Okay. Yeah, I think it generally is like if you do some sort of you need to know some analytical stuff, but you also need to do some engineering stuff. So the analytics engineer. Yeah, yeah. But yeah, so they’re so they’re doing all this and yeah, it’s a huge process. It was lots of documentation, it was lots of lots of like talking to people to see, Hey, would you be open to trying this? I know that this is an ideal, but right now I’m the only engineer and we don’t have the bandwidth to like build a tool. And buying a tool is going to take quite a while because we have to do a security review and the vendor, you know, this whole process. And so can you please learn some python and I teach you some python. Can I teach you how to write a dog? Would you you know, would you bear with me while I show you how to use the terminal? Because I promise you, it’s not as tricky as it looks.

Ryan: Yeah, show that. Yeah. Show the greener pastures and just say, trust me, trust me. That’s where we’re going. This is the better way to go.

Jessica: And I don’t know how many times the phrase every company uses Airflow, so it will be good for your career to learn it. I don’t know how many times I’ve said that phrase. I’m like, Just trust me. It will be the next company you work at will be using Airflow, I promise you. And people will love that you understand how it works.

Josh: And you were are saying, I mean, there’s a reason that you focus on this access control layer. It sounds like that turned out to be one of the major friction points and democratizing more of this dag development and creation down to the analysts. Like, why is that such a hard problem to to solve for you?

Jessica: It’s such a hard problem because you need to be able to map a service account, running some code to a user authorizing that to happen on this dataset that this user has a role that can access it. Right. So it’s just like a lot of layers of indirection. And yeah. And again, it’s that, you know, you don’t want. You need to make sure that the person who is doing this actually can access that data.

Josh: Okay. Interesting. Yeah. I can see how that would relate to democratizing more access to these services. Like you start by saying, okay, we’re really resource constrained in engineering, so we need to involve analysts more in the development of DAGS. When we involve analysts, more in the development of DAGS were now opening more surface area for them to build into and connect with. And as we do that, how do we manage the right level of kind of control and access that they can get into all these different services as they’re building more there? And so it sounds like that maybe is how that that challenge starts to open up more and more. And then you combine that with like all the different data sets that they need to be working with or shouldn’t be working with as they’re as they’re building more in the system. Am I sort of following that, that line directly?

Jessica: Yeah. And the interesting thing is that that actually ended up being a smaller, easier problem to solve than trying to find a tool that fit our specifications. Exactly. Or building one from scratch.

Josh: What were your specifications?

Jessica: The big one was the oath requirement and needing to make sure that like the user Haris using the tool was running this the query in snowflake using their own credentials.

Josh: So like what is it that and it sounds like you’ve built a lot within Pinterest’s to address this issue. You have your own access control library that’s living there. Is there is there a term for that you use by the way? Some, you know. Nice sounding name.

Jessica: Ah, if there is, I wouldn’t know it.

Josh: Can we invent one now?

Ryan: It could be your cat’s name. What about your cat’s name?

Jessica: So this one’s Lisi.

Ryan: That sounds kind of cool.

Jessica: I don’t know if I want to name something after her because she’s she’s pretty chaotic.

Ryan: Okay.

Jessica: I think she’s chaotic.

Josh: We’re going to go with chaotic good. Because I need something to call this now outside of Snowflakes, internal access control that Jessica created. So I’m going to call it Lisi for now and then you’ll let us know afterwards what we should call this. But so so you created Lisi, what was first of all, is Lisi being used everywhere today and Pinterest now? Like, what’s the adoption like of this system?

Jessica: It’s kind of interesting how it’s been created. So we are working on creating a tool that people can use instead of it being us using this like Python library to do this. So the adoption well, the adoption is pretty forced. It’s since it’s a security mandate, it’s either you use it or you do not use it and you do not get access to your data.

Josh: That’s a good course mechanism. Yeah.

Jessica: It’s works works quite nicely. We so we have some, we have Jorge on our team, he’s great who like deals with a whole lot of that and kind of uses the tools that I’ve built so far. But hopefully soon we’ll have like a nice, like actual tool where people can go and request access to things and probably integrates with some sort of cataloging and all of that.

Josh: You mean like taking a library and building like a kind of web service around this and a UI and all that?

Jessica: Exactly.

Josh: Interesting. So if let’s say I’m another company, I’m not Pinterest and I’m dealing with the same issue, I’ve just cultivated a bunch of analytics engineers are now going into building dags are oh, we’re running it’s access control complexities. We want to build our own Lisi what what kind of design principles like how did you go about thinking about the build of this where the core systems that play there, the core requirements that you’re solving that you would tell these folks about.

Ryan: And I just want to play real quick. This is basically your book, correct? Or my misinterpreting that.

Jessica: Yes. So kind of. Yeah, we.

Ryan: Want to we want to plug your book because your book is very thick. And I feel like you spent a lot of time on that book.

Jessica: Yeah, I did spend a lot of time on that book. Yeah. It was my entire last like like six months of 2021. Yeah. So, you know, kind of in my book, I the, the first thing I start off with is just kind of the theory behind it. So why are you, why are you putting this access control in place? And, and just always going back to the question, what problem are you trying to solve? I say this all the time, but like, I think engineers just love engineering things and we love exciting, fun things to do. And sometimes we lose track of what the actual problem is because we’re just like really busy and excited about things. So what’s the problem you’re trying to solve? What kind of data are you trying to protect? What are why are you trying to protect it? Is it for regulatory reasons? Is it is it because, you know, is it some other internal thing? Okay, it’s not necessarily regulatory, but like, you know, maybe we should make it so people can’t see this type of thing, you know, what are you trying to solve and how do you how is your organization structured right before you start creating roles and assigning roles to people and creating schemas and doing all of this, it’s important to think about what your organization looks like. Do you have a centralized data team, for example, or are you working more on that like pod model where you have a data engineer on every team and embedded with a data scientist and a data analyst or something like that? Right. What’s the sharing of data between teams? Right. So is it that each team has a very siloed thing that they’re working on, you know, where they can have entirely their own data sets and they basically don’t need to interact with anybody else. Or is it just super collaborative? Right. That’s really going to change how you design the entire system. And then from that, before getting to solutioning, thinking about for the given data, data warehouse or cloud, whatever that you’re working with, what what levers do you have to pull? Right. Snowflake has a few things that just greatly simplify everything. One of those features being secondary roles, which basically allows you to assume all of your read access for all of your different roles at the same time. That solves, Oh my gosh, I can’t even explain to you how many problems that solves because you don’t need to think about making sure that everybody has the exact right role. That is a combination of all of these roles. You just need to make sure they have each of the ones that they need. So that just takes away a whole lot of that, like operational work there. And then the other thing is just this raw hierarchy. So you can kind of grant roles to other roles just like you would in like an object hierarchy in Python or Java or something like that.

Josh: How many different role types do you have and what are the major levers in there? Like I’m imagining a bunch of different role types and then like some people have access to these certain tables, some people have access to this level of configuration within Snowflake or within our pipeline system. Like what? Where do you start pulling levers?

Jessica: There’s really like two, maybe three major types of roles when we’re talking about just data access. You have like a team role, so everybody on your team has access to this role. You might have a functional role that’s like software engineer, and every single software engineer is going to have that same role. And then you might have what’s called like a data set role. And so that’ll be like this is a Salesforce Salesforce role. Sometimes I like to group functional and data set together because I think that they kind of are used in a lot of the same way. I don’t think that there’s like when you actually get down to like how you work with these, it’s kind of the same. So I’m probably going to get a lot of hate for that. But and then kind of the other dimension I like to think about is read, write administrative, because each of those roles will have one of each of those typically. So you’ll have like software engineer, read, write admin, and then you kind of break off. I think you’re kind of hinting at this into these like specialty roles. So you might have specific like data platform engineer roles that has access to all sorts of things that affect the entirety of the database or everything else. So like one example in Snowflake might be like you have these file formats that you create and file formats can have all sorts of things baked into them, like how are timestamps formatted and you know, what kind of encoding are you using this type of thing? This really should be some, you know, data platform engineer that’s that’s doing that that standardizing that for the entire company. Right. You don’t want anybody else to go tinker with that because it will kind of go outside of their lane and outside of their, like, intended sphere of influence.

Josh: Couple other questions that come to mind. So first off, you have a new member joining the team. Like, how do you go about assigning these roles? What what what process you have in place that says, here’s where Josh starts. And if something changes, here’s where I. Change things for to give them extra access.

Jessica: It kind of depends on the role, too, because we do have certain roles that are hooked up to an automatic provisioning service. So that is like hooked up to like the groups that they’re in with LDAP. And so that would automatically handle the lifecycle of being added to a team being removed from that team. However, that’s only going to work for team roles or functional roles that we deem kind of fit like team roles as far as like data site access, that type of a thing. This will be where that that tool will come in in place, where basically any data set that a manager has access to and they’re like a data owner, they can just approve or deny requests for access to those data datasets. Right now we have it using like Jira because we need to comply with like our Sox internal controls. So we’ve kind of been doing it like that.

Josh: So I’m imagining some flow where I’m a manager or I have a few analysts that I’m managing. Someone joins the team, I get some email, says Josh is trying to access or just who? What do you want Josh able to access? And this manager is I’m going to click and Josh can access X, Y, Z tables quick, like, like and that gets propagated down into the access control system.

Jessica: Exactly.

Josh: Okay. Interesting. Are you are you hooked into, like what populates that catalog of tables for you or are you pulling that directly from Snowflake? Are you, like, hooked into cataloging systems that are already doing that work?

Jessica: Yeah. So that’s TBD because it’s something that we’re like actively working on right now. And so I’m not sure exactly how we will do that. As for right now, the like V1 is sort of surfacing the roles. So the managers will know the roles that they have and they can easily look that up and they can say, this is yes, Salesforce Read or something like that. Yes, they should have access to that. But we’re kind of exploring what we want to do as far as that cataloging, cataloging component goes, because it does seem that that one is it just requires a lot more thought than you would think.

Josh: Yeah. I mean, it sounds like there’s whole companies that are just focused on what is within Snowflake and what’s in these databases. So that being an important part of this map. Yeah, it sounds like a big one. And do you think that so? Well, is it right to think that a analytics manager would be opening access control to their subordinates or and their team? Or are those definitions or those always going through the platform team? Like, are you controlling who has access to what?

Jessica: So we don’t necessarily make any decisions about whether somebody should have access to something or not. We know who the owners are of a data set, and they’re the ones who make that decision. And until that, our back tools like finish being built like we’re actioning it. But we’re not, we’re not saying like, no, I don’t think you have access to this. That’s nope, that’s not my decision. I don’t I’m not I don’t know what goes on, on the business side, so I’m not well enough informed to make those decisions.

Josh: Last question for me is for the folks that don’t want to go about the adventure of building something like this internally, like did you see any services out there that were close, like anything that that folks could look out for for similar levels of functionality or just anything where you said, you know, if we didn’t have this long procurement process, this vendor approval process, like they seemed to be on the right track and any services out there.

Jessica: So we looked a little bit one of we didn’t look a whole lot just because Pinterest has a pretty strong culture of building things internally. And then also our security team is just very, very particular about all sorts of things. And the level of access that you need to grant to one of those tools is pretty high. And so we didn’t anticipate that our security just, you know, coming out of other security reviews, we kind of came to the conclusion that we were like, I don’t know, that will be that like will actually be able to get any of these through that security review. But I know that there’s a bunch of companies doing similar things, you know, like Alter’s doing some pretty cool stuff. I know that some of the cataloging tools even provides some ability to do this. There’s some really cool things out there. I would love to be able to use some of those. I just I don’t know that we’re going to be able to do that. So.

Josh: Interesting.

Ryan: Well, Jessica, this has been an amazing podcast. First of all, I just want to say I’ve tried to write a book multiple times and stopped. So the feat of writing a book is an amazing accomplishment.

Josh: I can barely read a book.

Ryan: Yeah, Josh can barely read books. So congrats on that. I do want to ask you two quick things. One is like what’s like the main takeaway you want people to take away from what you’ve what you’ve talked about today? What’s the one thing you want them to remember?

Jessica: Ooh, that’s a tough one. I would say, you know, no one size fits all. I think that we love to talk about best practices, best practices, best practices. But I think it’s always really important to realize that when somebody says that something is a best practice, there’s a lot of assumptions that go into that. And if your requirements don’t match the requirements that are required for the best practices, then it doesn’t really matter. Right. And so I guess just, you know, the biggest thing with access control is like, what is your organization look like and what does your organization actually need and tailoring it to to match that.

Josh: I like when we when I asked you, you know, what is your library do you started with? Well, here’s the problems that we’re looking to solve and just anchoring their before we talk about the graph database and, you know, everything that’s that’s powering the the system behind the scenes and all those really cool services just helping us understand, like, what is it that your business is solving for? Nothing else makes sense without that context. So I really appreciate that, that way of thinking about things. And it seems like it came closer to your book as well.

Jessica: Yeah, it doesn’t. You know, you could be using the sexiest technology in the world, but if you’re not solving the problem that you were intending to solve, then what does it matter.

Ryan: And Jessica for people listening, what’s the best way they can connect with you? You know, either online or if you’re going to get any conferences or speaking elsewhere, like let people know how can they get in touch with you? Or also kind of how to get your book? How can we connect with you?

Jessica: Yeah, so I’m on LinkedIn. I am actively planning on trying to update my personal website so that that has stuff. I will be at two conferences coming up I’ll be at the Lesbians Who Tech Conference in San Francisco. Should be a really awesome one. I’m really excited. And then I’ll also be at the Snowflake Conference in Las Vegas. So if if you are going to be there, it’d be great to meet up and say hello.

Ryan: Awesome. Well, thank you so much again for talking to us about this. This is a really, really awesome topic and we’ll make sure to promote this like crazy when it comes out. But thanks again for joining the podcast and hope to meet you in person sometime soon.

Jessica: Yeah, thank you so much.

Josh: Thanks, Jessica.

Stay Connected

Sign up for the newsletter