On-demand Webinar: How to use end-to-end data lineage to drive better actions

Watch now
> Episode Details

How Warby Parker Builds Repeatable Data Engineering Playbooks

Jimmy Shah is a Senior Data Engineer at Warby Parker, online retailer of high-quality eyeglasses, sunglasses, contacts, and eye exams at an affordable price. Jimmy talked to us about how he builds repeatable data engineering playbooks to cut engineering time by 50-60%.

About Our Guests

Jimmy Shah

Senior Data Engineer Warby Parker

Jimmy Shah is a Data Engineer at Warby Parker with a demonstrated history of working cross-functionally within various industries. He also volunteers with Codebar, and has been invited to speak about diversity, inclusion and data literacy within the tech industry. As a graduate from Swarthmore College, he studied both CS and Physics and conducted Statistical Physics research on Pedestrian Dynamics.

 

Contact Jimmy on Linkedin or via email at [email protected]

Episode Transcript

Ryan: Welcome back to another episode on the MAD Data Podcast. We have a very special guest here today. His name is Jimmy Shah. He’s a senior data engineer over at Warby Parker. He’s also sporting some really cool Warby Parker glasses that I’m seeing right now. Fun fact for this episode. This is the first episode that we’re interviewing somebody where I’ve actually personally bought something from that company. So I think back in the day when Warby Parker came around, I remember they were like one of the first ones to really do the whole. Skip the go to the doctor, eye doctor and get your glasses from there. 100 bucks to give the glasses to needy people as well. And I would.

Josh: Direct to the consumer right there was.

Ryan: Direct to consumer yeah. And so I’m really excited to have Jimmy on the side. He told me he’s going to send me ten free pairs of glasses, isn’t that right, Jimmy.

Jimmy: Yeah. I think, I think the number might be a bit different, but we can talk about it out later, I’m sure.

Josh: Divide by zero, as well as everyone who listens to the podcast, right?

Jimmy: Oh, wow.

Ryan: Yes. Yeah. If you look in the pocket. Yeah. Warby Parker, man, he’s promised a lot, you know. Data engineers have a lot of power over there. Well, today we’re going to be talking about how to save time and build repeatable data engineering playbooks. I talked to Jimmy about this as we were prepping for this podcast, but he’s got some really cool things going on over there at Warby Parker. But before we get to that, we always like to start this off and just kind of talk to us about, you know, as you got into data engineering, Jimmy, like, how did you get into it? What’s day in a life of Jimmy over at Warby Parker you know what are some things that give you a good background of the person we’re talking to today?

Jimmy: Yeah for sure. So first and foremost, thanks so much for having me. Definitely excited and kind of honored to be here. What? Yeah. So what got me into I think data engineering to begin with so I. So I graduated college with studying physics and computer science. I remember a lot of my computer science friends wanted to do like software engineering. I wasn’t super drawn to software engineering. Like, I would go to hackathons and just didn’t really find a sense of enjoyment that I think a lot of my peers found and so I remember after graduating, I was kind of like looking for ways to combine both my physics background and my computer science background. And so I ended up kind of hopping on as the first data hire at a small startup called Cargo. And so Cargo basically put snack boxes inside of Ubers so you could like. Basically, it was a way for drivers to earn more money, for riders to get free tried business on the go and for like brands to kind of connect with like this millennial audience that wasn’t like going into malls anymore or physical stores anymore. And so I started off as a kind of first like data scientist, data analyst there, and I really enjoyed it. And I think from there I like I was there for about a year and a half or so before the pandemic really took a huge effect on Uber’s and therefore Cargo’s business model. But I think I learned a lot of skills about kind of like what it means to present data effectively and like how to really leverage. And look, how to look for the right signals in the noise. So I think after kind of having after that role, I kind of realized I was more interested in kind of the architecture of doing like the steps you need to do before you can actually do data analysis and data science. So the architecture more of like the behind the scenes stuff and so turns out was a word for that data engineering. And so I feel like, you know, in a way I say this kind of jokingly, but I feel like there’s always like me, like data roles and turn things on around, like analytics engineering is still kind of a newer term. Machine learning engineers is new one we’re seeing, of course.

Ryan: We talk about that almost in every single podcast. I’m pretty sure about the different titles and roles that are changing constantly in the space.

Jimmy: Yeah, exactly. And so I think I realized, like, maybe I want to become more of a data engineer. Like, this seems like kind of like a relevant skillset to what I was definitely. Definitely how to do on the data team of one Cargo? I was definitely doing a lot of that pre-processing to do the analysis. But now I want to focus in on that.

Josh: Question about that role at Cargo. So you say you are doing data analytics but also data science. What were your actual as a data team of one? It sounds like you’re also doing data engineering there. What was the actual spread of work that you were doing between those different domains?

Jimmy: Good question. So one of the co-founders, Eric, kind of started off had started off and built the initial infrastructure setup of all the data in there. And so with a pretty lean stack, we used primarily this tool called Periscope, which is now I think has since it acquired by Sisense Insights might be under a different name now, but everything was basically done primarily in SQL and was basically tied to the Periscope platform. And so I think the split up was probably like 40% architecture, maybe like 30 to 40% like analysis, kind of meeting with stakeholders, figuring out how to move various KPIs or metrics. And then the last like 10 to 15% was data like hardcore like maybe or even basic data science. Yeah.

Josh: Okay. And it sounds like you you went directly from school or university into that role at Cargo. I’m also curious, as you shifted more to data engineering post that experience, what what were the major learning curves for you? Where were the the areas where you felt like you need to learn the most to go into data engineering full time?

Jimmy: Yeah, that’s a great question. So at Cargo, like a lot of like the the data stack, like I said, was just basically using Periscope. And so I don’t know how the tools well since the acquisition, but basically back then like you had a bunch of SQL statements that you would like save as Bitcoin as it materialized views. And so basically it was almost like a precursor to dbt if you could imagine we had to manage like Redshift data warehouse. And so it was a very like here’s one place to do all of work. The tech stack was literally just Periscope. And so I think shifting into a more like I guess a more specify role engineering, I think getting familiar with like almost working more like a software engineer, a little less like Jupyter Notebooks or just kind of like making these one off, one off ad hoc analysis and like really like building repeatable data pipelines, things like CI/CD environment variables like using tools, like dbt and then Airflow. And like this like kind of scanning a text across like kind of really getting comfortable with like doing more software engineering like things. That’s where I found a lot of growth.

Josh: Interesting. Yeah. We see a lot of data engineers that come from the software engineering domain and then get trained up in data, it sounds like you would be an example of someone going from the other direction, working first in analytics and science, and then getting trained up more in the software engineering aspect of data engineering. And I would imagine it helped your undergrad. I’m assuming undergrad was was in computer science, you probably had some of those primitives to help you get educated quickly on the software engineering side.

Jimmy: Yeah, 100% agree with that. Yeah.

Josh: Cool. So from Cargo, how did you get to Warby Parker?

Jimmy: So after Cargo, I knew that I want to really focus more on data engineering. I think I was I felt like I was at a crossroads between like kind of data science, data engineering. And I think I knew that I was definitely interested in both. And I knew that like if I really wanted to go, like I knew that there was probably a good chance I would have to go to grad school to like really become a true data scientist and really work on like these hard, complex problems like an undergrad degree might not be enough. Whereas with data engineering, I don’t think that was necessarily the case. I also kind of thought about it like what I kind of thought about. I was like, not only what am I interested in popular my good at. And so I knew I was a pretty like pretty good deep data scientist, but I knew that I was like a much better data engineer. Like, I had a hunch that I probably would be a much better data engineer. And so I kind of took like an estimated guess, I guess, and said, “Let me try data engineering.” I’m like, if this doesn’t work out. Okay, cool. I’ll go to grad school and become a Data Scientist, but let’s just see, like, is my hunch correct here? And so from there I like start applying to different roles that were more like on the analytics engineering data engineering space and was kind of curious like how much of my like very limited tech stack background kind of translate into like a more of a full time engineering kind of position. And so yeah, I interviewed for a bunch of roles and Warby Parker was one of them and I was really taken, I think I was really impressed by how much like the do good mentality of the company, like how much the good mentality has stayed with the company as they’ve grown, as it like, you know, now gone public. And so I think back in the day I accepted that offer.

Ryan: One of the craziest things not to make this about glasses, but I remember going into like the first like remember when it’s Warby Parker, they were all online and they had stores and it was like pop up stores that kind of came out. This sounds like the dumbest, smallest thing, but I just remember walking into the store with my glasses and the guy took the glasses. I was like, Hey, they’re kind of off here or whatever. He’s like, Gosh, you’re so come fix them and gave him back to me. And I was like, awesome. Like that took 10 minutes. Like usually, you know, before that just that small thing. Yeah, that type of service versus having to go to an eye doctor, schedule an appointment to give it to him to ship it off to someplace to fix it and send it back. It’s like it was that small thing. I was like, Yeah, I’m gonna buy Warby Parker glasses forever now. That was my one day.

Josh: I appraise.

Jimmy: Yeah definitely. I think the company definitely values, definitely tries to make the customer experiences as great as possible.

Ryan: Well, tell us kind of what you’re what you’re doing over there now. Like, what’s what’s the day in the life of Jimmy at Warby Parker? What’s what are you getting into over there?

Jimmy: Yeah, for sure. So, yes, currently at Warby Parker, I’m a senior engineer there on the data engineering team right now. What I think what I’m really focused on. So I just wrapped up actually a big project. But what I’ve really focused on for the last couple of months or so has been around our surveys. So when a customer walks into a store like whether or not they’re buying a pair, that they’re trying to get to the glasses fixed or they’re trying to buy or get eye exam or what have you, we like to survey our customers. We try to have a really good NPS score. So I think like, like, like you just said, like no one really looks forward to going to the eye doctor or getting your glasses. I think Warby Parker is trying to change that. And so we have a really high NPS score, I think like around 80 or so relative to like the industry’s like maybe around 30 or 40. So basically we’re launching a new survey that kind of helps measure like how do people interact with the store? Do they enjoy their interactions with like our employees and what can we do better? And so taking that data and then loading it into BI tool like OnLooker and then also mixing a bit with all these other, you know, internal data sources that we have that was kind of like the biggest, biggest thing I’ve had to work on over the last few months.

Ryan: So one of the main core topic that we want to get into, which basically backs into exactly what your what you’re doing at Warby Parker was that we had we had a discussion around the idea of playbooks and as soon as we said playbooks I got excited because I’m in marketing and I’m all about repeatable processes and playbooks to do things. I was we were talking about how this podcast has a playbook. Like I get on the phone with you and then we talk about that line and then we do the recording and then we take it off the editing. Then we do promotion. It’s like there’s a playbook. It goes into making this successful thing. So data engineering also has playbooks and you’ve built some of these playbooks over at Warby Parker. So the topic today is really like how can you save time and build some of these repeatable engineering playbooks? And let’s start out with talking about what is a data engineering playbook. How do you define that? And let’s get into some of the things you’re doing over there.

Jimmy: I think to define what’s typically a data engineering playbook is I think well, I think the best way to think about it is we narrowed down the audience. This is designed for another data engineer, at Warby Parker. So you can make certain assumptions. You can also you might have explained a lot of things as well. So like things like so I guess like to say in a nutshell, a data engineering playbook would be a repeatable procedure, a process that’s documented, targeted with the target audience of a data engineer or like. Well, the target audience is someone on the data engineering team at Warby Parker I think that’s a fair way to say.

Josh: What kind of processes get covered by a playbook?

Jimmy: I think these are things that are not that are expected to happen over and over again. These are definitely not ad hoc or one off. And so things like maybe installing this like we have internal tools use at Warby for our data engineering team, having a playbook on how to exactly install this stuff and make sure that how to make edits to it. One of the standards that we follow to kind of like make sure that there’s code consistency. That’s a great playbook. Example playbook. I think another example can be like if you’re starting a new project and maybe you’re starting a project, there’s like a certain team that works at different teams. At one, Parker have their own like set ups and processes and procedures. You can make a playbook that says, here’s like how you should work with this team as a stakeholder. And maybe that’s like just defining your deliverables in the beginning and then working backwards to engineer a solution as opposed to engineering and then having a back and forth of like, this is what they want, this is what we build. I’m kind of constantly chasing to close that gap.

Josh: Interesting. And where do you keep these playbooks? Are these and some just repository? Is this checked in like code? Is it kept in some drive folder somewhere? Like, how do you access this library of playbooks in the data engineering team?

Jimmy: Yeah. So we use Confluence as our kind of central source of truth for documentation. And so we have like a pretty like organized place in confluence, like where people can find these playbooks. And I think what’s been really helpful besides having Confluence because things can get lost in any sort of like wiki or docs organization you can imagine like having nested folders and such, but having breadcrumbs that like all kind of like lead to that place is really helpful. So in the JIRA, take you maybe being like, oh, like here’s where I like how to do X, Y and Z. Here’s a link to that conference page or an GitHub repo in a read me being like, hey, like you want to install this like internal tool that we use for like linked to like the conference page that has the exact page to it. And that way you have like no matter where you’re kind of stumbling to find this information, you always go to the same place.

Josh: Do you measure which playbooks are being used the most and use that as some gauge on how helpful they are?

Jimmy: That’s a good question. I don’t think we do that at the moment. We’re pretty rough on a smaller data engineering team. Have about five engineers right now and two new ones just joined at January. But I do think we’ve had a lot of like verbal feedback up to managers and like I’ve had a lot of feedback sent to me that playbooks have been helpful. So qualitatively, quantitatively, not so much at the moment.

Josh: What’s your favorite playbook that you have?

Ryan: I like the one that you told me about, which was you said that you had a playbook that basically cut the engineering time like 50 or 60%.

Jimmy: Yeah.

Ryan: I can’t remember which one talk about that one because I remember you talking about that and I was like, that seems like a lot of money saved for the company.

Jimmy: Yeah, that’s a good one. Right. So we had this project this like long. Yeah. It’s like the summary of it is I think this project where we recently opened a lab last year in Las Vegas, Nevada, and so it’s like an optical lab to assemble glasses and we needed to do some reporting on like audits. So making sure that the frames were like audited, inspected and that they’re, you know, correct and pass for quality and kind of other checks. And so we got a request from I think one of the I think goes one of the business systems teams about kind of like it built some report that builds like data tables in our database and they wanted to kind of combine the data and expose it more. And so initially, like, I, I like I, I think we were so we had just gotten, I think our PM access to it at this point. So I was still kind of PM’ing this project. And so basically it was unclear to me from like the ticket that I had what they actually want to see in locker. And so I like got together with the two stakeholders involved and said, okay, like here is like, like basically here’s an outline of like deliverable I’m going to build. Like I was basically writing on Google Docs. We were like, I was like, okay, here is like the look of view that’s going to get tasked as meeting for here is like if you read like and you look I’m I was basically writing wants to go look I’m saying like okay here’s like a column for the dimensions and I’m going to write, here’s the measures, here’s how they’re calculated here’s like how we format is very specific and basically being like, is this like how does this look? And then once we have that kind of an okay from both of you on this, I can go and build out the actual features. And so I it took about a week of like passed back and forth in terms of like, you know, getting details finalized, like to define like some of the columns they had built in the table and like checking with like I think that PM check with her engineers and like kind of verify the definitions are all correct but after that was all finalized and hashed out I actually began working on it. And so by working backwards and saying like, okay, I know exactly what I have to build now I. Saved me a lot of time in terms of like not building everything upfront and just like only having what they wanted in looker or heading back and forth. It was like, okay, well, I don’t need this column, so I’m not going to even like bring it in into dbt and then expose it on Looker like something that’s not needed. And so what I thought was going to be like a one week, one week project in terms of engineering end up being about, I think two days of engineering time. And so that was all because I knew exactly what to build and how to build it. And so that to me was like a huge win, I think. Yeah.

Josh: Are there any playbooks that you’re developing currently that you’re developing right now?

Jimmy: Yeah. Yeah. So we recently, as I kind of alluded to, we recently had a PM or I think our official title as a business analyst and she’s great, she started at the end of February. And one thing that we’re trying to do and I’m still trying to lead up is like an effort to kind of help her see like connect the dots between like the business requirements, like a technical kind of like the code that like changes are made. And so we taught her I’m in the middle of like now, I guess I think she’s pretty much up to speed on looking out she can actually go and make look ML changes so open to GitHub PR and she understands it fully. It’s not just like it’s not just her like typing and things like all and some like like employee fog playbook. It’s like actually understanding why this like dimension or measure is changing. And now we’re actually to the point where we’re actually going into dbt and like taking it even a step back and saying like, oh, this, this dbt model needs to change, and how does that impact looker? And so I’m currently working on a playbook of like how do you bring like maybe non engineers up to speed on what kind of data engineering tech stack or how do you kind of connect the dots of Google Docs that has like we need some X, Y, Z feature and here’s the code changes that are actually needed.

Josh: Interesting. Who decides what playbooks need to be developed?

Jimmy: That’s a good question.

Ryan: Jimmy does. Aren’t you like the head playbook maker over Warby Parker now?

Jimmy: Yeah. Head Playbook maker, yeah. H. P. M. Yeah, that’s my new title. No, I think engineers have a lot of autonomy to do like. I think in a data engineering team, we often correct each other, like to leave code in a better state to find it. And so I think part of that includes documentation playbooking. Like if, if you made a really cool, let’s say you made a looker explorer and like you were the only one that worked on it for four months. If another engineer has to go and make a change, hopefully you documented your changes. So I think it’s probably a case of judgment and like being a good engineer to your fellow engineers. Yeah. I don’t know if I have a good answer for that.

Josh: Do you do you treat the development of a playbook as similar to the development of a feature as a tracked, because it’s going to take engineer time. Yeah. So is it tracked in a similar way as pushing a new feature might be tracked? You have a ticketing system or a work tracking system that you’re using to stack that work along with sprints, maybe like the other the other features that you’re pushing out? Or is it its own highway of work that you handle separately?

Jimmy: Yeah, that’s a good question. I think I try to I could always make my work more visible. And so I think and I think the team probably follows a similar philosophy. So I think we have at least for our ticketing system internally. And so I was I always have a journey ticket that might be like, oh, like documentation for future X, Y or Z or I if it’s part of like, like, for example, the auditing work I just did, I might have it in the description of like all, like I also make like, here’s like the associated playbook that I made for it.

Ryan: Are there any people that, you know, as you start to build it sometimes, you know, believe or not, some people get annoyed with documentation and they just say, you know, screw this, I’m going to do what I want. I’m just going to go off. And figure out all this data stuff out myself. Did you have any pushback when you’re creating these at all or were people like, Oh, this is awesome. This is like the Holy Grail I’ve been looking for because it helps me out way, you know, saves me time and all that.

Jimmy: Yeah, no, totally. I think. No, actually, I definitely had people in other roles or other past lives not be as enthusiastic about documentation. I think I’ve been very fortunate that like I think the the leader on the engineering leadership team at work has been very supportive, like pro documentation wise. I think a lot of that comes from like having seen like I think every engineer like the data, the history of data at Warby Parker extends past like the current data engineers on the team. And so I think we’ve all gotten to a point where we’re trying to do something we to across some code that none of us have written or seen before. And it’s really poorly documented. And we were all like, Oh, this is like just a nightmare to figure out. And so I think that feeling like, like no one wants that. No one wants to cause that feeling, especially if you’ve experienced that feeling. And so I think, um, yeah, I think the team’s pretty. Like pretty in favor of documentation I would say. Yeah. Which is a good. Yeah. Good battles I guess. I don’t have to fight.

Documentation is a good thing. In my opinion, it is, I had to document and tell Josh that he can’t mess up the corporate PowerPoint deck a lot. I have to put in a huge slide in the very beginning and all red text says, do not edit this directly. Make a copy of it.

Josh: I like to iterate.

Ryan: Yeah, you like to mess up my stuff is what you like to do.

Ryan: I’m very waterfall. Well, as we get closer to the end here. Jimmy, I want to ask you a couple other questions. One of the things I want to ask you about is so we talked a lot about today. We talked about data engineering playbooks and how your path got there for the audience today. Like what is one thing you want them to remember? Or if you could just. What’s the one thing? Because I feel like our audience’s attention spans. Maybe they’re only listening for 10 minutes. What? What do you want them to remember?

Jimmy: I would say yeah, I would tell them to remember. Like to find patterns. Like maybe you already have your internal playbooks, company X, Y or Z. And that’s great. But like. I think back to like when I was passing on this question, like I in high school I did marching band and I don’t know if you’ve ever done like marching band or anything like that where it’s like, so I was in the pit. The pit doesn’t march per se. We have to learn how to march a little bit just in case. Switch to a different part of the marching band.

Ryan: Well, what were you playing in the pit, by the way? What were you playing?

Jimmy: I took the marimba. It’s like a big xylophone.

Ryan: My wife played the drum line, and she was the smallest, smallest bass. So she weighed, like, 90 lbs. And she had that. The small little like, yeah, yeah. Was going like this and hitting it. And anyway.

Josh: She’s also a T-Rex, apparently.

Ryan: Yeah. No, that’s how she looked, that was really funny, but that was in high school. But sorry. I just want to know what you’re. What you’re playing.

Jimmy: Yeah, I was the marimba from the xylophone, but I definitely couldn’t walk with that. But I remember the one thing I learned from that was like when we were learning how to march. Like, you can’t learn how to march. Like, you can’t remember. You can’t try to figure out like a distance to march. Every single time you go to march. Like at some point you have to like, just remember, this is like the exact distance. I forget what it was like. So I take a step, we march. And so I think a playbook is kind of the same idea, right? Like maybe you don’t call it a playbook at your other company or what have you, but like try to find patterns and like reduce like the work that you do. Like you don’t want to keep solving the same problems every single time. Like you have a stakeholder ask or you have some internal tool break or something like that. And so I think if you can like slow down a little bit and like figure out what those patterns are and build for those patterns, I think it’ll be a very rewarding process, hopefully.

Josh: You heard it here first. Data engineering is like being in a high school marching band. So appreciate the comparison there. That’s that’s a new one for us.

Ryan: It’s like that ties back to our name of our company, too. That’s wow. Full circle self-promotion look at that. We gave Jimmy $100 to give us our time.

Josh: Tying it back to Databand.

Ryan: Okay, so last couple of things, where can people go connect with you? Are you on Twitter, LinkedIn, Substack, Medium, anything like that? And then also tell us about any job openings at Warby so people can know about.

Jimmy: Yeah, for sure. So I I’m hoping to start a blog at some point, but I think the time being LinkedIn is probably the best place to go, just maybe just send me a message. When you send an invite saying like, hey, heard you on the podcast so I kind of know who you might be. Yeah, LinkedIn is the best place. And I’ll announce like where once I have my blog up and running at some point I keep putting it off a personal website. Is JimmyShah.io. That might be an easier place to find out about LinkedIn on there too. So I’m going to be easier place to connect and yeah, send emails and LinkedIn.

Ryan: Any data roles at Warby Parker to worry about should we tell anybody about that or they all full right now?

Jimmy: No, no, no. We just have some openings. Thanks for asking. So we’re currently looking for an engineering manager and a data engineering team. So that’s actually the team I’m on. So if you want to potentially be my manager, you could, I guess, apply. And we also have we’re I think we’re also looking for a staff or a principal level data engineer. So that could be both. Those roles are currently open and the data infrastructure team with classics adjacent to us helps us kind of, you know, helps with all of our services and like kind of the infrastructure, right, how the tools actually run there. I think you’re hiring for a few roles as well. So yeah, if you have any questions, feel free to reach out to me or yeah, just go apply directly and yeah, I recommend it.

Ryan: Awesome. Well, Jimmy, thank you so much for being on the MAD Data podcast and I hope we can hang out soon again. Are you gonna be at any conferences coming up soon?

Jimmy: I don’t think at the moment, but I will let you know otherwise. I don’t think at the moment anytime soon.

Ryan: Cool. Yeah well let people know on your LinkedIn. If you give me conferences as well, do some network connections over there, we’ll be at some conferences later this the summer and in the fall. And again, man, thanks so much for coming on the podcast.

Jimmy: Of course. Thank you so much for having me. It was a pleasure chatting.

Josh: Thanks a lot, Jimmy. Great having you on.

Stay Connected

Sign up for the newsletter