The ideal data operations (DataOps) org structure
An organization’s external communications tend to reflect its internal ones. That’s what Melvin Conway taught us, and it applies to data engineering. If you don’t have a clearly defined data operations or “DataOps” team, your company’s data outputs will be just as messy as its inputs.
For this reason, you probably need a data operations team, and you need one organized correctly.
So first let’s back up—what is data operations?
Data operations is the process of assembling the infrastructure to generate and process data, as well as maintain it. It’s also the name of the team that does (or should do) this work—data operations, or DataOps. What does DataOps do? Well, if your company maintains data pipelines, launching one team under this moniker to manage those pipelines can bring an element of organization and control that’s otherwise lacking.
DataOps isn’t just for companies that sell their data, either. Recent history has proven you need a data operations team no matter the provenance or use of that data. Internal customer or external customer, it’s all the same. You need one team to build (or let’s be real, inherit and then rebuild) the pipelines. They should be the same people (or, for many organizations, person) who implement observability and tracking tools and monitor the data quality across its four attributes.
And of course, the people who built the pipeline should be the same people who get the dreaded PagerDuty alert when a dashboard is down—not because it’s punitive, but because it’s educational. When they have skin in the game, people build differently. It’s good incentive and allows for better problem solving and speedier resolution.
Last but not least, that data operations team needs a mission—one that transcends simply “moving the data” from point A to point B. And that is why the “operations” part of their title is so important.
Data operations vs data management—what’s the difference?
Data operations is building resilient processes to move data for its intended purpose. All data should move for a reason. Often, that reason is revenue. If your data operations team can’t trace a clear line from that end objective, like the sales teams having better forecasts and making more money, to their pipeline management activities, you have a problem.
Without operations, problems will emerge as you scale:
- Data duplication
- Troubled collaboration
- Waiting for data
- Band-aids that will scar
- Discovery issues
- Disconnected tools
- Logging inconsistencies
- Lack of process
- Lack of ownership & SLAs
If there’s a disconnect, you’re simply practicing plain old data management. Data management is the rote maintenance aspect of data operations. Which, while crucial, is not strategic. When you’re in maintenance mode you’re hunting down the reason for a missing column or pipeline failure and patching it up, but you don’t have time to plan and improve.
Your work becomes true “operations” when you transform trouble tickets into repeatable fixes. Like, for example, you find a transformation error coming from a partner, and you email them to get it fixed before it hits your pipeline. Or you implement an “alerts” banner on your executives’ dashboard that tells them when something is wrong so they know to wait for the refresh. Data operations, just like developer operations, aims to put repeatable, testable, explainable, intuitive systems in place that ultimately reduce effort for all.
That’s data operations vs data management. And so the question then becomes, how should that data operations team be structured?
Organizing principles for a high-performing data operations team structure
So let’s return to where we began—talking about how your system outputs reflect your organizational structure. If your data operations team is an “operations” team in name only, and mostly only maintains, you’ll probably receive a forever ballooning backlog of requests. You’ll rarely have time to come up for air to make long-term maintenance changes, like switching out a system or adjusting a process. You’re stuck in Jira or ServiceNow response hell.
If, on the other hand, you’ve founded (or relaunched) your data operations team with strong principles and structure, you produce data that reflects your high-quality internal structure. Good data operations team structures produce good data.
Principle 1: Organize in full-stack functional work groups
Gather a data engineer, a data scientist, and an analyst into a group or “pod” and have them address things together they might have addressed separately. Invariably, these three perspectives lead to better decisions, less fence-tossing, and more foresight. For instance, rather than the data scientist writing a notebook that doesn’t make sense and passing it to the engineer only to create a back-and-forth loop, they and the analyst can talk through what they need and the engineer can explain how it should be done.
Lots of data operations teams already work this way. “Teams should aim to be staffed as ‘full-stack,’ so the necessary data engineering talent is available to take a long view of the data’s whole life cycle,” say Krishna Puttaswamy and Suresh Srinivas at Uber. And at the travel site Agoda, the engineering team uses pods for the same reason.
Principle 2: Publish an org chart for your data operations team structure
Do this even if you’re just one person. Each role is a “hat” that somebody must wear. To have a high-functioning data operation team, it helps to know which hat is where, and who’s the data owner for what. You also need to reduce each individual’s span of control to a manageable level. Maybe drawing it out like this helps you make the case for hiring.
What is data operations team management? A layer of coordination on top of your pod structures who plays the role of servant leader. They project manage, coach, and unblock. Ideally, they are the most knowledgeable people on the team.
We’ve come up with our own ideal structure, pictured, though it’s a work in progress. What’s important to note is there’s one single person leading with a vision for the data (the VP). Below them are multiple leaders guiding various data disciplines towards that vision (the Directors), and below them, interdisciplinary teams who ensure data org and data features work together. (Credit to our Data Solution Architect, Michael Harper, for these ideas.)
Principle 3: Publish a guiding document with a DataOps North Star metric
Picking a North Star metric helps everyone involved understand what they’re supposed to optimize for. Without such an agreement, you get disputes. Maybe your internal data “customers” complain that the data is slow. But the reason it’s slow is because you know their unstated desire is to optimize for quality first.
Once you have a North Star, you can also decide on sub-metrics or sub-principles that point to that North Star, which is almost always a lagging indicator.
Principle 4: Build in some cross-functional toe-stepping
Organize the team so different groups within it must frequently interact and ask other groups for things. These interactions can prove priceless. “Where the data scientists and engineers learn about how each other work, these teams are moving faster and producing more,” says Amir Arad, Senior Engineering Manager at Agoda.
Amir says he finds one of the hidden values to a little cross-functional redundancy is you get people asking questions nobody on that team had thought to ask.
“The engineering knowledge gap is actually kinda cool. It can lead to them asking us to simplify,” says Amir. “They might say, ‘But why can’t we do that?’ And sometimes, we go back and realize we don’t need that code or don’t need that server. Sometimes non-experts bring new things to the table.”
Principle 5: Build for self-service
Just as with DevOps, the best data operations teams are invisible, and constantly working to make themselves redundant. Rather than play the hero who likes to swoop in to save everybody, but ultimately makes the system fragile, play the servant leader. Aim to, as Lao Tzu put it, lead people to the solution in a way that gets them thinking, “We did it ourselves.”
Treat your data operations team like a product team. Study your customer. Keep a backlog of fixes. Aim to make the tool useful enough that the data is actually used.
Principle 6: Build in full data observability from day one
There is no such thing as “too early” for data monitoring and observability. The analogy that’s often used to excuse putting off monitoring is, “We’re building the plane while in flight.” Think about that visual. Doesn’t that tell you everything you need to know about your long-term survival? A much better analogy is plain old architecture. The longer you wait to assemble a foundation, the more costly it is to put in, and the more problems the lack of one creates.
Principle 7: Secure executive buy-in for long-term thinking
The decisions you make now with your data infrastructure will, as General Maximus put it, “Echo in eternity.” Today’s growth hack is tomorrow’s gargantuan, data-transforming internal system chaos nightmare. You need to secure executive support to make inconvenient but correct decisions, like telling everyone they need to pause the requests because you need a quarter to fix things.
Principle 8: Use the “CASE” method (with attribution)
CASE stands for “copy and steal everything,” a tongue-in-cheek way of saying, don’t build everything from scratch. There are so many useful microservices and open-source offerings today. Stand on the shoulders of giants and focus on building the 40% of your pipeline that actually needs to be custom, and doing it well.
If you do nothing else today, do this
Go have a look at the tickets in your backlog. How often are you reacting to rather than preempting problems? How many of the problems you’ve addressed had a clearly identifiable root cause? How many were you able to fix permanently? The more you preempt, the more you resemble a true data operations team. And, the more helpful you’ll find a data observability tool. Full visibility can help you make the transition from simply maintaining to actively improving.
Teams that actively improve their structure actively improve their data. Internal harmony leads to external harmony, in a connection that’d make Melvin Conway proud.