Why we’re moving away from centralized data for good
Fellow data wranglers, meet Oliver Bauer – Senior Director of Data Governance, Architecture & Engineering at Vistaprint. Eighteen months into ensuring excellence in the Data and Analytics team (DnA), Oliver shares how the journey is going and why it’s goodbye data warehouse.
Have you ever wondered how to decommission a giant data warehouse? The characteristics that make it hard to work with are the same ones that make it hard to take apart.
- First of all, the sheer size of the thing
For years, like most organizations, Vistaprint had a team diligently collecting datasets from different operational databases and merging them into a single centralized view. The idea being that we would be extracting endless gems of valuable insight. But to use a fraction of data there was a mountain of maintenance: the gain just wasn’t worth the growing complexity.
- Second, the warehouse gatekeepers
When our data analysts wanted to build a new dashboard or algorithm, requests for new data had to be funneled through the warehouse team. A silo of specialized but domain-agnostic engineers who were a few steps removed from content or business goals. This led to bottlenecks and backlogs we’re still dealing with today.
- And third, how to effectively govern the data?
The responsibility for data quality was distributed between the data warehouse team and the teams where the data was sourced – from across site, marketing, manufacturing, customer care and so on. Domain-agnostic data engineers, with a restricted view of end-user applications, were left to make sure the data was clean, correct, meaningful and fit for purpose.
All in all, it’s a difficult model if you want any degree of flexibility, scale, or speed. And because we’ve got big plans at Vistaprint – to become an iconic, kick-ass data and analytics pioneer – we need flexibility, scale, and speed in spades. Moving away from centralized data will empower our internal stakeholders with the insights they need to better serve our small business customers. So, this is why we’re starting from scratch with an exciting new decentralized, domain-oriented approach: the data mesh (see here and here to read Zhamak Dehghani’s awesome write-up on Martin Fowler’ blog).
Embracing the data mesh: three reasons
You might know of similar ideas from the Data Hub Model, or Piethein Strengholt’s book Data Management at Scale. These concepts break down the monolithic, domain-agnostic data warehouse into smaller components, treated as products, with owners embedded in each business domain. Product owners become tasked with creating, curating, and maintaining their data sets, dashboards, or APIs so that other users, both within their own and other domains, can easily discover and consume stable data.
Here’s why I love the data mesh approach so much, personally and professionally.
- The power of a multidisciplinary team
The data mesh model brings data brains and domain brains together into the same team. No longer do we have deep domain knowledge sitting over here, data engineering and analytical expertise over there, and visualization experts somewhere else. In our DnA data product teams, business analysts, data engineers, data scientists, product managers and domain specialists all work together within 1 nimble data product team. Imagine the firepower that comes out of a silo-free structure like this.
- Customer-obsessed innovation
The Vistaprint mission is to become the marketing and design partner of choice for small businesses. That means we’re in the business of creativity. The data mesh approach gives us the freedom to play, ideate, test and learn in a way that the congested monolithic warehouse never did. For example, it allows our promotion specialists to experiment with deft, next-generation pricing algorithms and ultimately deliver our everyday fair price promise to customers.
- It fits my personal mission
I believe that good, unbiased, fast decisions are at the heart of growth – for our small business customers, for people, for society as a whole. My role in the DnA leadership team is to develop excellence from an organizational and a technical perspective to help decision-makers – colleagues and customers alike – make well-informed decisions. And the data mesh model has all the right ingredients for smart decision making: deep business expertise, sophisticated analysis, and digital insights.
But starting from scratch with your data infrastructure will always be a long and bumpy ride for a large organization. Here are some of the road-bumps we’ve navigated so far, and are still navigating:
Moving to the mesh: five lessons for data-driven businesses
- Have strong foundations before you start
Setting up organizational and technical foundations is paramount. One key element is the data domain map: you need to break down your monolith using a ‘city map’ of your data with all its districts, recognizing which streets act as a domain (and team) boundary. The data domain map also helps to ensure a single source of truth per data element. Another key element is a strong data stack: the better the data platform, the easier the transition. (Note – not all the tools you’ll need to move over exist yet, a bit like APIs in software engineering ten years ago.) Teams need to have a set of relevant tools at their fingertips to ingest, process, transform, and catalog data sets and expose these to other teams. For discoverability, a data catalog will ensure central oversight over a decentralized system!
- Foster product thinking for data and expect it to be hard!
Data is often seen as a by-product – by engineering teams and those within the data space. So, to transform all teams across your organization into data evangelists and adopt data as a product, change management is indispensable. To succeed, all data producers need to embrace full (product) ownership of their data. It’s why cross-functional data product teams work so well. A complete skillset in one place means you can solve business problems and address all items in your backlog. With the product owner, data engineer, data scientist and analyst working alongside, data product teams can now cover the end-to-end value creation of their data products.
- Nurture fast-paced innovation and grow communication
The decentralized model can be a fertile ground for innovation: with each newly developed data product, you can try and test new things. While local innovation is great, you don’t want each data product team reinventing the wheel. Best practices must become reusable blueprints, elevating the tool set of the data platform stack. As always, ‘optionality’ wins. In parallel, data producers and data consumers have a heightened need to communicate as they negotiate mutual requirements (and they might not even know each other). If contact falls short, the data mesh will eventually fail. Thus, it’s important to think about how communication can be structurally increased and how to build strong bonds and trust between teams.
- Ensure interoperability between data products
Interoperability is critical for the data mesh as combining data between domains is necessary: you need site and transaction data to get a conversion rate, as a quick illustration. This starts with the right architecture (see There’s More Than One Kind of Data Mesh for more on the topic) and platform fragmentation needs to be actively managed. Secondly, speed is the reason there’s no central data model anymore! So you must hit minimum standards for interoperability. For example, the naming of keys or naming conventions like CamelCase or SCREAMING_SNAKE_CASE. Also consider too many standards kill innovation – the right balance needs to be found. And sometimes breaking (and consequently iterating) your own standards is the wise thing to do to keep moving fast.
- Data quality in the data mesh
Teams now want to formalize their mutual expectations in data contracts. Stay tuned for another blog…
We are 100% committed to the decentralized data mesh approach (and there is no way back for us!)
Why? The core benefit of a decentralized approach is finding the best solution for the problem at hand. After all, data product teams like ours are on the hook for running and maintaining what they create: it’s a perfect way to drive excellence.
And as an autonomous, inquisitive bunch, we’re finding data mesh architecture suits DnA beautifully. It’s a pillar that unlocks analytical data at scale, propels personalization, and helps Vistaprint become a leader – serving every small business in a uniquely familiar way.
Does ‘mesh’ mesh with your organization’s culture and strategy? Feel free to connect with me on LinkedIn – I’m always happy to chat. Or, if you’d like to join me and the DnA team in figuring out some cutting-edge answers to more exciting data conundrums that no doubt lie ahead, view our careers page.