DataMesh - The modern architecture pattern that encourages product thinking!
2 Minute Read
What is this architecture all about?
It is a architectural pattern for large data-driven enterprises that help scale analytical solutions beyond a monolithic solution consisting of single platform and single implementation setup.
In recent years, many large enterprises have adopted modern analytical solutions that combine DW and Big Data technologies resulting in centralized source of truth this delays value creation and is a big bottle-neck resulting in huge backlogs. These solutions are ideal for small companies - they are easier to start and implement but difficult to scale. Data mesh's goal is to let distributed teams to work with and share information in a decentralized and agile fashion.
Building blocks of Datamesh
There are 4 building blocks of datamesh namely:-
Data Domain : Where you build/define boundary across your enterprise data
Data Product : Involves your data assets, code, metadata and related policies
Self service infrastructure : common infrastructure framework making data more discoverable
Federated Governance : Accessing data in a secured and governed manner
How to Implementing it?
AWS perspective : You can implement datamesh Via the Lakehouse approach. You could would start off by building a scalable datalake where you have all of your raw + transformed data - although this is centralized in conceptual sense however this is distributed in reality you can could write your own namespaces in S3. Then, you could automate data movement via Glue and send it out to consumers ( like sagemaker, elk, quicksight). All of these are managed via lake formation - which acts as the governance head ensuring permissions, governance and discovery. So, use the lakehouse approach to build a data domain and duplicate it across different data domains or line of business.
2. PowerBI perspective: Here, PBI datasets acts as data products where as your PBI service acts as a Data domain. Your PBI workspace provide the necessary governance wrt permissions for every line of business thereby making them the owners of their workspace content. As a end consumers you could connect to any of these certified/ promoted datasets and get insights.
* image taken from awsonlinetechtalks channel in youtube
Some pros include common technology stack, data-product ownership , decentralization ,common infrastructure framework and stronger governance practices against domain agnostic data.
Cons are potential diverging technology stack, not for small organizations and imbibing product thinking among everyone.
More info on datamesh can be got here -> Data mesh: it's not just about tech, it's about ownership and communication | Thoughtworks
How to create a modern CPG data architecture with Datamesh -> https://aws.amazon.com/blogs/industries/how-to-create-a-modern-cpg-data-architecture-with-data-mesh/
Love to hear your comments...Happy blogging!!