- To find and get consistent reference data -- e.g. products, customers -- that are produced and managed in different operating units with different systems and
- To resolve conflicts or ambiguities in attribute values such as product descriptions, categories, short descriptions.
The traditional approach is to select a master data management product, consolidate all data metadata and centralized data management and governance and governance. There several challenges with this approach:
- Taxonomy and metadata management: defining a unified taxonomy and metadata and the associating management and governance processes across many different business units is often intractable.
- Data models: trying to define and management a consolidated data models that fulfill all business application needs and constant changing business or operational requirements is also quite intractable.
- The data representations in applications are often tightly coupled with backend data models.
The result is usually a multi-year and multi-million project ends with cost overrun and a lot of disappointments. Companies often continue the same cycle every few years -- with different vendors. It brings multi-billion dollar business benefits for master data management software vendors but probably but not so much for their customers.
Is building a consolidated master data management the only path? what kills using this approach is the complexity unifying data models and processes of different operating units. Most businesses are not in a stasis mode. By the time models and processes are defined, things are most likely to have changed.
Another approach is focus on information integration instead of consolidation using Linked Data information architecture and reduce project complexity by
- Enabling divisions to manage and govern their own taxonomy, metadata, and data; and publish data dictionary using standards such as RDF via API for internal or external consumption.
- Decoupling data representation and backend data models
- Retrieving data through provisioned, well-defined, and published information services API.
Example:
A product is represented by a graph with its attributes as leaf nodes or a linked node to another entity. For example:
A reference architecture for using linked data for MDM:
- Data Services: WSO2 Data Services
- API Management: WSO API Management, WSO2 BPS
- Search: Elasticsearch
- Graph database: Virtuoso, Allegrograph, Blazegraph, MarkLogic
- Ontology management: neologism, TopBraid, SmartLogic
- Link discovery: Silk, LIMES
References:
Data linking using genetic algorithm
Linked data and semantic service at Thompson Reuter: http://new.opencalais.com/