Avoid data quality and version control problems from the start with sound data governance processes.
I’ve spent most of the past two decades in the data management world, implementing supplier on-boarding processes in EDI and eProcurement programs, and managing the implementation of an enterprise-class MDM solution. One recurring theme exists in all three areas: data needs to be maintained with a “single authoritative source” mentality.
As companies transition from packaged and/or “home grown” legacy applications to our enterprise MDM solution, they often times take the opportunity to address an all too common problem: bad legacy data.
As those of you who have taken the “data cleansing journey” know, you can’t just fix the data and stop. You need to take a much more holistic view, seeing it as part of an ongoing data governance process. Otherwise, your newly cleansed data can quickly become “unclean” again.
Experience has shown me that one of the quickest ways to corrupt data is to allow multiple “authoritative sources” for the data. I recently witnessed an excellent example of this. A customer was implementing our Enable Product Information Management solution, including a server-based “shared document” storage area and our Enable Catalog Publisher’s bi-directional update capabilities.
The concept was sound; approved and validated data was pulled from the PIM repository to create a new or updated publication. The publication was pulled down either in its entirety or in sections that could include one or more pages.
These pages were then stored in the shared area to allow for multiple resources to work on them in a staged fashion, i.e., Designer One would make their changes, place the page(s) back in the shared area and notify Designer Two that it was available for their updates. Once the process was complete the pages could be sent to the printer.
The best laid plans…
In reviewing the process from a data governance perspective I saw no red flags. As noted above, they planned to make use of our Enable Catalog Publisher’s bi-directional update capabilities to move content changes made on the desktops back up to the publication files.
Building a process to post them to a “staging area” where they could then be picked up by our workflow engine and processed through an extended item maintenance process would ensure that the data was still being syndicated from a single authoritative source (the PIM).
The implementation that the customer chose, however, ensured a data governance nightmare:
It was decided that the changes made by the designers would be posted to the PIM with no reviews or approvals from the source data owners. This meant that the page being worked on could contain data that was inconsistent with the PIM repository.
There was no integration between the upload process and the item maintenance process. This meant that any changes that failed the validation rules weren’t promoted to the “production” PIM repository, and no notification was sent to people to take action to correct the issue. This resulted in the authoritative source (and any other application it fed) being out of sync with the publication in the shared folder.
The data was being passed to the PIM anonymously; the designer’s desktop was never logged in or authenticated in the solution. So there was no way to send notification of the failure to the designer. In addition, since the data “owner” wasn’t involved in the process, their source (be it a spreadsheet, a manual form, or another system) would be out of sync.
Establishing the primacy of the authoritative source.
This was compounded by the PIM being fed product data from another system. Any changes made to data in that originating system would overwrite the designer’s impromptu changes in the PIM; this would cause further problems as the designer wouldn’t think to pull down fresh replacement pages (from their perspective the pages on the shared drive contain their “authoritative source” of data).
The document they were working on would end up being out of sync with all other consumers of the data, resulting in catalogs and flyers being printed with data that was inconsistent with their web site and other channels, and the next catalog build would have the “old” data so the work of the designer would have been lost.
As each of these processes was independent, there was no systematic way to keep all the key stakeholders informed of what was taking place. From a data governance perspective, the process was flawed from the start.
Simply put, sound data governance dictates that there is a single authoritative source for every attribute, and data can only be changed in that authoritative source.
In the case described above, the appropriate process would have been to have the changes that the designer made put in a staging area, and have workflow notify the data owner of the needed changes. The changes would then be made in the application designated as the authoritative source, resulting in them being syndicated to all consumers of the data, including the designer. This would ensure consistency of the data across the enterprise.
Recognizing that there will be cases where expediency requires changes outside the authoritative source (like the changes made by the designer on their desktop), the prudent approach would be to introduce workflow processes to manage communication across the stakeholders and integration with the established item maintenance processes to ensure that the changes are propagated throughout the enterprise.
Bottom line, data governance is not just about the content. It’s about sound business practices that ensure that your data is being managed appropriately. Maintaining the single authoritative source rule in your enterprise is fundamental to achieving your data governance goals.