There’s a lot of discussion lately about Master Data Management; what it is and isn’t. Why is this of interest to the taxonomy practitioner? To a large extent a controlled vocabulary is trying to accomplish the same thing at the conceptual level as MDM is at the data level. MDM removes ambiguity at the data level through a variety of means. The how and why of MDM can provide valuable insight into the data workings of an organization. This can lead to a much more powerful enterprise taxonomy – one that can truly span the vast store of enterprise content.  An MDM initiative is an ideal place to insert a taxonomic strategy and implementation.  Making the business case for a taxonomy initiative, getting it funded, approved, and implemented will be easier if you can align yourself with an MDM initiative. An MDM deployment to succeed must analyze and potentially process all of an organization’s data stores. Resolving data and structural ambiguities is also a great time to resolve conceptual ambiguities.
Taxonomies overcome conceptual ambiguities. However, MDM is not so much about aligning structures as removing conceptual ambiguities at the data level. A taxonomy term applied to a digital object gives a sense of the “aboutness” of an object. The taxonomy itself helps to resolve meaning between concepts, conceptual synonyms, etc. But MDM is mostly about what is the correct value of this entity and is this entity as found in twenty-two other systems inside our company the same entity? One presenter recently stated that a data analysis showed more than 80 variations of Circle K within one company. MDM is all about finding the variations, determining equivalents, making a decision as to the preferred name, and then either changing all of these instances or mapping to the master data record. Not so much structure as data element integrity. The structural aspects come when you’re dealing with rational data bases with row and column names. Here you want to determine if a given column name is the same across databases. The bigger challenge comes when you determine that several columns are the same; then you have to begin the process of matching data elements.
In object oriented systems, or repository systems, unlike rational databases, digital objects are isolated. MDM becomes somewhat more important.  In an XML repository system, any XML digital object can be stored within the one repository. This makes the need for MDM and taxonomic “labeling” imperative because not only can you have variations in a company name, for example, there might not be any standards in operation for the XML element name for a company. It could just as likely be <CO> as <Company>. Equally important, you want to know what the object is about. When you can easily have a multibillion item repository, resolving meaning is critical.
Many of the challenges in taxonomy and MDM are similar. A concept can have many terms used for it and near terms such as chair, stool, seat, etc. Selecting a preferred term for a concept and then mapping synonyms is a big part of a taxonomist’s work. Mapping spelling and language variations must be done. Selecting between the use of term and its acronym must be done. There are many things to be done in building a controlled vocabulary such as a taxonomy. The same is true when dealing with data structural variations such as the example of variations in an XML element name for company. One must also select the preferred variation of a company name and map to it all variations.

Check out this article on the subject.

Aligning your taxonomy efforts with the MDM team could be a very good move for you and your taxonomy team.

Jay Ven Eman, CEO