The workflow in digital collections is evolving to accommodate the needs of linked data. This can get messy – like any change process will be. While many tasks are very similar to managing traditional metadata records, the consequences of data quality are much more severe. In the past, quality control was needed to ensure the user experience wasn’t negatively impacted. Even then, systems would intake messy or inaccurate data and metadata fields were not regulated strictly, leaving data quality something to aspire to.
The world of linked data is very different. In order for data to link to other data sets, it needs to be good quality data that is ready for the transformation process. The fault is not in our databases, but in ourselves. We can all clean up our data, even if we are not actively publishing linked data.
Data quality is fast becoming an important area of research and is impacting the future of our professional roles in libraries and archives. We will need to develop guidelines locally for data quality management, but it may be helpful for us to familiarize ourselves with the larger picture to help establish our own best practices.
We know how massive the amount of data in the world has become, and we have seen the need to understand and control it. We see the emergent patterns in that data, and we work with it to discover new avenues for viewership or revenue or education. But that’s using just a handful of data sets. No matter how large it might be, the size of that data pales in comparison to the data in the world.
Whatever the data is used for, the most important thing is that content can be found. Everything after that is useless, especially if it sits in the ether, hidden so nobody can read it. And as is likely fairly clear by now, metadata is absolutely crucial at this end stage, messy or not.
Melody K. Smith
Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.