Data quality has never been more important. It is the measure of how well suited a data set is to serve its specific purpose.

When data is deemed to be of low quality, it is likely that the inputs are to blame. The manner in which data is collected will have a lot to do with the quality of what lands in a database, and what is ultimately retrieved. Organizations who wish to improve their data quality will inevitably need to address input issues with data quality tools.

Poor data quality controls at data entry are fundamentally where this problem originates. As any good data scientist knows, entry issues are persistent and widespread. Adding to this, practitioners may have little or no control over providers of third-party data, so missing data will always be an issue.

Maybe instead of looking at the number of data sources as a problem, they should look at it as a benefit that technology has progressed to match the organizations’ needs. Front-end tools generate metadata and capture provenance, then data cataloging software manages the disparity in sourcing.

We do have to continue to push a cultural change around data, encouraging people throughout the organization to ensure data quality, governance and general data literacy.

Despite the sheer amount of data being a top concern, it will be hard for any organization to reduce the number of data sources it has. If anything, this is only likely to increase over time. Some other common data quality issues point to larger, institutional problems. Disorganized data stores and lack of metadata are fundamentally a governance issue and, with only 20% of respondents saying their organizations publish information on data provenance and lineage, very few have adequate governance.

Data provides insights in real time. When you can trust the quality of that data, it will lead to sound business practices. Organizations can achieve quality data by using a master data management (MDM) methodology that defines how data is collected, aggregated, matched, consolidated, checked for quality and distributed. Much like a taxonomy, a MDM provides consistency in data and data processing.

Data Harmony is our patented, award winning, artificial intelligence (AI) suite that leverages explainable AI for efficient, innovative and precise semantic discovery of your new and emerging concepts to help you find the information you need when you need it.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.