There is a lot of discussion and references to swamps in the news lately, but they aren’t talking about data. DATAVERSITY.net brought this interesting information to our attention in their article, “Data Lake vs. Data Swamp: Leveraging Enterprise Data.”

Data lakes and data swamps are different and the difference lies in how data is curated. A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. A data swamp is an unstructured, ungoverned, and out of control data lake due to a lack of process, standards, and governance. Data is hard to find, hard to use and is consumed out of context.

Data lakes make it easy for data scientists to mine and analyze data, require minimal transformation if any, facilitate automated pattern identification, and are good online archives. That is when the data in the lake has been identified, labeled, and refined through standards and processes. Without clear information about what currently exists, you can’t locate it.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.