Data lakes can be a fantastic asset, but they do need an array of elements to work properly. This is where semantics and graph databases in the data lake architecture become beneficial. This interesting information came to us from ZDNet in their article, “Semantic data lake architecture in healthcare and beyond.”
Data lakes can turn into data swamps. But what is the difference?
A data lake is built on top of cost efficient infrastructure. Lots of storage for cheap and schema-on-read. That means you can store all your data and more now and worry about it later.
Many organizations are doing just that and it results in a data swamp. A data swamp is a data lake where data goes to die. Without descriptive metadata and a mechanism to maintain it, you get a pile of data that is effectively unusable.
Health care systems are the perfect environment for data lakes and swamps. The data varies between structured and unstructured, which means a lot of it is just stuck in storage with no plans for retrieval.
Melody K. Smith
Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.