Breaking Down Automatic Metadata Generation/Extraction
There are two approaches to automatic metadata generation/extraction and within those, many variations. Generally speaking, the statistical approach types are Bayesian, vector, neural nets, automatic clustering, and the like. These methods work off the principle that if in a big set of data two words occur together frequently, then those words are related conceptually.atistical methods depend on the algorithms that the developer has come up with. Some vendors lock down the numerical recipes and provide few or no user controls. These systems usually require “training”. The way that they do this is to take a list of terms, often a thesaurus, taxonomy or authority file and a corpus of text. The system manager processes the inputs with spot checking and often manual intervention by a human subject matter expert. When the system has been trained, test queries are run to verify that the system is performing as desired on fresh content.