The web is full of unstructured data. There may be a few segments falling in line with semantic standards, but they are few and far between. The focus of search engines is on HTML, but that leaves much of the data undiscovered.
We found this on Programmable Web in their article, “Web Scraping Evolved: APIs for Turning Webpage Content into Valuable Data.” There are a lot of companies out there offering the solution of the day for gathering both structured and unstructured data. Whichever you choose, be sure to partner with someone who believes in standards. Proper indexing against a strong standards-based taxonomy increases the findability of data. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.
Melody K. Smith
Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.