The important role that meta tags play in indexing and making content findable has never been more needed. These pieces of information are added to the code of a web page that describe the page’s content, without actually appearing on the page itself, but act like invisible breadcrumbs for the searchers and queries to find the content for which they are searching. Harvard’s School of Engineering and Applied Sciences brought this to our attention in their article, “A newsworthy solution.”

For example, the Associated Press (AP) publishes 2,000 news stories each day, covering everything from international politics to incendiary pop stars, but those articles are only effective if readers can find them. To help content appear in relevant web searches, the AP applies metadata tags to upwards of 100,000 pieces of news media each day. But, with about 200,000 different tags to choose from and no clear way to ensure metadata accuracy, the tags are sometimes ineffective.

A team of students in the computational science and engineering master’s program offered by the Institute for Applied Computational Science (IACS) at the Harvard John A. Paulson School of Engineering and Applied Sciences spent the spring semester working with the AP to increase metadata accuracy, improve media discoverability, reduce customer complaints, and build ground truth. The students developed a mostly automated metatagging system that uses named entity extraction to identify and isolate people, places, organizations, companies, and other proper nouns in a piece of text.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.