The recent MarkLogic User Conference was a watershed event for the publishers in attendance, many of whom are just beginning to strategize about the application of semantic technology to their content. After years of hearing “the Semantic Web is coming,” the message this time was that it’s no longer about “what” or “why,” but “how” publishers will leverage this technology. It has been 10 years since Tim Berners-Lee, Jim Hendler, and Ora Lassila announced the creation of the Semantic Web, so many of us were very excited to hear Jim Hendler’s update on current developments. Some key themes of his presentation were already covered in this article from August, 2010 in New Scientist: Google, Twitter, and Facebook Build the Semantic Web. With his trademark slogan, “A little semantics goes a long way,” Hendler added some further context, and described how these companies and others have tapped into social and commercial drivers to promote relatively simple approaches to solving the problem of getting content tagged, and thus increasing the ability for computers to understand the meaning of text across vast amounts of Web content.
Facebook’s recent changes to their Open Graph Protocol now allow web developers to place multiple “Like” buttons on pages that are dedicated to specific subjects, like a page listing movie titles or restaurants. When Facebook users click on a “Like” button, they are providing a huge amount of information about that page to Facebook’s servers – not only what the page is about, but right down to what various demographic groups of users think about each entity on the page. Google’s acquisition of Freebase, the database of user-created categorized entities, will eventually enable Google to monetize (via AdWords) an ever-growing long tail of web content, down to individual facts within a Web page. Google is also supporting the “Good Relations Vocabulary,” which is used by major media sites such as Yahoo! Major news publishers like the NY Times have embraced the rNews standard, which links company names and products found in articles to more in-depth information (and potentially more pages to monetize!). The Twitter API now allows “annotations,” which can contain information far beyond the 140-character text limit, including links to web pages, user profiles, and categorization that functions as additional descriptive metadata for processing and analysis.
At the same time, initiatives by government agencies in the US and UK, using other extensions of the OWL Web Ontology language, are making massive data sets available for mashups. For example, the emerging XBRL standard for business and financial reporting is enabling analysts to query government economic data, corporate 10K filings, and other public data without creating complex data models. Organizations like the Sunlight Foundation are using these capabilities to help make the interaction of corporate lobbyists and government more transparent. Taken together with the news and social tagging enabled by the above consumer services, nearly anyone can ask a serious question, obtain detailed data to support a point of view, and publish their analysis to promote their findings.
The Linked Data Cloud is now up to 40 billion links. For professional and scholarly publishers, this means that it is now possible to turn documents into research portals by linking the key concepts that are contained within deep archives of journal articles and books to structured data stores that are produced by the organization or available on the open Web, including databases of plants and other organisms, chemical names, place names, and people directories. This creates new opportunities for engagement with users, and significant competitive advantage. At Access Innovations, we have been helping publishers get ready for the Semantic Web for over 30 years, by enriching content and with tools that support the knowledge management standards that are core to this technology. Talk to us about how we can help you get on board this new technology that is transforming the way people use content.