Continuing with our series on semantic integration, let’s address Data Visualization and Term Analytics.
A completely different kind of use is term analytics. We have talked a lot of text analytics in the past, where people take great big full text files and they run them through a lot of Bayesian, neural net, and latent semantic indexing kind of engines to figure out how to compare things. You could do that using a taxonomy instead and still figure out the strengths of the organization; what are the strengths in the publications; what are the emerging topics in your areas. You use people’s own data to address these questions and figure out the answers.
In this particular case, we took ten years of PubMed, ten years of IEEE, and ten years of the US Patents and we ran the MESH subject headings, the IEEE thesaurus, and the DTIC thesaurus against each of the three collections. You have 9 different sets of data. Then we compared them to see where is the field going? What is the next event? What are the trends? Since we were able to do it in one year slices, we were able to map that term space to show the distributions and figure out where the overlaps were, where things were pulling apart, groups that needed to be changed or augmented, and ones that could be enlarged or generally marketed-to, what are the new trends in the business, what are the new pieces of information that we need to deal with. What came out of it is a lot of different ways to display the same sets of data. This is exactly the same data from that 9 point matrix displayed in a number of different visual applications. This long line at the bottom is just wrapped around a circle here to show different ways that the data can be displayed. In this matrix up here, we have the blue and the red – red being medical, blue being engineering – and we are able to show where bioengineering, for example, overlaps. Good for business intelligence and data mining harnessing the concepts from the taxonomy.
A taxonomy really helps deliver things that are very precise and exactly what the user wants. You don’t have time on a mobile device to just give the user the normal million hits from Google. You want to provide very precise, very good recall so that what they are getting is everything but only the ones that are precisely on the topic, not ones that are moving around to different areas. Relevance to me is, as some of you have heard me say, a bit of a canard because it is just a guess. It is a confidence factor. I don’t really want your confidence that you are answering my question correctly. I want you to absolutely answer my question precisely.
Taxonomies in E-Commerce
E-commerce is yet another way to use taxonomies. You have seen this a lot in Amazon or LL Bean or E-Bay. Take a look at an Amazon site with primary categorizations on the right. These are the top terms in their taxonomy. When they display the second level, they’re displaying the second level, which is Arts and Photography as a First Level term and then they are displaying not only the Second Level terms but they are also displaying the related categories and the related topics. They are trying to get to a lot more areas in the same screen. If I go to Photography and click again, I would then go down to even a narrower selection. Then I can click on any one of those and get to their offerings. We don’t want to go any more than three levels. Keep it at three clicks for searching is the web concept. More clicks and the user has lost interest. In Amazon, I think most of you have probably ordered but it’ll say something like “customers who bought (whatever you bought) also bought …. Then they will show some thumbnails and the bibliographic citations based on the subject area or the taxonomy terms that are applied to these books. They are going to apply some additional content to the user so that he can link and serve more like these based on what I saw – what I was searching for. You could drive this from the taxonomy instead as a recommendation based on similar taxonomy terms.
Next week, we will finish up the series talking about SharePoint and Taxonomies.
Marjorie M.K. Hlava
President, Access Innovations