November 1, 2010 – I attended the 2010 meeting of the American Society for Information Science and Technology (ASIS&T) in Pittsburgh. There were quite a few papers and posters I found aligned with taxonomies and the whole area of linked data, semantic implementations and the Dublin Core. The DC-2010 conference, sponsored by the Dublin Core Metadata Initiative (DCMI), was held immediately prior to ASIS&T in the same hotel so the overlap in participants and programming was spot on for my interests. The ASIS&T annual meeting is the main venue for disseminating research centered on advances in the information sciences and related applications of information technology. It has veered heavily into usability for the last few years and this change back to mainline information science was refreshing!
I moderated a panel discussion on Knowledge Organization: Evaluating Foundation and Function in the Information Ecosystem put together by Jane Greenberg. The panelists (Hollie White, Denise Bedford and Gail Hodge) addressed the complexity of our digital information ecosystem, as the information transfer process becomes increasingly, and in some domains fully digital. Indicative of this change are entirely new ways in which individuals and information systems generate, provide access to, and link information. In line with this change is a growing need to better integrate and leverage knowledge organization systems (KOS). The presentations were great, the room was packed, and the subsequent discussion was quite lively. How do we know if the work is a “good” taxonomy or not? Effective means are needed to measure the application of KOS as both an integral foundation in the information ecosystem, and a core function. There are standards for creation but they do not really address the need for evaluation and benchmarking. Many people have joined the ranks of “taxonomists” in the past five years. Some are excellent. Some are not. How does the organization in need of this basic building block for linked data and semantic web applications know if the created taxonomy will work for their needs?
We need to create standards. Starting with the basics, a taxonomy itself is a KOS, one of the many kinds of Knowledge organization systems. Granted, it can range from simple to highly complex (semantics), but at its core is controlled vocabulary. Evaluating it — measuring its accuracy and determining its relationship to the knowledge domain – starts with getting it built. Identifying the parts of a KOS and more specifically, parts of the taxonomy, ontology, and thesaurus, is important to an effective evaluation. Then we have to apply it , to real data, and it has to be scalable for increasingly large collections of data. NKOS – the Networked Knowledge Organization Systems will take up the question. I am proud to be a member of the group. Marcia Zeng runs the discussion list and keeps the records for the organization here.
We have made the first step , identification of the need and the second step, identification of an ad hoc group to work on this challenge. I was tasked with working on functional requirements for KOS descriptions and potential registries. It is only the beginning in the need to develop standards and guidelines for KOS implementations. This set of meetings brought together KOS researchers, implementers, and developers to examine and share KOS approaches and evaluation strategies. The meeting also brought Linked Data, Dublin Core, publishers and programmers together for a joint discussion, often in the halls or over drinks at receptions. All felt the need to develop a deeper understanding of evaluation methods and gain a picture of evolving framework for assessing KOS. Hopefully after sharing our information, new approaches to using these systems effectively was considered and we walked away with further insight into the research needs and priorities for KOS.
Marjorie M.K. Hlava
President and Chairman
Access Innovations / Data Harmony
Regarding the evaluation of a vocabulary, I’d say this is the place where usability and controlled vocabulary intersect. The value of a vocabulary is in its use to find content. And users are the ones looking. So statistics like the number of times a term from the vocabulary is used 1) as an indexing term for content, and 2) as a search term (either entered or selected) can be used to evaluate a vocabulary’s effectiveness. Terms used seldom aren’t “carrying their weight”- maybe they just are not valid for the content. A comparison of the words in search logs with the terms (including synonyms) in a controlled vocabulary should be a good indication of gaps in coverage by the vocabulary.
But for users, having access to those controlled vocabulary terms makes all the difference. We users are still looking for search engines that will turn our query into one of the preferred terms in that vocabulary so we get the benefit of the indexing, whether by presenting terms to us to select or by making an inference from our typed entry, not a small feat.