In traditional taxonomy construction, you approach the body of knowledge, head for your ivory tower with its closed room, and decide on the general structure of the field or discipline. From that single point, or perhaps multiple points, of knowledge, you are going to design a completely new taxonomy … and every now and then that works.
You might build on some previous knowledge, find similar fields, generally current word lists, preferably from your own data. Compare the lists linguistically if you have multiple lists, and then choose the terms that the users choose. Now it might be, particularly if you are doing a corporate taxonomy, that you have some distinct sets of users. Those would be, for instance, the bench chemists and the marketing people, or the human resources people and the engineers. So they might be looking at the same body of data, but they use different terms to talk about it. You need to think about whether you need to have multiple views for those guys, or some other way of accommodating variant vocabularies.
You might have a multinational corporation. Everyone speaks English, but some of them speak British and some of them speak American. Though the words sound similar, they are not always spelled the same. Different cognates are used to discuss individual items. You need to decide which language will be the ascendant language for the firm, and then make the others synonyms so that you can serve the entire corporation.
This is more often a problem than you would think. People think … it’s going to be English. Which one? A while back, we were working for a big multinational corporation, and for the most part the English and Spanish parts were pretty easy. The British and the American were not as easy in that particular thesaurus.
You might adapt a thesaurus. If you are adapting it, consider looking at Knowledge Organization, a journal that comes out quarterly from the International Society for Knowledge Organization (ISKO). It often lists schemes and thesauri.
The University of Toronto Library has a print collection of English language thesauri, the Subject Analysis Systems (SAS) collection, on the fifth floor.
ASLIB in the UK (which used to be the Association of Special Libraries and Information Bureaux and is now the Association for Information Management), has an Information Resource Center and has published a number of useful books. My favorite is the Aitchison, Gilchrist and Bawden book titled Thesaurus Construction and Use: a practical manual.
The American Society for Indexing has a list on their site of thesauri and tools for building thesauri. This group that used to be composed mainly of back-of-book indexers, and they are, to some extent, reinventing themselves as database indexers and taxonomists.
Other resources include several terminology registries. One of these is TaxoBank, which allows people to contribute their thesaurus information and even entire thesauri, particularly if they are re-usable under a Creative Commons license or the like. Several information management students have contributed interesting thesauri to this. So there are all kinds of thesauri listed, from the National Library of Medicine ones to the thesaurus on belly dancing and the thesaurus on ship building. There’s plenty of variety in there, and it’s a good place for looking for something that you might adapt.
If you are in the biology area, you might be interested in the BioPortal, an ontology repository that provides viewer access to some very detailed biology-oriented systems.
So, there are many places to look for inspiration for your own taxonomy or thesaurus. I’ve described just some of the available resources.
Marjorie M.K. Hlava, President Access Innovations
Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.