There’s been some discussion recently in the Taxonomy Community of Practice LinkedIn group about free or low cost thesaurus management software. I’ve noticed a dearth of postings about using Excel, a very popular tool, particularly if you already have a Microsoft Office license.
Experts disparage Excel as a tool, but it can provide a way to start your thesaurus development. And, if you are mindful of organizing your Excel worksheet so that its data can be imported later into a dedicated tool, you can achieve some important objectives. Excel is indeed the most popular thesaurus management tool. (see Taxonomy & metadata strategies for effective content management workshop slides in which taxonomy expert Joe Busch reiterates this.)
First requirement: a hierarchy
It’s not enough to collect the terms that represent the “aboutness” of your electronic collection; it’s also important to put them into a “human usable” format. A hierarchy, or tree, displays the relationship between a broader, more inclusive expression of a concept and increasingly narrower, more specific examples and instances of it. The breadth of your term list, as represented by the top terms, provides insight into the breadth of your content collection. The number of levels in the hierarchy, when representative of groups of content items, describes the depth of the collection.
A hierarchy is very easy to create and maintain in Excel. The columns provide the levels, and a row for a new term can be easily added with a right click on a row number and the choice of Insert. Synonyms, scope notes, and related terms (the term record) can be added in columns beyond the last hierarchical level (more about these next week) because Excel does not “complain” about empty cells within a row, it just doesn’t like empty rows within a text ‘list’. Dedicating the first three to six columns to terms only allows you to adjust the column widths until you get a pleasing hierarchy display. And you can select just the columns that include terms as the “print area” to print out a hierarchical list.
Dedicated software tools allow collapse and expansion of the hierarchy to assist in changing focus from general to specific. In Excel, you’ll need to manually select groups of rows (the branches of your hierarchy) and identify them as an outline group (Data ribbon > Outline > Group).
To get a list of the top terms, you can copy Column 1 of your Excel worksheet to another worksheet, add a column label, select the column, and use the Sort and Filter section of the Data ribbon with “Unique records only” checked to remove the blank rows. Use Sort A-Z function on the Data ribbon to put them in alphabetical order.
A Different View
Dedicated software provides different views of your terms, for example alphabetical, permuted, a list with full term records. The easiest of those to reproduce in Excel is an alphabetical list of terms. One way to accomplish this is to copy the term hierarchy columns to a new worksheet and use a concatenate formula in the next empty column (=CONCATENATE(A1,B1,C1,D1,E1,…) to get all your terms in a single column. Copy the formula down: Home ribbon > Editing > Fill > Down to the last row of your hierarchy to display all the terms in a single column. To convert from displaying the results of your formula to an actual list of terms, copy the column and use the Paste Special > Values (a right click option, or from the Home ribbon > Clipboard section) to create the list. Then, select the list (click on the first term and hold down the Shift key while you tap the End key and then the Down arrow (↓) key) and use the Sort A-Z function on the Data ribbon to get them in alphabetical order.
Exporting your list
To export your list as a delimited text file, Choose the “Save As” option from the File button/menu. Give the export file a name and in the next “Save as type” field, pick Text (tab delimited)(*.txt) option. This means of export saves the entire sheet (but one sheet only). Using tabs as the delimiter prevents a comma appearing in a scope note from throwing off your export. If you need the terms only, copy the hiearchy columns to a separate worksheet first. When the webmaster asks you for the terms in just the top two levels (and maybe for only selected branches) of your taxonomy, copy the columns or branch sections to a new worksheet. In about four steps, and using a formula combining Excel’s IF and CONCATENATE functions, you can deliver just what’s needed.
As your term list grows, you’ll want to check that there are no gaps in your hierarchy (e.g., a 3rd level term as ‘child’ of a 1st level term) and that you haven’t used a top term somewhere else in the hierarchy (which violates the NISO Z39.19 standard and ultimately will cause you trouble). You’ll probably need a Visual Basic program for your worksheet (a sophisticated macro). If you’re not a programmer, sharpen those negotiating skills to get your talented colleagues to develop one for you. Or, consider dedicated thesaurus management software.
Even though Excel may be a place to start in building your taxonomy, you’ll want to be aware of its limitations and which of them are “deal breakers” for your project and career. Begin evaluating dedicated software from the start.
The Data Harmony thesaurus management software produced by Access Innovations takes care of keeping your thesaurus hierarchy standards-compliant as well as offering many other features that make the people building and using a thesaurus significantly more productive.
Mary Garcia, Lead Technical Support Specialist
Access Innovations, Inc.