by Scott Denning

Traditional thesaurus design has necessarily been a matter of alphabetization of the terms. While different views – permuted, hierarchical, rotated – offer different insights into the relationships of the terms in the thesaurus, the structure within these views is still ultimately alphabetical.

Though alphabetization is at the core of organization in language, the necessity to hold to an alphabetical structure may lead to reluctant and sometimes unwieldy compromises on the part of thesaurus designers. Non-alphabetical expressions and uses of a thesaurus have often necessitated the application of a separate term management program, and thesaurus designers have had to be mindful of the program(s) that might be used to output the thesaurus.

With the creation of the Notation Module for thesaurus construction, Access Innovation’s Thesaurus Master™ presents a new way to approach thesaurus design. The Notation Module allows user-assigned notations, either alphabetical or numerical, to be prepended onto thesaurus terms, while still maintaining the ability to view the thesaurus in traditional alphabetized view. Thus, a thesaurus may be structured in Notation View for the unique requirements of a user, while still being available in a hierarchy view consistent with ISO/NISO and ANSI standards. Based upon customer requests, the Notation Module is designed to work as part of Access Innovation’s Data Harmony suite of thesaurus management products.

The ability to annotate thesaurus terms opens up new possibilities in thesaurus design. While notation of thesaurus terms is not a new concept, the Notation Module allows annotated forms to exist in parallel with traditional structures, allowing thesaurus structures to more accurately reflect some of the structures found in business taxonomies and process structures.

For example, a thesaurus structured alphabetically

— Biology
— Chemistry
— Mathematics
— Physics
—-Nuclear Physics

might be annotated numerically to give “Physics” order primacy in the thesaurus:

1. Physics
1.1: Nuclear Physics
1.2: Astrophysics
1.3: Geophysics
2. Mathematics
3. Chemistry
4. Biology

Note that the order of the above hierarchy is based upon the prepended numerical notation, and not alphabetized by the initial letters of the terms. Also note that the second level of terms is similarly freed from alphabetization; “Nuclear Physics” is given primacy over “Astrophysics”.

The Notation Module also allows the notations to be alphabetical rather than numeric:

A: Physics
0 Aa: Nuclear Physics
1 Ab: Astrophysics
2 Ac: Geophysics
B: Mathematics
C: Chemistry
D: Biology

An advantage of using alphabetical notation is familiarity on the part of the user (an inherent A-Z structure is understood), and using the alphabet necessarily implies a limitation of 26 top terms. Conversely, a disadvantage of using alphabetical notation is that, for larger thesauri, having only combinations of 26 notation elements available may be limiting, and notation for the secondary and narrower levels may become cumbersome and non-intuitive. For larger thesauri, the numerical notation may be preferable. Alphanumeric blendings for notations may be enabled on a project-specific basis.

Prepended notations allow great flexibility in the structure of a thesaurus, allowing it to reflect:

Process structure

Thesaurus structure may now match the steps of a process or workflow, in the order that they are followed within the business or industry. This order can easily be amended or changed, as processes are revised.

Chronological order

Terms may reflect the order in which they were added to the thesaurus, or may reflect a fiscal cycle.

System structure

An annotated thesaurus can prioritize as well as accurately map the structure of a system, from top level to component to sub-component.

Organization structure

Departments of an organizational structure may be placed according to priority, funding, etc.

Filtering levels

Assigning a notation to a term simplifies filtering of levels of information or data against which the thesaurus is applied. Data discovery tools can be set to recognize terms with the same levels of notation (a “3.2” term and a “6.2” term could be mined similarly, based upon the “.2” element) or select which areas of data are examined first.

Security levels

Thesaurus structure can reflect levels of security or access within a system or organization.

Report or manual structure

The terms in the thesaurus can be ordered to reflect a report or manual used within a business or organization; the thesaurus itself forms the index structure of a document. A thesaurus can also be structured to mirror formal minutes.

Frequency weighting

Terms can be arranged within the thesaurus to reflect frequency of term appearances within databases or other material.

Document management structure

A thesaurus may be structured to reflect the document workflow of an organization. Where document forms are repeated across various departments, notation can facilitate document sorting, such as all “.3” terms being extracted together.

Multiple sources or editors

Prepended notations may reflect the source of a term, where several thesauri are being combined, providing ready visual cues for those folding the thesauri together. A notation prefix may also be assigned to each editor involved in creating or maintaining a thesaurus, to easily “brand” work as it is done. (Thus, all terms ending in “.6” may be Editor 1, “.7” is Editor 2, etc.)

Dewey Decimal Classification

The Notation Module allows a thesaurus to reflect the structure of a system such as the Dewey Decimal system, allowing a term like “595.78: Lepidoptera”.

Existing notations/cataloging codes

Some companies have devoted considerable resources in the past to designing and implementing internal, unique notation systems for subject categories or types of documents – essentially, they have designed their own card catalog for their information. While “raw” keyword text search is increasingly used in many settings, companies still wish to keep the notation systems as established. The Notation Module simplifies transferring an established notation system into a taxonomy/thesaurus.

Other advantages of notation

A very important advantage of notation is the intuitive positional relationship when a term record is viewed in isolation, especially valuable when considering terms from very large thesauri. The notation contains information which gives the user insight into the hierarchical position of the term: “2.2.3: Chemical reactions” immediately indicates that this is a third-level term, under the top term annotated with “2.” Notation facilitates a thesaurus being dynamically responsive. As priorities change within a process, system, or organization the thesaurus may easily be changed in kind. As designed to work with Data Harmony, the Notation module supports these changes by automatically allowing the re-numeration of notations – when a branch in the thesaurus is moved, the notations for the term and all its children are changed to reflect the first available open number (or a numbering system may be assigned by the user.) This simplifies re-structuring a thesaurus to reflect the changing needs of a business or industry. Notations may be used to reflect national or regional application or sources of thesauri. A company with branches around the world may have a version or section of a thesaurus for North American usage, one for the U.K., etc. Where this is the case, notation can be used to differentiate from which of the thesauri (or section of a thesaurus) a particular term originates. This also suggests utility in multilingual thesauri.

It should be noted that, while notation allows freedom from strict alphabetical structure, a thesaurus can be both annotated and alphabetized, to exploit the strengths of both systems, as below:

1: Apples
1.1: Granny Smith
1.2: Jonathan
1.3: Winesap
2: Bananas
3: Oranges
4: Pears

Addressing the confusion for non-taxonomists

One common confusion experienced by non-taxonomists is trying to come to grips with the fact that the structure of a classical thesaurus may bear little resemblance to the final form the average user sees. “Translated” through a variety of programs, thesauri typically exist in the background; most users rarely if ever see the hierarchy tree. However, personnel not specifically trained as taxonomists are sometimes called upon to examine or maintain taxonomies, and the structures that taxonomists utilize daily may not be readily grasped by the untrained. An advantage of the Notation Module is that it allows a view of the thesaurus that can directly correspond with the output view. Thus, the Notation View for a thesaurus may be designed as the “User View,” while still making available a classic hierarchical view for experienced taxonomists.

Prepended notations are particularly valuable for users who have entered a thesaurus or retrieved a thesaurus term by way of a keyword search. Depending upon the search engine used, the retrieved term record may have information regarding broader and narrower terms, relationships, etc. But the term record viewed in isolation may not provide sufficient information for the average user to gain insight into the term’s relative position in a large thesaurus. Notation such as “ Forensic science equipment” immediately informs the user that this term is fourth-level. In effect, such notation reflects the “thread” of the term’s branching back to the top term, a structure readily familiar to many computer users. The numerical structure is also familiar to many from reports, catalog listings, etc., and therefore may be more readily intuitive.

Non-taxonomists exposed to thesaurus hierarchies are sometimes perplexed by the equivalence given to top terms, and the lack of primacy of the subject of the thesaurus: “If this thesaurus is about mammals, why isn’t it at the beginning of the top term list?” Similarly, a thesaurus on the steps in a process, while detailing equivalent steps and their narrower steps, may in traditional view be quite bewildering.

For example, the simple thesaurus for the steps involved in operating a lawn mower, while all of equal importance for the process, when alphabetized do not reflect the process flow. An annotated version of the same thesaurus constitutes step-by-step instructions, while still maintaining the equivalency of the terms.