We know of one Sanskrit thesaurus, the Amarakosha (Treasury or Dictionary of Amara), written by Buddhist scholar Amara Simha around 375 or 400 AD. Interestingly, it was written in verse. (The only other terminology in verse I can think of offhand is the one-L lama one by Ogden Nash, but I don’t think that bears comparison to the Amarakosha.)

Reportedly, the Amarakosha was almost lost to posterity. As the story goes, Amara heard of a highly respected philosopher who was traveling and wanted to engage in debate with him. Amara freaked out and burned his manuscripts to avoid the other philosopher’s scrutiny. Fortunately, the visiting philosopher snatched the thesaurus from the flames. So now, Indian schoolchildren learn to recite the verses of the Amarakosha from memory. And Sanskrit scholars still study the text.

Jumping way ahead, we inevitably encounter Roget’s Thesaurus written in 1805 and first published in 1852 as the Thesaurus of English Words and Phrases. (The full title was actually Thesaurus of English Words and Phrases Classified and Arranged so as to Facilitate the Expression of Ideas and Assist in Literary Composition.) The Roget behind the thesaurus was Peter Mark Roget (1779-1869), a British physician who battled depression by making lists, including the thesaurus.



As most of you know, Roget’s Thesaurus is still being updated and published on a frequent basis, and is widely used by writers. What you might not realize is that it is hierarchical to several levels, and so may be regarded as a classification system. In fact, according to the Wikipedia webpage on Roget’s Thesaurus, “The Wikipedia “category schemes” … are based on the classification system of Roget’s Thesaurus, as evidenced by the outline from the 1911 US edition.” This puts Roget’s in the tradition of hierarchical classification systems that are also thesauri. The practice is so predominant that “thesaurus” now generally means a taxonomy with synonyms (along with other annotations and relationships).

The computer age brought further developments, many of which we’ve covered elsewhere in this series. One area of development was that of guidelines and standards to promote successful information retrieval. And one major landmark in that area was the set of guidelines created by the Committee on Scientific and Technical Information (COSATI) of the Federal Council on Science and Technology (which evolved into CENDI). COSATI developed the guidelines in conjunction with creation of the Thesaurus of Engineering and Scientific Terms (TEST), published in 1967. Except for a DuPont thesaurus from the 1950s that was not widely disseminated, TEST may have been the first thesaurus created with the goal of computer-based information retrieval.

Since TEST, numerous thesauri, large and small, have been developed for information retrieval. National and international guidelines and standards have emerged. And new technologies, formats, and methodologies for creating connections among terms have appeared. It’s a brave new world.

Marjorie M.K. Hlava, President Access Innovations


Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.