Most of you who have studied library science or information science are familiar with faceted classification as developed by Indian mathematician and librarian S. R. Ranganathan (1892-1972). His main contribution was the colon classification system, the first faceted classification. It allowed multiple multiple classifications to be assigned to an object, rather than a single pre-coordinated taxonomic designation.

Dr. Ranganathan is still pretty popular in the search world, especially in e-commerce. The Endeca faceted search module is popular with online marketers, because they are looking at lots of different kinds of ways to filter data.

In the case of Endeca, their biggest usages are in retail. So, if people are looking for online ordering, then their system is generally going to be an Endeca search system. That’s because you have one object:  A shirt that comes in five colors; it comes in four sizes and maybe there are some other attributes to it. You want to be able to search on any one of those classifications and get the same shirt.

So, I want it in a women’s size whatever, and he wants it in a men’s size whatever. He wants it in blue and I want it in green. You can make those orders with the same general properties. The shirt classification has a lot of sub-facets to it. So it is searching all of those different facets, which we know as size and shape and color, and get it ordered. It isn’t a single taxonomic list; it is a lot of sub-lists that identify that object.

In the taxonomy, you could have built each of those out as a separate branch. More likely, you would build them all as little taxonomies that are separate, because they are basically little pick lists or authority lists. Any one of them is consistent.  If I want women’s clothing on L.L. Bean, I am going to click on Women’s Clothing, and then the website is going to tell me that it has pants, shirts, and other things, and then I can choose from those. They are facets. I am going down a hierarchical approach to them if they are in Ranganathan. This is called faceted search. I can click through and get ever more detail.

In Lucene, which is an open source search system, I can do the same thing but a bit differently, because here the facet count is giving me the individual item. I have a lot of different facets, so I can search for the manufacturer and from a drop-down list of manufacturers. Each of these narrows search. I can go by price, or I can go by resolution or by zoom range, and all of those things are just narrowing down my search.

Once again, I think we can see the influence of Boolean logic in search.

Marjorie M.K. Hlava, President, Access Innovations


Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.