Facets Help Move from Search to Found

December 8, 2010  
Posted in Access Insights, metadata, Taxonomy

by Alice Redmond-Neal, Chief Taxonomist, Access Innovations, Inc.

In the “everything old is new again” department, facets are all the rage in the taxonomy world! What are facets, where did they come from, and how do they relate to familiar taxonomies and other search support strategies?

What are facets?

Facets are terms that describe some aspect of physical or digital objects. They enhance a taxonomy’s descriptive capacity by going beyond simply naming categories to capturing additional dimensions of the item being described. Facets enable a searcher to set a frame of reference, establishing priorities reflecting the ways a searcher may think of an object. They provide a way to filter a potentially huge group of search results by a set of qualities so searchers can discount those items lacking the qualities they want. Combining information from different facets helps a searcher narrow down a domain to focus on a manageable set of results, getting to the right object or piece of information faster and more efficiently. Facets provide the foundation for faceted browsing or guided navigation.

Some examples:

Clothing may be divided into categories for coats, pants, shirts, and so on, but could also be cross-categorized as men’s, women’s, or children’s, or by color, price range, composition, and so on.

Automobiles can be grouped by model, manufacturer, fuel type, new or used, and more.

Movies might be described by genre, featured actors, director, release date, setting, era, etc.

Research articles could be categorized by their specific topic, and by their discipline, author, country of publication, journal or conference proceedings, and date range.

Corporate content could be sorted by primary topic, but also by corporate department, intended audience, market or sector, stakeholders, security level, workflow phase, and so on.

Facets are metadata, i.e., words that describe data. The labels for the facets and their values are a controlled vocabulary predetermined by a content manager to be valuable for describing multidimensional information in a subject area. Facets used with one subject area need not be exhaustive and are not necessarily useful for another subject area. Facets with broad potential use include the following: part, property, material, process, operation or action, agent, product, by-product, audience, company or producer, activity cycles, geography or location, document type, category, and subject matter, topic, or domain. The Dublin Core metadata elements constitute another set of facets. The list of possible facets is infinite. Identifying the facets that would be useful for a particular purpose and audience is the content manager’s challenge.

Ranganathan and Fundamental Facets

The concept of facets is far from new. It is attributed to S.R. Ranganathan, an innovative Indian librarian who, in the 1930s, proposed classifying information according to features he called Personality, Matter, Energy, Space, and Time (PMEST). These fundamental facets, as he called them, support a detailed description of an item, addressing its primary “aboutness,” its physical composition, processes or activities, location, and place in time. Ranganathan envisioned the generic PMEST facets to be applicable for virtually all library content, but the ever-expanding diversity of information demands more specific and customized descriptions. His analytico-synthetic method brings together analysis of an item by its facets and the synthesis or postcoordination of facet information for a rich, multidimensional description of an item. Beyond the sparse original facets, an additional limitation was that, bound by the physical constraints of library organization of the era, the system required these facets to be applied in a certain filing order or sequence. But modern technology capabilities have taken the potential value of Ranganathan’s fundamental facets up several quantum leaps, supporting numerous and tailor-made facets applied in any order a searcher chooses.

How do facets work?

Each facet relates to a single dimension or attribute of an item being described. Facets are mutually exclusive, clearly distinct from each other. No single value in one facet can logically appear in another facet, since each facet represents a different attribute. An item’s price can appear only under the Price facet, not under any other facet. Facet values can be arranged in a hierarchy to show general and more specific values, such as Location>Country>City, but they cannot be polyhierarchical with a value located in more than one facet list.

Facet values can also be postcoordinated. A searcher can use as few or as many facets as desired in any combination and any sequence to describe an item and narrow search results. A book may be described by its subject matter, author, publication year and country, and more. Multiple values could apply for language, such as English and Spanish, and for format, such as paperback and hardcover.

Facets are most useful for narrow, well-defined domains. They are relatively simple to construct to describe tangible objects. It is somewhat more difficult to identify useful facets for subject areas consisting primarily of fuzzy, abstract concepts. Facets for an e-commerce application will be very different from those for enterprise content management.

Where does facet information go?

The attribute information of facets can be incorporated to enrich content as database fields or as taxonomy branches.

Capturing facet information in database fields results in field-formatted data with entries drawn from authority files. Information from several fields can be combined using Boolean operators AND and OR to create the desired mix. An example of this strategy is the Power Search at www.MediaSleuth.com, where a search can look for items using up to four kinds of attributes combined with a keyword. The search draws on entries in separate fields of the underlying database. Capturing attribute information in separate, dedicated database fields leaves the taxonomy to focus on describing subject matter. This hardcoded strategy offers strong advantages and leads to less potential confusion, but may not be feasible for existing databases or with changing attribute information.

Facets applying to a specific taxonomy term can be represented under that term, in a Broader Term/Narrower Term relationship. This approach works well for facets that vary with subject terms. On an e-commerce site, all items have a price, but the price ranges for hand tools, in groupings up to $400 on www.Sears.com, would be unsuitable for kitchen appliances costing up to several thousand dollars. The facet label “Price range” is appropriate for both, but the values would differ. In a taxonomy presentation, facets directly associated with particular terms are shown as typographically distinct, often in Italics and in brackets or parentheses, e.g. [by special properties], to indicate that the facet indicator is not a postable term available for indexing. The facet values that follow, e.g. Fast preparation or Low fat, can be used for indexing.

An example shows a faceted breakdown of paint:

-Surface coatings
–Paint
[by composition]
–Acrylic paint
–Oil paint
[by finish]
–Gloss finish
–Matte finish
–Semi-gloss finish
[by location]
–Exterior
–Interior

In an online environment, facets applying to a particular item are often displayed as a pick list, showing for that item only.

Faceted classification can also be approached by capturing the facet information as main branches in a taxonomy. This approach is best suited for domains where the same facets and range of values could be applied to most or all items in the domain. Facets organized in this way can be added and updated at any time for prospective value, requiring modifying only the taxonomy and not retrofitting the database. It is the strategy of choice when content is not collected, organized, and maintained in a dedicated database.

For its database of educational research articles, ERIC® uses document language, geographic origin, target audience, and document type—a basic list that might apply for most article databases. Ei’s Engineering Village 2® uses facets to capture controlled vocabulary, document types, language, year of publication, and publisher, as well as most cited authors and classification codes. The Art & Architecture Thesaurus® (AAT) uses seven facets each presented as distinct branches with a total of forty hierarchies or facet subdivisions. The AAT’s Objects Facet pertains to the item being described, and other facets apply to physical attributes, styles and periods, materials, and so on. The terms of the AAT serve as building blocks that, when postcoordinated or taken as a group, describe the item and its attributes. Every item in the Object Facet of the thesaurus could be described with some or all of the other facets.

How do you create facets?

Facet analysis starts with a consideration of what the items in a collection of information have in common. This basic question is the same for tangible products as well as documents in a digital library or corporate knowledge management system. Examine enough documents that are representative of the domain to get a sense of how they could be described. What features can apply to all the documents or to large subsets of them? Adopt a searcher’s perspective—facets should describe features that matter to searchers. Imagine getting a search results set of 100 documents—what features would be helpful for narrowing down your selection? Price range, size, color, manufacturer? Author, department, audience, format? Any concept that repeats in connection with multiple taxonomy topics is a good candidate for a facet, e.g. product color, size or dimension, audience, grade.

Just as with taxonomy terms, facets and their values must be clearly distinguished from each other, and their labels must not be confusing or open to interpretation. Facets should be mutually exclusive, not overlapping in coverage. Labels for facets and attributes should be clearly understood by the intended audience. Facets should refer to attributes that are lasting and reliable, not transient.

Aim for not more than five or six facets in general, supporting reasonable specificity but not overwhelming a searcher with too many attributes to juggle. However, in a highly detailed domain where specifying numerous essential attributes would help to narrow down to the appropriate choices, more facets can add valuable strength and flexibility for search and may be necessary.

How do you implement facets?

Faceted information can be made accessible through a browsing interface with menus and pick lists, or by advanced search features. Some systems also offer a keyword search box, enabling the searcher to start with a general initial query and narrow results using facet choices. Searchers can then pick which aspect of an item to focus on, depending on personal priorities, not being stuck in a rigid hierarchy format that may not reflect their way of thinking. Some implementations of facets support progressive filtering of dimensions, narrowing down the results by focusing only on content with the desired features and filtering out unwanted features one at a time. Others enable the searcher to set facet choices once in a single search. Ideally, no search leads to an empty results set. The details of a site’s information architecture establish the parameters of search, navigation, and results display.

It may be possible to adapt existing systems to take advantage of faceted information by assigning it to defined fields in a relational database or by using XFML (eXchangeable Faceted Metadata Language) for facets represented in a taxonomy. XFML enables tagging each entry as a facet topic or facet value with its parent topic. Dedicated software applications specifically designed to present content with faceted descriptions include Flamenco (an open source search interface), Facetmap, Endeca, Siderean, Dieselpoint, and Rawsugar.

A well-built taxonomy can provide the foundation for faceted navigation by representing topic categories supplemented with facets and values. Data Harmony’s Thesaurus Master is an example of a taxonomy/thesaurus management software that supports creation of faceted taxonomies.

Thesaurus Master offers the user two ways to capture facet information. One is by adding taxonomy branches, as described above. An alternative is to mark subject taxonomy terms with one or more facet topics and values. This incorporates facet information directly with the subject term in its term record and saves it with customizable XML tags. The facet view displays all terms sharing a particular facet entry. This approach is also effective for marking and then displaying subject terms with numeric or alphabetic category codes.

Facet forward

Three quarters of a century after Ranganathan proposed his fundamental facets and analytico-synthetic method, the power of his vision has largely been realized. Facets are found on virtually every e-commerce site and in a rapidly growing number of knowledge management taxonomies. Incorporating facet information enriches a descriptive vocabulary and empowers the searcher’s priorities. Whether for identifying a men’s navy raincoat in size large or finding a marketing department spreadsheet on new product promotions, facets comprise a valuable addition to effective strategies for organizing and finding content.