Measuring Accuracy in Search

When people talk about how accurate the search is, there are lots of different ways to measure that. This list indicates some of the ways that we talk about measuring accuracy.

Relevance is one of those ways to think of accuracy. That’s the computer system’s guess from what they know about you, and from how likely it thinks that this set that they are presenting to you in answer actually meets your query. They are guessing that it’s the right system for you.

Recall is the measure of how many of the records in the database match your query versus how many you actually got. If you are that smoking gun detective or that doctoral student or that patent researcher, you want 100% recall.

Precision is the percentage or ratio of the units returned that actually meet your query. So, if you see that first screen of ten, if eight of them are really something you are interested in, then you have 80% precision.

The information technology societies used to focus on precision and recall. That is what we talked about back when. Now we don’t talk about that. Now we talk about usability studies and other things. If you look at the literature from the early 1990s back to the mid-1980s, you’ll find many articles about precision and recall. You don’t find much of that anymore.

Accuracy can be measured with hits, miss, and noise, which is another way to think about it. Hits are what a human searcher wants that the computer shows, that is, satisfactory search findings; misses are what the human should have been shown, but the computer didn’t show you; and noise consists of results that the computer showed you that you didn’t want – a false drop.

Ranking is another way of showing you the results – putting the ones that are thought to be the best first. This means that the search system is guessing which results are most likely to be accurate, and which are more likely to be accurate than others. There are lots and lots of ways to do that.

Linguistic analysis involves profiling you – the user – and coming up with the right answers.

There are some other ways that people measure accuracy, and one of them is how fast the query came back to you. What is your processing speed? Is it in nanoseconds or is it in 30 seconds? How fast is the search response to the user? Users are really spoiled and impatient, which is why you have all of that cache building going on behind the scenes. The users will think they are getting it all at once, but actually they are only getting what they are reading on the scene. The rest is queuing up while they wait.

That’s part of results processing. If I’ve run a query, how long does it take to build that cache to give me a full answer? How fast is it going to display my results? If you have an SQL system, for example, is it building that response on the fly? Can you refine the search? Can you do a search within the search? Can you do a re-purposed search that will give you more information?

Other kinds of business rules are used to measure the accuracy of search and how well it meets your needs.

Relevance is a measure of how well the documents answer your needs. It is a very subjective measure. It is different for different user communities. It really depends on the information resources and the tension between the user needs and the context available. So, it might be the case that there is nothing relevant to your search in the entire corpus.

In the old days, when we talked about precision and recall, everyone said that relevance was the confidence rating. Then Google came along and everyone said “What about the relevance?” Some people would say that relevance is a canard, that it doesn’t mean anything. However, Google kind of changed that perception so that everyone is now looking for the relevance. Some people would say that relevance is a result of precision of recall. The ones that had high precision/high recall – those are the ones that are really relevant to you. Those algorithms have also gone out the window, and relevance is now measured in many different ways, but not with an algorithm for precision of recall.

There are formulas. I’m going to let you study these.

These are the traditional definitions of recall, precision, and relevance. These are the ones held by the ASIS&T community. There is is a great deal of information written about these formulas – practically all of it prior to 1990.

In measuring relevance, there are a lot of different algorithms that people use to come up with the percentage. A lot of discussion has been held on relevance but, in the end, it is our confidence in how well we think a particular answer to the query meets your needs.

In a future installment, we’ll look at different kinds of search.

Marjorie M.K. Hlava, President, Access Innovations

Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.

Measuring Accuracy in Search