Search as Big Brother, Molding What You See and Think

May 30, 2011  
Posted in Access Insights, Featured, search, Taxonomy

A recent TED presentation is by Eli Pariser.  He is the author of “The Filter Bubble: What the Internet Is Hiding From You.” A new and very interesting book. His talk is a synopsis of how the Google personalization algorithms affect search results. Google results are influenced by your own search history and other online activity. Systems such as Amazon, Yahoo, Bing, and eBay depend heavily on personalization to serve you results. Traditional databases do not use profiles (yet) but they are often based on Verity, Vivisimo, Autonomy, FAST, and other mathematically based search software so they could and they do serve up different results whenever the vectors are reset – that is, every time additional data is added to the system with updates or metadata enrichment.

Pariser is rightfully concerned that the personalization practice of search providers, used to steer viewers to specific content that the search algorithms think will be of interest, is not transparent. This is “The Filter Bubble.” He finds it is a threat to democracy.  He fears it will “increase polarization and limit engagement across ideological lines.” I think he is right!

It might be useful for users searching Amazon and other purchasing sites to have the system suggest topics and items of interest. But when it comes to searching for actual information on a new topic, it is not useful to have Google or Yahoo supply results based on the Bayesian vectors and co-occurrence algorithms it has collected from your past inquiries to only post things that match your recent search profile.

Pariser wrote an editorial published in the New York Times on May 23, 2011.  There he outlines how limiting the views people see, by filtering using personalization of news and information, is a direct threat to democracy.  The filtering restricts exposure to differing viewpoints while increasing the presentation of content that agrees with the  users’ regular profiled behavior, sentiment and preferences.

Mr. Pariser’s background is summarized nicely in Wikipedia.  Talk about people who would want to “steer” the discussion one particular way! Eli Pariser, the president of the board of MoveOn.org, has a clearly political agenda that he would like to share with others.  It is interesting to have people in power know how to skew the results of a web search, especially someone who helps direct a political campaign. Since it will soon be election time, perhaps something on how the media and search impact peoples’ thinking would be a good program. Pretty scary.

Unfortunately, this is not a new area or bit of knowledge. It is well known that you can trick the vectors in the statistically based search systems. It has been something we have mentioned frequently when trying to point out the differences between Bayesian and Boolean based search software. I have in the past talked about it in terms of precision and recall, versus someone else’s estimation of what is relevant to me. The confidence factor (relevance) in returned search results.

It is pretty easy to “trick” a Bayesian engine. Steve Arnold documented this in his discussion of personalization and “shaping” content to push ads in his 2005 book, The Google Legacy. Let’s see, that was six to eight years ago. Microsoft does the same thing.

Our countries’ intelligence organizations depend heavily on these statistics based systems.  Latent Semantic Indexing, vectors, neural nets, co-occurrence engines, all depends on mathematical models to find the answers. All you have to do is to know the algorithms to point the system away from you to other answers.  The analysts are depending on the data presented by the systems. It is straightforward to weight the responses with chatter so they miss what is important.

That is why we are such strong supporters of taxonomies, controlled vocabularies and Boolean search at Access Innovations. The answers cannot be tricked by resetting the algorithms of the search software retrieval engine.

By the way, the work-around so you are not tracked and “personalized is:

* Delete your history

* Do not log in to Google

* Use a controlled vocabulary to tag your data. It costs less, is more predictable and provides the full results, not just those “tailored” to you the searcher.

Marjorie M.K. Hlava
President, Access Innovations