My Dad would always object strenuously whenever we said “It broke”. It did not break. Something caused it to break. It should be no surprise that the Japanese language was a special interest for him and in particular the difference in causality. Living in Spanish speaking New Mexico, I am often struck by the same lack of connection to agents of change. So I was particularly affected by the July 24 article in the Wall Street journal about different cognitive paths driven by the language we speak. 

Language is closely tied to learning styles and vice versa. In our complex world we believe just because someone looks like us they will think like us. I would submit that language is part but not all of that disconnect. It is also the way that we organize and categorize information as a culture that drives our thinking. In the US and the UK we follow the outlines of knowledge set forth by John Knox and we think and gather thoughts and teachings along those lines. These outlines were codified further by John Dewey and Charles Ammi Cutter into the library classification systems widely used today. Other widely used systems such as the basic Rubricon written by Lenin, outline the world as he saw it and is still widely in use in Russia and related countries today. The basics of the way they think, following a subtlety different outline of knowledge means that we can believe we are communicating but not talking about the same thing at all. I believe this is a very basic failure of our intelligence analysis systems today. We assume that the automated means we have for parsing English means the same things to all peoples. It does not. 

Language, culture and our very way of organizing the information at hand will lead to very different discoveries and conclusions. The same body of information written the same way but organized differently will lead the researcher to different conclusions. The algorithms underlying the data mining programs should be very well understood by the user so they are not misguided. Compounding the situation is that when new data is added to the corpus of information all the vectors change and the organization of the information provided the first time is not the same, nor is the same data returned.

This “on the fly” clustering is useful for discovery – a new perspective, but dangerous for continued research where persistent clusters gathered from a known and replicable method are needed to track information over time. 

Access Innovations is one of a very small number of companies able to help its clients generate complete, ANSI/ISO/W3C-compliant taxonomies to increase findability of information within or across large collections of information, structured in databases, or unstructured in content repositories using controlled vocabularies. These activities are not unique to a single country or language, but rather shared and active globally.

Margie Hlava

Margie Hlava
President, Access Innovations