The siren call of natural language processing has been issued again. In this study, the researchers compared the use of free text searches to administrative codes to see which would give a better indication of safety based on 20 indicators. The authors rightly suggest that instead of reliance only on the notoriously poor check boxes used with discharge orders, the hand-written notes from the discharge nurse or physician might be much more instructive.

There is a huge cognitive leap in this article, though, when they assume that natural language processing (NLP) will always be better than the 20 critical care indicators. NLP Systems are not designed to provide precision and recall. They are designed to give excellent indications of data and gist of meaning and guide the user in discovery. They are often combined with Bayesian systems (or other statistical methods like latent semantic, neural net, vector based) to further enhance the discovery aspects of the search.

However, in the end the searcher will have to prove where and how the data was accessed. This is particularly true when the discoveries lead to court cases. Using ever-changing pointers and word values only goes so far. They do not provide replicable results. They do not provide the same additive results. So if more data is added to the system, the information presented will change. The combination could lead first to excellent NLP discovery and then pinpoint the activities using more accurate Boolean approaches. Use the discovery the 5% of the time that you are in discovery mode trying to figure out the trends, and then use the accurate Boolean (AND, OR, and NOT commands) to zoom in on and collect all of the data related to the topic.

Marjorie M.K. Hlava
President, Access Innovations

Originally published September 19, 2011