The sirens call of natural language processing has been issued again. In this study, the researchers compared the use for free text searches to administrative codes to see which would give better indication of safety based on 20 indicators. The authors rightly suggest that instead of relying only on the notoriously poor check boxes used with discharge orders that the hand written notes form the discharge nurse or physician might be much more instructive.
There is a huge cognitive leap in this article however when they assume that Natural Language Processing (NLP) will always be better than the 20 critical care indicators. NLP Systems are not designed to provide precision and recall. They are designed to give excellent indications of data and gist of meaning and guide the user in discovery. They are often combined with Bayesian systems (or other statistical methods like latent semantic, neural net, vector based) to further enhance the discovery aspects of the search.
However, in the end the searcher will have to prove where and how the data was accessed. Especially if the discoveries lead to court cases. Using ever changing pointers and word values only go so far. They do not provide replicable results. They do not provide the same additive results. So if more data is added to the system the information presented will change. The combination could lead first to excellent NLP discovery and then pinpoint the activities using more accurate Boolean approaches. Use the discovery the 5% of the time that you are in discovery mode trying to figure out the trends and then use the accurate Boolean (and, or not commands) to zoom in on and collect all of the data related to the topic.
Marjorie M.K. Hlava
President, Access Innovations
Do you have any specific examples where machine learning technologies have been combined with NLP? The most recent edition of International Journal of Knowledge ManagementJ shows a study that used pattern recognition technology that out performed COS tf-idf in retrieval of patient information. Seems that using pattern recognition technologies that detect the inherent semantic value of text would be more useful than a Bayesian or LDS approach. Thoughts?