I would like to make some observations about statistics-based categorization and search, and about the advantages that their proponents claim.
First of all, statistics-based co-occurrence approaches do have their place. For wide-ranging bodies of text such as email archives and social media exchanges, and for assessing the nature of an unknown collection of documents, a well-defined collection of concepts covering a pre-determined area of study and practice is not possible. For lack of this foundation, and for lack of other practical approaches, attempts at analysis fall back on less-than -ideal mathematical approaches.