It is not really possible to say what will happen for everyone everywhere, but we can predict within our own spheres of influence.
Semantic Cross Searching as a Service
Employees are the most valuable assets in any organization. The 1994 Chen study determined that 24% of employee time was spent looking for information. In 2016 that number has climbed to 38%. Employees still spend too much time searching for information. They often need to look in many repositories without a cross search function, so it is one silo at a time. Whether those employees are paid minimum wage or a 6 figure salary, there is a cost far beyond the money paid to the employees: government-required employer payments (we pay seven mandated fees like Social Security), enabling the work environment with technology, electricity, telecommunications, office space, etc. – these add up to a significant investment. Bringing the employee “up to speed” via training, learning the organizational culture, etc. also add to the costs. Once up to speed they can be considered knowledge workers.
In general, the cost of an employee to a small business is two to four times the salary. In technology intensive organizations, or those that are heavily regulated, the factor is much higher. If an employee makes $10,000 per year the cost is $20,000 to $40,000. If they spend 25% of their time searching for information, the cost per employee is $5,000 per year. If you have 100 employees, that adds up to $500,000 in lost productivity. If the employees make $100,000 average salary then the cost per year is $5,000,000 for those 100 employees. If they spend 40% of their time searching, then the cost escalates to $8,000,000 per year. What if you could save half that time? The cost savings would be enormous!
The solution is a comprehensive semantic application that will unify information across all organizational silos, making it accessible for research and decision making and enabling knowledge management.
This is why improvements in search, knowledge management, records management, and libraries have such a great upside in the near future. When those knowledge workers described above are finally up to speed and productive, they are worth more to other firms and can walk out the door with immense knowledge in their heads. That investment in manpower is then lost. Knowledge management is in part the systemic effort to capture some of that mobile – as in walk-out-the-door – knowledge. Urgency here will speed innovation and adoption.
Creating Nimble Data Stores
To create searchable collections we must create nimble data stores. The data must be cataloged in some way, but the huge amount of digital material stored in digital archives in most organizations makes this seem insurmountable.
Supposedly unstructured data still has file structure, date stamps, file size, document type, and many other clues to the information held within. Coupling that with a controlled, extensible vocabulary rich with synonymy will surface data not easily accessible now. In 1998 Merrill Lynch suggested that most organizations hold up to 80% of their information within unstructured data. This is a misnomer – it would be better to call it “inaccessible data”. Since this data has the skeletal metadata plus the full text of the items themselves, it is a short step to significant enhancement. Mining the content to add keywords makes that content accessible.
The process is not complex, although the underlying technology is. First, build or adapt a controlled vocabulary – anything from a thesaurus to taxonomy to ontology to authority files – as appropriate. Then adapt an automatic indexing or tagging system to add conceptually appropriate keywords and tags to the content by item or by subsection. This then enables huge amounts of data to gain visibility and inform better decision-making based on facts and trends surfaced by the data.
A comprehensive semantic application will unify information across organizational silos, again making it accessible for research and decision making and enabling knowledge management. A word of caution, though: poorly executed semantic enrichment will only make search worse. Choose carefully.
With the addition of home security you can monitor on your cell phone, Alexa and other systems answering questions throughout the house, Facebook, instant chats, and the like adding to our already constant emails and streams of “news” from the chattering classes, it is difficult to be out of touch. “Unplugged” and “No TV” are increasingly popular types of vacations, but then you have to catch up when you return, which makes re-entry after a week off challenging.
How is it then that people have time to think, consider, or have big thoughts? We are all thinking at the speed of thought as Bill Gates outlined in his book, “Business @ the Speed of Thought”, published in 1998.
In the book, Gates stressed the need for organizations to not consider technology as overhead, but rather as a strategic asset. Almost 20 years later it is a capital asset and has certainly transformed the nature of business. Are we better thinkers now?
We are certainly more connected and generating more data, but we have not organized particularly well. This is the great new frontier and we have to do it personally – organizing photos and personal email – as well as corporately, organizing the knowledge of the firm and its content assets.
Silence is Golden
Un-connected vacations are trendy, but difficult to pull off; it’s almost impossible not to worry about your inbox while breathing in deeply and slowly exhaling your mantra. To make the deep breathing really beneficial requires employees and employers knowing that the information stream is being organized well and that critical, time-sensitive issues are being properly handled. Physical and mental health are directly related to productivity. The brain and the body need rest, but if the brain worries too much about what is, or perhaps more important, what isn’t, happening back at the office, then the brain doesn’t get its rest and results in reduced critical and creative thinking when returned to the job.
This is not a new issue, but we bravely predict it will get worse before it gets better. A key component of knowledge management is knowledge organization and semantically-driven workflows. Based on sophisticated semantic solutions, the ebb and flow of an organization’s information can be organized automatically, including automatic routing. A semantic analysis of a given content item along with workflow business rules will put it in front of an appropriate AND active knowledge worker, not the one at yoga retreat. The result is timely actions and a more relaxed, refreshed returning employee. There is a direct, positive impact on cost, revenue, employee morale, and customer satisfaction.
Seeking “truth” – whatever that means – also requires ferreting out non-truths. Both “truth” and “non-truth” are topics best reserved for philosophers. Volumes have already been written. Braving that, we’ll say the near term future looks bleak. Information of all kinds can be distributed instantaneously and globally. Fortunately, tools already exist to facilitate the truth seeker, and these tools are quickly improving. Semantics improve precision and recall in search. Filters also help reduce volume, but filtering is a two-edged sword. It does cut down on the noise, but it can screen out critical content, too. A search service that purports to know what you want can leave you with the feeling that you’ve missed something. Withholding information is as much a non-truth as an accidental mathematical error in a report. Relevancy is a failed strategy. Stick with improving search through taxonomic and semantic enrichment because this approach improves precision and recall. Precision and recall do a much better job of reducing the overload, and do so using unbiased, impartial techniques, thus avoiding the downside of filtering.
There is much written in the press suggesting that associations will diminish. Information and library organizations seem to struggle to keep attendance and volunteerism on a growth curve. As the cost of airfare, hotels, and registration fees go up, conference attendance goes down.
Interestingly, people in computer sciences, optics, and cancer research, for example, are expanding their meetings, as are other fast-moving industries. Overall expectations range from 2.7% to 3.8% increases in attendance depending on the forecaster, but it is definitely on the increase. I believe the composition of associations and their meetings will change with the times, but the instinct to get together and discuss things is quite basic.
Publishers and other holders of content like government organizations, corporations, and others are realizing that there is a great deal of useable material in their archives, if they could only search it. Creating searchable repositories and using federated search to cross content silos and then enriching the content with hooks like content tags or keywords is growing logarithmically. Whether it is creating triple stores, ontologies, or taxonomies, the key is straightforward, consistent tagging of the corpus. The popular method in which this is done swings like a pendulum from natural language processing (NLP) to rules-based and back again. At the moment, rules-based is ascendant since it is replicable and the searcher can find the same information they found in a previous search – plus the most recent updates. On the other hand, NLP options from Bayesian, neural nets, vector based, and others have to be reset with each addition of new content.
There is and will continue to be a proliferation of standards, especially for identifiers. Rather than have a class of identifiers with subclasses, the industry seems to be on a trajectory to add additional identifiers for everything. Originally, the digital object identifier (DOI) might have filled this role. DOIs cost real money to add, but in spite of the costs, numerous groups have come together to build new sets of identifiers for a seemingly unlimited number of objects and object properties. This proliferation will continue, and while identifiers are very useful, when there become too many, then costs rise and user confusion ensues. Third-party cross-referencing and reconciliation services may be an up and coming mini-industry as well as intelligent apps to automate the assignment of identifiers.
Go Boldly Into 2017
There is a lot ahead in 2017 and we can predict that much of what has already taken place will help shape the future. That is why trends can be seen, if not actual events. The volume, velocity, and variety of content has been accelerating in accordance with Moore’s Law and will continue do so. Work has been going for nearly a hundred years to combat the challenge of the information avalanche; the print version of “Psychological Abstracts” began in 1927. Services continue to proliferate online and the knowledge gained in producing these valuable reference services has been advancing rapidly in the digital age. They are applicable to any organization’s content avalanche. We encourage you to start now because – look out – the avalanche has already started.
Photos, https://commons.wikimedia.org/wiki/File:Magnifying_glass_with_focus_on_glass.png; https://www.flickr.com/photos/30478819@N08/30478364625; https://www.flickr.com/photos/norgesbank/9573447091