Wide-Ranging Interview Touches on Taxonomies, Term Mining and The Cloud

During her 33 years in the search industry, Margie Hlava, president of Access Innovations, has seen a lot of trends come and go. Today’s changing information environment, with its ever-growing avalanche of data and critical need for better search and more efficient organization of content, presents some unique challenges.

Hlava recently shared her views with Steve Arnold, a technology and financial analyst who has more than 30 years of experience, as part of his Search Wizards Speak series. Access Innovations, Hlava’s company, is releasing an excerpt of the interview.

Hlava described today’s information problems as “popping up like the ‘Whack a Mole’ game.” She explained, “Our clients continually need to know what they have, where to find it and be able to reuse, repackage that information on demand. With the explosion of ‘big data,’ new solutions are needed for transforming, parsing, and accessing information.”

In terms of the future, she cited three trends she believes are most important:  1) the shift to Cloud or SaaS computing; 2) term mining; and 3) the growing “appetite” for taxonomies and other metadata enrichment.

The way the world is doing computing has changed, Hlava said, because we are now supporting people remotely. “There is the cloud (through a browser) or a SaaS approach (using a client on the user machine and main software on the host).” While both suggest a general direction for the future, they do not mean less computer power is needed. “In fact, more computing capability is needed as well as more bandwidth. Since the advent of the iPhone, there are many more dropped calls now that there are more apps and pictures being moved on the internet. The system is struggling to cope with increased data flows,” she explained.

Cloud and SaaS approaches may mean users are freed from debilitating waits for service and project implementation using internal information technology departments. “No one can usurp the marketing department’s servers for a higher priority jobs. New approaches mean enabling the user, not controlling access to data and systems,” she said.

Hlava noted her company is seeing a very fast change from internal hosting to Web hosting of data and applications. She added, “the cloud shift also means we do not have to deal with as many strange internal configurations, firewalls, and other challenges in getting a customer up and running . . . the device they use on the client side can be flexible and small.”

She predicted that the move to the Cloud and SaaS will add greater potential for data sharing and service enhancement, and that there will more options for maintaining security and intellectual property rights.

Hlava defined term mining as a process involving conceptual extraction using thesaurus terms and their synonyms with a rule-base, then looking for occurrences to create more detailed data maps. She said Access Innovations has achieved “excellent results” so far using large data sets. “We process several million records and then, using our methods, we boil then down to trend analyses, data forecasts, and similar types of reports or outputs.”

Hlava stated she also is intrigued by current industry trends such as mashups, data fusion and visualization. “Each can lead to beautiful results making the interpretation of data easier; however, if the data underneath are poorly tagged, then they reflect nothing meaningful. Rotten data gives rotten results.”

She cautioned that other newer technologies like linked data and personalization are also interesting but depend even more heavily on well-formed data underneath.

The growing appetite for taxonomies and other metadata enrichment is understandable, Hlava said. She described search as “like having to stand in a long line waiting to order a cold drink on a hot day. There’s always dissatisfaction because ‘search’ stands between you and what you want. You want the drink but hate the line. I think the reason controlled indexing (taxonomy or thesaurus) is so popular compared to the free ranging keywords is that you have control. They make moving through the line efficient. You know how long the wait is and what terms you need to achieve the result.”

In terms of taxonomies and search, “I think we have just scratched the surface. With good data, our clients are in a good position to do an incredible array of new and interesting things. Good taxonomies take everything to the next level, forming the basis of not only mashups, but also author networks, project collaborations, deeper and better information retrieval,” she concluded.

For the complete interview Hlava gave to Arnold, visit their website.


About Access Innovations – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation and semantic integration. The Access Innovations Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments and corporate clients throughout the world.

About Stephen Arnold and ArnoldIt.com – www.arnoldit.com

Stephen Arnold is a technology and financial analyst with more than 30 years of experience. He has extensive operational and entrepreneurial experience, able to bridge the gap between new ideas and the financial implications of a technology.  He is the author of six books and over 50 journal articles.

Arnold has worked provided technical, financial, and strategic support for many technology projects. In 2000, he helped develop the plan, architecture, and security guidelines concepts for First-Gov.gov, the official gateway to U.S. governmental information. He was a member of the planning team for USWest’s electronic yellow pages. He has worked for a number of intelligence and enforcement organizations, including the US Senate Police and the Office of Management & Budget, among others. He was a member of the team that developed the Threat Open Source Intelligence Gateway, an initiative of a US Federal agency.

Mr. Arnold’s Web log “Beyond Search” is a widely read collection of critical commentary and opinion about information systems and methods available at http://www.arnoldit.com/wordpress.