The link between business and information technology is the data, information, and process assets that are stored and automated through technical tools. This blog suggests the first steps toward governing and managing these important assets before tool implementation, helping to avoid the too common “graveyards” of expensive, underused tools.
Identify business critical information and data
In order to get past the confusion of rapidly evolving types, formats, risks, and tools, first identify the most important information and data assets for your organization and start treating them like assets. These assets may already be known but not documented, or identifying them may require chartering and funding a project. Critical information and data assets vary widely across organizations and departments. They need to be based on the core products, expertise, and risks of an organization, which also may need to be identified. For example, data from production machinery and its interpretation could be an unrecognized competitive asset. In other cases, information and data may not yet be regarded as critical assets, but regulatory scrutiny may be about to change that perception.
The identification and listing of important information and data assets should include brief descriptions, the most recent owner, and a relative value. This high level overview is intended to enable discussions about assets, prioritization of work and investments, and the creation of general policies. It should not be confused with the detailed, time-consuming asset management inventories for which records managers and librarians are trained. It is, however, the first step toward governance and “thoughtful localization and organization,” proven techniques which can later be the basis for advanced management techniques such as developing and using metadata, taxonomies, and controlled vocabularies. The overview can employ simple, existing tools such as a spreadsheet or database that can aid in analysis and produce reports.
The first goal is to initiate discussions about how information and data assets support organizational strategies and to determine what governance and management programs are needed. Governance is the exercise of control over multiple operations through accountability frameworks and priorities. It may take some time to build out all the needed policies and measurements regarding decision rights, alignment, and communication, but the discussions will get the work started. Management, which is the exercise of control over day-to-day operations, decisions, work, people, or things, will come later and will comply with governance policies.
Assign an Information and Data Governance Focal Point
Responsibility for information and data governance needs to be assigned if progress is expected, even if the organization is not ready to fund a full-scale program. A part-time person can be responsible for the information and data asset list, act as an authenticating gatekeeper for changes, and make sure that it is discussed at appropriate high-level meetings. With a little bit of additional time the assignee could set up and publicize a mail box or shared site for collecting issues, ideas, and needs, compile them, and recommend projects that are worthy of investment.
The steps above are the beginning and will help to determine where effort, investment, and tools can be justified and what should be accomplished. Much additional work is needed to realize more significant competitive advantages, provide complete functional requirements for tools, and meet regulatory requirements.
Keeping in mind the principle that information is best understood and used by its primary users, governance, standardization, normalization, and coordination may be needed across departments to achieve strategic quality and integrity goals. In addition, specific, detailed, ongoing programs and organizations may be needed for information and data management, funded to evolve, grow, and change as uses, formats, and values fluctuate with business and regulatory changes.
Incrementally, over time, or as the result of concentrated, planned projects, a deeper understanding of needs can be achieved, and more advanced management techniques can be justified, funded, and adopted. Examples include more advanced techniques for asset valuation, cooperative metadata and vocabulary adoption and use, development of competitive information and data techniques, and strategic asset based service level agreements with vendors and operating level agreements with internal groups.
In most cases, over time, it will be beneficial to incorporate principles, standards, and best practices from a variety of complementary disciplines which have found successful ways to deal with the issues – records management, information science, library science, ISO, ANSI, related industries, project management, organizational change, and COBIT and ITIL frameworks for IT governance and management.
Watch future blog postings for more details on this subject.
Judith Gerber (guest blogger), JGG Enterprises
Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.
Heather Kotula, a long-time employee of Access Innovations, Inc., has recently been promoted to the position of DHUG (Data Harmony Users Group) Meeting and Marketing Coordinator. Heather is one of many faces of fresh, young talent at Access Innovations, and her promotion can only mean good things for the company.
Ms. Kotula started with Access Innovations in 1995, and has since filled many positions within the company and seen it grow over many years. She has worked in finance, as a project manager, office manager, and Vice President of Operations. Her versatility in her early years at Access Innovations gave her a strong background and knowledge of the company, and these have translated into new marketing ideas and skills that will propel the company forward.
Ms. Kotula has coordinated the past three DHUG meetings, and her new position within the company puts her at the forefront of new ideas for the annual DHUG gatherings. These meetings include two days of case studies and presentations and three days of software training. Access Innovations’ clients from around the world attend to present case studies, get training in the use of the Data Harmony software, network with other users, and become acquainted with the team at Access Innovations. As well as putting these meetings together, she handles and oversees the marketing endeavors at Access Innovations. She is always working to keep the company up to date, as well as providing ways to communicate the company’s products and services to the world to benefit others.
“The changes in the world of information over the past 20 years are astounding,” commented Ms. Kotula recently. “Since the founding of the company in 1978, Access Innovations has been preparing and waiting for the ‘Information Age’ to arrive. We have the best tools and an unparalleled breadth of experience in taking information from archive to actionable asset.”
Heather received her Bachelor’s degree in Distributed Foreign Languages from the University of New Mexico in 1991. Before she received her degree, Heather also attended the Goethe Institut in Munich and took German language classes there, as well as attending the Scuola Dante Alighieri in Florence, Italy where she received a Certificate of Fluency in Italian language. Heather received her Master’s degree in Business Administration from New Mexico State University in 1995.
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. The Access Innovations Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
As an IT process and governance consultant, I see a large number of software tools for managing knowledge, information, and data. The choices between vendors and in-house development options seem endless. Evaluation can be challenging because descriptions of purpose, use, and comparative approach are often not clear, standardized, or based on research.
Recently I heard a well-known software vendor describe one of its popular and respected products as being able to deliver far beyond its designed capabilities of content tracking and retrieval across multiple platforms. It was described as also able to “accelerate, automate, and maintain compliance with core business processes.” There was no mention of the significant management work required to make this happen. Most organizations are not even close to the needed level of defined processes, policies, measurements, and organizational roles.
I also recently heard a talented technical manager describe how he developed a taxonomy for his specialty, but he was unaware of existing taxonomy tools, standards, or related taxonomies with which his proprietary tool needs to interoperate for long-term success.
I am part of the IT community where management and historical perspectives are not always adequately evaluated before a technical solution is considered. Consequently, I have seen far too many graveyards of expensive tools that never met their potential and were discarded.
My heart goes out to those who feel overwhelmed by a need to do “something” with increasing numbers of content types, formats, devices, and security or litigation risks. Growing amounts of content, vaguely written regulations and laws, and nebulous but formidable concepts like “Big Data,” “The Cloud, ” and “Dark Data,” add additional complexity. It is understandable why an easy one-tool solution is attractive. Nevertheless, it is not productive or necessary to keep buying more expensive tools that will only be discarded. There are better ways of addressing the problems.
Establish Governance and Management Processes First
Establishing basic governance and management processes for knowledge, information, and data is essential for informed decisions about tool purchase, configuration, coordination, and ongoing content viability and validity. The seemingly simpler choice of following what a tool vendor suggests for processes usually complicates the enterprise business environment. Designers of general and industry tools have no way of knowing specific business details or organization strengths, which are often competitive advantages.
Use What is Already Known
Having “too much information” is not new. Humans have survived by processing large amounts of information in the subconscious brain while concentrating the conscious mind on the most pressing external business. Libraries were begun as shared repositories and the beginnings of thought about managing “too much information” by around 2,500 BC.
Marjorie M.K. Hlava, President, Access Innovations, states in a May 27, 2013, TaxoDiary blog post, “We librarians and information specialists get to view anarchy in the universe more often than other people do. And we are the ones who have the job of putting the universe into some sort of order. With a thousand points of knowledge…”
What has been learned by facing “a thousand points of knowledge” head-on applies to the terabytes, exabytes, zettabytes, and yottabytes we now face. They all seemed boundless when first encountered, but can be bounded with thoughtful localization and organization.
During my earlier career as a corporate librarian and records manager, I learned information science models that combined thousands of years of thought with current research and technology for addressing “too much information.” A few examples paired with initial action steps follow:
- Information is best understood and used by its primary users
Define core organizational products, areas of expertise, and risks. Focus and fund knowledge, information, and
data work in these important areas.
- Information is a definable, manageable asset
Define the knowledge, information, and data assets needed to produce core products and maintain expertise,
dividing the assets into manageable, coordinated groups.
- Managed metadata can describe information so that it can be found and/or linked
- Controlled, agreed vocabularies in areas of specialty can greatly enhance retrieval
Define and assign organizational entities and roles to make and maintain the asset definitions, decision rights,
priorities, growth needs, agreed metadata schemas and vocabularies, measurements and reports.
- Information does not follow the rules of thermodynamics – it grows when it is used
Plan for growth, change, coordination and interoperability by using existing standards and making use of what is
Watch future blog postings for more details on this subject.
Judith Gerber (guest blogger)
Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.
The bell struck twelve.
The Phantom slowly, gravely, silently, approached. … It was shrouded in a deep black garment, which concealed its head, its face, its form, and left nothing of it visible save one outstretched hand. But for this it would have been difficult to detach its figure from the night, and separate it from the darkness by which it was surrounded.
(From A Christmas Carol in Prose; Being a Ghost Story of Christmas, by Charles Dickens.)
And so it is with emerging concepts, those concepts whose forms we can but vaguely discern at the present point in time, whose true reality lurks in the future.
As taxonomists, we have a responsibility to discern those future concepts, although they may still be invisible to most. We can save the various expressions of those concepts in search logs from being rejected from consideration for a vocabulary simply on account of their as yet infrequent appearance. In a taxonomy or thesaurus, we can provide labels that will consolidate the indexing for a concept whose researchers have not yet settled on a name. In some cases, especially with widely used vocabularies, we can perhaps determine the name by which a concept will be known on a standard basis.
This role in itself is one of the emerging responsibilities for taxonomists, thanks to the rapid advances in science and technology. In “What Next, Taxonomy?” (posted on The Taxonomy Blog on November 4, 2011), taxonomist Marlene Rockmore concludes that taxonomists need to deal with emerging technologies in a variety of ways, including collection of relevant content:
“So what next, taxonomy? What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics and challenges – how to distribute and decentralize (localize) while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data. Phew! That’s a challenge. Taxonomists have a chance to build relationships not only between terms, but with stakeholders on the way to a compelling, visualized, multidimensional content strategy. Good luck.”
This challenge has been growing in step with the rapid advances in science and technology. One example among the many advances in science is the ability of biologists to recognize new and emerging species, as well as life forms that have existed for a while but were formerly overlooked. The Live Science page Newfound Species observes:
“Science has identified some 2 million species of plants, animals and microbes on Earth, but scientists estimated there are millions more left to discover, and new species are constantly discovered and described. The most commonly discovered new species are typically insects, a type of animal with a high degree of biodiversity. Newly discovered mammal species are rare, but they do occur, typically in remote places that haven’t been well-studied previously. Some animals are found to be new species only when scientists peer at their genetic code, because they look outwardly similar to another species — these are called cryptic species. Some newfound species come from museum collections that haven’t been previously combed through and, of course, from fossils.”
Even the humble hosta has its own emergings, due in part to technological and social advances in communication.
“In past centuries, we used to talk about people “discovering” new species of plants. What this usually meant was that European, English or American plant explorers traveled to remote parts of the world and found plants that were new to them. Now, of course, we know that local people in those other parts of the world were often quite familiar with these plants all along. Many of the so-called new plants, including hostas, have been found in local paintings and documents produced long before the Westerners started poking around. In more recent times, however, with better communications, we more universally share the knowledge of different horticultural communities.”
As far as actually emerging species are concerned, evolutionary biologist Rob DeSalle of the American Museum of Natural History has indicated the continuing nature of species emergence:
“Identifying a new species as it emerges is the holy grail of evolutionary biology. … Species must be emerging someplace on earth. The best places to look would be places with lots of species, like rain forests, and islands, because isolation opens new niches.” (In “Q & A; Emerging Species” by C. Claiborne Ray, published June 17, 2003 in The New York Times)
The ScienceDaily website has a webpage dedicated to news about “new” species of plants and animals. While most of these will escape public awareness, Time Magazine has sifted through the barrage of information to identify the “Top 10 New Species” of 2013.
Speaking of top things of 2013, and moving on to emerging technologies, the Massachusetts Institute of Technology’s online Technology Review has published a list of “10 Breakthrough Technologies 2013“. The Technology Review’s “Best of 2013” (December 23, 2013) a quantum internet that Los Alamos National Laboratory has been running, is one of many significant technologies that didn’t make the list, perhaps because the system has been running for the past two years.
The Wikipedia article “Emerging technologies” emphasizes the role of technology convergence in the emergence of new technologies. The article mentions an acronym of particular interest to those in the information technology world:
“NBIC, an acronym for Nanotechnology, Biotechnology, Information technology and Cognitive science, is currently the most popular term for emerging and converging technologies, and was introduced into public discourse through the publication of Converging Technologies for Improving Human Performance, a report sponsored in part by the U.S. National Science Foundation.”
Wikipedia also has a “List of emerging technologies” containing brief descriptions of “some of the most prominent ongoing developments, advances, and innovations in various fields of modern technology.” More than two hundred emerging technologies are listed.
There are and will continue to be many new and emerging concepts in science, technology, and other fields. Taxonomies can help define the terminology for those concepts. This is perhaps most readily evident for genus-species-subspecies-etc. names, whose designation is the territory of the biological taxonomist, or the biologist temporarily acting as taxonomist. Elsewhere, taxonomists can identify predominant labels and the occasionally used synonyms, and then use that information to add appropriate preferred terms and non-preferred synonyms to a vocabulary. They can also add definitions and scope notes. The skills of the taxonomist can bring clarity to formerly mysterious concepts and nomenclature.
No fog, no mist; clear, bright, jovial, stirring, cold; cold, piping for the blood to dance to; Golden sunlight; Heavenly sky; sweet fresh air; merry bells. Oh, glorious! Glorious!
So don’t be scared of the ghosts of future concepts. Think of them as true spirits of the future, taking flight with the benefit of well-chosen terms and synonyms in a taxonomy or thesaurus.
Every time a new term rings true, an emerging concept gets its wings.
Barbara Gilles, Taxonomist
Much they saw, and far they went, and many homes they visited, but always with a happy end. The Spirit stood beside sick beds, and they were cheerful; on foreign lands, and they were close at home; by struggling men, and they were patient in their greater hope; by poverty, and it was rich.
(From A Christmas Carol in Prose; Being a Ghost Story of Christmas, by Charles Dickens. Illustrations by John Leech.)
Wouldn’t it be splendid if, in the ‘spirit’ of Dickens’ Ghost of Christmas Present, we could use taxonomies to accomplish the same things?
- Cure and eradicate sickness
- Promote international understanding
- Promote justice and social harmony
- Lessen and eradicate poverty
Admittedly, these are lofty goals. As it happens, taxonomies can help us accomplish these things. As taxonomist Alice Redmond-Neal has pointed out, “Verbalizing a concept identifies it, gives it substance, and makes it recognizable.” Taxonomies enable us to all agree on what we’re talking about, which can help us identify, quantify, and deal with problems.
Never underestimate the power of a taxonomy! Let’s take a short tour of taxonomies that reflect the spirit and intent of Dickens’ ghost.
“The Spirit stood beside sick beds, and they were cheerful”
The Public Library of Science (PLOS) has a large thesaurus reflecting the content of their digital library. (We at Access Innovations are very familiar with this thesaurus, as we helped develop it in its current form.) Most of PLOS’s journals focus on biological topics. Several of these journals present research related to disease control methods and eradication efforts:
- PLOS Medicine
- PLOS Pathogens
- PLOS Neglected Tropical Diseases
The last-named journal is especially noteworthy in that it offers a publication platform for researchers in third world countries who may have no opportunity for publication elsewhere. PLOS is probably the main publisher of articles on neglected (or at least previously neglected) tropical diseases.
Since the PLOS thesaurus was constructed to reflect the scope and depth of PLOS articles, the thesaurus covers hundreds of terms relevant to disease control methods and eradication efforts. The thesaurus serves as a basis for indexing the articles. As such, it guides searchers to information that can be used in current research, as well as information for healthcare providers and government officials to apply in disease control and eradication efforts.
“The Spirit stood … on foreign lands, and they were close at home”
Probably the best-known organization concerned with international understanding and cooperation is the United Nations. It’s fitting that they have a thesaurus, and a multilingual one at that, in all the official languages of the United Nations: Arabic, Chinese, English, French, Russian and Spanish.
“The multilingual UNBIS Thesaurus, created by the Dag Hammarskjöld Library, United Nations Department of Public Information, contains the terminology used in subject analysis of documents and other materials relevant to United Nations programmes and activities. It is used as the subject authority of the United Nations Bibliographic Information System (UNBIS) and has been incorporated as the subject lexicon of the United Nations Official Document System. It is multidisciplinary in scope, reflecting the Organization’s wide-ranging concerns. The terms included are meant to reflect accurately, clearly, concisely and with a sufficient degree of specificity, matters of importance and interest to the United Nations.”
“The Spirit stood … by struggling men, and they were patient in their greater hope”
As HURIDOCS describes itself, it is “an international NGO [non-governmental organization] helping human rights organisations use information technologies and documentation methods to maximise the impact of their advocacy work.” Of potential interest to taxonomists, “HURIDOCS is also an informal, open and decentralised network of human rights organisations who wish to put together their experiences and creativity to develop common standards and tools for information management. “
One of those tools is a set of small thesauri, in a collection named “Micro-thesauri : a tool for documenting human rights violations”.
“This collection of 48 lists with terminology was developed by HURIDOCS or adapted from a variety of authoritative resources. The Micro-thesauri are intended for use in conjunction with HURIDOCS Standard Formats manuals, and in particular with the HURIDOCS Events Standard Formats: a tool for documenting human rights violations.
“The Micro-thesauri can be used as a starting point for developing one’s own index terms for libraries and documentation centres, as keywords for organising information on websites, or as controlled vocabularies for databases to record violations.
“They have been translated into the following languages, often by volunteers: English, French, Spanish, Arabic, Russian, Portuguese, and Bahasa Indonesia.”
“The Spirit stood … by poverty, and it was rich.”
The Oxford Poverty and Human Development Initiative (OPHI) is an economic research center of the Oxford Department of International Development, at the University of Oxford. The center’s goal is “to build and advance a more systematic methodological and economic framework for reducing multidimensional poverty, grounded in people’s experiences and values.” OPHI explains multidimensional poverty as follows:
“Most countries of the world define poverty by income. Yet poor people themselves define their poverty much more broadly, to include lack of education, health, housing, empowerment, humiliation, employment, personal security and more. No one indicator, such as income, is uniquely able to capture the multiple aspects that contribute to poverty.”
OPHI has identified various aspects of poverty, grouped into five “missing dimensions” of poverty “that deprived people cite as important in their experiences of poverty”:
- Quality of work
- Physical safety
- Social connectedness
- Psychological wellbeing
While OPHI does not call their dimensional framework a taxonomy, it can certainly serve as one.
The Moral of This Posting
Use the power of the taxonomy! And as Obi-Wan Kenobi said in the movie Star Wars, “Use your power for good, not evil.”
And make sure your taxonomy gets used. Call attention to it, or to the search platform that it’s integrated with.
“Sometimes you have to…
SLAP them in the face just to get their attention.”
Carol Kane as The Ghost of Christmas Present, with Bill Murray, in the movie Scrooged (1988), written by Mitch Glazer and Michael O’Donoghue
Barbara Gilles, Taxonomist
Bob Kasenchak, formerly a project manager at Access Innovations, Inc., has recently been promoted to Production Coordinator within the company. Bob is a great representation of the fresh talent at Access Innovations. His extraordinary people skills, organizational skills, and leadership qualities make him an excellent fit to oversee editorial endeavors.
After receiving his bachelor’s degree in liberal arts from St. John’s College in Santa Fe in 1995, Bob went on to the New England Conservatory of Music in Boston to receive a master’s degree in music theory in 2001, and spent several years pursuing doctoral studies in music theory at the University of Texas. Bob remains an expert in all things music at Access Innovations.
Bob has been working at Access Innovations for two years, rising through the ranks with his quick wit and experience. He has worked on and managed projects for clients such as JSTOR, the Education division of McGraw-Hill, AAAS (the American Association for the Advancement of Science), and ASCE (the American Society for Civil Engineers), among others. His experience with taxonomy and thesaurus development and his knowledge of editorial workflows and challenges make him a skilled coordinator. He shares his knowledge and expertise both with clients and with the editors of Access Innovations as he ensures that projects are running smoothly and efficiently.
“I’m proud to serve on the Access Innovations team,” remarked Bob recently. “It’s exciting and fulfilling to play a role in the cutting-edge services we provide to our clients. I am especially interested in projects involving author and named entity disambiguation like the work we did for AIP’s Scitation offering.”
Bob has given presentations at several industry conferences over the last couple of years as well as presentations to clients and potential clients. He presented a paper at the 2013 Taxonomy Boot Camp in Washington D.C., and attended the 2013 Frankfurt Book Fair, meeting with many past and present Access Innovations clients.
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes automatic indexing, thesaurus management, an XML intranet system (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
“Leave me! Take me back. Haunt me no longer!”
In the struggle, if that can be called a struggle in which the Ghost with no visible resistance on its own part was undisturbed by any effort of its adversary, Scrooge observed that its light was burning high and bright; and dimly connecting that with its influence over him, he seized the extinguisher-cap, and by a sudden action pressed it down upon its head.
The Spirit dropped beneath it, so that the extinguisher covered its whole form; but though Scrooge pressed it down with all his force, he could not hide the light: which streamed from under it, in an unbroken flood upon the ground
There are a good many professional organizations that take their knowledge organization and dissemination responsibilities seriously. One good example, appropriately enough, is the International Society for Knowledge Organization (ISKO), which maintains a free online bibliographic service covering knowledge organization literature.
According to a recent communication from the organization, the ISKO literature database has been enhanced to cover nine more years into the past. This extension to include older research involved complex analysis and conversion of data that was in old formats. In explaining why ISKO made this effort and went to all this trouble, webmaster Claudio Gnoli explained as follows:
“We hope that this service can contribute to strengthen knowledge organization as a full, consistent scientific field. Let us encourage students and researchers to start their work by looking at what has been published in the past.”
That last sentence merits some reflection. It might seem obvious that previously published research would be a logical starting point for subsequent research on the same topic, or on a similar one. So why would students and researchers need encouragement to start there?
One reason could be that they are bombarded with current research. This could be research from recently set-up online platforms, research by their networks of current colleagues, and RSS feeds on hot topics in their fields. While keeping up with current research certainly is commendable, overrating it could lead to a telescoped view of the relevance of prior research, even when that research was published within the researcher’s lifetime. For a young field such as information science, while the amount of research material is expanding rapidly, the foundations may have been established within the last few decades. And they may still be worth mining.
“I am the Ghost of Christmas Past.”
“Long Past?” inquired Scrooge: observant of its dwarfish stature.
“No. Your past.”
Another reason for ignoring past research is intentional rejection of older observations, purely for the sake of adhering to a “modern” approach to research and description. (This, IMHO, is a major failing of many post-modernist practitioners, who consciously and deliberately ignore history and historical accounts.)
“What!” exclaimed the Ghost, “would you so soon put out, with worldly hands, the light I give?
Another possible reason that researchers ignore old research is that when they have tried to access it, they haven’t been able to. Not all professional organizations have expanded their databases backwards in time. Sometimes the organizations face obstacles, sometimes huge but generally surmountable. As we’ve seen with ISKO, the expansion can involve dealing with old formats, which might be a one-time technical issue (at least until the next major change in formatting). And then there may be overall budget and time limitations, especially with organizations that rely heavily on volunteer staff. Additionally, the shifting terminology of some research fields can cloud the search for past sources of illumination.
These obstacles are surmountable, but they require attention from those who maintain the research databases. Formats can be converted. And thesauri can accommodate older terminology, while offering flexibility to accommodate future terminology. Likewise, future guardians of research databases will need to pay the same kind of attention to these kinds of matters, so that current research is available to those who, like Scrooge, may discover valuable information that otherwise might have remained hidden.
“I will live in the Past, the Present, and the Future!” Scrooge repeated, as he scrambled out of bed. “The Spirits of all Three shall strive within me.”
Barbara Gilles, Taxonomist
Starry Night over the Rhone (the “other Starry Night”), by Vincent van Gogh
Last week, my co-workers and I were discussing points of knowledge. The phrase “a thousand terms of knowledge” popped up. It was apparently an off-the-cuff mingling of “a thousand points of light” with “points of knowledge” and with one of the topics of the moment, thesaurus terms.
I couldn’t resist following up on the mixture, which has a precedent of sorts in an earlier TaxoDiary blog posting by Marjorie Hlava. That posting ends with this paragraph:
“We librarians and information specialists get to view anarchy in the universe more often than other people do. And we are the ones who have the job of putting the universe into some sort of order. With a thousand points of knowledge.”
Universe. Order. Points. Light. The concepts of “universe and “order” lead inexorably to how we make sense of the cosmology of the physical universe. And what are the universe’s “points of light”? As far as the physical universe is concerned, those would be stars.
A comparison to stars is what speechwriter Peggy Noonan had in mind when she co-authored the speech first mentioning a thousand points of light. The context is “a brilliant diversity spread like stars, like a thousand points of light in a broad and peaceful sky.” To stretch the metaphor, might we think of a thesaurus as “a brilliant diversity”? With terms serving as the individual sources of illumination?
One often speaks of stars being “spread” or “scattered” throughout the sky. Even without a telescope, in a dark sky you can see thousands of stars. How does one make sense of them? By arranging them into categories, of course. The first categories of stars were constellations: physical, visible groupings with individual names and attributes. Many of the earliest constellations from ancient times are still recognized by the International Astronomical Union and are still used by numerous professional and amateur astronomers for identifying individual stars and regions of interstellar space. And popular culture still makes reference to the grouping of constellations used to map the sun’s travels: the zodiac.
The knowledge organization role that constellations have served for thousands of years is reflected in their metaphoric use in literature. There’s a frequently quoted sentence in The Fault in Our Stars, a best-selling novel by John Green: “My thoughts are stars I can’t fathom into constellations.” Green explains that he based the metaphor on “the idea that constellations are a way of constructing meaning and organization from a disorganized and arbitrary universe.”
As early civilizations struggled to make sense of the universe, constellations inevitably made their way into the early star catalogs, which could be thought of as simple taxonomies. These catalogs date back at least as far as the 12th century BC, when the first known Babylonian star catalog was written on clay tablets. In later centuries, development and use of similar quasi-taxonomies continued; ancient people who had star catalogs include the ancient Greeks, Persians, Chinese, Arabs, and Egyptians.
Through the centuries, astronomers have continued to develop star catalogs. Meanwhile, astronomical knowledge has grown, um, astronomically, and astronomers have recognized various types and subtypes of stars and star systems. (This is fortunate, partly because modern telescopes can detect billions and billions of stars. BTW, it was Johnny Carson, not Carl Sagan, who coined the phrase “billions and billions”.) As these types and subtypes are incorporated into the cataloging of stars, the catalogs have become more and more like taxonomies.
And yes, there are several outright taxonomies and thesauri that cover stars, star systems, star types and subtypes (and subtypes of those), and individual stars. One of these is the Unified Astronomy Thesaurus (UAT), which Access Innovations helped develop, along with the American Institute of Physics [www.aip.org] and other scientific organizations. The screenshot below shows just a portion of the taxonomic treatment of the “Stars” branch.
These terms and relationships can be illuminating (for both professional and amateur astronomers) in a figurative sense. And if you consider how stars look from the earth, or how small yet illuminating each star is in relation to the whole universe, the multitude of terms literally does cover a thousand points of light.
Barbara Gilles, Taxonomist
Albuquerque-based Access Innovations, a leader in semantic enrichment, celebrates its 35th anniversary this month.
Officially incorporated on November 8, 1978, Access Innovations began like many other entrepreneurial enterprises — with a great idea, a few energetic people, and a little bit of capital. The company’s first headquarters was the kitchen of president and co-founder Marjorie M.K. Hlava’s home.
“Over the past 35 years, we’ve helped a lot of prestigious companies build their databases, knowledge management systems, and intranets,” says Jay Ven Eman, CEO and co-founder of Access Innovations. “We’ve worked on projects from huge SGML-based corporate intranets to sole proprietors’ customer records. We avail ourselves of every conceivable software, and we’ve converted data on nearly every platform. It’s been very exciting, yet in some ways I feel like this is just the beginning.”
Some 2,000 installations later, and with more than 200 thesauri under its belt, Access Innovations continues to be a leader in the information management arena. In the past year, Access Innovations has been named to eContent Magazine’s “100 Companies that Matter Most in the Digital Content Industry,” as well as to KMWorld’s list of “100 Companies That Matter Most in Knowledge Management.” In addition, Access Innovations’ proprietary software, Data Harmony, has been included on the list of “100 Trend-Setting Products” by Information Today.
“Our longevity and experience in the ever-changing world of knowledge management is definitely something to be celebrated,” says Hlava. “While I am extremely proud of our accomplishments these past 35 years, we have no intentions or desires to rest on our laurels. We will continue to push the boundaries of our industry. It’s exciting to think of what the future holds.”
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
We taxonomists love lists. For us, they are raw materials, the stuff we love to get our hands on to start organizing. I recently came across a couple of lists with crossover appeal and relevance beyond simple conceptual organization. Both address why we do this stuff and how we help turn information into actionable knowledge.
The first is a post by Kent Anderson on the Society for Scholarly Publishing’s Scholarly Kitchen, entitled UPDATED — 73 Things Publishers Do (2013 Edition). 73! This grew out of the 2012 list of a mere 60 things journal publishers do and several more offered in a postscript.
Several of the early items on the list have to do with establishing and nurturing a high quality product and readership, followed by numerous points relating to the business of journal publishing. At #34 we find a key contribution of journal publishers that adds value:
Tagging. To generate good metadata, articles and elements are often tagged using either semantic, custom taxonomies, or both. Sometimes, tagging is manual, sometimes automated, and sometimes a little of both. But it doesn’t happen all by itself. And it isn’t maintained, enhanced, expanded, migrated, or corrected all by itself, either.
[good to be recognized, isn’t it?]
Later points of particular interest to taxonomists and others concerned with information management include:
Search engine optimization – “…authors want their papers to be found”
Integrate and track metrics and, increasingly, altmetrics
Implement and manage interlinking services
Create and maintain e-commerce systems
Create or integrate with educational offerings
In the follow-up discussion for the list of journal publishers’ 73 valued activities we find: “It may be time to start thinking about how this list could be subdivided into discrete and logical categories.” Could there be a taxonomist in the audience?
We strive for meaningful usability, which sometimes means distilling great quantities of material to its essence. In an exploration of how document managers can add value, Seth Maislin encapsulated their contributions in an essential six points.
Findability – finding things accurately and precisely
Speed – finding things quickly
Timeliness – finding things quickly, in manageable chunks and in context, with retrieval tuned to a user’s situation (read “relevance”)
Accessibility – content delivered in a way that’s usable, readable, printable, viewable on a device, etc.
Personalization – content delivered to the correct audiences
Interpretation – content with the right semantic meaning in the user’s context
Clearly, the fundamental ways that document managers boost the value of content directly serve the valuable contributions of publishers. And it largely boils down to the value of enriching content by adding metadata, i.e. tagging, and more specifically, subject tagging or indexing. Subject indexing with a reliable and consistent vocabulary—a taxonomy—feeds findability, speed, relevant retrieval, personalization and semantic precision. For the publisher, it serves search engine optimization, various types of metrics including altmetrics, e-commerce, and linked data for educational offerings or any other collections.
In their own subtle ways, both the extensive list of journal publishers’ activities and the focused list of document managers’ strategies for adding value recognize and depend upon the key contributions of taxonomists and indexers.
[Final note: Considering their presumed importance, it is curious that “taxonomy”, “indexing”, “tagging” and “metadata” are nearly nonexistent, mentioned only with reference to the publishers’ contribution of tagging.]
Chief Taxonomist, Senior Editor