Semantic networks provide structure to a set of concepts, as a network or a web. They are often defined as graph representations of the relationships among sets of concepts. In a larger sense, in the world of computer science, they are sets of concepts with relationships that are defined in such a way that computers, or the World Wide Web, can work with those relationships.
Semantic networks need to be built on top of something else. The semantic networks are made to form a web, or a web of knowledge. As we know, there are lots of ways to think about knowledge, lots of ways to approach it, but at the end of the day it is the concepts themselves that form the nodes on the web – meaning the intersection of the data. To build the web we need to find a way to create a node and then make the relationships. Those relationships might be the broader and narrower terms from a thesaurus or taxonomy. They might be the synonyms and they might be the hierarchies from another column or another table providing a way to flesh out the information. You can show all kinds of cause and effect kinds of relationships, whole/part ownership, and things like that in that kind of expansion. This expanded network of terms and relationships is made possible with computer technology. It would take individuals much longer to figure them out and make the connections.
The concepts in a semantic network can be thought of as nodes, with various relationships branching out from those nodes. The expanded relationships can be ones that taxonomists and ontologists are familiar with, such as specific whole-part relationships, cause-effect, and parent-child.
Princeton University’s WordNet is often used as a source for this work; it appears in several search engines. However, WordNet is really a synonym gradient, rather than a true semantic network in the full sense. To my mind, it does not lend itself to use as a semantic web, but some people will mention WordNet in the same breath as “semantic web,” so I want you to understand the difference between them. If you are looking for the sense of a term, or looking for simple ways to expand terms (and if you can’t afford to buy the American Heritage Dictionary, which is the other one that people frequently invoke) then using a variation on WordNet, which is available fairly inexpensively, is a good option.
To a geneticist, “recombination” may seem perfectly clear, and in no need of clarification. He or she may automatically think of genetic recombination, in which DNA strands break and rejoin in different ways. Even for other geneticists with a different specialty within genetics, though, “recombination” may have something to do with genetic algorithms and chromosomes. If the taxonomy covers various sciences and not just genetics, “recombination” could trigger any of the following, as indicated on the Wikipedia disambiguation page for Recombination (bless Wikipedia for their disambiguation pages, even though they need to be taken with a grain of salt): in genetics (as mentioned above), the process by which genetic material is broken and joined to other genetic material; in semiconductor physics, the elimination of mobile charge carriers (electrons and holes); in plasma physics, the formation of neutral atoms from the capture of free electrons by the cations in a plasma; in cosmology, the time at which protons and electrons formed neutral hydrogen in the timeline of the Big Bang; in chemistry, the opposite of dissociation. This underscores the need to find out what whomever you are working with means.
Marjorie M.K. Hlava
President, Access Innovations
Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.
Lisa, I agree with your thinking on doing the smaller project to start with. You have a more controlled situation for developing workflow and governance guidelines. And if the software doesn’t turn out to be a good fit with your organization or with your information infrastructure, there will have been that much less time and effort involved.
Lisa, as I understand your question, the main issue is whether to do a small scale test on what would be part of the ultimate taxo or jump in the deep water to develop the full taxo and then apply it, hoping for the best.
If time and organizational requirements permit, I’d do the small test. Greater chance of deeper involvement and engagement with select group of people, and more manageable corpus to evaluate results on. If the results are good, retagging the small group within the full set shouldn’t be a problem.
I don’t see anything about how you will do the tagging–manually? automatically based on exact match to term? That will affect the term choice to express concepts, i.e. you may have to use the exact phrase most commonly seen in documents and hope there isn’t much variation from that. If categorization software is an option to enhance SharePoint, consider Data Harmony’s M.A.I. to capture the variations in words/phrases writers will use to express the same concept.
I like your blog very much. I am managing a taxonomy project for my organization (the project began last month). We are redesigning our websites and, while we have everything being scrutinzed and reformatted, decided that this would be an excellent opportunity to implement a taxonomy. We intend for our taxonomy to assist with website Search and potentially to be the driver that will allow users to customize their landing pages based on their declared areas of interest. I have a question for you and your audience, as I hope to be guided by others’ wisdom and experience:
We are using a toolset new to the organization (Sharepoint) for the website and also for the Taxonomy. Out of the box Term-Store (budget is tight). We plan to have a main site and a suite of microsites for our various programs and initiaives (we’re a small public agency, short on funds, but long on expertise and enthusiasm for the work).
My approach is to conduct a POC (proof of concept) of the toolset and our approach by creating a taxomony from surveying 2 SMEs and tagging the content of their microsites using the result. We would then test our toolset against this tagged content to ensure that the toolset works.
My goal with this approach is also to introduce the concept of taxonomy to the organization by demonstrating value with a controlled pair of applications. The end result should tell us if our approach to the human aspect of eliciting the taxonomy is workable and if our technical approach is workable.
On the other hand…
Another approach would be to conduct an enterprise-wide (150 people in the organization) taxonomy-creation exercise and using this much larger superset, tag our content and test the technical tools.
In the end, an enterprise-wide taxonomy (as relates to the websites at least) WILL be created. My preference for the former approach is due to my concern that we have picked and are using our shiny new tools correctly. If we need to tweak the technical aspects, this will become apparent during the technical phase of the POC. That tweaking can occur independent of the further progress of eliciting the taxonomy from the work-groups. If we pursue the “no tagging until we know the entire taxonomy” approach, we could run into technical problems and have to solve them at the end of the project when expectations are high. But, we would not be re-tagging what we’d initially tagged, so there is that.
What are your (plural ‘you’), thoughts?