Metadata May Be Taking the Stand

April 27, 2011  
Posted in metadata, News, semantic, Uncategorized

April 27, 2011 – Companies that deliver and manage electronic discovery services, metadata, or data that is embedded in electronic documents can tell us precisely how and when an electronic document was created, modified, and transmitted. Thus, the document can tell its own story.

According to the Forbes article, “Metadata, The Freedom of Information Act, and Government Hypocrisy,” metadata matters.

The ability of software to connect chains of events with different media and to capture digital irregularities by parties hoping to hide their activities depends on both artificial intelligence and metadata. And because of this, more and more court systems are acknowledging their value and validity.

Melody K. Smith

Lucid Imagination Joins the SharePoint Crowd

April 27, 2011  
Posted in News, search, Taxonomy, Uncategorized

In a familiar story, we learned that Lucid Imagination released an update to their LucidWorks Enterprise product this week that includes a way of connecting the search tool directly to SharePoint repositories.

Fierce Content Management brought this news to our attention in their article, “Lucid Imagination includes SharePoint connector.” This version takes advantage of an open source connector created by Google.

One reason it might seem familiar is that Access Innovations, a leader in the data and content management industry, recently announced that its Data Harmony suite of content enrichment and thesaurus management tools can now be fully integrated with Microsoft SharePoint 2010. The end result is a collaboration platform with information assets that are more searchable and more accessible.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.

Name Disambiguation Musings

April 25, 2011  
Posted in Access Insights, Featured, Standards, Uncategorized

The Wall Street Journal on April 19, 2011 talked about the need for customer name authority control in banks. Okay, so maybe that is not what they said. What they did outline was the problem of Arabic names and the many ways to state them. This creates difficulties for banks and other organizations that try to track the information or put a hold on funds for organizations or individuals. The example used was Moammar Gadhafi. His first name could be transliterated as Muammar, Mummar, Mohamed  Mahmut, Mehmud and more than 20 other variants. The same goes with the last name, which could be Gaddafi, Ghathafi, Elkaddafi, El-Kaddafi, Al-Gaddafi, Gadhafi, Qaddafi, Al-Qadhafi, El-Qaddfi, Qadhafi, Abu Miryar Al-Qahafi, Ghadaffi, and others. Any combination of these names is valid. There are further complications of the Abu or Al or El and other designations of honor, making things even more interesting.

The UN Sanctions group lists twelve variations for his name. The UN prefers a single form of the name (Muammar Mohammed Abu Miryar Qahafi), as do the Swiss (Muammar Ghedklafi).  However, there is nothing unusual or illegal about someone filing under any of several valid variations of their name.

Arabic does not have transliteration standards like Kanji (Korean, Chinese and Japanese) or Tamil or most of the other languages. There are attempts and the ISO has a transliteration standard. But transliteration means that it should work both ways and this is not the case in this case because the initial inputs are so variable. Arabic has sounds that do not exist in Latin based languages, and the ways of displaying them are highly varied. A speaker will be able to understand the meaning phonetically but the machine cannot easily translate them back and forth. There is also the issue of dialects of Arabic. They are very particular about how one group speaks compared to another. Recently, an Egyptian told me that the Algerians do not speak real Arabic, to which an Algerian responded that it was the Egyptians who did not know how to speak. Ugh.

Figuring out names, how to best format them, building databases of those names and their variations is increasingly important in our ever more digital world. Author disambiguation for authors of papers scientific journals is a recent popular area with the introduction of author networks such as the AIP UniPHY and the Elsevier SciVal (formerly Collexis) offerings. The common examples are for Asian names because they can be inverted and one not knowing the syntax and naming conventions can easily confuse them. There are many Asian authors in Western journals. Trying to distinguish the names written in languages like Arabic, Tamil, and other alphabetic scripts that do not conform to the Western notion of naming conventions precludes simple solutions to complex options for names. Names change as people marry, divorce, use nicknames, use or don’t use middle names, initials etc.

We believe that the need for transliteration standards, like the large number of them available from ISO, and the accompanying tools and authority file databases of names, will continue to become more important as we try to bridge the divides of information capture and sharing. Access Innovations chooses to use Unicode to ensure that all character sets can be represented. The Data Harmony tool set accommodates the huge variations in naming conventions, and the XML option ensures that the data can be encapsulated and shared across many platforms. Using Java means that the tools are platform independent. Entity extraction (people, places, and things) tools need to pull as much conceptual value from the digital objects as available. Making sure that the desired information can be extracted in any language and then gathered with the aliases of each name forms the name disambiguation framework we support.

Marjorie M.K. Hlava
President, Access Innovations

The Demise of Grammar at the Hands of Technology

April 22, 2011  
Posted in News, Standards, Uncategorized

April 22, 2011 – Texting may be faster, more efficient and mostly gets the job done. But the slippery slope of bad grammar we have taken from emails to texting has taken away all quality of writing and content. Using the smallest number of letters to get your message across does not equal writing.

We found this attention-grabbing topic on in their post, “Web Semantics: Bad Writing Doesn’t Matter Any More.” It has long annoyed me to see the English language so mangled and misused. Don’t get me wrong, I too have been guilty of the texting travesty. It has become part of our culture; even professional workplaces utilize it.

How has the evolution of natural language processing and semantic technology made this situation worse? Or has it?

Melody K. Smith

Big Data, Big Challenges

April 22, 2011  
Posted in metadata, News, search, semantic, Uncategorized

April 22, 2011 – Everyone seems to be addressing the challenges of data – managing it, handling it, storing it. However, for startups that challenge is even more daunting.

We found this interesting information on Read Write Web in their article, “Access, Aggregation, and Other Big Data Challenges for Startups.” Findability, before and after integration, is a challenge for any size of organization. For a startup, it isn’t easy to access, utilize, or monetize from a new database. Gil Elbaz argues that it’s important to “grease the wheels” of data, something he has a personal interest in. His own startup, Factual, is an open data source for location data.

According to him, this open data model leads to “information singularity,” as do other efforts like data marketplaces, data search engines, semantic web mark-up, and better standards.

Melody K. Smith

Social Media at Walmart

April 21, 2011  
Posted in indexing, News, Uncategorized

April 21, 2011 – WalMart Stores has acquired Kosmix, a technology firm which searches and analyzes social media connections in real-time to deliver customized feedback to users.

Mr. Web brought this to our attention in their article, “Wal-Mart Buys Social Media Analyst Kosmix.” Kosmix was founded in 2005 by online shopping pioneers Venky Harinarayan and Anand Rajaraman.

Their claim to fame was a platform called the ‘Social Genome’, which adds a layer of semantic understanding to social media data. By analyzing the huge volume of data produced every day on social media, the platform builds rich profiles of users, topics, products, places and events. The Kosmix platform also powers TweetBeat, a real-time social media filter for live events with more than five million visits last month.

The founders and their team will now operate as part of the newly formed @WalmartLabs, which will create technologies and businesses around social mobile commerce to support the integration of Walmart’s real-time and online e-commerce strategy.

Melody K. Smith

Earth to the Clouds

April 21, 2011  
Posted in metadata, News, Uncategorized

April 21, 2011 – Google Earth Builder will serve up data from businesses and government agencies to the World.

We found this news on GIS User in their blog post, “Enterprise geospatial data in the Google cloud.” This product will be available in late 2011.

With Google’s cloud supporting all data formats, extracting all relevant metadata and presenting the user (think Gmaps and gEarth) with a nice catalog via the cloud, the user can then serve and share their geospatial data similar to the way they use Google Docs now.

They also claim no technical expertise or GIS training will be required. Earth Builder offers anytime, anywhere access efficiently and effortlessly. This will be interesting to see.

Melody K. Smith

Location Tool Drives Business

April 21, 2011 – LocalResponse launches its long-awaited public beta of their new tool that allows business owners to respond to their customers to drive transactions. allows businesses to track their consumers in real time by organizing, indexing and prioritizing various location-based services.

We found this interesting piece of news in Directions Magazine in their article, “LocalResponse Launches Public Beta, Forever Changing How Businesses Engage With Consumers.” Tagging along with the frenzied “I am here” location applications, LocalResponse is harnessing consumer psychology and engaging the consumer at the point of sale.

Businesses will now have access to a simple and user-friendly platform to sift through millions of online messages and respond accordingly. It provides a forum for them to retain customers, create new ones and manage relationships.

Melody K. Smith

Six Degrees of Indexing

April 20, 2011  
Posted in indexing, News, semantic, Uncategorized

April 20, 2011 – What would it be like to have search sites take your friends’ opinions into account when you look for restaurants? Newspaper sites that use their knowledge of what’s previously captured your attention online to display articles you are interested in? Sounds a lot like Amazon’s technology, doesn’t it?

Technology Review brought this topic to our attention in their article, “Social Indexing.” Let’s face it, the Web is better when it focuses on us. Facebook’s chief technology officer, Bret Taylor, agrees. To bring this idea to fruition, he is creating a pseudo social index of the most frequently visited chunks of the Web.

Many sites, such as Amazon, have tried to personalize what they offer by remembering your past behavior and showing information they presume will be relevant to you. But this social index would be different and more powerful because it also mines your friends’ interests and collects information from multiple sites. As a result, the index can give websites a sense of what is likely to interest you – before you visit. Of course, if you have an eclectic taste in friends, the results could be skewed.

Melody K. Smith

iGlue On Its Way To Being a Publicly Traded Entity

April 20, 2011  
Posted in indexing, News, semantic, Uncategorized

April 20, 2011 – Power of the Dream Ventures has announced they are going forward on a roadmap for iGlue to take the company public in the United States.

MarketWire brought this news to our attention in their article, “Power of the Dream Ventures Announces Going Public Roadmap for iGlue.”

iGlue represents a significant shift in internet technology. Through the interactive use of iGlue’s machine and hand annotation feature, every single word on a webpage becomes a live, media rich Wikipedia-like junction point, providing users access to immediate, value added information. They are known for the slogan, ‘Let’s Wikify the Web.’

Their expanded semantic database contains over 83 million data points, including over 7 million entities, with over 38 million semantic connections between them, including over three million geographical locations, more than one million names, and more than two hundred thousand institutional name entries.

Melody K. Smith

Next Page »