June 11, 2010 – In comparison to Wikipedia, which gathers data, NLGbAse is an information extraction system.

NLGbAse  is based on structured information built from Wikipedia and wiki syntax. NLGbAse contains more than 2,7 millions multilingual entities. Those entities contains statistical and semantic informations. Those informations are exploited by information retrieval algorithms, virtually capable of unlimited facts extraction associated to each entity. The main advantage of such a learning material is its evolutivity. NLGbAse can learn automatically new entities and relations on a day by day basis. Because of cross-linguistic detailed references, NLGbAse gives for a unique term, a wide range of possible writings and synonyms. The final objective of the project is to build a robust Natural Langage Generation system. The tools delivered on their website are free to use for academic and research purpose.

An interesting live demo can be viewed here.

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.