Apache Lucene 4.0 has been released and is garnering quite a bit of chatter, especially with the impending ApacheCon 2011. So what are the new features and, more importantly, what does it mean to users?
This interesting topic was brought to our attention by the blog, Ostatic, in their post, “Under the Hood in Apache Lucene 4.0.” One of the most significant changes in Lucene 4.0 is the full switch to using bytes (UTF8) in place of text strings for indexing within the search engine library. This increases the speed of loading and searching by removing the need for string conversion. The result is up to 30 times faster. This switch has also facilitated one of the main goals for Lucene 4.0, which is ‘flexible indexing’.
Indexing, flexible or otherwise, is a great feature in any document management system and it helps to reduce redundancy, but it is very important to choose a product that makes your content findable – easily and with thoroughness. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies and associated rule bases for machine assisted indexing.
Melody K. Smith
Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.