Optimizing Information Retrieval in Lucene Using Lexical Chain Analysis and WordNet Integration
Main Article Content
Abstract
Extraction of relevant information from unstructured data is a major challenge in the big data era. This paper proposes a method to extend the search capabilities of Lucene by integrating lexical chain analysis with WordNet. The focus is mainly on enhancing the precision and relevance of results in search. WordNet is an extremely large lexical database that contains all the rich semantic relationships among words, which are used to create lexical chains—sequences of semantically related words, offering insight into the text conceptual structure. Doing the lexical chain analysis increases the complexity of semantics in search queries and with their corresponding semantic relationships to documents, giving real context. It supports diversified search modalities, which contain document attributes such as name, type, size, date, and author, along with content-based searches. A major emphasis of the indexing mechanism would be on keyword frequency and the presence of semantically relevant keywords using lexical chains. In this paper, a methodology and its corresponding algorithms are developed for this integration. This includes explanations of the indexing mechanism and the different search algorithms. By combining the power of Lucene in search, with the semantic depth afforded by WordNet-driven lexical chain analysis, it is the aspiration of this research to significantly enhance information retrieval from unstructured data, addressing evolving requirements across a number of domains and facilitating intuitive access to knowledge repositories.
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.