Incorporating terminology evolution for query translation in text retrieval with association rules

Document Type

Conference Proceeding

Publication Date

12-1-2010

Journal / Book Title

International Conference on Information and Knowledge Management Proceedings

Abstract

Time-stamped documents such as newswire articles, blog posts and other web-pages are often archived online. When these archives cover long spans of time, the terminology within them could undergo significant changes. Hence when users pose queries pertaining to historical information over such documents, the queries need to be translated taking into account these temporal changes in order to provide accurate responses to users. For example, a query on Sri Lanka should automatically retrieve documents with its former name Ceylon. We call such concepts SITACs, i.e., Semantically Identical Temporally Altering Concepts. In order to discover SITACs, we propose an approach based on a novel framework constituting an integration of natural language processing, association rule mining and contextual similarity as a learning technique. The proposed approach has been experimented with real data and has been found to yield good results with respect to efficiency and accuracy. © 2010 ACM.

DOI

10.1145/1871437.1871730

Share

COinS