Logo image
Text Retrieval based on Least Information Measurement
Conference proceeding

Text Retrieval based on Least Information Measurement

Weimao Ke and Assoc Comp Machinery
ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, pp 125-132
01 Jan 2017

Abstract

Computer Science Computer Science, Information Systems Computer Science, Theory & Methods Science & Technology Technology
We developed a new information retrieval framework based on the Least Information (LI) metric. We derived multiple term weighting schemes and combined them with a vector space representation for ad hoc retrieval. Given probability distributions in a collection as prior knowledge, LI Binary (LIB) quantifies least information due to the binary occurrence of a term in a document whereas LI Frequency (LIF) measures least information based on the probability of drawing a term from a bag of words. Experiments on four benchmark TREC collections for ad hoc retrieval showed that LIT-based methods achieved superior performances compared to classic TF*IDF and BM25, especially for verbose queries and hard search topics. The least information theory is a method for entropy-based information measurement and offers a novel approach for IR modeling.

Metrics

19 Record Views
5 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#4 Quality Education

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Information Systems
Computer Science, Theory & Methods
Logo image