Book chapter
Using Concept-Based Indexing to Improve Language Modeling Approach to Genomic IR
Advances in Information Retrieval, pp 444-455
2006
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing literature size, is challenging IR community. In this paper, we are focused on addressing the synonym and polysemy issue within the language model framework. Unlike the ways translation model and traditional query expansion techniques approach this issue, we incorporate concept-based indexing into a basic language model for genomic IR. In particular, we adopt UMLS concepts as indexing and searching terms. A UMLS concept stands for a unique meaning in the biomedicine domain; a set of synonymous terms will share same concept ID. Therefore, the new approach makes the document ranking effective while maintaining the simplicity of language models. A comparative experiment on the TREC 2004 Genomics Track data shows significant improvements are obtained by incorporating concept-based indexing into a basic language model. The MAP (mean average precision) is significantly raised from 29.17% (the baseline system) to 36.94%. The performance of the new approach is also significantly superior to the mean (21.72%) of official runs participated in TREC 2004 Genomics Track and is comparable to the performance of the best run (40.75%). Most official runs including the best run extensively use various query expansion and pseudo-relevance feedback techniques while our approach does nothing except for the incorporation of concept-based indexing, which evidences the view that semantic smoothing, i.e. the incorporation of synonym and sense information into the language models, is a more standard approach to achieving the effects traditional query expansion and pseudo-relevance feedback techniques target.
Metrics
Details
- Title
- Using Concept-Based Indexing to Improve Language Modeling Approach to Genomic IR
- Creators
- Xiaohua Zhou - Drexel UniversityXiaodan Zhang - Drexel UniversityXiaohua Hu - Drexel University
- Publication Details
- Advances in Information Retrieval, pp 444-455
- Series
- Lecture Notes in Computer Science
- Publisher
- Springer Berlin Heidelberg; Berlin, Heidelberg
- Resource Type
- Book chapter
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000238083200039
- Scopus ID
- 2-s2.0-33745848377
- Other Identifier
- 991019170602504721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Information Systems
- Computer Science, Theory & Methods