Computer Science, Interdisciplinary Applications Life Sciences & Biomedicine Mathematical & Computational Biology Science & Technology Computer Science Medical Informatics Technology
In this paper, we propose an approach based on a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. Experiments on TREC 2007 Genomics collection and two distinctive IR baseline runs demonstrate the effectiveness of our method in promoting ranking diversity for biomedical information retrieval. Evaluation results show that our approach can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track.
Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA
Creators
Yan Chen - University of Chinese Academy of Sciences
Xiaoshi Yin - Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
Zhoujun Li - National University of Defense Technology
Xiaohua Hu - Drexel University, Information Science
Jimmy Xiangji Huang - York Univ, Sch Informat Technol, Toronto, ON, Canada
Publication Details
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), pp 456-461
Series
IEEE International Conference on Bioinformatics and Biomedicine-BIBM
Publisher
IEEE
Number of pages
6
Grant note
SKLSDE-2011ZX-03 / Fund of State Key Laboratory of Software Development Environment
61170189 / National Natural Science Foundation of China; National Natural Science Foundation of China (NSFC)
Resource Type
Conference proceeding
Language
English
Academic Unit
Information Science
Web of Science ID
WOS:000411330600086
Scopus ID
2-s2.0-84862969491
Other Identifier
991019167451604721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: