Logo image
An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain
Book chapter   Peer reviewed

An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain

Min Song, Il-Yeol Song, Xiaohua Hu and Robert B. Allen
Advances in Knowledge Discovery and Data Mining, pp 173-179
2005

Abstract

In the domain of bioinformatics, extracting a relation such as protein-protein interations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheer size of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.

Metrics

11 Record Views

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#4 Quality Education

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Information Systems
Logo image