Book chapter
An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain
Advances in Knowledge Discovery and Data Mining, pp 173-179
2005
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
In the domain of bioinformatics, extracting a relation such as protein-protein interations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheer size of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.
Metrics
11 Record Views
Details
- Title
- An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain
- Creators
- Min Song - Drexel UniversityIl-Yeol Song - Drexel UniversityXiaohua Hu - Drexel UniversityRobert B. Allen - Drexel University
- Publication Details
- Advances in Knowledge Discovery and Data Mining, pp 173-179
- Series
- Lecture Notes in Computer Science
- Publisher
- Springer Berlin Heidelberg; Berlin, Heidelberg
- Resource Type
- Book chapter
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000229956700021
- Scopus ID
- 2-s2.0-26944468655
- Other Identifier
- 991019170504504721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Information Systems