Logo image
A Mixture Language Model for Class-Attribute Mining from Biomedical Literature Digital Library
Conference proceeding

A Mixture Language Model for Class-Attribute Mining from Biomedical Literature Digital Library

Xiaohua Zhou, Xiahoua Hu, Xiaohua Zhang, Daniel D. Wu, Tingting He and Aijing Luo
2008 IEEE International Conference on Bioinformatics and Biomedicine, Proceedings, pp 17-22
01 Jan 2008

Abstract

Life Sciences & Biomedicine Mathematical & Computational Biology Science & Technology
We define and study a novel text mining problem for biomedical literature digital library, referred to as the class-attribute mining. Given a collection of biomedical literature from a digital library addressing a set of objects (e.g., proteins) and their descriptions (e.g., protein functions), the tasks of class-attribute mining include: (1) to identify and summarize latent classes in the space of objects, (2) to discover latent attribute themes in the space of object descriptions, and (3) to summarize the commonalities and differences among identified classes along each attribute theme. We approach this mining problem through a mixture language model and estimate the parameters of the model using the EM algorithm. We demonstrate the effectiveness of the model with an application called protein community identification and annotation from Medline, the largest biomedical literature digital library with more than 16 millions abstracts.

Metrics

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Mathematical & Computational Biology
Logo image