Journal article
Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
Journal of informetrics, v 9(3), pp 455-465
Jul 2015
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
•Five vocabulary- and model-based methods to extract terms from scientific publications are evaluated.•Three conditional random fields (CRF)-based methods outperform the two vocabulary-based ones.•CRF with keyword-based dictionary method has the best performance.•The keyword-based one has a higher recall and the Wikipedia-based one has a higher precision.
The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabulary-based methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabulary-based methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level.
Metrics
Details
- Title
- Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
- Creators
- Erjia YanYongjun Zhu
- Publication Details
- Journal of informetrics, v 9(3), pp 455-465
- Publisher
- Elsevier
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:000360907400004
- Scopus ID
- 2-s2.0-84930626409
- Other Identifier
- 991014976824204721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Interdisciplinary Applications
- Information Science & Library Science