Journal article
Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
Journal of informetrics, Vol.9(3), pp.455-465
Jul 2015
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
•Five vocabulary- and model-based methods to extract terms from scientific publications are evaluated.•Three conditional random fields (CRF)-based methods outperform the two vocabulary-based ones.•CRF with keyword-based dictionary method has the best performance.•The keyword-based one has a higher recall and the Wikipedia-based one has a higher precision.
The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabulary-based methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabulary-based methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level.
Metrics
3 Record Views
Details
- Title
- Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
- Creators
- Erjia YanYongjun Zhu
- Publication Details
- Journal of informetrics, Vol.9(3), pp.455-465
- Publisher
- Elsevier
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science (Informatics)
- Identifiers
- 991014976824204721
UN Sustainable Development Goals (SDGs)
This output has contributed to the advancement of the following goals:
InCites Highlights
These are selected metrics from InCites Benchmarking & Analytics tool, related to this output
- Web of Science research areas
- Computer Science, Interdisciplinary Applications
- Information Science & Library Science