Logo image
Incremental Author Name Disambiguation for Scientific Citation Data
Conference proceeding

Incremental Author Name Disambiguation for Scientific Citation Data

Zhengqiao Zhao, Jason Rollins, Linge Bai, Gail Rosen and IEEE
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), v 2018-, pp 175-183
01 Jan 2017

Abstract

Computer Science Computer Science, Information Systems Computer Science, Theory & Methods Engineering Engineering, Electrical & Electronic Science & Technology Technology
Name disambiguation is a perennial challenge for any large and growing dataset but is particularly significant for scientific publication data where documents and ideas are linked through citations and depend on highly accurate authorship. Differentiating personal names in scientific publications is a substantial problem as many names are not sufficiently distinct due to the large number of researchers active in most academic disciplines today. As more and more documents and citations are published every year, any system built on this data must be continually retrained and reclassified to remain relevant and helpful. Recently, some incremental learning solutions have been proposed, but most of these have been limited to small-scale simulations and do not exhibit the full heterogeneity of the millions of authors and papers in real world data. In our work, we propose a probabilistic model that simultaneously uses a rich set of metadata and reduces the amount of pairwise comparisons needed for new articles. We suggest an approach to disambiguation that classifies in an incremental fashion to alleviate the need for retraining the model and re-clustering all papers and uses fewer parameters than other algorithms. Using a published dataset, we obtained the highest K-measure which is a geometric mean of cluster and author-class purity. Moreover, on a difficult author block from the Clarivate Analytics Web of Science, we obtain higher precision than other algorithms.

Metrics

7 Record Views
17 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Industry collaboration
Domestic collaboration
Web of Science research areas
Computer Science, Information Systems
Computer Science, Theory & Methods
Engineering, Electrical & Electronic
Logo image