Book chapter
Stylometric Authorship Attribution of Collaborative Documents
Cyber Security Cryptography and Machine Learning, pp 115-135
02 Jun 2017
Abstract
Stylometry is the study of writing style based on linguistic features and is typically applied to authorship attribution problems. In this work, we apply stylometry to a novel dataset of multi-authored documents collected from Wikia using both relaxed classification with a support vector machine (SVM) and multi-label classification techniques. We define five possible scenarios and show that one, the case where labeled and unlabeled collaborative documents by the same authors are available, yields high accuracy on our dataset while the other, more restrictive cases yield lower accuracies. Based on the results of these experiments and knowledge of the multi-label classifiers used, we propose a hypothesis to explain this overall poor performance. Additionally, we perform authorship attribution of pre-segmented text from the Wikia dataset, and show that while this performs better than multi-label learning it requires large amounts of data to be successful.
Metrics
Details
- Title
- Stylometric Authorship Attribution of Collaborative Documents
- Creators
- Edwin Dauber - Drexel UniversityRebekah Overdorf - Drexel UniversityRachel Greenstadt - Drexel University
- Publication Details
- Cyber Security Cryptography and Machine Learning, pp 115-135
- Series
- Lecture Notes in Computer Science
- Publisher
- Springer International Publishing; Cham
- Resource Type
- Book chapter
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:000432576900009
- Scopus ID
- 2-s2.0-85021710228
- Other Identifier
- 991019169578104721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Theory & Methods