Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Eric M. Clark; Jake Ryland Williams; Chris A. Jones; Richard A. Galbraith; Christopher M. Danforth; Peter Sheridan Dodds

doi:10.1016/j.jocs.2015.11.002

Back

Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Journal article

Open access

Peer reviewed

Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Eric M. Clark, Jake Ryland Williams, Chris A. Jones, Richard A. Galbraith, Christopher M. Danforth and Peter Sheridan Dodds

Journal of computational science, v 16, p7

Sep 2016

DOI: https://doi.org/10.1016/j.jocs.2015.11.002

Files and links (1)

url

https://arxiv.org/pdf/1505.04342View

SubmittedarXiv.org - Non-exclusive license to distribute, Open

Abstract

[Display omitted] •We collected the 1000 most frequent twitter accounts from a 4 month span and hand-coded each account as human or automated/promotional.•Our classifier focuses on organic (human) linguistic attributes of individual's text to identify automatons by exclusion.•We performed a 10-fold cross validation of the algorithm and benchmark the performance of the classifier with receiver operator characteristic curves which yield high levels of accuracy (97% at best). Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage metadata (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twittersphere.

Metrics

9 Record Views

64 citations in Web of Science

77 citations in Scopus

Details

Title: Sifting robotic from organic text: A natural language approach for detecting automation on Twitter
Creators: Eric M. Clark - University of Vermont
Jake Ryland Williams - University of Vermont
Chris A. Jones - University of Vermont
Richard A. Galbraith - University of Vermont
Christopher M. Danforth - University of Vermont
Peter Sheridan Dodds - University of Vermont
Publication Details: Journal of computational science, v 16, p7
Publisher: Elsevier
Resource Type: Journal article
Language: English
Academic Unit: Information Science
Web of Science ID: WOS:000382795200001
Scopus ID: 2-s2.0-84961177021
Other Identifier: 991021806814204721

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas: Computer Science, Interdisciplinary Applications; Computer Science, Theory & Methods

Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Files and links (1)

Abstract

Metrics

Details

InCites Highlights

Drexel University Social media