Logo image
Sifting robotic from organic text: A natural language approach for detecting automation on Twitter
Journal article   Open access   Peer reviewed

Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Eric M. Clark, Jake Ryland Williams, Chris A. Jones, Richard A. Galbraith, Christopher M. Danforth and Peter Sheridan Dodds
Journal of computational science, v 16, p7
Sep 2016
url
https://arxiv.org/pdf/1505.04342View

Abstract

[Display omitted] •We collected the 1000 most frequent twitter accounts from a 4 month span and hand-coded each account as human or automated/promotional.•Our classifier focuses on organic (human) linguistic attributes of individual's text to identify automatons by exclusion.•We performed a 10-fold cross validation of the algorithm and benchmark the performance of the classifier with receiver operator characteristic curves which yield high levels of accuracy (97% at best). Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage metadata (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twittersphere.

Metrics

9 Record Views
77 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Interdisciplinary Applications
Computer Science, Theory & Methods
Logo image