[Display omitted]
•We collected the 1000 most frequent twitter accounts from a 4 month span and hand-coded each account as human or automated/promotional.•Our classifier focuses on organic (human) linguistic attributes of individual's text to identify automatons by exclusion.•We performed a 10-fold cross validation of the algorithm and benchmark the performance of the classifier with receiver operator characteristic curves which yield high levels of accuracy (97% at best).
Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage metadata (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twittersphere.