Crowds Replicate Performance of Scientific Experts Scoring Phylogenetic Matrices of Phenotypes

Maureen A O'Leary; Kenzley Alphonse; Arce H Mariangeles; Dario Cavaliere; Andrea Cirranello; Thomas G Dietterich; Matthew Julius; Seth Kaufman; Edith Law; Maria Passarotti; Abigail Reft; Javier Robalino; Nancy B Simmons; Selena Y Smith; Dennis W Stevenson; Ed Theriot; Paúl M Velazco; Ramona L Walls; Mengjie Yu; Marymegan Daly

doi:10.1093/sysbio/syx052

Back

Crowds Replicate Performance of Scientific Experts Scoring Phylogenetic Matrices of Phenotypes

Journal article

Open access

Peer reviewed

Crowds Replicate Performance of Scientific Experts Scoring Phylogenetic Matrices of Phenotypes

Maureen A O'Leary, Kenzley Alphonse, Arce H Mariangeles, Dario Cavaliere, Andrea Cirranello, Thomas G Dietterich, Matthew Julius, Seth Kaufman, Edith Law, Maria Passarotti, …

Systematic biology, v 67(1), pp 49-60

01 Jan 2018

DOI: https://doi.org/10.1093/sysbio/syx052

PMID: 29253296

Featured in Collection : UN Sustainable Development Goals @ Drexel

Files and links (1)

url

https://doi.org/10.1093/sysbio/syx052View

Published, Version of Record (VoR)Maybe Open Access (Publisher Bronze), Open

Abstract

Animals

Classification - methods

Crowdsourcing - standards

Phenotype

Phylogeny

Professional Competence

Reproducibility of Results

Scientists building the Tree of Life face an overwhelming challenge to categorize phenotypes (e.g., anatomy, physiology) from millions of living and fossil species. This biodiversity challenge far outstrips the capacities of trained scientific experts. Here we explore whether crowdsourcing can be used to collect matrix data on a large scale with the participation of nonexpert students, or "citizen scientists." Crowdsourcing, or data collection by nonexperts, frequently via the internet, has enabled scientists to tackle some large-scale data collection challenges too massive for individuals or scientific teams alone. The quality of work by nonexpert crowds is, however, often questioned and little data have been collected on how such crowds perform on complex tasks such as phylogenetic character coding. We studied a crowd of over 600 nonexperts and found that they could use images to identify anatomical similarity (hypotheses of homology) with an average accuracy of 82% compared with scores provided by experts in the field. This performance pattern held across the Tree of Life, from protists to vertebrates. We introduce a procedure that predicts the difficulty of each character and that can be used to assign harder characters to experts and easier characters to a nonexpert crowd for scoring. We test this procedure in a controlled experiment comparing crowd scores to those of experts and show that crowds can produce matrices with over 90% of cells scored correctly while reducing the number of cells to be scored by experts by 50%. Preparation time, including image collection and processing, for a crowdsourcing experiment is significant, and does not currently save time of scientific experts overall. However, if innovations in automation or robotics can reduce such effort, then large-scale implementation of our method could greatly increase the collective scientific knowledge of species phenotypes for phylogenetic tree building. For the field of crowdsourcing, we provide a rare study with ground truth, or an experimental control that many studies lack, and contribute new methods on how to coordinate the work of experts and nonexperts. We show that there are important instances in which crowd consensus is not a good proxy for correctness.

Metrics

11 Record Views

8 citations in Web of Science

8 citations in Scopus

Details

Title: Crowds Replicate Performance of Scientific Experts Scoring Phylogenetic Matrices of Phenotypes
Creators: Maureen A O'Leary - Stony Brook University
Kenzley Alphonse - Kenx Technology, Inc., 1170 N. Milwaukee Ave. Chicago, IL 60642, USA.
Arce H Mariangeles - Stony Brook University
Dario Cavaliere - New York Botanical Garden
Andrea Cirranello - American Museum of Natural History
Thomas G Dietterich - Oregon State University
Matthew Julius - St. Cloud State University
Seth Kaufman - Whirl-i-gig, 109 South 5th Street, Suite 608, Brooklyn, NY 10012, USA.
Edith Law - University of Waterloo
Maria Passarotti - Whirl-i-gig, 109 South 5th Street, Suite 608, Brooklyn, NY 10012, USA.
Abigail Reft - The Ohio State University
Javier Robalino - Stony Brook University
Nancy B Simmons - American Museum of Natural History
Selena Y Smith - University of Michigan
Dennis W Stevenson - New York Botanical Garden
Ed Theriot - The University of Texas at Austin
Paúl M Velazco - American Museum of Natural History
Ramona L Walls - University of Arizona
Mengjie Yu - The University of Texas at Austin
Marymegan Daly - The Ohio State University
Publication Details: Systematic biology, v 67(1), pp 49-60
Publisher: Oxford University Press
Resource Type: Journal article
Language: English
Academic Unit: Ichthyology; Academy of Natural Sciences of Drexel University
Web of Science ID: WOS:000419588000004
Scopus ID: 2-s2.0-85040133353
Other Identifier: 991019330808604721

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types: Domestic collaboration; International collaboration
Web of Science research areas: Evolutionary Biology

Crowds Replicate Performance of Scientific Experts Scoring Phylogenetic Matrices of Phenotypes

Files and links (1)

Abstract

Metrics

Details

UN Sustainable Development Goals (SDGs)

InCites Highlights

Drexel University Social media