Information systems -- Information retrieval -- Evaluation of retrieval results -- Relevance assessment Information systems -- Information retrieval -- Evaluation of retrieval results -- Retrieval effectiveness Information systems -- Information retrieval -- Evaluation of retrieval results -- Retrieval efficiency
A number of information retrieval studies have been done to assess which statistical techniques are appropriate for comparing systems. However, these studies are focused on TREC-style experiments, which typically have fewer than 100 topics. There is no similar line of work for large search and recommendation experiments; such studies typically have thousands of topics or users and much sparser relevance judgements, so it is not clear if recommendations for analyzing traditional TREC experiments apply to these settings. In this paper, we empirically study the behavior of significance tests with large search and recommendation evaluation data. Our results show that the Wilcoxon and Sign tests show significantly higher Type-1 error rates for large sample sizes than the bootstrap, randomization and t-tests, which were more consistent with the expected error rate. While the statistical tests displayed differences in their power for smaller sample sizes, they showed no difference in their power for large sample sizes. We recommend the sign and Wilcoxon tests should not be used to analyze large scale evaluation results. Our result demonstrate that with Top-N recommendation and large search evaluation data, most tests would have a 100% chance of finding statistically significant results. Therefore, the effect size should be used to determine practical or scientific significance.
Metrics
6 Record Views
2 citations in Scopus
Details
Title
Inference at Scale: Significance Testing for Large Search and Recommendation Experiments
Creators
Ngozi Ihemelandu - Boise State University
Michael D. Ekstrand - Boise State University
ACM
Publication Details
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2087-2091
Conference
SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 46th (2023)
Series
ACM Conferences
Publisher
ACM
Resource Type
Conference proceeding
Language
English
Academic Unit
Information Science (Informatics)
Web of Science ID
WOS:001118084002027
Scopus ID
2-s2.0-85168660208
Other Identifier
991021868092004721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
Web of Science research areas
Computer Science, Information Systems
Computer Science, Theory & Methods
Research Home Page
Browse by research and academic units
Learn about the ETD submission process at Drexel
Learn about the Libraries’ research data management services