Recall, Robustness, and Lexicographic Evaluation

Fernando Diaz; Michael D. Ekstrand; Bhaskar Mitra

doi:10.1145/3728373

Back

Recall, Robustness, and Lexicographic Evaluation

Journal article

Open access

Recall, Robustness, and Lexicographic Evaluation

Fernando Diaz, Michael D. Ekstrand and Bhaskar Mitra

ACM transactions on recommender systems, v 4(1), pp 1-50

29 Jul 2025

DOI: https://doi.org/10.1145/3728373

Featured in Collection : Research Supported by Drexel Libraries' OA Programs

Files and links (1)

url

https://doi.org/10.1145/3728373View

Published, Version of Record (VoR)Open Access via Drexel Libraries Read and Publish Program 2025CC BY V4.0, Open

Abstract

Information systems

Recommender systems

Retrieval effectiveness

Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in rankings from a formal perspective. Our analysis is composed of three tenets: recall, robustness, and lexicographic evaluation. First, we formally define ‘recall-orientation’ as the sensitivity of a metric to a user interested in finding every relevant item. Second, we analyze recall-orientation from the perspective of robustness with respect to possible content consumers and providers, connecting recall to recent conversations about fair ranking. Finally, we extend this conceptual and theoretical treatment of recall by developing a practical preference-based evaluation method based on lexicographic comparison. Through extensive empirical analysis across multiple recommendation and retrieval tasks, we establish that our new evaluation method, lexirecall, has convergent validity (i.e., it is correlated with existing recall metrics) and exhibits substantially higher sensitivity in terms of discriminative power and stability in the presence of missing labels. Our conceptual, theoretical, and empirical analysis substantially deepens our understanding of recall and motivates its adoption through connections to robustness and fairness.

Metrics

18 Record Views

Details

Title: Recall, Robustness, and Lexicographic Evaluation
Creators: Fernando Diaz - Carnegie Mellon University
Michael D. Ekstrand (Corresponding Author) - Drexel University
Bhaskar Mitra - Microsoft (Canada)
Publication Details: ACM transactions on recommender systems, v 4(1), pp 1-50
Publisher: Association for Computing Machinery
Resource Type: Journal article
Language: English
Academic Unit: Information Science
Other Identifier: 991022047147104721

Recall, Robustness, and Lexicographic Evaluation

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media