The need for fair AI is increasingly clear in the era of general-purpose
systems such as ChatGPT, Gemini, and other large language models (LLMs).
However, the increasing complexity of human-AI interaction and its social
impacts have raised questions of how fairness standards could be applied. Here,
we review the technical frameworks that machine learning researchers have used
to evaluate fairness, such as group fairness and fair representations, and find
that their application to LLMs faces inherent limitations. We show that each
framework either does not logically extend to LLMs or presents a notion of
fairness that is intractable for LLMs, primarily due to the multitudes of
populations affected, sensitive attributes, and use cases. To address these
challenges, we develop guidelines for the more realistic goal of achieving
fairness in particular use cases: the criticality of context, the
responsibility of LLM developers, and the need for stakeholder participation in
an iterative process of design and evaluation. Moreover, it may eventually be
possible and even necessary to use the general-purpose capabilities of AI
systems to address fairness challenges as a form of scalable AI-assisted
alignment.