Computer Science - Computer Vision and Pattern Recognition Computer Science - Human-Computer Interaction Computer Science - Neural and Evolutionary Computing
Intelligent assistance involves not only understanding but also action.
Existing ego-centric video datasets contain rich annotations of the videos, but
not of actions that an intelligent assistant could perform in the moment. To
address this gap, we release PARSE-Ego4D, a new set of personal action
recommendation annotations for the Ego4D dataset. We take a multi-stage
approach to generating and evaluating these annotations. First, we used a
prompt-engineered large language model (LLM) to generate context-aware action
suggestions and identified over 18,000 action suggestions. While these
synthetic action suggestions are valuable, the inherent limitations of LLMs
necessitate human evaluation. To ensure high-quality and user-centered
recommendations, we conducted a large-scale human annotation study that
provides grounding in human preferences for all of PARSE-Ego4D. We analyze the
inter-rater agreement and evaluate subjective preferences of participants.
Based on our synthetic dataset and complete human annotations, we propose
several new tasks for action suggestions based on ego-centric videos. We
encourage novel solutions that improve latency and energy requirements. The
annotations in PARSE-Ego4D will support researchers and developers who are
working on building action recommendation systems for augmented and virtual
reality systems.
Metrics
8 Record Views
Details
Title
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos