GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality
arXiv - CS - Human-Computer Interaction Pub Date : 2024-04-12 , DOI: arxiv-2404.08213
Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, Jon E. Froehlich

Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users' spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask "what's over there?" or "how do I solve this math problem?" simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems; (2) examining GazePointAR's pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.

中文翻译：

GazePointAR：可穿戴增强现实中用于代词消歧的上下文感知多模态语音助手

Siri 和 Alexa 等语音助手 (VA) 正在改变人机交互方式；然而，他们缺乏对用户时空背景的认识，导致性能有限和对话不自然。我们推出了 GazePointAR，这是一种用于可穿戴增强现实的全功能上下文感知 VA，它利用眼睛注视、指向手势和对话历史记录来消除语音查询的歧义。借助 GazePointAR，用户可以询问“那边有什么？”或“我该如何解决这道数学问题？”只需通过查看和/或指向即可。我们在一项由三部分组成的实验室研究中评估了 GazePointAR（N=12）：（1）将 GazePointAR 与两个商业系统进行比较； (2) 检查 GazePointAR 在三个任务中的代词消歧； (3) 以及一个开放式阶段，参与者可以建议并尝试自己的上下文相关查询。参与者赞赏代词驱动的查询的自然性和类人性，尽管有时代词的使用是违反直觉的。然后，我们迭代 GazePointAR 并进行了第一人称日记研究，检查 GazePointAR 在野外的表现。最后，我们列举了未来情境感知虚拟设备的局限性和设计注意事项。

更新日期：2024-04-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>