Modeling social interaction dynamics using temporal graph networks,arXiv - CS - Social and Information Networks

当前位置： X-MOL 学术 › arXiv.cs.SI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modeling social interaction dynamics using temporal graph networks
arXiv - CS - Social and Information Networks Pub Date : 2024-04-05 , DOI: arxiv-2404.06611
J. Taery Kim, Archit Naik, Isuru Jayarathne, Sehoon Ha, Jouh Yeong Chew

Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employing an adapted Temporal Graph Networks to comprehensively represent social interaction dynamics while enabling its practical implementation. Our method incorporates temporal multi-modal behavioral data including gaze interaction, voice activity and environmental context. This representation of social interaction dynamics is trained as a link prediction problem using annotated gaze interaction data. The F1-score outperformed the baseline model by 37.0%. This improvement is consistent for a secondary task of next speaker prediction which achieves an improvement of 29.0%. Our contributions are two-fold, including a model to representing social interaction dynamics which can be used for many downstream human-robot interaction tasks like human state inference and next speaker prediction. More importantly, this is achieved using a more concise yet efficient message passing method, significantly reducing it from 768 to 14 elements, while outperforming the baseline model.

中文翻译：

使用时态图网络对社交互动动态进行建模

由于人类行为和内部状态的相互影响，将机器人等智能系统集成到动态群体环境中会带来挑战。社会互动动态的稳健表示对于有效的人机协作至关重要。现有的方法通常将注意力集中在面部表情或言语上，而忽视了更广泛的背景。我们建议采用适应性强的时态图网络来全面表示社交互动动态，同时使其能够实际实施。我们的方法结合了时间多模式行为数据，包括凝视交互、语音活动和环境背景。使用带注释的注视交互数据将社交交互动态的这种表示训练为链接预测问题。 F1 分数比基线模型高出 37.0%。这一改进与下一个说话者预测的次要任务一致，实现了 29.0% 的改进。我们的贡献有两个方面，包括一个表示社交互动动态的模型，该模型可用于许多下游人机交互任务，例如人类状态推断和下一个说话者预测。更重要的是，这是使用更简洁而高效的消息传递方法来实现的，将其从 768 个元素显着减少到 14 个，同时优于基线模型。

更新日期：2024-04-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>