Seq2Set2Seq: A Two-stage Disentangled Method for Reply Keyword Generation in Social Media,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Seq2Set2Seq: A Two-stage Disentangled Method for Reply Keyword Generation in Social Media
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2024-03-09 , DOI: 10.1145/3644074
Jie Liu ₁ , Yaguang Li ₂ , Shizhu He ₃ , Shun Wu ₃ , Kang Liu ₃ , Shenping Liu ₄ , Jiong Wang ₂ , Qing Zhang ₅

Affiliation

Social media produces large amounts of content every day. How to predict the potential influences of the contents from a social reply feedback perspective is a key issue that has not been explored. Thus, we propose a novel task named reply keyword prediction in social media, which aims to predict the keywords in the potential replies in as many aspects as possible. One prerequisite challenge is that the accessible social media datasets labeling such keywords remain absent. To solve this issue, we propose a new dataset,¹ to study the reply keyword prediction in social media. This task could be seen as a single-turn dialogue keyword prediction for open-domain dialogue system. However, existing methods for dialogue keyword prediction cannot be adopted directly, which has two main drawbacks. First, they do not provide an explicit mechanism to model topic complementarity between keywords which is crucial in social media to controllably model all aspects of replies. Second, the collocations of keywords are not explicitly modeled, which also makes it less controllable to optimize for fine-grained prediction since the context information is much less than that in dialogue. To address these issues, we propose a two-stage disentangled framework, which can optimize the complementarity and collocation explicitly in a disentangled fashion. In the first stage, we use a sequence-to-set paradigm via multi-label prediction and determinantal point processes, to generate a set of keyword seeds satisfying the complementarity. In the second stage, we adopt a set-to-sequence paradigm via seq2seq model with the keyword seeds guidance from the set, to generate the more-fine-grained keywords with collocation. Experiments show that this method can generate not only a more diverse set of keywords but also more relevant and consistent keywords. Furthermore, the keywords obtained based on this method can achieve better reply generation results in the retrieval-based system than others.

中文翻译：

Seq2Set2Seq：社交媒体中回复关键字生成的两阶段解缠结方法

社交媒体每天都会产生大量内容。如何从社交回复反馈的角度预测内容的潜在影响是一个尚未探讨的关键问题。因此，我们提出了一项名为社交媒体回复关键词预测的新任务，旨在尽可能多地预测潜在回复中的关键词。一项先决挑战是，标记此类关键词的可访问社交媒体数据集仍然不存在。为了解决这个问题，我们提出了一个新的数据集¹来研究社交媒体中的回复关键词预测。该任务可以看作是开放域对话系统的单轮对话关键词预测。然而，现有的对话关键词预测方法不能直接采用，这有两个主要缺点。首先，它们没有提供明确的机制来对关键字之间的主题互补性进行建模，这在社交媒体中对于可控地对回复的各个方面进行建模至关重要。其次，关键词的搭配没有明确建模，这也使得细粒度预测的优化不太可控，因为上下文信息比对话中的信息少得多。为了解决这些问题，我们提出了一个两阶段解纠缠框架，它可以以解纠缠的方式明确优化互补性和搭配。在第一阶段，我们通过多标签预测和行列式点过程使用序列到集合范式，生成一组满足互补性的关键字种子。在第二阶段，我们采用通过 seq2seq 模型的集合到序列范例，并使用集合中的关键字种子引导，以生成更细粒度的关键字搭配。实验表明，该方法不仅可以生成更加多样化的关键词集，而且可以生成更加相关和一致的关键词。此外，基于该方法获得的关键词在基于检索的系统中可以比其他系统获得更好的回复生成结果。

更新日期：2024-03-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>