当前位置: X-MOL 学术Data Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recognition algorithm for cross-texting in text chat conversations
Data & Knowledge Engineering ( IF 2.5 ) Pub Date : 2023-12-10 , DOI: 10.1016/j.datak.2023.102261
Da-Young Lee , Hwan-Gue Cho

As the development of the Internet and IT technology, short-text based communication is so popular compared with voice based one. Chat-based communication enables rapid, short and massive exchange of message with many people, creates new social problems. ‘Cross-texting’ is one of them. It refers to accidentally sending a text to an unintended person during the concurrent conversations with separated multiple people. Cross-texting would be a serious problem in languages where respectful expressions are required. As text-based communication is getting popular, it is a crucial work to prevent cross-texting by detecting it in advance in languages with honorifics expression such as Korean. In this paper, we proposed two methods detecting a cross-text using a deep learning model. The first model is the formal feature vector, which models dialog by explicitly defining the politeness and completeness features. The second one is the grpah2vec based ChatGram-net model, which models the dialog based on the syllable occurrence relationship. To evaluate the detection performance, we suggest a generating method for cross-text datasets from a actual messenger corpus. In experiment we show that both proposed models detected cross-text effectively, and exceeded the performance of the baseline models.

中文翻译:

文本聊天对话中交叉文本的识别算法

随着互联网和IT技术的发展,与基于语音的通信相比,基于短文本的通信更加流行。基于聊天的通信可以与许多人进行快速、简短和大量的消息交换,从而产生新的社会问题。“交叉短信”就是其中之一。它是指在与分散的多人同时对话时,意外地将文本发送给非预期的人。在需要尊重表达的语言中,交叉短信将是一个严重的问题。随着基于文本的通信越来越流行,在韩语等带有敬语表达的语言中,通过提前检测来防止交叉文本是一项至关重要的工作。在本文中,我们提出了两种使用深度学习模型检测交叉文本的方法。第一个模型是形式特征向量,它通过显式定义礼貌性和完整性特征来对对话进行建模。第二个是基于 grpah2vec 的 ChatGram-net 模型,它根据音节出现关系对对话进行建模。为了评估检测性能,我们提出了一种从实际信使语料库生成跨文本数据集的方法。在实验中,我们表明两种提出的模型都能有效地检测交叉文本,并且超过了基线模型的性能。
更新日期:2023-12-10
down
wechat
bug