当前位置: X-MOL 学术Dokl. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Artificially Generated Text Fragments Search in Academic Documents
Doklady Mathematics ( IF 0.6 ) Pub Date : 2024-03-11 , DOI: 10.1134/s1064562423701211
G. M. Gritsay , A. V. Grabovoy , A. S. Kildyakov , Yu. V. Chekhovich

Abstract

Recent advances in text generative models make it possible to create artificial texts that look like human-written texts. A large number of methods for detecting texts obtained using large language models have already been developed. However, improvement of detection methods occurs simultaneously with the improvement of generation methods. Therefore, it is necessary to explore new generative models and modernize existing approaches to their detection. In this paper, we present a large analysis of existing detection methods, as well as a study of lexical, syntactic, and stylistic features of the generated fragments. Taking into account the developments, we have tested the most qualitative, in our opinion, methods of detecting machine-generated documents for their further application in the scientific domain. Experiments were conducted for Russian and English languages on the collected datasets. The developed methods improved the detection quality to a value of 0.968 on the F1-score metric for Russian and 0.825 for English, respectively. The described techniques can be applied to detect generated fragments in scientific, research, and graduate papers.



中文翻译:

学术文献中人工生成的文本片段搜索

摘要

文本生成模型的最新进展使得创建看起来像人类编写的文本的人工文本成为可能。已经开发出大量用于检测使用大型语言模型获得的文本的方法。然而,检测方法的改进与生成方法的改进同时发生。因此,有必要探索新的生成模型并对现有的检测方法进行现代化改造。在本文中,我们对现有的检测方法进行了大量分析,并对生成的片段的词汇、句法和风格特征进行了研究。考虑到这些发展,我们测试了我们认为最定性的检测机器生成文档的方法,以使其在科学领域进一步应用。在收集的数据集上对俄语和英语进行了实验。所开发的方法将俄语和英语的 F1 分数指标的检测质量分别提高到 0.968 和 0.825。所描述的技术可用于检测科学、研究和研究生论文中生成的片段。

更新日期:2024-03-11
down
wechat
bug