当前位置: X-MOL 学术Inf. Retrieval J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using word semantic concepts for plagiarism detection in text documents
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2021-07-14 , DOI: 10.1007/s10791-021-09394-4
Chia-Yang Chang , Shie-Jue Lee , Chih-Hung Wu , Chih-Feng Liu , Ching-Kuan Liu

Plagiarism is a common problem in the modern age. With the advance of Internet, it is more and more convenient to access other people’s writings or publications. When someone uses the content of a text in an undesirable way, plagiarism may occur. Plagiarism infringes the intellectual property rights, so it is a serious problem nowadays. However, detecting plagiarism effectively is a challenging work. Traditional methods, like vector space model or bag-of-words, are short of providing a good solution due to the incapability of handling the semantics of words satisfactorily. In this paper, we propose a new method for plagiarism detection. We use Word2vec to transform the words into word vectors which are able to reveal the semantic relationship among different words. Through word vectors, words are clustered into concepts. Then documents and their paragraphs are represented in terms of concepts, and plagiarism detection can be done more effectively. A number of experiments are conducted to demonstrate the good performance of our proposed method.



中文翻译:

使用词语义概念进行文本文档抄袭检测

抄袭是现代社会的普遍问题。随着互联网的发展,访问他人的著作或出版物变得越来越方便。当有人以不受欢迎的方式使用文本内容时,可能会发生剽窃。抄袭侵犯了知识产权,是当今严重的问题。然而,有效地检测抄袭是一项具有挑战性的工作。传统的方法,如向量空间模型或词袋,由于无法令人满意地处理词的语义,无法提供很好的解决方案。在本文中,我们提出了一种新的剽窃检测方法。我们使用 Word2vec 将单词转换为能够揭示不同单词之间语义关系的词向量。通过词向量,词被聚类成概念。然后用概念来表示文档及其段落,可以更有效地进行抄袭检测。进行了许多实验以证明我们提出的方法的良好性能。

更新日期:2021-07-15
down
wechat
bug