A reversible natural language watermarking for sensitive information protection,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A reversible natural language watermarking for sensitive information protection
Information Processing & Management ( IF 8.6 ) Pub Date : 2024-01-25 , DOI: 10.1016/j.ipm.2024.103661
Lingyun Xiang , Yangfan Liu , Zhongliang Yang

Existing methods have evolved from using synonym substitution to incorporating arbitrary word substitution to achieve reversible natural language watermarking. However, a notable limitation is that they are prone to overlook the sensitivity of information associated with the original words, with a tendency to prefer non-sensitive words for substitution. As a result, a potential risk of sensitive information leakage contained in the original text is posed. Furthermore, while aiming for reversibility, the overall performance of the watermarking method may be inadvertently compromised. In response to the above problems, this paper puts forward a novel reversible natural language watermarking method that combines a Keyword Substitution scheme and a Prediction Error Expansion algorithm (KSPEE) to protect sensitive information, verify content integrity, protect copyright, and so on. Specifically, KSPEE leverages a keyword extraction algorithm to identify important content containing sensitive information in the original text, thereby determining the potential positions for watermark information embedding. Subsequently, a masked language model is utilized to predict appropriate substitution words based on the surrounding semantic information of the embedding position. In addition, the prediction error expansion algorithm is employed to select appropriate words for substituting the original keywords, ensuring the successful embedding of watermark information while maintaining the recoverability of the original keywords. By identifying keywords and substituting them, a suitable method of protecting the original sensitive information is provided. Extensive experiments demonstrate that, under the promise of semantic distortion and lossless restoration of the original content, the proposed method KSPEE achieves outstanding watermarked text quality. A higher watermark embedding rate is achieved and strong security is shown by KSPEE. More importantly, KSPEE effectively prevents the leakage of sensitive information.

中文翻译：

用于敏感信息保护的可逆自然语言水印

现有的方法已经从使用同义词替换发展到结合任意单词替换来实现可逆的自然语言水印。然而，一个显着的局限性是，他们很容易忽视与原始单词相关的信息的敏感性，并且倾向于选择非敏感单词进行替换。从而存在原文中敏感信息泄露的潜在风险。此外，虽然以可逆性为目标，但水印方法的整体性能可能会无意中受到损害。针对上述问题，提出一种结合关键词替换方案和预测误差扩展算法（KSPEE）的新型可逆自然语言水印方法，以保护敏感信息、验证内容完整性、保护版权、等等。具体来说，KSPEE利用关键词提取算法来识别原文中包含敏感信息的重要内容，从而确定水印信息嵌入的潜在位置。随后，利用掩蔽语言模型根据嵌入位置的周围语义信息来预测适当的替换词。另外，采用预测误差扩展算法选择合适的词来替换原始关键词，保证水印信息的成功嵌入，同时保持原始关键词的可恢复性。通过识别关键字并替换它们，提供了保护原始敏感信息的合适方法。大量实验表明，在语义失真和无损恢复原始内容的承诺下，所提出的方法 KSPEE 实现了出色的水印文本质量。KSPEE实现了较高的水印嵌入率并显示出较强的安全性。更重要的是，KSPEE有效防止敏感信息泄露。

更新日期：2024-01-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>