当前位置: X-MOL 学术Semant. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bilingual dictionary generation and enrichment via graph exploration
Semantic Web ( IF 3 ) Pub Date : 2022-09-07 , DOI: 10.3233/sw-222899
Shashwat Goel 1 , Jorge Gracia 2 , Mikel L. Forcada 3
Affiliation  

Abstract

In recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data have stimulated the development and use of openly available linguistic knowledge graphs, as is the case with the Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work, we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speed-up, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. Over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for dictionary enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.



中文翻译:

通过图探索生成和丰富双语词典

摘要

近年来,我们目睹了以链接数据形式在 Web 上表示和公开的语言信息的稳步增长。此类语言关联数据促进了公开可用的语言知识图谱的开发和使用,Apertium RDF 就是一个例子,Apertium RDF 是一组通过语义 Web 标准表示和访问的互连双语词典。在这项工作中,我们探索了利用双语词典的图形特性自动推断新链接(翻译)的技术。我们建立在基于循环密度的方法上:将图划分为双连接组件以加快速度,并通过仔细的结构分析来简化管道,从而减少超参数调整要求。我们还分析了用于翻译推理的传统评估指标的缺点,并建议用双词精度 (BWP) 和双词召回 (BWR) 来补充它们,旨在为算法改进提供更多信息。超过 27 个语言对,我们的算法在一分钟内以 85% 的高 BWP 生成字典,其大小约为现有 Apertium RDF 字典的 70%。人工评估表明,为字典生成的额外翻译中有 78% 我们的算法在一分钟内以 85% 的高 BWP 生成字典,其大小约为现有 Apertium RDF 字典的 70%。人工评估表明,为字典生成的额外翻译中有 78% 我们的算法在一分钟内以 85% 的高 BWP 生成字典,其大小约为现有 Apertium RDF 字典的 70%。人工评估表明,为字典生成的额外翻译中有 78%浓缩也是正确的。我们进一步描述了一个有趣的用例:在单一语言中推断同义词,我们最初的基于人类的评估显示平均准确率为 84%。我们将我们的工具作为免费/开源软件发布,它不仅可以应用于 RDF 数据和 Apertium 字典,而且还可以轻松用于其他格式和社区。

更新日期:2022-09-07
down
wechat
bug