Deep author name disambiguation using DBLP data,International Journal on Digital Libraries

当前位置： X-MOL 学术 › International Journal on Digital Libraries › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep author name disambiguation using DBLP data
International Journal on Digital Libraries Pub Date : 2023-05-04 , DOI: 10.1007/s00799-023-00361-6
Zeyd Boukhers , Nagaraj Bahubali Asundi

In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it is challenging to assign newly published papers to their respective authors. Therefore, author name ambiguity is considered a critical open problem in digital libraries. This paper proposes an author name disambiguation approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.

中文翻译：

使用 DBLP 数据深度作者姓名消歧

在学术界，科学家的数量每年都在增加，同名作者的数量也在增加。因此，将新发表的论文分配给各自的作者是一项挑战。因此，作者姓名歧义被认为是数字图书馆中一个关键的开放性问题。本文提出了一种作者姓名消歧方法，该方法通过利用作者的合著者和研究领域将作者姓名与其现实世界的实体联系起来。为此，我们使用从 DBLP 存储库收集的数据，该存储库包含由约 260 万合著者创作的超过 500 万条书目记录。我们的方法首先将具有相同姓氏和相同名字首字母的作者分组。通过捕获与他/她的合著者和研究领域的关系来识别每个组中的作者，由相应作者的已验证出版物的标题表示。为此，我们训练了一个神经网络模型，该模型从共同作者和标题的表示中学习。我们通过对大型数据集进行大量实验来验证我们方法的有效性。

更新日期：2023-05-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>