当前位置: X-MOL 学术Journal of Documentation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated Dewey Decimal Classification of Swedish library metadata using Annif software
Journal of Documentation ( IF 2.034 ) Pub Date : 2024-04-02 , DOI: 10.1108/jd-01-2022-0026
Koraljka Golub , Osma Suominen , Ahmed Taiye Mohammed , Harriet Aagaard , Olof Osterman

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.



中文翻译:

使用 Annif 软件对瑞典图书馆元数据进行自动杜威十进制分类

目的

为了估计半自动主题索引在有效图书馆目录中的价值,该研究旨在调查一个开源软件包在大量瑞典联合目录元数据记录上的五种不同的自动化实现,并使用杜威十进制分类 (DDC)作为目标分类系统。它还旨在为自动化主题索引和评估中的相关性和相关挑战的研究做出贡献。

设计/方法论/途径

在超过 230,000 条记录和近 12,000 个不同 DDC 类的样本中,芬兰国家图书馆开发的开源工具 Annif 被应用于以下实现:词法算法、支持向量分类器、fastText、Omikuji Bonsai 和集成方法结合前四种。还进行了一项涉及两名高级目录馆员和三名图书馆与信息研究学生的定性研究,以 60 条记录为样本,调查自动分配类别的价值和评估者之间的一致性。

发现

使用集成方法取得了最好的结果,在三位数 DDC 分类任务上达到了 66.82% 的准确率。定性研究证实了早期研究报告的评估者间一致性较低,但也指出了自动分配的类别作为信息检索中额外访问点的潜在价值。

原创性/价值

本文对有效图书馆目录中的自动分类进行了广泛的研究,并对自动化类别进行了定性研究。它展示了在可操作的信息检索系统中应用半自动索引的价值。

更新日期:2024-04-02
down
wechat
bug