当前位置: X-MOL 学术ACM J. Comput. Cult. Herit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
What Is in a ? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe
ACM Journal on Computing and Cultural Heritage ( IF 2.4 ) Pub Date : 2024-03-26 , DOI: 10.1145/3494572
Marta Musso 1 , Kerstin Arnold 2 , Federico Nanni 3 , Beatrice Cannelli 4
Affiliation  

Archives Portal Europe (APE, www.archivesportaleurope.net) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users in a situation where national and linguistics barriers blur or are re-defined. This article presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities.



中文翻译:

a 中有什么?档案馆中的跨语言主题检测和信息检索 Portal Europe

欧洲档案门户网站(APE,www.archivesportaleurope.net)是欧洲档案门户网站,是一个在单个研究点上连接欧洲及其周边所有档案馆的目录和数字化档案材料的聚合器。目前,它拥有来自 30 多个国家和各种档案机构(例如国家档案馆、城市档案馆、大学和教区档案馆、私人机构等)的资料。它由欧洲档案门户基金会维护,该基金会是一个由国家档案馆和其他档案机构组成的国际联盟,旨在将单个机构的档案材料连接到一个数字存储库中,以允许普遍访问欧洲的档案遗产,促进新的档案形式超越国家或地方界限的研究。欧洲档案门户网站提供的研究工具之一是按主题分类;然而,这些目前由档案管理员手动维护,并且门户中吸收的大量档案材料使得不可能拥有描述整个 APE 存储库的全面主题。传统上,档案不是按主题内容组织的,而是围绕在其活动过程中创建和/或收集文档的实体(个人、组织、机构)组织的。虽然这是档案管理无可争议的支柱,但用于档案研究的在线数字存储库的可用性需要新的数字档案研究工具,特别是当来自不同国家和不同类型机构的不同档案传统被合并到一个独特的研究门户时。主题检测成为指导档案研究并允许全球潜在用户在国家和语言障碍模糊或重新定义的情况下访问档案的基本工具。本文介绍了用于在多语言环境中自动主题检测的人工智能工具的未来迭代的初步结果和计划,其中人类创建的分类法作为算法的基础,围绕特定主题聚合相关材料。该开发基于监督机器学习,结合不同语言的人工输入,并使用维基百科页面来建模相关词汇和实体。

更新日期:2024-03-26
down
wechat
bug