当前位置: X-MOL 学术International Journal on Digital Libraries › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CSO Classifier 3.0: a scalable unsupervised method for classifying documents in terms of research topics
International Journal on Digital Libraries Pub Date : 2021-07-22 , DOI: 10.1007/s00799-021-00305-y
Angelo Salatino 1 , Francesco Osborne 1 , Enrico Motta 1
Affiliation  

Classifying scientific articles, patents, and other documents according to the relevant research topics is an important task, which enables a variety of functionalities, such as categorising documents in digital libraries, monitoring and predicting research trends, and recommending papers relevant to one or more topics. In this paper, we present the latest version of the CSO Classifier (v3.0), an unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive taxonomy of research areas in the field of Computer Science. The CSO Classifier takes as input the textual components of a research paper (usually title, abstract, and keywords) and returns a set of research topics drawn from the ontology. This new version includes a new component for discarding outlier topics and offers improved scalability. We evaluated the CSO Classifier on a gold standard of manually annotated articles, demonstrating a significant improvement over alternative methods. We also present an overview of applications adopting the CSO Classifier and describe how it can be adapted to other fields.



中文翻译:

CSO Classifier 3.0:一种可扩展的无监督方法,用于根据研究主题对文档进行分类

根据相关研究主题对科学文章、专利和其他文档进行分类是一项重要任务,它支持多种功能,例如对数字图书馆中的文档进行分类、监控和预测研究趋势以及推荐与一个或多个主题相关的论文. 在本文中,我们介绍了最新版本的 CSO 分类器 (v3.0),这是一种根据计算机科学本体 (CSO) 自动对研究论文进行分类的无监督方法,CSO 是计算机科学领域研究领域的综合分类法。CSO 分类器将研究论文的文本部分(通常是标题、摘要和关键字)作为输入,并返回一组从本体中提取的研究主题。这个新版本包括一个用于丢弃异常主题的新组件,并提供了改进的可扩展性。我们在手动注释文章的黄金标准上评估了 CSO 分类器,证明了对替代方法的显着改进。我们还概述了采用 CSO 分类器的应用程序,并描述了它如何适用于其他领域。

更新日期:2021-07-22
down
wechat
bug