Text-based paper-level classification procedure for non-traditional sciences using a machine learning approach,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Text-based paper-level classification procedure for non-traditional sciences using a machine learning approach
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2023-12-13 , DOI: 10.1007/s10115-023-02023-0
Daniela Moctezuma , Carlos López-Vázquez , Lucas Lopes , Norton Trevisan , José Pérez

Science as a whole is organized into broad fields, and as a consequence, research, resources, students, etc., are also classified, assigned, or invited following a similar structure. Some fields have been established for centuries, and some others are just flourishing. Funding, staff, etc., to support fields are offered if there is some activity on it, commonly measured in terms of the number of published scientific papers. How to find them? There exist well-respected listings where scientific journals are ascribed to one or more knowledge fields. Such lists are human-made, but the complexity begins when a field covers more than one area of knowledge. How to discern if a particular paper is devoted to a field not considered in such lists? In this work, we propose a methodology able to classify the universe of papers into two classes; those belonging to the field of interest, and those that do not. This proposed procedure learns from the title and abstract of papers published in monothematic or “pure” journals. Provided that such journals exist, the procedure could be applied to any field of knowledge. We tested the process with Geographic Information Science. The field has contacts with Computer Science, Mathematics, Cartography, and others, a fact which makes the task very difficult. We also tested our procedure and analyzed its results with three different criteria, illustrating its power and capabilities. Interesting findings were found, where our proposed solution reached similar results as human taggers also similar results compared with state-of-the-art related work.

中文翻译：

使用机器学习方法的非传统科学基于文本的论文级分类程序

科学作为一个整体被组织成广泛的领域，因此，研究、资源、学生等也按照类似的结构进行分类、分配或邀请。有些领域已经建立了几个世纪，而另一些领域才刚刚蓬勃发展。如果某个领域有活动，就会提供资金、人员等支持，通常以发表的科学论文的数量来衡量。如何找到他们？存在一些备受推崇的列表，其中科学期刊被归于一个或多个知识领域。此类列表是人为制定的，但当一个领域涵盖多个知识领域时，复杂性就开始了。如何辨别某篇特定论文是否致力于此类列表中未考虑的领域？在这项工作中，我们提出了一种能够将论文范围分为两类的方法；那些属于感兴趣的领域，那些不属于感兴趣的领域。该程序从单一主题或“纯”期刊上发表的论文的标题和摘要中学习。如果存在这样的期刊，该程序就可以应用于任何知识领域。我们用地理信息科学测试了该过程。该领域与计算机科学、数学、制图学等领域有联系，这一事实使得这项任务非常困难。我们还测试了我们的程序，并使用三种不同的标准分析了其结果，说明了其威力和功能。我们发现了有趣的发现，我们提出的解决方案达到了与人类标记者相似的结果，与最先进的相关工作相比也达到了相似的结果。

更新日期：2023-12-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>