arXiv - CS - Digital Libraries期刊最新论文, 计算机, 软件工程类期刊,

Linked open data per la valorizzazione di collezioni culturali: il dataset mythLOD

arXiv.cs.DL Pub Date : 2024-04-10
Valentina Pasqual, Francesca Tomasi

The formal representation of cultural metadata has always been a challenge, considering both the heterogeneity of cultural objects and the need to document the interpretive act exercised by experts. This article provides an overview of the revalorization of the digital collection Mythologiae in Linked Open Data format. The research aims to explore the data of a collection of artworks (Mythologiae)

更新日期：2024-04-11

详情收藏

The Rise and Fall of the Initial Era

arXiv.cs.DL Pub Date : 2024-04-09
Simon J Porter, Daniel W Hook

Bibliographic data is a rich source of information that goes beyond the use cases of location and citation -- it also encodes both cultural and technological context. For most of its existence, the scholarly record has changed slowly and hence provides an opportunity to gain insight through its reflection of the cultural norms of the research community over the last four centuries. While it is often

更新日期：2024-04-10

详情收藏

A Knowledge Producer's View on the Knowledge Commons

arXiv.cs.DL Pub Date : 2024-04-09
Mathilde Noual

Hardin introduced the notorious concept of "tragedy of the commons". Worrying about the consequences of human overpopulation on the planet, he discussed "hard problems": problems with no technical solutions, that can only be addressed by way of an evolving morality. Hardin's tragedy of the commons predicts that the hard problem of human population growth directly implies a hard problem of overuse or

更新日期：2024-04-10

详情收藏

The role of non-scientific factors vis-a-vis the quality of publications in determining their scholarly impact

arXiv.cs.DL Pub Date : 2024-04-08
Giovanni Abramo, Ciriaco Andrea D'Angelo, Leonardo Grilli

In the evaluation of scientific publications' impact, the interplay between intrinsic quality and non-scientific factors remains a subject of debate. While peer review traditionally assesses quality, bibliometric techniques gauge scholarly impact. This study investigates the role of non-scientific attributes alongside quality scores from peer review in determining scholarly impact. Leveraging data

更新日期：2024-04-09

详情收藏

A Repository for Formal Contexts

arXiv.cs.DL Pub Date : 2024-04-05
Tom Hanika, Robert Jäschke

Data is always at the center of the theoretical development and investigation of the applicability of formal concept analysis. It is therefore not surprising that a large number of data sets are repeatedly used in scholarly articles and software tools, acting as de facto standard data sets. However, the distribution of the data sets poses a problem for the sustainable development of the research field

更新日期：2024-04-09

详情收藏

New fractional classifications of papers based on two generations of references and on the ASJC Scopus scheme

arXiv.cs.DL Pub Date : 2024-04-04
Jesus M. Alvarez Llorente, Vicente P. Guerrero-Bote, Felix de Moya-Anegon

This paper presents and evaluates a set of methods to classify individual Scopus publications using their references back to the second generation, where each publication can be assigned fractionally into up to five ASJC (All Science Journal Classifications) categories, excluding the Multidisciplinary area and the miscellaneous categories. Based on proposals by Glanzel et al. (1999a, 1999b, 2021),

更新日期：2024-04-05

详情收藏

Usage of OpenAlex for creating meaningful global overlay maps of science on the individual and institutional levels

arXiv.cs.DL Pub Date : 2024-04-03
Robin Haunschild, Lutz Bornmann

Global overlay maps of science use base maps that are overlaid by specific data (from single researchers, institutions, or countries) for visualizing scientific performance such as field-specific paper output. A procedure to create global overlay maps using OpenAlex is proposed. Six different global base maps are provided. Using one of these base maps, example overlay maps for one individual (the first

更新日期：2024-04-04

详情收藏

The open access coverage of OpenAlex, Scopus and Web of Science

arXiv.cs.DL Pub Date : 2024-04-02
Marc-Andre Simard, Isabel Basson, Madelaine Hare, Vincent Lariviere, Philippe Mongeon

Diamond open access (OA) journals offer a publishing model that is free for both authors and readers, but their lack of indexing in major bibliographic databases presents challenges in assessing the uptake of these journals. Furthermore, OA characteristics such as publication language and country of publication have often been used to support the argument that OA journals are more diverse and aim to

更新日期：2024-04-04

详情收藏

Analyzing the inter-domain vs intra-domain knowledge flows

arXiv.cs.DL Pub Date : 2024-04-02
Giovanni Abramo, Ciriaco Andrea D'Angelo

Similar to how innovations often find success in fields other than their original domains, in this study we explore whether the same holds true for scientific discoveries. We investigate the flow of knowledge across scientific disciplines, focusing on connections between citing and cited publications. Specifically, we analyze the connections among cited publications from 2015 indexed in the Web of

更新日期：2024-04-04

详情收藏

Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest

arXiv.cs.DL Pub Date : 2024-04-02
Walid Hariri

Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific

更新日期：2024-04-04

详情收藏

How biomedical papers accumulated their clinical citations: A large-scale retrospective analysis based on PubMed

arXiv.cs.DL Pub Date : 2024-04-01
Xin Li, Xuli Tang, Wei Lu

This paper explored the temporal characteristics of clinical citations of biomedical papers, including how long it takes to receive its first clinical citation (the initial stage) and how long it takes to receive two or more clinical citations after its first clinical citation (the build-up stage). Over 23 million biomedical papers in PubMed between 1940 and 2013 and their clinical citations are used

更新日期：2024-04-02

详情收藏

Forensic Scientometrics -- An emerging discipline to protect the scholarly record

arXiv.cs.DL Pub Date : 2024-03-30
Leslie D. McIntosh, Cynthia Hudson Vitale

Forensic Scientometrics (FoSci) is emerging as a vital discipline at the intersection of scientific integrity and security. Scholarship and scholarly communication are critical for maintaining scientific integrity, influencing public trust in science, health, technology, policy, and law. Yet, these foundations are threatened by the misuse of scientific research for personal, commercial, ideological

更新日期：2024-04-02

详情收藏

A First Ontological Model for the Description of the Art Market in the Semantic Web

arXiv.cs.DL Pub Date : 2024-03-30
Manuele Veggi

This dissertation presents the first version of a project at the Fondazione Federico Zeri, aimed at modelling the art market starting from the recognition of the peculiarities of this sector and relying on the data collected by this institute during its research activities on its documentary collection. Specifically, this study describes the development of an ontology, able to describe agents, events

更新日期：2024-04-02

详情收藏

Towards a Brazilian History Knowledge Graph

arXiv.cs.DL Pub Date : 2024-03-28
Valeria de Paiva, Alexandre Rademaker

This short paper describes the first steps in a project to construct a knowledge graph for Brazilian history based on the Brazilian Dictionary of Historical Biographies (DHBB) and Wikipedia/Wikidata. We contend that large repositories of Brazilian-named entities (people, places, organizations, and political events and movements) would be beneficial for extracting information from Portuguese texts.

更新日期：2024-04-01

详情收藏

Understanding Archives: Towards New Research Interfaces Relying on the Semantic Annotation of Documents

arXiv.cs.DL Pub Date : 2024-03-28
Nicolas GutehrléCRIT, Iana AtanassovaCRIT, STIH, TESNIERE, LaLIC

The digitisation campaigns carried out by libraries and archives in recent years have facilitated access to documents in their collections. However, exploring and exploiting these documents remain difficult tasks due to the sheer quantity of documents available for consultation. In this article, we show how the semantic annotation of the textual content of study corpora of archival documents allow

更新日期：2024-03-29

详情收藏

Authorized Subject Headings in the Online Automatic catalog Environment An Empirical Study on a Sample of Arabic Records

arXiv.cs.DL Pub Date : 2024-03-26
Ahmed Ammar Hussein Hammam

Subject headings are very important to machine catalogs, given the importance of thematic research. This study aims to measure the quality of a group of authorized subject headings with a sample of Arabic bibliographic records on the catalog of Egyptian university libraries by identifying the most important practices, policies, procedures followed, and tools used. In addition to assessing the actual

更新日期：2024-03-29

详情收藏

Technical Report: Incorporating Blogs in Pollux

arXiv.cs.DL Pub Date : 2024-03-26
Tobias Holtdirk, Nina Smirnova

This technical report describes the incorporation of political blogs into Pollux, the Specialised Information Service (FID) for Political Science in Germany. Considering the widespread use of political blogs in political science research, we decided to include them in the Pollux search system to enhance the available information infrastructure. We describe the crawling and analyzing of the blogs and

更新日期：2024-03-27

详情收藏

Towards a FAIR Documentation of Workflows and Models in Applied Mathematics

arXiv.cs.DL Pub Date : 2024-03-26
Marco Reidelbach, Björn Schembera, Marcus Weber

Modeling-Simulation-Optimization workflows play a fundamental role in applied mathematics. The Mathematical Research Data Initiative, MaRDI, responded to this by developing a FAIR and machine-interpretable template for a comprehensive documentation of such workflows. MaRDMO, a Plugin for the Research Data Management Organiser, enables scientists from diverse fields to document and publish their workflows

更新日期：2024-03-27

详情收藏

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

arXiv.cs.DL Pub Date : 2024-03-25
Andrew Gray

The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature

更新日期：2024-03-26

详情收藏

pyKCN: A Python Tool for Bridging Scientific Knowledge

arXiv.cs.DL Pub Date : 2024-03-24
Zhenyuan Lu, Wei Li, Burcu Ozek, Haozhou Zhou, Srinivasan Radhakrishnan, Sagar Kamarthi

The study of research trends is pivotal for understanding scientific development on specific topics. Traditionally, this involves keyword analysis within scholarly literature, yet comprehensive tools for such analysis are scarce, especially those capable of parsing large datasets with precision. pyKCN, a Python toolkit, addresses this gap by automating keyword cleaning, extraction and trend analysis

更新日期：2024-03-26

详情收藏

SPACE-IDEAS: A Dataset for Salient Information Detection in Space Innovation

arXiv.cs.DL Pub Date : 2024-03-25
Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez

Detecting salient parts in text using natural language processing has been widely used to mitigate the effects of information overflow. Nevertheless, most of the datasets available for this task are derived mainly from academic publications. We introduce SPACE-IDEAS, a dataset for salient information detection from innovation ideas related to the Space domain. The text in SPACE-IDEAS varies greatly

更新日期：2024-03-26

详情收藏

(Non-)retracted academic papers in OpenAlex

arXiv.cs.DL Pub Date : 2024-03-20
Christian Hauschke, Serhii Nazarovets

The proliferation of scholarly publications underscores the necessity for reliable tools to navigate scientific literature. OpenAlex, an emerging platform amalgamating data from diverse academic sources, holds promise in meeting these evolving demands. Nonetheless, our investigation uncovered a flaw in OpenAlex's portrayal of publication status, particularly concerning retractions. Despite accurate

更新日期：2024-03-21

详情收藏

A proposal to improve the calculation of the disruption index

arXiv.cs.DL Pub Date : 2024-03-18
Christian Leibel, Lutz Bornmann

Wu et al. (2019) proposed the disruption index (DI1) as a bibliometric indicator that measures disruptive and consolidating research. Leibel and Bornmann (2024) recently published a literature overview on the disruption index research in Scientometrics. In this letter to the editor, we point out that the method of calculating the DI1 score of a focal paper contains a logical impact measurement error

更新日期：2024-03-19

详情收藏

Sentiment-aware Enhancements of PageRank-based Citation Metric, Impact Factor, and H-index for Ranking the Authors of Scholarly Articles

arXiv.cs.DL Pub Date : 2024-03-13
Shikha Gupta, Animesh Kumar

Heretofore, the only way to evaluate an author has been frequency-based citation metrics that assume citations to be of a neutral sentiment. However, considering the sentiment behind citations aids in a better understanding of the viewpoints of fellow researchers for the scholarly output of an author.

更新日期：2024-03-14

详情收藏

BrainKnow -- Extracting, Linking, and Associating Neuroscience Knowledge

arXiv.cs.DL Pub Date : 2024-03-07
Cunqing Huangfu, Yi Zeng, Yuwei Wang, Dongsheng Wang, Zizhe Ruan

The vast accumulation of neuroscience knowledge presents a challenge for researchers to timely and accurately locate the specific information they require. Constructing a knowledge engine that automatically extracts and organizes information from academic papers can provide researchers with timely and accurate informational services. We present the Brain Knowledge Engine (BrainKnow), which extracts

更新日期：2024-03-08

详情收藏

PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

arXiv.cs.DL Pub Date : 2024-03-05
Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue

With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles

更新日期：2024-03-07

详情收藏

AceMap: Knowledge Discovery through Academic Graph

arXiv.cs.DL Pub Date : 2024-03-05
Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou, Chenghu Zhou

The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications

更新日期：2024-03-07

详情收藏

Preserving Tangible and Intangible Cultural Heritage: the Cases of Volterra and Atari

arXiv.cs.DL Pub Date : 2024-03-05
Maciej Grzeszczuk, Kinga Skorupska, Paweł Grabarczyk, Władysław Fuchs, Paul F. Aubin, Mark E. Dietrick, Barbara Karpowicz, Rafał Masłyk, Pavlo Zinevych, Wiktor Stawski, Stanisław Knapiński, Wiesław Kopeć

At first glance, the ruins of the Roman Theatre in the Italian town of Volterra have little in common with cassette tapes containing Atari games. One is certainly considered an important historical landmark, while the consensus on the importance of the other is partial at best. Still, both are remnants of times vastly different from the present and are at risk of oblivion. Unearthed architectural structures

更新日期：2024-03-07

详情收藏

Astronomy in Colombia: a bibliometric perspective

arXiv.cs.DL Pub Date : 2024-03-04
Sofía Guevara-Montoya, Felipe Ortiz-Ferreira, María Paula Silva-Arévalo, Paola A. Niño-Muñoz, Jaime E. Forero-Romero

In Colombia, astronomical research is experiencing accelerated growth. In order to better understand its evolution and current state, we conducted a bibliometric study using data from the Astrophysics Data System (ADS) and Web of Science (WoS). In ADS, we identified 422 peer-reviewed publications from 1980, the year of the first publication, until 2023, which was the cutoff date for our study. Among

更新日期：2024-03-07

详情收藏

It Takes a Village: A Distributed Training Model for AI-based Chatbots

arXiv.cs.DL Pub Date : 2024-03-03
Colleen Estes, Beth Twomey, Annie Johnson

In Summer 2023, staff from the information technology and reference departments at the University of Delaware Library, Museums and Press came together in a unique partnership to pilot a low-cost AI-powered chatbot. The goal of the pilot is to learn more about student and faculty interest in engaging with this tool, and to better understand the labor required on the staff side. Reference librarians

更新日期：2024-03-07

详情收藏

On The Peer Review Reports: Does Size Matter?

arXiv.cs.DL Pub Date : 2024-03-07
Abdelghani MaddiGEMASS, Luis MiottiCEPN

Amidst the ever-expanding realm of scientific production and the proliferation of predatory journals, the focus on peer review remains paramount for scientometricians and sociologists of science. Despite this attention, there is a notable scarcity of empirical investigations into the tangible impact of peer review on publication quality. This study aims to address this gap by conducting a comprehensive

更新日期：2024-03-07

详情收藏

Talent hat, cross-border mobility, and career development in China

arXiv.cs.DL Pub Date : 2024-02-29
Yurui Huang, Xuesen Cheng, Chaolin Tian, Xunyi Jiang, Langtian Ma, Yifang Ma

This study aims to investigate the influence of cross-border recruitment program in China, which confers scientists with a 'talent hat' including a startup package comprising significant bonuses, pay, and funding, on their future performance and career development. By curating a unique dataset from China's 10-year talent recruitment program, we employed multiple matching designs to quantify the effects

更新日期：2024-03-04

详情收藏

A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions

arXiv.cs.DL Pub Date : 2024-03-01
Charvi Rastogi, Xiangchen Song, Zhijing Jin, Ivan Stelmakh, Hal Daumé III, Kun Zhang, Nihar B. Shah

Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one

更新日期：2024-03-01

详情收藏

How open are hybrid journals included in transformative agreements?

arXiv.cs.DL Pub Date : 2024-02-28
Najko Jahn

The ongoing controversy surrounding transformative agreements, which aim to transition journal publishing to full open access, highlight the need for large-scale studies assessing the uptake of open access in hybrid journals. This includes evaluating the extent to which transformative agreements enabled open access. By combining publicly available data from various sources, including cOAlition S Journal

更新日期：2024-02-29

详情收藏

Handling Open Research Data within the Max Planck Society -- Looking Closer at the Year 2020

arXiv.cs.DL Pub Date : 2024-02-28
Martin Boosen, Michael Franke, Yves Vincent Grossmann, Sy Dat Ho, Larissa Leiminger, Jan Matthiesen

This paper analyses the practice of publishing research data within the Max Planck Society in the year 2020. The central finding of the study is that up to 40\% of the empirical text publications had research data available. The aggregation of the available data is predominantly analysed. There are differences between the sections of the Max Planck Society but they are not as great as one might expect

更新日期：2024-02-29

详情收藏

PST-Bench: Tracing and Benchmarking the Source of Publications

arXiv.cs.DL Pub Date : 2024-02-25
Fanjin Zhang, Kun Cao, Yukuo Cen, Jifan Yu, Da Yin, Jie Tang

Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which automatic

更新日期：2024-02-27

详情收藏

Mapping Literacies in the Tourism Labor Market: A Cross-Database Comparison

arXiv.cs.DL Pub Date : 2024-02-23
Eddy Soria Leyva, Ana Beatriz Hernandez Lara

This book chapter conducts a comparative bibliometric analysis of literacies in the tourism labor market, drawing from the Web of Science (WoS) and Scopus databases. The objective is to assess scientific outputs and identify key patterns of scientific collaboration. Findings suggest a statistically significant difference between the two databases with an overlap level of 35.71%. However, there is a

更新日期：2024-02-26

详情收藏

Analyzing the Dynamics of COVID-19 Lockdown Success: Insights from Regional Data and Public Health Measures

arXiv.cs.DL Pub Date : 2024-02-25
Md. Motaleb Hossen Manik, Md. Ahsan Habib, Md. Zabirul Islam, Tanim Ahmed, Fabliha Haque

The COVID-19 pandemic caused by the coronavirus had a significant effect on social, economic, and health systems globally. The virus emerged in Wuhan, China, and spread worldwide resulting in severe disease, death, and social interference. Countries implemented lockdowns in various regions to limit the spread of the virus. Some of them were successful and some failed. Here, several factors played a

更新日期：2024-02-25

详情收藏

Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer Models

arXiv.cs.DL Pub Date : 2024-02-24
Chunhe Ni, Jiang Wu, Hongbo Wang, Wenran Lu, Chenwei Zhang

Large Language Models (LLMs) are a class of generative AI models built using the Transformer network, capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. LLMs promise to revolutionize society, yet training these foundational models poses immense challenges. Semantic vector search within large language models is a potent technique that can significantly

更新日期：2024-02-24

详情收藏

EOSC CZ: Towards the development of Czech national ecosystem for FAIR research data

arXiv.cs.DL Pub Date : 2024-02-20
Matej Antol, Jiri Marek, Michaela Capandova, Jaroslav Juracek, Ludek Matyska

This short paper presents a compact overview of the Czech approach to implementing the European Open Science Cloud and plans for developing a Czech national infrastructure for FAIR research data. Its purpose is to provide an all-encompassing summary of the near future of research data management in Czechia. As such, we deliberately attempt to explain complicated concepts in minimum words, sacrificing

更新日期：2024-02-22

详情收藏

A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence

arXiv.cs.DL Pub Date : 2024-02-20
Penghai Zhao, Xin Zhang, Ming-Ming Cheng, Jian Yang, Xiang Li

By consolidating scattered knowledge, the literature review provides a comprehensive understanding of the investigated topic. However, excessive reviews, especially in the booming field of pattern analysis and machine intelligence (PAMI), raise concerns for both researchers and reviewers. In response to these concerns, this Analysis aims to provide a thorough review of reviews in the PAMI field from

更新日期：2024-02-21

详情收藏

Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age Recession

arXiv.cs.DL Pub Date : 2024-02-19
Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals

更新日期：2024-02-20

详情收藏

Thinking Outside the Black Box: Insights from a Digital Exhibition in the Humanities

arXiv.cs.DL Pub Date : 2024-02-19
Sebastian Barzaghi, Alice Bordignon, Bianca Gualandi, Silvio Peroni

One of the main goals of Open Science is to make research more reproducible. There is no consensus, however, on what exactly "reproducibility" is, as opposed for example to "replicability", and how it applies to different research fields. After a short review of the literature on reproducibility/replicability with a focus on the humanities, we describe how the creation of the digital twin of the temporary

更新日期：2024-02-20

详情收藏

Research status of the Mendeleev Periodic Table: a bibliometric analysis

arXiv.cs.DL Pub Date : 2024-02-18
Kamna Sharma, Deepak Kumar Das, Saibal Ray

In this paper, we present a bibliometric analysis of the Mendeleev Periodic Table. We have conducted a comprehensive analysis of the Scopus-based database using the keyword "Mendeleev Periodic Table". Our findings suggest that the Mendeleev Periodic Table is an influential topic in the field of Inorganic as well as Organic Chemistry. Future researchers may focus on expanding our analysis to include

更新日期：2024-02-20

详情收藏

Towards Development of Automated Knowledge Maps and Databases for Materials Engineering using Large Language Models

arXiv.cs.DL Pub Date : 2024-02-17
Deepak Prasad, Mayur Pimpude, Alankar Alankar

In this work a Large Language Model (LLM) based workflow is presented that utilizes OpenAI ChatGPT model GPT-3.5-turbo-1106 and Google Gemini Pro model to create summary of text, data and images from research articles. It is demonstrated that by using a series of processing, the key information can be arranged in tabular form and knowledge graphs to capture underlying concepts. Our method offers efficiency

更新日期：2024-02-20

详情收藏

HTML papers on arXiv -- why it is important, and how we made it happen

arXiv.cs.DL Pub Date : 2024-02-14
Charles Frankston, Jonathan Godfrey, Shamsi Brinn, Alison Hofer, Mark Nazzaro

In October 2023, arXiv made HTML formatted papers available to readers. This was the exciting outcome of over a year of accessibility research and development with the scientific community. Currently, only 2.4% of research outputs meet accessibility guidelines. Informed by scientists who rely on assistive technology, our analysis demonstrates that offering HTML is the most impactful step arXiv can

更新日期：2024-02-15

详情收藏

Interleaved snowballing: Reducing the workload of literature curators

arXiv.cs.DL Pub Date : 2024-02-13
Ralf Stephan

We formally define the literature (reference) snowballing method and present a refined version of it. We show that the improved algorithm can substantially reduce curator work, even before application of text classification, by reducing the number of candidates to classify. We also present a desktop application named LitBall that implements this and other literature collection methods, through access

更新日期：2024-02-14

详情收藏

Cultural gems linked open data: Mapping culture and intangible heritage in European cities

arXiv.cs.DL Pub Date : 2024-02-12
Sergio Consoli, Valentina Alberti, Cinzia Cocco, Francesco Panella, Valentina Montalto

The recovery and resilience of the cultural and creative sectors after the COVID-19 pandemic is a current topic with priority for the European Commission. Cultural gems is a crowdsourced web platform managed by the Joint Research Centre of the European Commission aimed at creating community-led maps as well as a common repository for cultural and creative places across European cities and towns. More

更新日期：2024-02-14

详情收藏

Ontology Engineering to Model the European Cultural Heritage: The Case of Cultural Gems

arXiv.cs.DL Pub Date : 2024-02-12
Valentina Alberti, Cinzia Cocco, Sergio Consoli, Valentina Montalto, Francesco Panella

Cultural gems is a web application conceived by the European Commission's Joint Research Centre (DG JRC), which aims at engaging people and organisations across Europe to create a unique repository of cultural and creative places. The main goal is to provide a vision of European culture in order to strengthen a sense of identity within a single European cultural realm. Cultural gems maps more than

更新日期：2024-02-14

详情收藏

A Maturity Model for Urban Dataset Meta-data

arXiv.cs.DL Pub Date : 2024-02-07
Mark S. Fox, Bart Gajderowicz, Dishu Lyu

In the current environment of data generation and publication, there is an ever-growing number of datasets available for download. This growth precipitates an existing challenge: sourcing and integrating relevant datasets for analysis is becoming more complex. Despite efforts by open data platforms, obstacles remain, predominantly rooted in inadequate metadata, unsuitable data presentation, complications

更新日期：2024-02-09

详情收藏

The Howard-Harvard effect: Institutional reproduction of intersectional inequalities

arXiv.cs.DL Pub Date : 2024-02-06
Diego Kozlowski, Thema Monroe-White, Vincent Larivière, Cassidy R. Sugimoto

The US higher education system concentrates the production of science and scientists within a few institutions. This has implications for minoritized scholars and the topics with which they are disproportionately associated. This paper examines topical alignment between institutions and authors of varying intersectional identities, and the relationship with prestige and scientific impact. We observe

更新日期：2024-02-08

详情收藏

[Citation needed] Data usage and citation practices in medical imaging conferences

arXiv.cs.DL Pub Date : 2024-02-05
Théo Sourget, Ahmet Akkoç, Stinna Winther, Christine Lyngbye Galsgaard, Amelia Jiménez-Sánchez, Dovile Juodelyte, Caroline Petitjean, Veronika Cheplygina

Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In

更新日期：2024-02-06

详情收藏

HERITRACE: Tracing Evolution and Bridging Data for Streamlined Curatorial Work in the GLAM Domain

arXiv.cs.DL Pub Date : 2024-02-01
Arcangelo MassariDigital Humanities Advanced Research CentreResearch Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy, Silvio PeroniDigital Humanities Advanced Research CentreResearch Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy

HERITRACE is a semantic data management system tailored for the GLAM sector. It is engineered to streamline data curation for non-technical users while also offering an efficient administrative interface for technical staff. The paper compares HERITRACE with other established platforms such as OmekaS, Semantic MediaWiki, Research Space, and CLEF, emphasizing its advantages in user friendliness, provenance

更新日期：2024-02-02

详情收藏

University Students Motives and Challenges in Utilising Institutional Repository Resources

arXiv.cs.DL Pub Date : 2024-01-31
Suzan Masawe, Paul Muneja, Vincent Msonge

One of the core functions of an academic institution is to generate knowledge, disseminate it to the intended audiences, and preserve it for future use. Academic institutions are now establishing Institutional Repositories (IRs) to collect produced resources to facilitate accessibility, dissemination, utilization, and management of intellectual materials produced within an institution. This study aimed

更新日期：2024-02-01

详情收藏

Reading yesterday's news. Layout recognition by segmentation of historical newspaper pages

arXiv.cs.DL Pub Date : 2024-01-30
Christian SchultzeHigh-Performance Computing and Analytics, Niklas KerkfeldHigh-Performance Computing and Analytics, Kara KuebartInstitut für Geschichtswissenschaft Universität Bonn, Princilia WeberInstitut für Geschichtswissenschaft Universität Bonn, Moritz WolterHigh-Performance Computing and Analytics, Felix SelgertInstitut für Geschichtswissenschaft Universität Bonn

Newspapers are important sources for historians interested in past societies' cultural values, social structures, and their changes. Since the 19th century, newspapers have been widely available and spread regionally. Today, historical newspapers are digitized but unavailable in a separate metadata-enhanced form. Machine-readable metadata, however, is a prerequisite for a mass statistical analysis

更新日期：2024-01-31

详情收藏

WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia

arXiv.cs.DL Pub Date : 2024-01-30
Johannes Stegmüller, Moritz Schubotz

MediaWiki and Wikipedia authors usually use LaTeX to define mathematical formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas were processed by a long cascade of web services and finally delivered to users' browsers in rendered form for visually readable representation as SVG. With the latest developments of supporting MathML Core in Chromium-based browsers, MathML continues

更新日期：2024-01-31

详情收藏

Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus

arXiv.cs.DL Pub Date : 2024-01-29
Jack Culbert, Anne Hobert, Najko Jahn, Nick Haupka, Marion Schmidt, Paul Donner, Philipp Mayr

OpenAlex is a promising open source of scholarly metadata, and competitor to the established proprietary sources, the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rapidly evolving source and the data contained within is expanding

更新日期：2024-01-30

详情收藏

Textual Entailment for Effective Triple Validation in Object Prediction

arXiv.cs.DL Pub Date : 2024-01-29
Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez

Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines

更新日期：2024-01-30

详情收藏

ChemDFM: Dialogue Foundation Model for Chemistry

arXiv.cs.DL Pub Date : 2024-01-26
Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu

Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly informative

更新日期：2024-01-29

详情收藏

Visualization of rank-citation curves for fast detection of h-index anomalies in university metrics

arXiv.cs.DL Pub Date : 2024-01-24
Serhii Nazarovets

University rankings, despite facing criticism, continue to maintain their popularity. In the 2023 Scopus Ranking of Ukrainian Universities, certain institutions stood out due to their high h-index, despite modest publication and citation numbers. This phenomenon can be attributed to influential research topics or involvement in international collaborative research. However, these results may also be

更新日期：2024-01-25

详情收藏