-
Linked open data per la valorizzazione di collezioni culturali: il dataset mythLOD arXiv.cs.DL Pub Date : 2024-04-10 Valentina Pasqual, Francesca Tomasi
The formal representation of cultural metadata has always been a challenge, considering both the heterogeneity of cultural objects and the need to document the interpretive act exercised by experts. This article provides an overview of the revalorization of the digital collection Mythologiae in Linked Open Data format. The research aims to explore the data of a collection of artworks (Mythologiae)
-
The Rise and Fall of the Initial Era arXiv.cs.DL Pub Date : 2024-04-09 Simon J Porter, Daniel W Hook
Bibliographic data is a rich source of information that goes beyond the use cases of location and citation -- it also encodes both cultural and technological context. For most of its existence, the scholarly record has changed slowly and hence provides an opportunity to gain insight through its reflection of the cultural norms of the research community over the last four centuries. While it is often
-
A Knowledge Producer's View on the Knowledge Commons arXiv.cs.DL Pub Date : 2024-04-09 Mathilde Noual
Hardin introduced the notorious concept of "tragedy of the commons". Worrying about the consequences of human overpopulation on the planet, he discussed "hard problems": problems with no technical solutions, that can only be addressed by way of an evolving morality. Hardin's tragedy of the commons predicts that the hard problem of human population growth directly implies a hard problem of overuse or
-
The role of non-scientific factors vis-a-vis the quality of publications in determining their scholarly impact arXiv.cs.DL Pub Date : 2024-04-08 Giovanni Abramo, Ciriaco Andrea D'Angelo, Leonardo Grilli
In the evaluation of scientific publications' impact, the interplay between intrinsic quality and non-scientific factors remains a subject of debate. While peer review traditionally assesses quality, bibliometric techniques gauge scholarly impact. This study investigates the role of non-scientific attributes alongside quality scores from peer review in determining scholarly impact. Leveraging data
-
A Repository for Formal Contexts arXiv.cs.DL Pub Date : 2024-04-05 Tom Hanika, Robert Jäschke
Data is always at the center of the theoretical development and investigation of the applicability of formal concept analysis. It is therefore not surprising that a large number of data sets are repeatedly used in scholarly articles and software tools, acting as de facto standard data sets. However, the distribution of the data sets poses a problem for the sustainable development of the research field
-
New fractional classifications of papers based on two generations of references and on the ASJC Scopus scheme arXiv.cs.DL Pub Date : 2024-04-04 Jesus M. Alvarez Llorente, Vicente P. Guerrero-Bote, Felix de Moya-Anegon
This paper presents and evaluates a set of methods to classify individual Scopus publications using their references back to the second generation, where each publication can be assigned fractionally into up to five ASJC (All Science Journal Classifications) categories, excluding the Multidisciplinary area and the miscellaneous categories. Based on proposals by Glanzel et al. (1999a, 1999b, 2021),
-
Usage of OpenAlex for creating meaningful global overlay maps of science on the individual and institutional levels arXiv.cs.DL Pub Date : 2024-04-03 Robin Haunschild, Lutz Bornmann
Global overlay maps of science use base maps that are overlaid by specific data (from single researchers, institutions, or countries) for visualizing scientific performance such as field-specific paper output. A procedure to create global overlay maps using OpenAlex is proposed. Six different global base maps are provided. Using one of these base maps, example overlay maps for one individual (the first
-
The open access coverage of OpenAlex, Scopus and Web of Science arXiv.cs.DL Pub Date : 2024-04-02 Marc-Andre Simard, Isabel Basson, Madelaine Hare, Vincent Lariviere, Philippe Mongeon
Diamond open access (OA) journals offer a publishing model that is free for both authors and readers, but their lack of indexing in major bibliographic databases presents challenges in assessing the uptake of these journals. Furthermore, OA characteristics such as publication language and country of publication have often been used to support the argument that OA journals are more diverse and aim to
-
Analyzing the inter-domain vs intra-domain knowledge flows arXiv.cs.DL Pub Date : 2024-04-02 Giovanni Abramo, Ciriaco Andrea D'Angelo
Similar to how innovations often find success in fields other than their original domains, in this study we explore whether the same holds true for scientific discoveries. We investigate the flow of knowledge across scientific disciplines, focusing on connections between citing and cited publications. Specifically, we analyze the connections among cited publications from 2015 indexed in the Web of
-
Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest arXiv.cs.DL Pub Date : 2024-04-02 Walid Hariri
Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific
-
How biomedical papers accumulated their clinical citations: A large-scale retrospective analysis based on PubMed arXiv.cs.DL Pub Date : 2024-04-01 Xin Li, Xuli Tang, Wei Lu
This paper explored the temporal characteristics of clinical citations of biomedical papers, including how long it takes to receive its first clinical citation (the initial stage) and how long it takes to receive two or more clinical citations after its first clinical citation (the build-up stage). Over 23 million biomedical papers in PubMed between 1940 and 2013 and their clinical citations are used
-
Forensic Scientometrics -- An emerging discipline to protect the scholarly record arXiv.cs.DL Pub Date : 2024-03-30 Leslie D. McIntosh, Cynthia Hudson Vitale
Forensic Scientometrics (FoSci) is emerging as a vital discipline at the intersection of scientific integrity and security. Scholarship and scholarly communication are critical for maintaining scientific integrity, influencing public trust in science, health, technology, policy, and law. Yet, these foundations are threatened by the misuse of scientific research for personal, commercial, ideological
-
A First Ontological Model for the Description of the Art Market in the Semantic Web arXiv.cs.DL Pub Date : 2024-03-30 Manuele Veggi
This dissertation presents the first version of a project at the Fondazione Federico Zeri, aimed at modelling the art market starting from the recognition of the peculiarities of this sector and relying on the data collected by this institute during its research activities on its documentary collection. Specifically, this study describes the development of an ontology, able to describe agents, events
-
Towards a Brazilian History Knowledge Graph arXiv.cs.DL Pub Date : 2024-03-28 Valeria de Paiva, Alexandre Rademaker
This short paper describes the first steps in a project to construct a knowledge graph for Brazilian history based on the Brazilian Dictionary of Historical Biographies (DHBB) and Wikipedia/Wikidata. We contend that large repositories of Brazilian-named entities (people, places, organizations, and political events and movements) would be beneficial for extracting information from Portuguese texts.
-
Understanding Archives: Towards New Research Interfaces Relying on the Semantic Annotation of Documents arXiv.cs.DL Pub Date : 2024-03-28 Nicolas GutehrléCRIT, Iana AtanassovaCRIT, STIH, TESNIERE, LaLIC
The digitisation campaigns carried out by libraries and archives in recent years have facilitated access to documents in their collections. However, exploring and exploiting these documents remain difficult tasks due to the sheer quantity of documents available for consultation. In this article, we show how the semantic annotation of the textual content of study corpora of archival documents allow
-
Authorized Subject Headings in the Online Automatic catalog Environment An Empirical Study on a Sample of Arabic Records arXiv.cs.DL Pub Date : 2024-03-26 Ahmed Ammar Hussein Hammam
Subject headings are very important to machine catalogs, given the importance of thematic research. This study aims to measure the quality of a group of authorized subject headings with a sample of Arabic bibliographic records on the catalog of Egyptian university libraries by identifying the most important practices, policies, procedures followed, and tools used. In addition to assessing the actual
-
Technical Report: Incorporating Blogs in Pollux arXiv.cs.DL Pub Date : 2024-03-26 Tobias Holtdirk, Nina Smirnova
This technical report describes the incorporation of political blogs into Pollux, the Specialised Information Service (FID) for Political Science in Germany. Considering the widespread use of political blogs in political science research, we decided to include them in the Pollux search system to enhance the available information infrastructure. We describe the crawling and analyzing of the blogs and
-
Towards a FAIR Documentation of Workflows and Models in Applied Mathematics arXiv.cs.DL Pub Date : 2024-03-26 Marco Reidelbach, Björn Schembera, Marcus Weber
Modeling-Simulation-Optimization workflows play a fundamental role in applied mathematics. The Mathematical Research Data Initiative, MaRDI, responded to this by developing a FAIR and machine-interpretable template for a comprehensive documentation of such workflows. MaRDMO, a Plugin for the Research Data Management Organiser, enables scientists from diverse fields to document and publish their workflows
-
ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature arXiv.cs.DL Pub Date : 2024-03-25 Andrew Gray
The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature
-
pyKCN: A Python Tool for Bridging Scientific Knowledge arXiv.cs.DL Pub Date : 2024-03-24 Zhenyuan Lu, Wei Li, Burcu Ozek, Haozhou Zhou, Srinivasan Radhakrishnan, Sagar Kamarthi
The study of research trends is pivotal for understanding scientific development on specific topics. Traditionally, this involves keyword analysis within scholarly literature, yet comprehensive tools for such analysis are scarce, especially those capable of parsing large datasets with precision. pyKCN, a Python toolkit, addresses this gap by automating keyword cleaning, extraction and trend analysis
-
SPACE-IDEAS: A Dataset for Salient Information Detection in Space Innovation arXiv.cs.DL Pub Date : 2024-03-25 Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez
Detecting salient parts in text using natural language processing has been widely used to mitigate the effects of information overflow. Nevertheless, most of the datasets available for this task are derived mainly from academic publications. We introduce SPACE-IDEAS, a dataset for salient information detection from innovation ideas related to the Space domain. The text in SPACE-IDEAS varies greatly
-
(Non-)retracted academic papers in OpenAlex arXiv.cs.DL Pub Date : 2024-03-20 Christian Hauschke, Serhii Nazarovets
The proliferation of scholarly publications underscores the necessity for reliable tools to navigate scientific literature. OpenAlex, an emerging platform amalgamating data from diverse academic sources, holds promise in meeting these evolving demands. Nonetheless, our investigation uncovered a flaw in OpenAlex's portrayal of publication status, particularly concerning retractions. Despite accurate
-
A proposal to improve the calculation of the disruption index arXiv.cs.DL Pub Date : 2024-03-18 Christian Leibel, Lutz Bornmann
Wu et al. (2019) proposed the disruption index (DI1) as a bibliometric indicator that measures disruptive and consolidating research. Leibel and Bornmann (2024) recently published a literature overview on the disruption index research in Scientometrics. In this letter to the editor, we point out that the method of calculating the DI1 score of a focal paper contains a logical impact measurement error
-
Sentiment-aware Enhancements of PageRank-based Citation Metric, Impact Factor, and H-index for Ranking the Authors of Scholarly Articles arXiv.cs.DL Pub Date : 2024-03-13 Shikha Gupta, Animesh Kumar
Heretofore, the only way to evaluate an author has been frequency-based citation metrics that assume citations to be of a neutral sentiment. However, considering the sentiment behind citations aids in a better understanding of the viewpoints of fellow researchers for the scholarly output of an author.
-
BrainKnow -- Extracting, Linking, and Associating Neuroscience Knowledge arXiv.cs.DL Pub Date : 2024-03-07 Cunqing Huangfu, Yi Zeng, Yuwei Wang, Dongsheng Wang, Zizhe Ruan
The vast accumulation of neuroscience knowledge presents a challenge for researchers to timely and accurately locate the specific information they require. Constructing a knowledge engine that automatically extracts and organizes information from academic papers can provide researchers with timely and accurate informational services. We present the Brain Knowledge Engine (BrainKnow), which extracts
-
PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers arXiv.cs.DL Pub Date : 2024-03-05 Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue
With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper titles
-
AceMap: Knowledge Discovery through Academic Graph arXiv.cs.DL Pub Date : 2024-03-05 Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou, Chenghu Zhou
The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications
-
Preserving Tangible and Intangible Cultural Heritage: the Cases of Volterra and Atari arXiv.cs.DL Pub Date : 2024-03-05 Maciej Grzeszczuk, Kinga Skorupska, Paweł Grabarczyk, Władysław Fuchs, Paul F. Aubin, Mark E. Dietrick, Barbara Karpowicz, Rafał Masłyk, Pavlo Zinevych, Wiktor Stawski, Stanisław Knapiński, Wiesław Kopeć
At first glance, the ruins of the Roman Theatre in the Italian town of Volterra have little in common with cassette tapes containing Atari games. One is certainly considered an important historical landmark, while the consensus on the importance of the other is partial at best. Still, both are remnants of times vastly different from the present and are at risk of oblivion. Unearthed architectural structures
-
Astronomy in Colombia: a bibliometric perspective arXiv.cs.DL Pub Date : 2024-03-04 Sofía Guevara-Montoya, Felipe Ortiz-Ferreira, María Paula Silva-Arévalo, Paola A. Niño-Muñoz, Jaime E. Forero-Romero
In Colombia, astronomical research is experiencing accelerated growth. In order to better understand its evolution and current state, we conducted a bibliometric study using data from the Astrophysics Data System (ADS) and Web of Science (WoS). In ADS, we identified 422 peer-reviewed publications from 1980, the year of the first publication, until 2023, which was the cutoff date for our study. Among
-
It Takes a Village: A Distributed Training Model for AI-based Chatbots arXiv.cs.DL Pub Date : 2024-03-03 Colleen Estes, Beth Twomey, Annie Johnson
In Summer 2023, staff from the information technology and reference departments at the University of Delaware Library, Museums and Press came together in a unique partnership to pilot a low-cost AI-powered chatbot. The goal of the pilot is to learn more about student and faculty interest in engaging with this tool, and to better understand the labor required on the staff side. Reference librarians
-
On The Peer Review Reports: Does Size Matter? arXiv.cs.DL Pub Date : 2024-03-07 Abdelghani MaddiGEMASS, Luis MiottiCEPN
Amidst the ever-expanding realm of scientific production and the proliferation of predatory journals, the focus on peer review remains paramount for scientometricians and sociologists of science. Despite this attention, there is a notable scarcity of empirical investigations into the tangible impact of peer review on publication quality. This study aims to address this gap by conducting a comprehensive
-
Talent hat, cross-border mobility, and career development in China arXiv.cs.DL Pub Date : 2024-02-29 Yurui Huang, Xuesen Cheng, Chaolin Tian, Xunyi Jiang, Langtian Ma, Yifang Ma
This study aims to investigate the influence of cross-border recruitment program in China, which confers scientists with a 'talent hat' including a startup package comprising significant bonuses, pay, and funding, on their future performance and career development. By curating a unique dataset from China's 10-year talent recruitment program, we employed multiple matching designs to quantify the effects
-
A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions arXiv.cs.DL Pub Date : 2024-03-01 Charvi Rastogi, Xiangchen Song, Zhijing Jin, Ivan Stelmakh, Hal Daumé III, Kun Zhang, Nihar B. Shah
Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one
-
How open are hybrid journals included in transformative agreements? arXiv.cs.DL Pub Date : 2024-02-28 Najko Jahn
The ongoing controversy surrounding transformative agreements, which aim to transition journal publishing to full open access, highlight the need for large-scale studies assessing the uptake of open access in hybrid journals. This includes evaluating the extent to which transformative agreements enabled open access. By combining publicly available data from various sources, including cOAlition S Journal
-
Handling Open Research Data within the Max Planck Society -- Looking Closer at the Year 2020 arXiv.cs.DL Pub Date : 2024-02-28 Martin Boosen, Michael Franke, Yves Vincent Grossmann, Sy Dat Ho, Larissa Leiminger, Jan Matthiesen
This paper analyses the practice of publishing research data within the Max Planck Society in the year 2020. The central finding of the study is that up to 40\% of the empirical text publications had research data available. The aggregation of the available data is predominantly analysed. There are differences between the sections of the Max Planck Society but they are not as great as one might expect
-
PST-Bench: Tracing and Benchmarking the Source of Publications arXiv.cs.DL Pub Date : 2024-02-25 Fanjin Zhang, Kun Cao, Yukuo Cen, Jifan Yu, Da Yin, Jie Tang
Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which automatic
-
Mapping Literacies in the Tourism Labor Market: A Cross-Database Comparison arXiv.cs.DL Pub Date : 2024-02-23 Eddy Soria Leyva, Ana Beatriz Hernandez Lara
This book chapter conducts a comparative bibliometric analysis of literacies in the tourism labor market, drawing from the Web of Science (WoS) and Scopus databases. The objective is to assess scientific outputs and identify key patterns of scientific collaboration. Findings suggest a statistically significant difference between the two databases with an overlap level of 35.71%. However, there is a
-
Analyzing the Dynamics of COVID-19 Lockdown Success: Insights from Regional Data and Public Health Measures arXiv.cs.DL Pub Date : 2024-02-25 Md. Motaleb Hossen Manik, Md. Ahsan Habib, Md. Zabirul Islam, Tanim Ahmed, Fabliha Haque
The COVID-19 pandemic caused by the coronavirus had a significant effect on social, economic, and health systems globally. The virus emerged in Wuhan, China, and spread worldwide resulting in severe disease, death, and social interference. Countries implemented lockdowns in various regions to limit the spread of the virus. Some of them were successful and some failed. Here, several factors played a
-
Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer Models arXiv.cs.DL Pub Date : 2024-02-24 Chunhe Ni, Jiang Wu, Hongbo Wang, Wenran Lu, Chenwei Zhang
Large Language Models (LLMs) are a class of generative AI models built using the Transformer network, capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. LLMs promise to revolutionize society, yet training these foundational models poses immense challenges. Semantic vector search within large language models is a potent technique that can significantly
-
EOSC CZ: Towards the development of Czech national ecosystem for FAIR research data arXiv.cs.DL Pub Date : 2024-02-20 Matej Antol, Jiri Marek, Michaela Capandova, Jaroslav Juracek, Ludek Matyska
This short paper presents a compact overview of the Czech approach to implementing the European Open Science Cloud and plans for developing a Czech national infrastructure for FAIR research data. Its purpose is to provide an all-encompassing summary of the near future of research data management in Czechia. As such, we deliberately attempt to explain complicated concepts in minimum words, sacrificing
-
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence arXiv.cs.DL Pub Date : 2024-02-20 Penghai Zhao, Xin Zhang, Ming-Ming Cheng, Jian Yang, Xiang Li
By consolidating scattered knowledge, the literature review provides a comprehensive understanding of the investigated topic. However, excessive reviews, especially in the booming field of pattern analysis and machine intelligence (PAMI), raise concerns for both researchers and reviewers. In response to these concerns, this Analysis aims to provide a thorough review of reviews in the PAMI field from
-
Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age Recession arXiv.cs.DL Pub Date : 2024-02-19 Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad
This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals
-
Thinking Outside the Black Box: Insights from a Digital Exhibition in the Humanities arXiv.cs.DL Pub Date : 2024-02-19 Sebastian Barzaghi, Alice Bordignon, Bianca Gualandi, Silvio Peroni
One of the main goals of Open Science is to make research more reproducible. There is no consensus, however, on what exactly "reproducibility" is, as opposed for example to "replicability", and how it applies to different research fields. After a short review of the literature on reproducibility/replicability with a focus on the humanities, we describe how the creation of the digital twin of the temporary
-
Research status of the Mendeleev Periodic Table: a bibliometric analysis arXiv.cs.DL Pub Date : 2024-02-18 Kamna Sharma, Deepak Kumar Das, Saibal Ray
In this paper, we present a bibliometric analysis of the Mendeleev Periodic Table. We have conducted a comprehensive analysis of the Scopus-based database using the keyword "Mendeleev Periodic Table". Our findings suggest that the Mendeleev Periodic Table is an influential topic in the field of Inorganic as well as Organic Chemistry. Future researchers may focus on expanding our analysis to include
-
Towards Development of Automated Knowledge Maps and Databases for Materials Engineering using Large Language Models arXiv.cs.DL Pub Date : 2024-02-17 Deepak Prasad, Mayur Pimpude, Alankar Alankar
In this work a Large Language Model (LLM) based workflow is presented that utilizes OpenAI ChatGPT model GPT-3.5-turbo-1106 and Google Gemini Pro model to create summary of text, data and images from research articles. It is demonstrated that by using a series of processing, the key information can be arranged in tabular form and knowledge graphs to capture underlying concepts. Our method offers efficiency
-
HTML papers on arXiv -- why it is important, and how we made it happen arXiv.cs.DL Pub Date : 2024-02-14 Charles Frankston, Jonathan Godfrey, Shamsi Brinn, Alison Hofer, Mark Nazzaro
In October 2023, arXiv made HTML formatted papers available to readers. This was the exciting outcome of over a year of accessibility research and development with the scientific community. Currently, only 2.4% of research outputs meet accessibility guidelines. Informed by scientists who rely on assistive technology, our analysis demonstrates that offering HTML is the most impactful step arXiv can
-
Interleaved snowballing: Reducing the workload of literature curators arXiv.cs.DL Pub Date : 2024-02-13 Ralf Stephan
We formally define the literature (reference) snowballing method and present a refined version of it. We show that the improved algorithm can substantially reduce curator work, even before application of text classification, by reducing the number of candidates to classify. We also present a desktop application named LitBall that implements this and other literature collection methods, through access
-
Cultural gems linked open data: Mapping culture and intangible heritage in European cities arXiv.cs.DL Pub Date : 2024-02-12 Sergio Consoli, Valentina Alberti, Cinzia Cocco, Francesco Panella, Valentina Montalto
The recovery and resilience of the cultural and creative sectors after the COVID-19 pandemic is a current topic with priority for the European Commission. Cultural gems is a crowdsourced web platform managed by the Joint Research Centre of the European Commission aimed at creating community-led maps as well as a common repository for cultural and creative places across European cities and towns. More
-
Ontology Engineering to Model the European Cultural Heritage: The Case of Cultural Gems arXiv.cs.DL Pub Date : 2024-02-12 Valentina Alberti, Cinzia Cocco, Sergio Consoli, Valentina Montalto, Francesco Panella
Cultural gems is a web application conceived by the European Commission's Joint Research Centre (DG JRC), which aims at engaging people and organisations across Europe to create a unique repository of cultural and creative places. The main goal is to provide a vision of European culture in order to strengthen a sense of identity within a single European cultural realm. Cultural gems maps more than
-
A Maturity Model for Urban Dataset Meta-data arXiv.cs.DL Pub Date : 2024-02-07 Mark S. Fox, Bart Gajderowicz, Dishu Lyu
In the current environment of data generation and publication, there is an ever-growing number of datasets available for download. This growth precipitates an existing challenge: sourcing and integrating relevant datasets for analysis is becoming more complex. Despite efforts by open data platforms, obstacles remain, predominantly rooted in inadequate metadata, unsuitable data presentation, complications
-
The Howard-Harvard effect: Institutional reproduction of intersectional inequalities arXiv.cs.DL Pub Date : 2024-02-06 Diego Kozlowski, Thema Monroe-White, Vincent Larivière, Cassidy R. Sugimoto
The US higher education system concentrates the production of science and scientists within a few institutions. This has implications for minoritized scholars and the topics with which they are disproportionately associated. This paper examines topical alignment between institutions and authors of varying intersectional identities, and the relationship with prestige and scientific impact. We observe
-
[Citation needed] Data usage and citation practices in medical imaging conferences arXiv.cs.DL Pub Date : 2024-02-05 Théo Sourget, Ahmet Akkoç, Stinna Winther, Christine Lyngbye Galsgaard, Amelia Jiménez-Sánchez, Dovile Juodelyte, Caroline Petitjean, Veronika Cheplygina
Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In
-
HERITRACE: Tracing Evolution and Bridging Data for Streamlined Curatorial Work in the GLAM Domain arXiv.cs.DL Pub Date : 2024-02-01 Arcangelo MassariDigital Humanities Advanced Research CentreResearch Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy, Silvio PeroniDigital Humanities Advanced Research CentreResearch Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
HERITRACE is a semantic data management system tailored for the GLAM sector. It is engineered to streamline data curation for non-technical users while also offering an efficient administrative interface for technical staff. The paper compares HERITRACE with other established platforms such as OmekaS, Semantic MediaWiki, Research Space, and CLEF, emphasizing its advantages in user friendliness, provenance
-
University Students Motives and Challenges in Utilising Institutional Repository Resources arXiv.cs.DL Pub Date : 2024-01-31 Suzan Masawe, Paul Muneja, Vincent Msonge
One of the core functions of an academic institution is to generate knowledge, disseminate it to the intended audiences, and preserve it for future use. Academic institutions are now establishing Institutional Repositories (IRs) to collect produced resources to facilitate accessibility, dissemination, utilization, and management of intellectual materials produced within an institution. This study aimed
-
Reading yesterday's news. Layout recognition by segmentation of historical newspaper pages arXiv.cs.DL Pub Date : 2024-01-30 Christian SchultzeHigh-Performance Computing and Analytics, Niklas KerkfeldHigh-Performance Computing and Analytics, Kara KuebartInstitut für Geschichtswissenschaft Universität Bonn, Princilia WeberInstitut für Geschichtswissenschaft Universität Bonn, Moritz WolterHigh-Performance Computing and Analytics, Felix SelgertInstitut für Geschichtswissenschaft Universität Bonn
Newspapers are important sources for historians interested in past societies' cultural values, social structures, and their changes. Since the 19th century, newspapers have been widely available and spread regionally. Today, historical newspapers are digitized but unavailable in a separate metadata-enhanced form. Machine-readable metadata, however, is a prerequisite for a mass statistical analysis
-
WikiTexVC: MediaWiki's native LaTeX to MathML converter for Wikipedia arXiv.cs.DL Pub Date : 2024-01-30 Johannes Stegmüller, Moritz Schubotz
MediaWiki and Wikipedia authors usually use LaTeX to define mathematical formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas were processed by a long cascade of web services and finally delivered to users' browsers in rendered form for visually readable representation as SVG. With the latest developments of supporting MathML Core in Chromium-based browsers, MathML continues
-
Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus arXiv.cs.DL Pub Date : 2024-01-29 Jack Culbert, Anne Hobert, Najko Jahn, Nick Haupka, Marion Schmidt, Paul Donner, Philipp Mayr
OpenAlex is a promising open source of scholarly metadata, and competitor to the established proprietary sources, the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rapidly evolving source and the data contained within is expanding
-
Textual Entailment for Effective Triple Validation in Object Prediction arXiv.cs.DL Pub Date : 2024-01-29 Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez
Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines
-
ChemDFM: Dialogue Foundation Model for Chemistry arXiv.cs.DL Pub Date : 2024-01-26 Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu
Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly informative
-
Visualization of rank-citation curves for fast detection of h-index anomalies in university metrics arXiv.cs.DL Pub Date : 2024-01-24 Serhii Nazarovets
University rankings, despite facing criticism, continue to maintain their popularity. In the 2023 Scopus Ranking of Ukrainian Universities, certain institutions stood out due to their high h-index, despite modest publication and citation numbers. This phenomenon can be attributed to influential research topics or involvement in international collaborative research. However, these results may also be