当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17768
Dongqi Pu, Yifan Wang, Jia Loy, Vera Demberg

Scientific news reports serve as a bridge, adeptly translating complex research articles into reports that resonate with the broader public. The automated generation of such narratives enhances the accessibility of scholarly insights. In this paper, we present a new corpus to facilitate this paradigm development. Our corpus comprises a parallel compilation of academic publications and their corresponding scientific news reports across nine disciplines. To demonstrate the utility and reliability of our dataset, we conduct an extensive analysis, highlighting the divergences in readability and brevity between scientific news narratives and academic manuscripts. We benchmark our dataset employing state-of-the-art text generation models. The evaluation process involves both automatic and human evaluation, which lays the groundwork for future explorations into the automated generation of scientific news reports. The dataset and code related to this work are available at https://dongqi.me/projects/SciNews.

中文翻译:

SciNews:从学术复杂性到公共叙事——科学新闻报告生成的数据集

科学新闻报道充当桥梁,熟练地将复杂的研究文章转化为引起更广泛公众共鸣的报告。此类叙述的自动生成增强了学术见解的可及性。在本文中,我们提出了一个新的语料库来促进这种范式的发展。我们的语料库包括九个学科的学术出版物及其相应的科学新闻报道的并行汇编。为了证明我们的数据集的实用性和可靠性,我们进行了广泛的分析,强调了科学新闻叙述和学术手稿之间在可读性和简洁性方面的差异。我们使用最先进的文本生成模型对我们的数据集进行基准测试。评估过程涉及自动评估和人工评估,这为未来探索科学新闻报告的自动生成奠定了基础。与这项工作相关的数据集和代码可在 https://dongqi.me/projects/SciNews 获取。
更新日期:2024-03-27
down
wechat
bug