当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Top-down Text-Level Discourse Rhetorical Structure Parsing with Bidirectional Representation Learning
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2023-09-30 , DOI: 10.1007/s11390-022-1167-0
Long-Yin Zhang , Xin Tan , Fang Kong , Pei-Feng Li , Guo-Dong Zhou

Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches, limiting the parsing process to local information. Although current top-down parsers can better capture global information and have achieved particular success, the importance of local and global information at various levels of discourse parsing is different. This paper argues that combining local and global information for discourse parsing is more sensible. To prove this, we introduce a top-down discourse parser with bidirectional representation learning capabilities. Existing corpora on Rhetorical Structure Theory (RST) are known to be much limited in size, which makes discourse parsing very challenging. To alleviate this problem, we leverage some boundary features and a data augmentation strategy to tap the potential of our parser. We use two methods for evaluation, and the experiments on the RST-DT corpus show that our parser can primarily improve the performance due to the effective combination of local and global information. The boundary features and the data augmentation strategy also play a role. Based on gold standard elementary discourse units (EDUs), our parser significantly advances the baseline systems in nuclearity detection, with the results on the other three indicators (span, relation, and full) being competitive. Based on automatically segmented EDUs, our parser still outperforms previous state-of-the-art work.



中文翻译:

自上而下的文本级话语修辞结构解析与双向表示学习

早期的语篇修辞结构解析研究主要采用自下而上的方法,将解析过程局限于局部信息。尽管当前的自顶向下解析器可以更好地捕获全局信息并取得了一定的成功,但局部和全局信息在不同级别的语篇解析中的重要性是不同的。本文认为结合局部和全局信息进行语篇解析更为明智。为了证明这一点,我们引入了具有双向表示学习功能的自上而下的话语解析器。现有的修辞结构理论(RST)语料库的规模非常有限,这使得语篇解析非常具有挑战性。为了缓解这个问题,我们利用一些边界特征和数据增强策略来挖掘解析器的潜力。我们使用两种方法进行评估,在RST-DT语料库上的实验表明,由于局部信息和全局信息的有效结合,我们的解析器可以主要提高性能。边界特征和数据增强策略也发挥了作用。基于黄金标准基本话语单元(EDU),我们的解析器显着改进了核检测的基线系统,其他三个指标(跨度、关系和完整)的结果也具有竞争力。基于自动分段的 EDU,我们的解析器仍然优于以前最先进的工作。

更新日期:2023-09-30
down
wechat
bug