Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2024-04-15 , DOI: 10.1145/3649456
Farooq Zaman ₁ , Onaiza Maqbool ₁ , Jaweria Kanwal ₂

Affiliation

Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present a Long Short-term Memory–based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially, we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-the-art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuation and 95.45% for ambiguous words; however, Hidden Markov Model shows 78.37% and 44.72% accuracy, respectively. Results show that our approach outperforms Hidden Markov Model in Part-of-Speech tagging for Pashto text.

中文翻译：

利用双向 LSTM 和 CRF 进行普什图语标记

词性标注在文本处理和自然语言理解中起着至关重要的作用。过去很少尝试标记普什图语词性。在这项工作中，我们提出了一种基于长短期记忆的普什图语词性标记方法，特别关注歧义消除。最初，我们创建了一个普什图语句子语料库，其中包含具有多种含义的单词及其标签。我们为普什图语文本处理引入了强大的句子表示和新架构。所提出方法的准确性与最先进的隐马尔可夫模型进行了比较。我们的模型对除标点符号之外的所有单词显示准确率为 87.60%，对歧义单词的准确率为 95.45%；然而，隐马尔可夫模型的准确率分别为 78.37% 和 44.72%。结果表明，我们的方法在普什图语文本的词性标记方面优于隐马尔可夫模型。

更新日期：2024-04-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>