当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging Bidirectionl LSTM with CRFs for Pashto Tagging
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2024-04-15 , DOI: 10.1145/3649456
Farooq Zaman 1 , Onaiza Maqbool 1 , Jaweria Kanwal 2
Affiliation  

Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present a Long Short-term Memory–based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially, we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-the-art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuation and 95.45% for ambiguous words; however, Hidden Markov Model shows 78.37% and 44.72% accuracy, respectively. Results show that our approach outperforms Hidden Markov Model in Part-of-Speech tagging for Pashto text.



中文翻译:

利用双向 LSTM 和 CRF 进行普什图语标记

词性标注在文本处理和自然语言理解中起着至关重要的作用。过去很少尝试标记普什图语词性。在这项工作中,我们提出了一种基于长短期记忆的普什图语词性标记方法,特别关注歧义消除。最初,我们创建了一个普什图语句子语料库,其中包含具有多种含义的单词及其标签。我们为普什图语文本处理引入了强大的句子表示和新架构。所提出方法的准确性与最先进的隐马尔可夫模型进行了比较。我们的模型对除标点符号之外的所有单词显示准确率为 87.60%,对歧义单词的准确率为 95.45%;然而,隐马尔可夫模型的准确率分别为 78.37% 和 44.72%。结果表明,我们的方法在普什图语文本的词性标记方面优于隐马尔可夫模型。

更新日期:2024-04-15
down
wechat
bug