当前位置: X-MOL 学术World Englishes › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tagging Singapore English
World Englishes ( IF 1.154 ) Pub Date : 2022-07-21 , DOI: 10.1111/weng.12597
Li Lin 1 , Kunmei Han 1 , Jiawen Hing 1 , Luwen Cao 1 , Vincent Ooi 1 , Nick Huang 1 , Zhiming Bao 1
Affiliation  

It is well-known that Outer Circle English has undergone extensive contact-induced lexical and grammatical restructuring. Is it possible to use common NLP tools developed for Inner Circle English to process Outer Circle English texts? Here, we report our experience of using the Stanford PoS tagger to tag the Singaporean component of the International Corpus of English (ICE-SIN). We isolate two major contact-related causes of tagging errors: (1) lexical and grammatical loans directly borrowed from the local languages; and (2) English-origin words with new grammatical meanings acquired from the local languages. While the first type may be easy to overcome, the latter type is intractable, creating an extra layer of morphosyntactic complexity. We achieved comparable accuracy rates in the more formal registers, and a lower but still decent 88% in the informal register of private conversations. A tagged ICE-SIN allows us to investigate lexical and grammatical restructuring at unprecedented levels of detail.

中文翻译:

标记新加坡英语

众所周知,外圈英语经历了广泛的接触引起的词汇和语法重组。是否可以使用内圈英语开发的常用NLP工具来处理外圈英语文本?在这里,我们报告使用斯坦福 PoS 标记器来标记国际英语语料库 (ICE-SIN) 的新加坡部分的经验。我们分离出与接触相关的标注错误的两个主要原因:(1)直接从当地语言借用的词汇和语法借用;(2)源自英语的单词,具有从当地语言获得的新语法含义。虽然第一种类型可能很容易克服,但后一种类型却很棘手,从而产生了额外的形态句法复杂性。我们在更正式的语域中达到了相当的准确率,在非正式的私人对话语域中达到了较低但仍然不错的 88% 的准确率。带标签的 ICE-SIN 使我们能够以前所未有的详细程度研究词汇和语法重组。
更新日期:2022-07-21
down
wechat
bug