当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A corpus of English learners with Arabic and Hebrew backgrounds
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2023-11-20 , DOI: 10.1007/s10579-023-09692-x
Omaima Abboud , Batia Laufer , Noam Ordan , Uliana Sentsova , Shuly Wintner

Learner corpora—datasets that reflect the language of non-native speakers—are instrumental for research of language learning and development, as well as for practical applications, mainly for teaching and education. Such corpora now exist for a plethora of native–foreign language pairs; but until recently, none of them reflected native Hebrew speakers, and very few reflected native Arabic speakers. We introduce a recently-released corpus of English essays authored by learners in Israel. The corpus consists of two sub-corpora, one of them of Arabic native speakers and the other consisting mainly of Hebrew native speakers. We report on the composition and curation of the datasets; specifically, we processed the data so that both sub-corpora are now uniformly represented, facilitating seamless research and computational processing of the data. We provide statistical information on the corpora and outline a few research projects that had already used them. This is the first and only learner corpus in Israel including two major native languages of people in the same educational system regarding the English syllabus. All the resources related to the corpus are freely available.



中文翻译:

具有阿拉伯语和希伯来语背景的英语学习者语料库

学习者语料库(反映非母语人士语言的数据集)对于语言学习和发展的研究以及实际应用(主要是教学和教育)很有帮助。现在存在大量母语-外语对的此类语料库。但直到最近,它们都没有反映以希伯来语为母语的人,也很少有以阿拉伯语为母语的人。我们介绍了最近发布的由以色列学习者撰写的英语论文语料库。该语料库由两个子语料库组成,其中一个子语料库由阿拉伯语母语者组成,另一个主要由希伯来语母语者组成。我们报告数据集的组成和管理;具体来说,我们对数据进行了处理,以便两个子语料库现在得到统一表示,从而促进数据的无缝研究和计算处理。我们提供有关该语料库的统计信息,并概述了一些已经使用它们的研究项目。这是以色列第一个也是唯一一个学习者语料库,其中包含同一教育系统中人们的两种主要母语的英语教学大纲。所有与语料库相关的资源都是免费提供的。

更新日期:2023-11-23
down
wechat
bug