当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2024-01-03 , DOI: 10.1007/s10579-023-09702-y
Kerenza Doxolodeo , Adila Alfa Krisnadhi

Constructing a question-answering dataset can be prohibitively expensive, making it difficult for researchers to make one for an under-resourced language, such as Indonesian. We create a novel Indonesian Question Answering dataset that is produced automatically end-to-end. The process uses Context Free Grammar, the Wikipedia Indonesian Corpus, and the concept of the proxy model. The dataset consists of 134 thousand simple questions and 60 thousand complex questions. It achieved competitive grammatical and model accuracy compared to the translated dataset but suffers from some issues due to resource constraints.



构建问答数据集可能非常昂贵,这使得研究人员很难为资源匮乏的语言(例如印度尼西亚语)制作一个数据集。我们创建了一个新颖的印度尼西亚问答数据集,该数据集是端到端自动生成的。该过程使用上下文无关语法、维基百科印尼语语料库和代理模型的概念。该数据集包含 13.4 万个简单问题和 6 万个复杂问题。与翻译的数据集相比,它实现了有竞争力的语法和模型准确性,但由于资源限制而遇到一些问题。
