AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata,Language Resources and Evaluation

当前位置： X-MOL 学术 › Lang. Resour. Eval. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2024-01-03 , DOI: 10.1007/s10579-023-09702-y
Kerenza Doxolodeo , Adila Alfa Krisnadhi

Constructing a question-answering dataset can be prohibitively expensive, making it difficult for researchers to make one for an under-resourced language, such as Indonesian. We create a novel Indonesian Question Answering dataset that is produced automatically end-to-end. The process uses Context Free Grammar, the Wikipedia Indonesian Corpus, and the concept of the proxy model. The dataset consists of 134 thousand simple questions and 60 thousand complex questions. It achieved competitive grammatical and model accuracy compared to the translated dataset but suffers from some issues due to resource constraints.

中文翻译：

AC-IQuAD：利用维基数据自动构建印尼语问答数据集

构建问答数据集可能非常昂贵，这使得研究人员很难为资源匮乏的语言（例如印度尼西亚语）制作一个数据集。我们创建了一个新颖的印度尼西亚问答数据集，该数据集是端到端自动生成的。该过程使用上下文无关语法、维基百科印尼语语料库和代理模型的概念。该数据集包含 13.4 万个简单问题和 6 万个复杂问题。与翻译的数据集相比，它实现了有竞争力的语法和模型准确性，但由于资源限制而遇到一些问题。

更新日期：2024-01-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>