Agreements ‘in the wild’: Standards and alignment in machine learning benchmark dataset construction,Big Data & Society

当前位置： X-MOL 学术 › Big Data & Society › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Agreements ‘in the wild’: Standards and alignment in machine learning benchmark dataset construction
Big Data & Society ( IF 8.731 ) Pub Date : 2024-04-03 , DOI: 10.1177/20539517241242457
Isak Engdahl ₁

Affiliation

This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.

中文翻译：

“野外”协议：机器学习基准数据集构建中的标准和一致性

本文介绍了一个企业学术团体的人种学案例研究，该团体为各种机器学习和计算机视觉任务构建日常活动的基准数据集。本文使用社会技术视角，将数据集概念化为一个知识对象，它通过实用标准（日常活动、数据化、注释和基准）和协调工作（即包括达成一致以使这些标准有效）来稳定。在实践中。通过关注协调工作，本文强调了非正式的、沟通的和支持性的努力，这些努力是标准成功以及参与者和因素之间紧张关系缓和的基础。强调这些努力构成了多方面的贡献。本文的人种学分析模式挑战并补充了数据集的定量指标。它通过对新基准数据集的开发作为集体成就进行详细的实证检验，推动了数据集分析领域的发展。通过展示协调工作的重要性及其与标准的密切联系及其局限性，它增加了我们对机器学习数据集如何构建的理解。而且，最重要的是，它对数据集的一个关键特征提出了质疑：它捕获“在野外”自然发生的无脚本活动，因为对齐工作渗透到数据捕获的时刻。

更新日期：2024-04-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>