论文标题

数据产生的DispoSitif

The Data-Production Dispositif

论文作者

Miceli, Milagros, Posada, Julian

论文摘要

机器学习(ML)取决于训练和验证模型的数据。通常,组织通过业务流程外包(BPO)公司和众包平台外包与数据工作(即生成和注释数据并评估产出)相关的流程。本文通过研究委内瑞拉的三个平台和阿根廷的BPO来研究拉丁美洲的ML数据工作。我们依靠DispoSitif的Foucauldian概念将数据产生销售定义为在战略上策略性地置于(重新)在数据和劳动中产生权力/知识关系的论述,行动和对象的合奏。我们的DispoSitif分析包括对210个数据工作指导文档的检查,对数据工作者,经理和请求者的55次访谈以及参与者的观察。我们的发现表明,在说明中编码的话语复制并使请求者的世界观正常化。不稳定的工作条件和经济依赖性使工人疏远,使他们对指示服从。此外,在界面和绩效指标,限制工人的代理以及使特定的解释数据归一化的文物中实现的话语和社会环境。最后,我们强调通过与疏远和危害来抵消数据产生的呈硫的重要性,并授权数据工作者成为寻求高质量数据的资产。

Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource processes related to data work (i.e., generating and annotating data and evaluating outputs) through business process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML data work in Latin America by studying three platforms in Venezuela and a BPO in Argentina. We lean on the Foucauldian notion of dispositif to define the data-production dispositif as an ensemble of discourses, actions, and objects strategically disposed to (re)produce power/knowledge relations in data and labor. Our dispositif analysis comprises the examination of 210 data work instruction documents, 55 interviews with data workers, managers, and requesters, and participant observation. Our findings show that discourses encoded in instructions reproduce and normalize the worldviews of requesters. Precarious working conditions and economic dependency alienate workers, making them obedient to instructions. Furthermore, discourses and social contexts materialize in artifacts, such as interfaces and performance metrics, limiting workers' agency and normalizing specific ways of interpreting data. We conclude by stressing the importance of counteracting the data-production dispositif by fighting alienation and precarization, and empowering data workers to become assets in the quest for high-quality data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源