论文标题

American ==白色在多模式语言和图像AI中

American == White in Multimodal Language-and-Image AI

论文作者

Wolfe, Robert, Caliskan, Aylin

论文摘要

评估了三种最先进的语言和图像AI模型,即剪辑,滑移和BLIP,以证明以前在社会和实验心理学中观察到的偏见:将美国身份等同于白人。使用芝加哥脸部数据库(CFD)的自我识别的亚洲,黑人,拉丁裔和白人个体的标准化图像嵌入关联测试(EATS)表明,白人与集体内词相比,与亚洲,黑人或拉丁裔/O个人更相关。在评估社会心理学家报道的美国身份的三个核心方面时,单类饮食表明,白人的形象与爱国主义和出生在美国更相关,但与心理学的先前发现一致,白人个人与不太可能同样对待所有种族和背景的人。三个下游机器学习任务表明了将美国与白人相关联的偏见。在使用BLIP回答任务的视觉问答中,只有3%的亚洲人被确定为美国人,其中97%的人被确定为美国人。当被问及个人所描绘的生活状态时,该模型在亚洲人中有53%的时间回应中国,但始终具有美国对白人的国家。在图像字幕的任务中,Blip评论了亚洲人的种族多达36%的时间,但从未对白人人士进行比赛。最后,使用基于文本的基于文​​本的指导率提供了CFD和文本“ American Person”的初始化图像,并提供了合成图像发生器(VQGAN),从而减轻了所有种族个体的肤色(基于像素亮度的黑人个人的35%)。结果表明,语言和图像ai将其等同于美国身份与白人的偏见,并传播到此类模型的下游应用。

Three state-of-the-art language-and-image AI models, CLIP, SLIP, and BLIP, are evaluated for evidence of a bias previously observed in social and experimental psychology: equating American identity with being White. Embedding association tests (EATs) using standardized images of self-identified Asian, Black, Latina/o, and White individuals from the Chicago Face Database (CFD) reveal that White individuals are more associated with collective in-group words than are Asian, Black, or Latina/o individuals. In assessments of three core aspects of American identity reported by social psychologists, single-category EATs reveal that images of White individuals are more associated with patriotism and with being born in America, but that, consistent with prior findings in psychology, White individuals are associated with being less likely to treat people of all races and backgrounds equally. Three downstream machine learning tasks demonstrate biases associating American with White. In a visual question answering task using BLIP, 97% of White individuals are identified as American, compared to only 3% of Asian individuals. When asked in what state the individual depicted lives in, the model responds China 53% of the time for Asian individuals, but always with an American state for White individuals. In an image captioning task, BLIP remarks upon the race of Asian individuals as much as 36% of the time, but never remarks upon race for White individuals. Finally, provided with an initialization image from the CFD and the text "an American person," a synthetic image generator (VQGAN) using the text-based guidance of CLIP lightens the skin tone of individuals of all races (by 35% for Black individuals, based on pixel brightness). The results indicate that biases equating American identity with being White are learned by language-and-image AI, and propagate to downstream applications of such models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源