论文标题

分析区块链分散和协作AI的模型

Analysis of Models for Decentralized and Collaborative AI on Blockchain

论文作者

Harris, Justin D.

论文摘要

机器学习最近使人工智能取得了巨大进步,但是这些结果可以高度集中。所需的大数据集通常是专有的;预测通常是按时销售的;并且已发布的模型可以很快变得过时,而无需努力获取更多数据并维护它们。已发布的提案免费提供某些任务的模型和数据,包括Microsoft Research在区块链上的分散和协作AI。该框架允许参与者协作构建数据集并使用智能合约来共享公共区块链上不断更新的模型。最初的建议概述了该框架,省略了所使用的模型的许多细节以及现实世界中的激励机制。在这项工作中,我们评估了几种模型和配置的使用,以便在使用自我评估激励机制时提出最佳实践,以便模型可以保持准确且有良好的参与者,以提交正确的数据有机会利润。我们已经分析了三种模型中每种模型的仿真:感知到的,幼稚的贝叶斯和一个最近的质心分类器,其中包括三个不同的数据集:预测一项来自Endomondo的用户活动的运动,IMDB的电影评论的情感分析,并确定新闻文章是否是假的。当在公共区块链上托管模型时,我们比较了每个数据集的几个因素:随着时间的流逝,它们的准确性,好坏用户的平衡以及交易成本(或汽油)用于部署,更新,收集退款和收集奖励。 https://github.com/microsoft/0xdeca10b提供了以太坊区块链的免费和开源实现。此版本使用原始出版物后编写的更新优化更新了气体成本。

Machine learning has recently enabled large advances in artificial intelligence, but these results can be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and maintain them. Published proposals to provide models and data for free for certain tasks include Microsoft Research's Decentralized and Collaborative AI on Blockchain. The framework allows participants to collaboratively build a dataset and use smart contracts to share a continuously updated model on a public blockchain. The initial proposal gave an overview of the framework omitting many details of the models used and the incentive mechanisms in real world scenarios. In this work, we evaluate the use of several models and configurations in order to propose best practices when using the Self-Assessment incentive mechanism so that models can remain accurate and well-intended participants that submit correct data have the chance to profit. We have analyzed simulations for each of three models: Perceptron, Naïve Bayes, and a Nearest Centroid Classifier, with three different datasets: predicting a sport with user activity from Endomondo, sentiment analysis on movie reviews from IMDB, and determining if a news article is fake. We compare several factors for each dataset when models are hosted in smart contracts on a public blockchain: their accuracy over time, balances of a good and bad user, and transaction costs (or gas) for deploying, updating, collecting refunds, and collecting rewards. A free and open source implementation for the Ethereum blockchain and simulations written in Python is provided at https://github.com/microsoft/0xDeCA10B. This version has updated gas costs using newer optimizations written after the original publication.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源