论文标题

采用推荐系统来支持数据和算法共享

Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing

论文作者

Müllner, Peter, Schmerda, Stefan, Theiler, Dieter, Lindstaedt, Stefanie, Kowald, Dominik

论文摘要

数据和算法共享是数据和AI驱动经济体的必要组成部分。数据和算法的有效共享取决于用户,数据提供商和算法提供商之间的主动相互作用。尽管已知推荐系统可以有效地互连用户和电子商务设置中的项目,但缺乏对推荐系统在数据和算法共享的适用性的研究。为了填补这一空白,我们确定了六个建议方案,用于支持数据和算法共享,其中这些方案中有四种与电子商务应用程序中的传统推荐方案有很大不同。我们根据OpenML数据和算法共享平台的交互数据来评估这些建议方案,我们也为科学界提供了这些建议。具体而言,我们研究了三种类型的建议方法,即受欢迎程度,协作和基于内容的建议。我们发现,基于协作的建议在所有情况下都提供了最准确的建议。另外,建议精度在很大程度上取决于特定方案,例如,用户的算法建议比数据集的算法建议更困难。最后,基于内容的方法产生了涵盖最多的数据集和算法的最不受欢迎的建议。

Data and algorithm sharing is an imperative part of data and AI-driven economies. The efficient sharing of data and algorithms relies on the active interplay between users, data providers, and algorithm providers. Although recommender systems are known to effectively interconnect users and items in e-commerce settings, there is a lack of research on the applicability of recommender systems for data and algorithm sharing. To fill this gap, we identify six recommendation scenarios for supporting data and algorithm sharing, where four of these scenarios substantially differ from the traditional recommendation scenarios in e-commerce applications. We evaluate these recommendation scenarios using a novel dataset based on interaction data of the OpenML data and algorithm sharing platform, which we also provide for the scientific community. Specifically, we investigate three types of recommendation approaches, namely popularity-, collaboration-, and content-based recommendations. We find that collaboration-based recommendations provide the most accurate recommendations in all scenarios. Plus, the recommendation accuracy strongly depends on the specific scenario, e.g., algorithm recommendations for users are a more difficult problem than algorithm recommendations for datasets. Finally, the content-based approach generates the least popularity-biased recommendations that cover the most datasets and algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源