论文标题
堆栈溢出上的外部链接断开
Broken External Links on Stack Overflow
论文作者
论文摘要
Stack Overflow拥有有价值的与编程相关的知识,并带有11,926,354个链接,引用了第三方网站。引用堆栈溢出网站外托管资源的链接大大扩展了堆栈溢出知识基础。但是,随着与编程相关的知识的快速发展,互联网上托管的许多资源不再可用。基于我们对2019年6月2日发布的堆栈溢出数据的分析,堆栈溢出上的链接的14.2%是断开的链接。堆栈溢出上的断开链接可能会阻碍观众获得所需的编程相关知识,并可能损害堆栈溢出的声誉,因为观众可能将链接断开的帖子视为已过时的帖子。在本文中,我们表征了堆栈溢出上的损坏链接。我们采样问题中65%的链接中有65%用于显示示例,例如代码示例。我们采样答案中70%的链接中有70%用于提供支持信息,例如解释某个概念并描述解决问题的步骤。在帖子的评论中,观看者只有1.67%的帖子突出显示了链接断开的帖子。只有5.8%的帖子中链接断开的帖子删除了损坏的链接。观众不能完全依靠投票分数来检测断开的链接,因为断开的链接在不同投票分数的帖子中很常见。托管链接最多可以在堆栈溢出上引用的托管资源的网站是GitHub。与Web技术有关的帖子和评论,即JavaScript,HTML,CSS和JQuery,与更损坏的链接相关联。根据我们的发现,我们为未来的方向提供了灯光,并为从业人员和研究人员提供了建议。
Stack Overflow hosts valuable programming-related knowledge with 11,926,354 links that reference to the third-party websites. The links that reference to the resources hosted outside the Stack Overflow websites extend the Stack Overflow knowledge base substantially. However, with the rapid development of programming-related knowledge, many resources hosted on the Internet are not available anymore. Based on our analysis of the Stack Overflow data that was released on Jun. 2, 2019, 14.2% of the links on Stack Overflow are broken links. The broken links on Stack Overflow can obstruct viewers from obtaining desired programming-related knowledge, and potentially damage the reputation of the Stack Overflow as viewers might regard the posts with broken links as obsolete. In this paper, we characterize the broken links on Stack Overflow. 65% of the broken links in our sampled questions are used to show examples, e.g., code examples. 70% of the broken links in our sampled answers are used to provide supporting information, e.g., explaining a certain concept and describing a step to solve a problem. Only 1.67% of the posts with broken links are highlighted as such by viewers in the posts' comments. Only 5.8% of the posts with broken links removed the broken links. Viewers cannot fully rely on the vote scores to detect broken links, as broken links are common across posts with different vote scores. The websites that host resources that can be maintained by their users are referenced by broken links the most on Stack Overflow -- a prominent example of such websites is GitHub. The posts and comments related to the web technologies, i.e., JavaScript, HTML, CSS, and jQuery, are associated with more broken links. Based on our findings, we shed lights for future directions and provide recommendations for practitioners and researchers.