论文标题
隐私政策覆盖分析序列分类模型的比较研究
A Comparative Study of Sequence Classification Models for Privacy Policy Coverage Analysis
论文作者
论文摘要
隐私政策是描述网站将如何收集,使用和分发用户数据的法律文件。不幸的是,这样的文件通常过于复杂,并充满了合法的行话。使用户难以完全掌握究竟要收集的内容以及原因。我们解决此问题的解决方案是,使用广泛的经典机器学习和深度学习技巧为用户提供对给定网站的隐私政策的覆盖范围分析。鉴于网站的隐私政策,分类器可确定每个逻辑段的相关数据实践。这些数据实践/标签直接从OPP-115语料库中获取。例如,数据实践“保留数据”是指网站存储用户信息的时间。覆盖范围分析允许用户确定涵盖了十种可能的数据实践中的多少,并确定与特别感兴趣的数据实践相对应的部分。
Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data. Unfortunately, such documents are often overly complicated and filled with legal jargon; making it difficult for users to fully grasp what exactly is being collected and why. Our solution to this problem is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques. Given a website's privacy policy, the classifier identifies the associated data practice for each logical segment. These data practices/labels are taken directly from the OPP-115 corpus. For example, the data practice "Data Retention" refers to how long a website stores a user's information. The coverage analysis allows users to determine how many of the ten possible data practices are covered, along with identifying the sections that correspond to the data practices of particular interest.