论文标题
AOBTM:自适应在线BITERM主题建模,用于版本敏感的短文本分析
AOBTM: Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis
论文作者
论文摘要
移动应用程序评论的分析表明,其在需求工程,软件维护和移动应用程序的演变中的重要作用。移动应用开发人员经常检查用户的评论,以澄清用户遇到的问题或捕获由于最近的应用程序更新而引入的新问题。应用程序评论具有动态的性质,其讨论的主题随着时间的流逝而变化。有关应用程序不同版本的收集评论中的主题的变化可以揭示有关应用程序更新的重要问题。该分析中的主要技术是使用主题建模算法。但是,应用程序评论是简短的文本,随着时间的流逝,推出其潜在主题是一项挑战。传统的主题模型遭受了单词共发生模式的稀疏性,同时推断出短文的主题。此外,这些算法无法捕获许多连续时间分段的主题。在线主题建模算法加快了通过节省以前时间板的一小部分数据来加快在最新时间片中收集的文本的推理。但是,这些算法并未分析所有以前的时间分段的统计数据,这可以赋予当前时板的主题分布的贡献。 我们建议自适应在线Biterm主题模型(AOBTM),以自适应地模拟主题。 AOBTM减轻了短文本中的稀疏性问题,并考虑了最佳数量以前的时间分段的统计数据。我们还提出了并行算法,以自动确定主题的最佳数量和应在主题推理阶段中考虑的最佳数量。自动评估应用程序评论和现实世界短文本数据集的收集确认,AOBTM可以找到更多连贯的主题,并胜过最先进的基线。
Analysis of mobile app reviews has shown its important role in requirement engineering, software maintenance and evolution of mobile apps. Mobile app developers check their users' reviews frequently to clarify the issues experienced by users or capture the new issues that are introduced due to a recent app update. App reviews have a dynamic nature and their discussed topics change over time. The changes in the topics among collected reviews for different versions of an app can reveal important issues about the app update. A main technique in this analysis is using topic modeling algorithms. However, app reviews are short texts and it is challenging to unveil their latent topics over time. Conventional topic models suffer from the sparsity of word co-occurrence patterns while inferring topics for short texts. Furthermore, these algorithms cannot capture topics over numerous consecutive time-slices. Online topic modeling algorithms speed up the inference of topic models for the texts collected in the latest time-slice by saving a fraction of data from the previous time-slice. But these algorithms do not analyze the statistical-data of all the previous time-slices, which can confer contributions to the topic distribution of the current time-slice. We propose Adaptive Online Biterm Topic Model (AOBTM) to model topics in short texts adaptively. AOBTM alleviates the sparsity problem in short-texts and considers the statistical-data for an optimal number of previous time-slices. We also propose parallel algorithms to automatically determine the optimal number of topics and the best number of previous versions that should be considered in topic inference phase. Automatic evaluation on collections of app reviews and real-world short text datasets confirm that AOBTM can find more coherent topics and outperforms the state-of-the-art baselines.