数据驱动的对数指令质量评估的方法

论文标题

数据驱动的对数指令质量评估的方法

Data-Driven Approach for Log Instruction Quality Assessment

论文作者

Bogatinovski, Jasmin, Nedelkoski, Sasho, Acker, Alexander, Cardoso, Jorge, Kao, Odej

论文摘要

在当前的IT世界中，开发人员编写代码，而系统操作员主要将代码运行为黑匣子。两个世界之间的连接通常是通过日志消息建立的：开发人员为（未知）操作员提供了提示，在该操作员中，发生的问题是，反之亦然，操作员可以在操作过程中报告错误。为了实现此目的，开发人员编写了通常由日志级别组成的结构化文本的日志指令（例如，“ info”，“ error”），静态文本（“无法达到ip {}”）和动态变量（例如ip {}）。但是，与良好的编码实践相反，关于如何用高质量属性编写日志说明的指南没有广泛的指南。例如，开发人员可以为琐碎的事件分配高的日志级别（例如，“错误”），该事件可能会使操作员混淆并增加维护成本。否则静态文本不足以暗示特定问题。在本文中，我们解决了日志质量评估的问题，并为其自动化提供了第一步。我们从对九个软件系统中质量日志指令属性进行深入分析开始，并确定两个质量属性：1）正确的日志级别分配评估日志级别的正确性，以及2）足够的语言结构，评估了静态文本的最小丰富性，用于静态事件的静态文本所必需的静态文本。基于这些发现，我们开发了一种数据驱动的方法，该方法适应了这两个属性的每个属性的深度学习方法。对大规模开源系统的广泛评估表明，我们的方法正确评估了日志水平分配的精度为0.88，而F1得分为0.99的足够语言结构，表现优于基准。我们的研究表明，数据驱动方法在评估指令质量和辅助开发人员理解和编写更好的代码方面的潜力。

In the current IT world, developers write code while system operators run the code mostly as a black box. The connection between both worlds is typically established with log messages: the developer provides hints to the (unknown) operator, where the cause of an occurred issue is, and vice versa, the operator can report bugs during operation. To fulfil this purpose, developers write log instructions that are structured text commonly composed of a log level (e.g., "info", "error"), static text ("IP {} cannot be reached"), and dynamic variables (e.g. IP {}). However, as opposed to well-adopted coding practices, there are no widely adopted guidelines on how to write log instructions with good quality properties. For example, a developer may assign a high log level (e.g., "error") for a trivial event that can confuse the operator and increase maintenance costs. Or the static text can be insufficient to hint at a specific issue. In this paper, we address the problem of log quality assessment and provide the first step towards its automation. We start with an in-depth analysis of quality log instruction properties in nine software systems and identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description. Based on these findings, we developed a data-driven approach that adapts deep learning methods for each of the two properties. An extensive evaluation on large-scale open-source systems shows that our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines. Our study shows the potential of the data-driven methods in assessing instructions quality and aid developers in comprehending and writing better code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题