Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction,Journal of Computer Science and Technology

当前位置： X-MOL 学术 › J. Comput. Sci. Tech. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2023-09-30 , DOI: 10.1007/s11390-021-1691-3
Ran Tian , Zu-Long Diao , Hai-Yang Jiang , Gao-Gang Xie

Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Second, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholders and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholders via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.

中文翻译：

认知：使用模板校正进行准确一致的线性日志解析

日志包含系统和用户的运行时信息。由于其中许多使用自然语言，典型的基于日志的分析需要首先将日志解析为结构化格式。现有的解析方法通常需要两个步骤。第一步是找到相似的单词（标记）或句子。其次，解析器通过用可变占位符替换不同的标记来提取日志模板。然而，我们观察到大多数解析器专注于对相似的标记或日志进行精确分组。但他们没有设计良好的模板提取过程，这导致特定数据集的准确性不一致。根本原因是变量占位符和类似模板的定义不明确。其后果包括滥用可变占位符、错误划分的模板以及随着时间的推移模板数量过多。在本文中，我们提出了在线日志解析方法认知。它通过严格的下限重新定义变量占位符，以避免歧义。然后，它应用我们的模板校正技术来合并和吸收相似的模板。消除了常用参数的干扰，从而隔离了模板数量。通过 16 个公共数据集的评估表明，Cognition 比最先进的方法具有更好的准确性和一致性。它还比其他方法平均节省高达 52.1% 的时间成本。

更新日期：2023-09-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>