Multi-Querying: A Subsequence Matching Approach to Support Multiple Queries,Informatica

当前位置： X-MOL 学术 › Informatica › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Querying: A Subsequence Matching Approach to Support Multiple Queries
Informatica ( IF 2.9 ) Pub Date : 2023-06-28 , DOI: 10.15388/23-infor519
Wen Liu , Mingrui Ma , Peng Wang

The widespread use of sensors has resulted in an unprecedented amount of time series data. Time series mining has experienced a particular surge of interest, among which, subsequence matching is one of the most primary problem that serves as a foundation for many time series data mining techniques, such as anomaly detection and classification. In literature there exist many works to study this problem. However, in many real applications, it is uneasy for users to accurately and clearly elaborate the query intuition with a single query sequence. Consequently, in this paper, we address this issue by allowing users to submit a small query set, instead of a single query. The multiple queries can embody the query intuition better. In particular, we first propose a novel probability-based representation of the query set. A common segmentation is generated which can approximate the queries well, in which each segment is described by some features. For each feature, the corresponding values of multiple queries are represented as a Gaussian distribution. Then, based on the representation, we design a novel distance function to measure the similarity of one subsequence to the multiple queries. Also, we propose a breadth-first search strategy to find out similar subsequences. We have conducted extensive experiments on both synthetic and real datasets, and the results verify the superiority of our approach. PDF XML

中文翻译：

多重查询：一种支持多重查询的子序列匹配方法

传感器的广泛使用导致了前所未有的时间序列数据量。时间序列挖掘受到了特别高的关注，其中子序列匹配是最基本的问题之一，它是许多时间序列数据挖掘技术（例如异常检测和分类）的基础。文献中存在许多研究这个问题的著作。然而，在许多实际应用中，用户很难用单个查询序列准确、清晰地阐述查询直觉。因此，在本文中，我们通过允许用户提交小型查询集而不是单个查询来解决此问题。多个查询可以更好地体现查询直觉。特别是，我们首先提出了一种新颖的基于概率的查询集表示。生成可以很好地近似查询的公共分段，其中每个分段由一些特征描述。对于每个特征，多个查询的对应值表示为高斯分布。然后，基于表示，我们设计了一种新颖的距离函数来测量一个子序列与多个查询的相似度。此外，我们提出了一种广度优先搜索策略来找出相似的子序列。我们对合成数据集和真实数据集进行了广泛的实验，结果验证了我们方法的优越性。PDF XML 我们设计了一种新颖的距离函数来测量一个子序列与多个查询的相似性。此外，我们提出了一种广度优先搜索策略来找出相似的子序列。我们对合成数据集和真实数据集进行了广泛的实验，结果验证了我们方法的优越性。PDF XML 我们设计了一种新颖的距离函数来测量一个子序列与多个查询的相似性。此外，我们提出了一种广度优先搜索策略来找出相似的子序列。我们对合成数据集和真实数据集进行了广泛的实验，结果验证了我们方法的优越性。PDF XML

更新日期：2023-06-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>