当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast, accurate and explainable time series classification through randomization
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2023-10-16 , DOI: 10.1007/s10618-023-00978-w
Nestor Cabello , Elham Naghizade , Jianzhong Qi , Lars Kulik

Time series classification (TSC) aims to predict the class label of a given time series, which is critical to a rich set of application areas such as economics and medicine. State-of-the-art TSC methods have mostly focused on classification accuracy, without considering classification speed. However, efficiency is important for big data analysis. Datasets with a large training size or long series challenge the use of the current highly accurate methods, because they are usually computationally expensive. Similarly, classification explainability, which is an important property required by modern big data applications such as appliance modeling and legislation such as the European General Data Protection Regulation, has received little attention. To address these gaps, we propose a novel TSC method – the Randomized-Supervised Time Series Forest (r-STSF). r-STSF is extremely fast and achieves state-of-the-art classification accuracy. It is an efficient interval-based approach that classifies time series according to aggregate values of the discriminatory sub-series (intervals). To achieve state-of-the-art accuracy, r-STSF builds an ensemble of randomized trees using the discriminatory sub-series. It uses four time series representations, nine aggregation functions and a supervised binary-inspired search combined with a feature ranking metric to identify highly discriminatory sub-series. The discriminatory sub-series enable explainable classifications. Experiments on extensive datasets show that r-STSF achieves state-of-the-art accuracy while being orders of magnitude faster than most existing TSC methods and enabling for explanations on the classifier decision.



中文翻译:

通过随机化进行快速、准确且可解释的时间序列分类

时间序列分类(TSC)旨在预测给定时间序列的类别标签,这对于经济学和医学等丰富的应用领域至关重要。最先进的 TSC 方法主要关注分类精度,而不考虑分类速度。然而,效率对于大数据分析很重要。训练规模大或序列长的数据集对当前高精度方法的使用提出了挑战,因为它们通常计算成本很高。同样,分类可解释性是现代大数据应用(例如电器建模)欧洲通用数据保护条例等立法所需的重要属性,但很少受到关注。为了解决这些差距,我们提出了一种新颖的 TSC 方法——随机监督时间序列森林(r-STSF)。r-STSF 速度极快,并且达到了最先进的分类精度。它是一种有效的基于区间的方法,根据有区别的子序列(区间)的聚合值对时间序列进行分类。为了实现最先进的准确性,r-STSF 使用判别性子系列构建了一个随机树集合。它使用四种时间序列表示、九种聚合函数和一种受监督的二进制启发搜索,并结合特征排名指标来识别高度歧视的子系列。歧视性子系列可以实现可解释的分类。对大量数据集的实验表明,r-STSF 实现了最先进的准确性,同时比大多数现有 TSC 方法快几个数量级,并且能够解释分类器决策。

更新日期:2023-10-16
down
wechat
bug