Abstract
When time series are processed, the difficulty increases with the size of the series. This fact is aggravated when time series are processed online, since their size increases indefinitely. Therefore, reducing their number of points, without significant loss of information, is an important field of research. This article proposes an optimal online segmentation method, called OSFS-OnL, which guarantees that the number of segments is minimal, that a preset error limit is not exceeded using the \(L \infty \)-norm, and that for that number of segments the value of the error corresponding to the \(L^2\)-norm is minimized. This new proposal has been compared with the optimal OSFS offline segmentation method and has shown better computational performance, regardless of its flexibility to apply it to online or offline segmentation.
Similar content being viewed by others
Data availability
The datasets supporting Fig. 4, as well as those that have been used in the experiments, are publicly available in https://github.com/ma1capoa/OSFS_Method/tree/main/TimeSeriesFiles. No datasets were generated during the current work.
References
Pérez-Ortiz M, Durán-Rosal A, Gutiérrez P, Sánchez-Monedero J, Nikolaou A, Fernández-Navarro F, Hervás-Martínez C (2019) On the use of evolutionary time series analysis for segmenting paleoclimate data. Neurocomputing 326–327:3–14
Deng W, Wang G (2017) A novel water quality data analysis framework based on time-series data mining. J Environ Manag 196:365–375
Koski A, Juhola M, Meriste M (1995) Syntactic recognition of ECG signals by attributed finite automata. Pattern Recognit 28:1927–1940
Lee C-H, Liu A, Chen W-S (2006) Pattern discovery of fuzzy time series for financial prediction. IEEE Trans Knowl Data Eng 18:613–625
Okawa M (2021) Time series averaging and local stability weighted dynamic time warping for online signature verification. Pattern Recognit 112:107699
Cortes C, Fisher K, Pregibon D, Rogers A, Smith F (2000) Hancock: a language for extracting signatures from data streams. In: Proceeding of the sixth ACM SIGKDD international conference on knowledge discovery and data mining
Fu T-C (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Chatfield C (2000) Time-series forecasting. CRC Press, Boca Raton
Van Trees HL (2004) Detection, estimation, and modulation theory, part I: detection, estimation, and linear modulation theory. Wiley, Hoboken
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:1–58
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33:917–963
Aghabozorgi S, Shirkhorshidi A, Wah T (2015) Time-series clustering-a decade review. Inf Syst 53:16–38
Weigend A (1994) Time series prediction: forecasting the future and understanding the past, 1st edn. Routledge, London
Kamalzadeh H, Ahmadi A, Mansour S (2017) A shape-based adaptive segmentation of time-series using particle swarm optimization. Inf Syst 67:1–18
Tseng V, Chen C-H, Huang P-C, Hong T-P (2009) Cluster-based genetic segmentation of time series with DWT. Pattern Recognit Lett 30:1190–1197
Fuchs E, Gruber T, Nitschke J, Sick B (2009) On-line motif detection in time series with swift motif. Pattern Recognit 42:3015–3031
Keogh E, Chu S, Hart D, Pazzani M (2004) Segmenting time series: a survey and novel approach. In: Data mining in time series databases, pp 1–22
Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245
Xie Q, Pang C, Zhou X, Zhang X, Deng K (2014) Maximum error-bounded piecewise linear representation for online stream approximation. VLDB J 23:915–937
Carmona-Poyato A, Fernández-Garcia NL, Madrid-Cuevas FJ, Durán-Rosal AM (2021) A new approach for optimal offline time-series segmentation with error bound guarantee. Pattern Recognit 115:107917
Sarker IH (2019) Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data 6:95
Carmona-Poyato A, Fernandez-Garcia NL, Madrid-Cuevas F, Duran-Rosal A (2020) A new approach for optimal time-series segmentation. Pattern Recognit Lett 135:153–159
Liu X, Lin Z, Wang H (2008) Novel online methods for time series segmentation. IEEE Trans Knowl Data Eng 20(12):1616–1626
Chu C (1995) Time series segmentation: a sliding window approach. Inf Sci 85(1):147–173
Keogh E, Chakrabarti K, Pazzani M, Mehrotr S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15:107–144
Keogh E, Chu S, Pazzani M, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: Proceedings 2001 IEEE international conference on data mining, pp 289–296
Salotti M (2002) Optimal polygonal approximation of digitized curves using the sum of square deviations criterion. Pattern Recognit 35:435–443
Pikaz A, Dinstein I (1995) Optimal polygonal approximation of digital curves. Pattern Recognit 28:373–379
NOAA (2015) National Buoy Data Center. http://www.ndbc.noaa.gov/
Dau HA, Keogh E, Kamgar K, Yeh C (2018) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (October)
Donoho DL, Johnstone IM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455
Donoho DL, Johnstone IM (1995) Adapting to unknown smoothness via wavelet shrinkage. J Am Stat Assoc 90(432):1200–1224
Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50
Tsay RS (2010) Analysis of financial time series, 3rd edn. Wiley, Hoboken
IFAPA (2023) Red de Información Agroclimática de Andalucía. https://www.juntadeandalucia.es/agriculturaypesca/ifapa/riaweb/web/estacion/14/6
Universidad Pompeu Fabra (2023) The Bonn EEG time series. https://www.upf.edu/web/ntsa/downloads
Acknowledgements
This work has been developed with the support of the Research Project PID2019-103871GB-I00 of Spanish Ministry of Economy, Industry and Competitiveness.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Carmona-Poyato, Á., Fernández-García, NL., Madrid-Cuevas, FJ. et al. Optimal online time-series segmentation. Knowl Inf Syst 66, 2417–2438 (2024). https://doi.org/10.1007/s10115-023-02029-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02029-8