Skip to main content
Log in

Optimal online time-series segmentation

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

When time series are processed, the difficulty increases with the size of the series. This fact is aggravated when time series are processed online, since their size increases indefinitely. Therefore, reducing their number of points, without significant loss of information, is an important field of research. This article proposes an optimal online segmentation method, called OSFS-OnL, which guarantees that the number of segments is minimal, that a preset error limit is not exceeded using the \(L \infty \)-norm, and that for that number of segments the value of the error corresponding to the \(L^2\)-norm is minimized. This new proposal has been compared with the optimal OSFS offline segmentation method and has shown better computational performance, regardless of its flexibility to apply it to online or offline segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The datasets supporting Fig. 4, as well as those that have been used in the experiments, are publicly available in https://github.com/ma1capoa/OSFS_Method/tree/main/TimeSeriesFiles. No datasets were generated during the current work.

References

  1. Pérez-Ortiz M, Durán-Rosal A, Gutiérrez P, Sánchez-Monedero J, Nikolaou A, Fernández-Navarro F, Hervás-Martínez C (2019) On the use of evolutionary time series analysis for segmenting paleoclimate data. Neurocomputing 326–327:3–14

    Article  Google Scholar 

  2. Deng W, Wang G (2017) A novel water quality data analysis framework based on time-series data mining. J Environ Manag 196:365–375

    Article  Google Scholar 

  3. Koski A, Juhola M, Meriste M (1995) Syntactic recognition of ECG signals by attributed finite automata. Pattern Recognit 28:1927–1940

    Article  Google Scholar 

  4. Lee C-H, Liu A, Chen W-S (2006) Pattern discovery of fuzzy time series for financial prediction. IEEE Trans Knowl Data Eng 18:613–625

    Article  Google Scholar 

  5. Okawa M (2021) Time series averaging and local stability weighted dynamic time warping for online signature verification. Pattern Recognit 112:107699

    Article  Google Scholar 

  6. Cortes C, Fisher K, Pregibon D, Rogers A, Smith F (2000) Hancock: a language for extracting signatures from data streams. In: Proceeding of the sixth ACM SIGKDD international conference on knowledge discovery and data mining

  7. Fu T-C (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181

    Article  Google Scholar 

  8. Chatfield C (2000) Time-series forecasting. CRC Press, Boca Raton

    Google Scholar 

  9. Van Trees HL (2004) Detection, estimation, and modulation theory, part I: detection, estimation, and linear modulation theory. Wiley, Hoboken

    Google Scholar 

  10. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:1–58

    Article  Google Scholar 

  11. Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33:917–963

    Article  MathSciNet  Google Scholar 

  12. Aghabozorgi S, Shirkhorshidi A, Wah T (2015) Time-series clustering-a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  13. Weigend A (1994) Time series prediction: forecasting the future and understanding the past, 1st edn. Routledge, London

    Google Scholar 

  14. Kamalzadeh H, Ahmadi A, Mansour S (2017) A shape-based adaptive segmentation of time-series using particle swarm optimization. Inf Syst 67:1–18

    Article  Google Scholar 

  15. Tseng V, Chen C-H, Huang P-C, Hong T-P (2009) Cluster-based genetic segmentation of time series with DWT. Pattern Recognit Lett 30:1190–1197

    Article  Google Scholar 

  16. Fuchs E, Gruber T, Nitschke J, Sick B (2009) On-line motif detection in time series with swift motif. Pattern Recognit 42:3015–3031

    Article  Google Scholar 

  17. Keogh E, Chu S, Hart D, Pazzani M (2004) Segmenting time series: a survey and novel approach. In: Data mining in time series databases, pp 1–22

  18. Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245

    Article  Google Scholar 

  19. Xie Q, Pang C, Zhou X, Zhang X, Deng K (2014) Maximum error-bounded piecewise linear representation for online stream approximation. VLDB J 23:915–937

    Article  Google Scholar 

  20. Carmona-Poyato A, Fernández-Garcia NL, Madrid-Cuevas FJ, Durán-Rosal AM (2021) A new approach for optimal offline time-series segmentation with error bound guarantee. Pattern Recognit 115:107917

    Article  Google Scholar 

  21. Sarker IH (2019) Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data 6:95

    Article  Google Scholar 

  22. Carmona-Poyato A, Fernandez-Garcia NL, Madrid-Cuevas F, Duran-Rosal A (2020) A new approach for optimal time-series segmentation. Pattern Recognit Lett 135:153–159

    Article  Google Scholar 

  23. Liu X, Lin Z, Wang H (2008) Novel online methods for time series segmentation. IEEE Trans Knowl Data Eng 20(12):1616–1626

    Article  Google Scholar 

  24. Chu C (1995) Time series segmentation: a sliding window approach. Inf Sci 85(1):147–173

    Article  MathSciNet  Google Scholar 

  25. Keogh E, Chakrabarti K, Pazzani M, Mehrotr S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286

    Article  Google Scholar 

  26. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15:107–144

    Article  MathSciNet  Google Scholar 

  27. Keogh E, Chu S, Pazzani M, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: Proceedings 2001 IEEE international conference on data mining, pp 289–296

  28. Salotti M (2002) Optimal polygonal approximation of digitized curves using the sum of square deviations criterion. Pattern Recognit 35:435–443

    Article  Google Scholar 

  29. Pikaz A, Dinstein I (1995) Optimal polygonal approximation of digital curves. Pattern Recognit 28:373–379

    Article  Google Scholar 

  30. NOAA (2015) National Buoy Data Center. http://www.ndbc.noaa.gov/

  31. Dau HA, Keogh E, Kamgar K, Yeh C (2018) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (October)

  32. Donoho DL, Johnstone IM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455

    Article  MathSciNet  Google Scholar 

  33. Donoho DL, Johnstone IM (1995) Adapting to unknown smoothness via wavelet shrinkage. J Am Stat Assoc 90(432):1200–1224

    Article  MathSciNet  Google Scholar 

  34. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50

    Article  Google Scholar 

  35. Tsay RS (2010) Analysis of financial time series, 3rd edn. Wiley, Hoboken

    Book  Google Scholar 

  36. IFAPA (2023) Red de Información Agroclimática de Andalucía. https://www.juntadeandalucia.es/agriculturaypesca/ifapa/riaweb/web/estacion/14/6

  37. Universidad Pompeu Fabra (2023) The Bonn EEG time series. https://www.upf.edu/web/ntsa/downloads

Download references

Acknowledgements

This work has been developed with the support of the Research Project PID2019-103871GB-I00 of Spanish Ministry of Economy, Industry and Competitiveness.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ángel Carmona-Poyato.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carmona-Poyato, Á., Fernández-García, NL., Madrid-Cuevas, FJ. et al. Optimal online time-series segmentation. Knowl Inf Syst 66, 2417–2438 (2024). https://doi.org/10.1007/s10115-023-02029-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02029-8

Keywords

Navigation