Skip to main content
Log in

Attentive neural controlled differential equations for time-series classification and forecasting

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Neural networks inspired by differential equations have proliferated for the past several years, of which neural ordinary differential equations (NODEs) and neural controlled differential equations (NCDEs) are two representative examples. In theory, NCDEs exhibit better representation learning capability for time-series data than NODEs. In particular, it is known that NCDEs are suitable for processing irregular time-series data. Whereas NODEs have been successfully extended to adopt attention, methods to integrate attention into NCDEs have not yet been studied. To this end, we present attentive neural controlled differential equations (ANCDEs) for time-series classification and forecasting, where dual NCDEs are used: one for generating attention values and the other for evolving hidden vectors for a downstream machine learning task. We conduct experiments on 5 real-world time-series datasets and 11 baselines. After dropping some values, we also conduct experiments on irregular time-series. Our method consistently shows the best accuracy in all cases by non-trivial margins. Our visualizations also show that the presented attention mechanism works as intended by focusing on crucial information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. \(\varvec{z}(t)\) can be either a raw multi-variate time-series vector or a hidden vector created from the raw input.

  2. In NODEs, the time variable t corresponds to the layer of a neural network. In other words, NODEs have continuous depth.

  3. A well-posed problem means (i) its solution uniquely exists, and (ii) its solution continuously changes as input data changes.

References

  1. Fu T-C (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181

    Article  Google Scholar 

  2. Ahmed NK, Atiya AF, Gayar NE, El-Shishiny H (2010) An empirical comparison of machine learning models for time series forecasting. Economet Rev 29(5–6):594–621

    Article  MathSciNet  Google Scholar 

  3. Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963

    Article  MathSciNet  Google Scholar 

  4. Weigend AS (2018) Time series prediction: forecasting the future and understanding the past. Routledge

    Book  Google Scholar 

  5. Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):1–34

    Article  Google Scholar 

  6. Kirchgässner G, Wolters J, Hassler U (2012) Introduction to modern time series analysis. Springer Science & Business Media

  7. Krollner B, Vanstone BJ, Finnie GR (2010) Financial time series forecasting with machine learning techniques: a survey. In: ESANN

  8. Bontempi G, Taieb SB, Le Borgne Y-A (2012) Machine learning strategies for time series forecasting. In: European business intelligence summer school, Springer, pp 62–77

  9. Reinsel GC (2003) Elements of multivariate time series analysis. Springer Science & Business Media

  10. Ralanamahatana CA, Lin J, Gunopulos D, Keogh E, Vlachos M, Das G (2005) Mining time series data. In: Data mining and knowledge discovery handbook, Springer, pp 1069–1103

  11. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(1735–80):12

    Google Scholar 

  12. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint arXiv:1412.3555

  13. Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):1–12

    Article  Google Scholar 

  14. Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. In: NeurIPS

  15. Kidger P, Morrill J, Foster J, Lyons T (2020) Neural controlled differential equations for irregular time series. In: NeurIPS

  16. Brouwer ED, Simm J, Arany A, Moreau Y (2019) Gru-ode-bayes: continuous modeling of sporadically-observed time series. In: NeurIPS

  17. Zang C, Wang F (2020) Neural dynamics on complex networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 892–902

  18. Portwood GD, Mitra PP, Ribeiro MD, Nguyen TM, Nadiga BT, Saenz JA, Chertkov M, Garg A, Anandkumar A, Dengel A et al (2019) Turbulence forecasting via neural ode. Preprint arXiv:1911.05180

  19. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser U, Polosukhin I (2017) Attention is all you need. In: NeurIPS

  20. Zia T, Zahid U (2019) Long short-term memory recurrent neural network architectures for Urdu acoustic modeling. Int J Speech Technol 22(1):21–30

    Article  Google Scholar 

  21. Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. Preprint arXiv:1409.2329

  22. Zhang J, Man K (1998) Time series prediction using RNN in multi-dimension embedding phase space. In: SMC’98 conference proceedings. 1998 IEEE international conference on systems, man, and cybernetics (Cat. No. 98CH36218), vol 2, pp 1868–1873

  23. Zhang X, Shen F, Zhao J, Yang G (2017) Time series forecasting using GRU neural network with multi-lag after decomposition. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy E-SM (eds), ICONIP

  24. Siami-Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in forecasting time series. In: ICMLA

  25. Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2020) Time series forecasting of COVID-19 using deep learning models: India–USA comparative case study, Chaos, Solitons & Fractals, vol 140

  26. Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial networks. In: NeurIPS

  27. Alaa A, Chan AJ, van der Schaar M (2021) Generative time-series modeling with Fourier flows. In: ICLR

  28. Dormand J, Prince P (1980) A family of embedded Runge-Kutta formulae. J Comput Appl Math 6(1):19–26

    Article  MathSciNet  Google Scholar 

  29. Zhuang J, Dvornek N, Li X, Tatikonda S, Papademetris X, Duncan J (2020) Adaptive checkpoint adjoint method for gradient estimation in neural ode. In: ICML

  30. Rubanova Y, Chen RTQ, Duvenaud DK (2019) Latent ordinary differential equations for irregularly-sampled time series. In: NeurIPS

  31. Jordan ID, Sokół PA, Park IM (2021) Gated recurrent units viewed through the lens of continuous time dynamical systems, Frontiers in computational neuroscience, p 67

  32. McKinley S, Levine M (1998) Cubic spline interpolation. Coll Redw 45(1):1049–1060

    Google Scholar 

  33. Tzen B, Raginsky M (2019) Neural stochastic differential equations: deep latent Gaussian models in the diffusion limit. Preprint arXiv:1905.09883

  34. Lyons T, Qian Z et al (2002) System control and rough paths. Oxford University Press

    Book  Google Scholar 

  35. Zhuang J, Dvornek NC, Tatikonda S, Duncan JS (2021) MALI: a memory efficient and reverse accurate integrator for neural ODEs. In: ICLR

  36. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: ICLR

  37. You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: CVPR

  38. Spratling MW, Johnson MH (2004) A feedback model of visual attention. J Cogn Neurosci 16(2):219–237

    Article  CAS  PubMed  Google Scholar 

  39. Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia

  40. Kim H, Mnih A, Schwarz J, Garnelo M, Eslami A, Rosenbaum D, Vinyals O, Teh YW (2019) Attentive neural processes. In: ICLR

  41. Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Transactions on neural networks and learning systems

  42. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML

  43. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: directional self-attention network for RNN/CNN-free language understanding. In: AAAI

  44. Kiela D, Wang C, Cho K (2018) Dynamic meta-embeddings for improved sentence representations. In: EMNLP

  45. Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: NeurIPS

  46. Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. CoRR arXiv:1904.02874

  47. Lee JB, Rossi RA, Kim S, Ahmed NK, Koh E (2019) Attention models in graphs: a survey. ACM Trans Knowl Discov Data 13(6)

  48. Gao P, Yang X, Zhang R, Huang K (2020) Explainable tensorized neural ordinary differential equations forarbitrary-step time series prediction

  49. Pontryagin L, Mishchenko E, Boltyanski V, Gamkrelidze R (1962) The mathematical theory of optimal processes. Interscience Publishers

    Google Scholar 

  50. Giles M, Pierce N (2000) An introduction to the adjoint approach to design. Flow Turbul Combust 65:393–415

    Article  Google Scholar 

  51. Hager W (2000) Runge-kutta methods in optimal control and the transformed adjoint system. Numer Math 87:247–282

    Article  MathSciNet  Google Scholar 

  52. Lyons T, Caruana M, Lévy T (2004) Differential equations driven by rough paths. Springer, 2004, École D’Eté de Probabilités de Saint-Flour XXXIV—2004

  53. Pourciau B (1977) Analysis and optimization of Lipschitz continuous mappings. J Optim Theory Appl 22:311–351

    Article  MathSciNet  Google Scholar 

  54. Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive, 2018

  55. Reyna MA, Josef C, Seyedi S, Jeter R, Shashikumar R, Brandon Westover M, Sharma A, Nemati S, Clifford GD (2019) Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In: 2019 computing in cardiology (CinC), pp 1–4

  56. Reiter PJ (2005) Using cart to generate partially synthetic, public use microdata. J Off Stat 21(441):01

    Google Scholar 

  57. Lyons TJ (1998) Differential equations driven by rough signals. Rev Mat Iberoam 14(2):215–310

    Article  MathSciNet  Google Scholar 

  58. Tassa Y, Doron Y, Muldal A, Erez T, Li Y, de Las Casas D, Budden D, Abdolmaleki A, Merel J, Lefrancq A, Lillicrap TP, Riedmiller MA (2018) Deepmind control suite. CoRR, arXiv:abs/1801.00690, [Online]

  59. Jordan ID, Sokol PA, Park IM (2019) Gated recurrent units viewed through the lens of continuous time dynamical systems

  60. Herrera C, Krach F, Teichmann J (2021) Neural jump ordinary differential equations: consistent continuous-time prediction and filtering. In: ICLR

  61. Dupont E, Doucet A, Teh YW (2019) Augmented neural odes. In: NeurIPS

  62. Jhin SY, Jo M, Kong T, Jeon J, Park N (2021) Ace-node: attentive co-evolving neural ordinary differential equations. In: KDD

  63. Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent neural networks for multivariate time series with missing values

Download references

Acknowledgements

Noseong Park is the corresponding author. This work was supported by an IITP grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)) and (No. 2022-0-00113, Developing a Sustainable Collaborative Multi-modal Lifelong Learning Framework).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheo Yon Jhin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jhin, S.Y., Shin, H., Kim, S. et al. Attentive neural controlled differential equations for time-series classification and forecasting. Knowl Inf Syst 66, 1885–1915 (2024). https://doi.org/10.1007/s10115-023-01977-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01977-5

Keywords

Navigation