Attentive neural controlled differential equations for time-series classification and forecasting

Jhin, Sheo Yon; Shin, Heejoo; Kim, Sujie; Hong, Seoyoung; Jo, Minju; Park, Solhee; Park, Noseong; Lee, Seungbeom; Maeng, Hwiyoung; Jeon, Seungmin

doi:10.1007/s10115-023-01977-5

Attentive neural controlled differential equations for time-series classification and forecasting

Regular Paper
Published: 01 November 2023

Volume 66, pages 1885–1915, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Sheo Yon Jhin ORCID: orcid.org/0000-0001-5930-0735¹,
Heejoo Shin¹,
Sujie Kim¹,
Seoyoung Hong¹,
Minju Jo¹,
Solhee Park¹,
Noseong Park¹,
Seungbeom Lee³,
Hwiyoung Maeng² &
…
Seungmin Jeon⁴

545 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Neural networks inspired by differential equations have proliferated for the past several years, of which neural ordinary differential equations (NODEs) and neural controlled differential equations (NCDEs) are two representative examples. In theory, NCDEs exhibit better representation learning capability for time-series data than NODEs. In particular, it is known that NCDEs are suitable for processing irregular time-series data. Whereas NODEs have been successfully extended to adopt attention, methods to integrate attention into NCDEs have not yet been studied. To this end, we present attentive neural controlled differential equations (ANCDEs) for time-series classification and forecasting, where dual NCDEs are used: one for generating attention values and the other for evolving hidden vectors for a downstream machine learning task. We conduct experiments on 5 real-world time-series datasets and 11 baselines. After dropping some values, we also conduct experiments on irregular time-series. Our method consistently shows the best accuracy in all cases by non-trivial margins. Our visualizations also show that the presented attention mechanism works as intended by focusing on crucial information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Self-attention-based time-variant neural networks for multi-step time series forecasting

Article 05 February 2022

Multiscale convolutional neural-based transformer network for time series prediction

Article 25 October 2023

Multivariate Time Series Early Classification with Interpretability Using Deep Learning and Attention Mechanism

Notes

\(\varvec{z}(t)\) can be either a raw multi-variate time-series vector or a hidden vector created from the raw input.
In NODEs, the time variable t corresponds to the layer of a neural network. In other words, NODEs have continuous depth.
A well-posed problem means (i) its solution uniquely exists, and (ii) its solution continuously changes as input data changes.

References

Fu T-C (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Article Google Scholar
Ahmed NK, Atiya AF, Gayar NE, El-Shishiny H (2010) An empirical comparison of machine learning models for time series forecasting. Economet Rev 29(5–6):594–621
Article MathSciNet Google Scholar
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963
Article MathSciNet Google Scholar
Weigend AS (2018) Time series prediction: forecasting the future and understanding the past. Routledge
Book Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):1–34
Article Google Scholar
Kirchgässner G, Wolters J, Hassler U (2012) Introduction to modern time series analysis. Springer Science & Business Media
Krollner B, Vanstone BJ, Finnie GR (2010) Financial time series forecasting with machine learning techniques: a survey. In: ESANN
Bontempi G, Taieb SB, Le Borgne Y-A (2012) Machine learning strategies for time series forecasting. In: European business intelligence summer school, Springer, pp 62–77
Reinsel GC (2003) Elements of multivariate time series analysis. Springer Science & Business Media
Ralanamahatana CA, Lin J, Gunopulos D, Keogh E, Vlachos M, Das G (2005) Mining time series data. In: Data mining and knowledge discovery handbook, Springer, pp 1069–1103
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(1735–80):12
Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint arXiv:1412.3555
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):1–12
Article Google Scholar
Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. In: NeurIPS
Kidger P, Morrill J, Foster J, Lyons T (2020) Neural controlled differential equations for irregular time series. In: NeurIPS
Brouwer ED, Simm J, Arany A, Moreau Y (2019) Gru-ode-bayes: continuous modeling of sporadically-observed time series. In: NeurIPS
Zang C, Wang F (2020) Neural dynamics on complex networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 892–902
Portwood GD, Mitra PP, Ribeiro MD, Nguyen TM, Nadiga BT, Saenz JA, Chertkov M, Garg A, Anandkumar A, Dengel A et al (2019) Turbulence forecasting via neural ode. Preprint arXiv:1911.05180
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser U, Polosukhin I (2017) Attention is all you need. In: NeurIPS
Zia T, Zahid U (2019) Long short-term memory recurrent neural network architectures for Urdu acoustic modeling. Int J Speech Technol 22(1):21–30
Article Google Scholar
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. Preprint arXiv:1409.2329
Zhang J, Man K (1998) Time series prediction using RNN in multi-dimension embedding phase space. In: SMC’98 conference proceedings. 1998 IEEE international conference on systems, man, and cybernetics (Cat. No. 98CH36218), vol 2, pp 1868–1873
Zhang X, Shen F, Zhao J, Yang G (2017) Time series forecasting using GRU neural network with multi-lag after decomposition. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy E-SM (eds), ICONIP
Siami-Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in forecasting time series. In: ICMLA
Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2020) Time series forecasting of COVID-19 using deep learning models: India–USA comparative case study, Chaos, Solitons & Fractals, vol 140
Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial networks. In: NeurIPS
Alaa A, Chan AJ, van der Schaar M (2021) Generative time-series modeling with Fourier flows. In: ICLR
Dormand J, Prince P (1980) A family of embedded Runge-Kutta formulae. J Comput Appl Math 6(1):19–26
Article MathSciNet Google Scholar
Zhuang J, Dvornek N, Li X, Tatikonda S, Papademetris X, Duncan J (2020) Adaptive checkpoint adjoint method for gradient estimation in neural ode. In: ICML
Rubanova Y, Chen RTQ, Duvenaud DK (2019) Latent ordinary differential equations for irregularly-sampled time series. In: NeurIPS
Jordan ID, Sokół PA, Park IM (2021) Gated recurrent units viewed through the lens of continuous time dynamical systems, Frontiers in computational neuroscience, p 67
McKinley S, Levine M (1998) Cubic spline interpolation. Coll Redw 45(1):1049–1060
Google Scholar
Tzen B, Raginsky M (2019) Neural stochastic differential equations: deep latent Gaussian models in the diffusion limit. Preprint arXiv:1905.09883
Lyons T, Qian Z et al (2002) System control and rough paths. Oxford University Press
Book Google Scholar
Zhuang J, Dvornek NC, Tatikonda S, Duncan JS (2021) MALI: a memory efficient and reverse accurate integrator for neural ODEs. In: ICLR
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: ICLR
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: CVPR
Spratling MW, Johnson MH (2004) A feedback model of visual attention. J Cogn Neurosci 16(2):219–237
Article CAS PubMed Google Scholar
Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia
Kim H, Mnih A, Schwarz J, Garnelo M, Eslami A, Rosenbaum D, Vinyals O, Teh YW (2019) Attentive neural processes. In: ICLR
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Transactions on neural networks and learning systems
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: directional self-attention network for RNN/CNN-free language understanding. In: AAAI
Kiela D, Wang C, Cho K (2018) Dynamic meta-embeddings for improved sentence representations. In: EMNLP
Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: NeurIPS
Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. CoRR arXiv:1904.02874
Lee JB, Rossi RA, Kim S, Ahmed NK, Koh E (2019) Attention models in graphs: a survey. ACM Trans Knowl Discov Data 13(6)
Gao P, Yang X, Zhang R, Huang K (2020) Explainable tensorized neural ordinary differential equations forarbitrary-step time series prediction
Pontryagin L, Mishchenko E, Boltyanski V, Gamkrelidze R (1962) The mathematical theory of optimal processes. Interscience Publishers
Google Scholar
Giles M, Pierce N (2000) An introduction to the adjoint approach to design. Flow Turbul Combust 65:393–415
Article Google Scholar
Hager W (2000) Runge-kutta methods in optimal control and the transformed adjoint system. Numer Math 87:247–282
Article MathSciNet Google Scholar
Lyons T, Caruana M, Lévy T (2004) Differential equations driven by rough paths. Springer, 2004, École D’Eté de Probabilités de Saint-Flour XXXIV—2004
Pourciau B (1977) Analysis and optimization of Lipschitz continuous mappings. J Optim Theory Appl 22:311–351
Article MathSciNet Google Scholar
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive, 2018
Reyna MA, Josef C, Seyedi S, Jeter R, Shashikumar R, Brandon Westover M, Sharma A, Nemati S, Clifford GD (2019) Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In: 2019 computing in cardiology (CinC), pp 1–4
Reiter PJ (2005) Using cart to generate partially synthetic, public use microdata. J Off Stat 21(441):01
Google Scholar
Lyons TJ (1998) Differential equations driven by rough signals. Rev Mat Iberoam 14(2):215–310
Article MathSciNet Google Scholar
Tassa Y, Doron Y, Muldal A, Erez T, Li Y, de Las Casas D, Budden D, Abdolmaleki A, Merel J, Lefrancq A, Lillicrap TP, Riedmiller MA (2018) Deepmind control suite. CoRR, arXiv:abs/1801.00690, [Online]
Jordan ID, Sokol PA, Park IM (2019) Gated recurrent units viewed through the lens of continuous time dynamical systems
Herrera C, Krach F, Teichmann J (2021) Neural jump ordinary differential equations: consistent continuous-time prediction and filtering. In: ICLR
Dupont E, Doucet A, Teh YW (2019) Augmented neural odes. In: NeurIPS
Jhin SY, Jo M, Kong T, Jeon J, Park N (2021) Ace-node: attentive co-evolving neural ordinary differential equations. In: KDD
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent neural networks for multivariate time series with missing values

Download references

Acknowledgements

Noseong Park is the corresponding author. This work was supported by an IITP grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)) and (No. 2022-0-00113, Developing a Sustainable Collaborative Multi-modal Lifelong Learning Framework).

Author information

Authors and Affiliations

Yonsei University, Seoul, South Korea
Sheo Yon Jhin, Heejoo Shin, Sujie Kim, Seoyoung Hong, Minju Jo, Solhee Park & Noseong Park
Socar Co. Ltd., Seoul, South Korea
Hwiyoung Maeng
42dot Inc., Seoul, South Korea
Seungbeom Lee
MORAI Inc., Seoul, South Korea
Seungmin Jeon

Authors

Sheo Yon Jhin
View author publications
You can also search for this author in PubMed Google Scholar
Heejoo Shin
View author publications
You can also search for this author in PubMed Google Scholar
Sujie Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seoyoung Hong
View author publications
You can also search for this author in PubMed Google Scholar
Minju Jo
View author publications
You can also search for this author in PubMed Google Scholar
Solhee Park
View author publications
You can also search for this author in PubMed Google Scholar
Noseong Park
View author publications
You can also search for this author in PubMed Google Scholar
Seungbeom Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hwiyoung Maeng
View author publications
You can also search for this author in PubMed Google Scholar
Seungmin Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheo Yon Jhin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jhin, S.Y., Shin, H., Kim, S. et al. Attentive neural controlled differential equations for time-series classification and forecasting. Knowl Inf Syst 66, 1885–1915 (2024). https://doi.org/10.1007/s10115-023-01977-5

Download citation

Received: 01 February 2022
Revised: 23 August 2023
Accepted: 30 August 2023
Published: 01 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10115-023-01977-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attentive neural controlled differential equations for time-series classification and forecasting

Abstract

Access this article

Similar content being viewed by others

Self-attention-based time-variant neural networks for multi-step time series forecasting

Multiscale convolutional neural-based transformer network for time series prediction

Multivariate Time Series Early Classification with Interpretability Using Deep Learning and Attention Mechanism

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attentive neural controlled differential equations for time-series classification and forecasting

Abstract

Access this article

Similar content being viewed by others

Self-attention-based time-variant neural networks for multi-step time series forecasting

Multiscale convolutional neural-based transformer network for time series prediction

Multivariate Time Series Early Classification with Interpretability Using Deep Learning and Attention Mechanism

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation