Abstract
Neural networks inspired by differential equations have proliferated for the past several years, of which neural ordinary differential equations (NODEs) and neural controlled differential equations (NCDEs) are two representative examples. In theory, NCDEs exhibit better representation learning capability for time-series data than NODEs. In particular, it is known that NCDEs are suitable for processing irregular time-series data. Whereas NODEs have been successfully extended to adopt attention, methods to integrate attention into NCDEs have not yet been studied. To this end, we present attentive neural controlled differential equations (ANCDEs) for time-series classification and forecasting, where dual NCDEs are used: one for generating attention values and the other for evolving hidden vectors for a downstream machine learning task. We conduct experiments on 5 real-world time-series datasets and 11 baselines. After dropping some values, we also conduct experiments on irregular time-series. Our method consistently shows the best accuracy in all cases by non-trivial margins. Our visualizations also show that the presented attention mechanism works as intended by focusing on crucial information.
Similar content being viewed by others
Notes
\(\varvec{z}(t)\) can be either a raw multi-variate time-series vector or a hidden vector created from the raw input.
In NODEs, the time variable t corresponds to the layer of a neural network. In other words, NODEs have continuous depth.
A well-posed problem means (i) its solution uniquely exists, and (ii) its solution continuously changes as input data changes.
References
Fu T-C (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Ahmed NK, Atiya AF, Gayar NE, El-Shishiny H (2010) An empirical comparison of machine learning models for time series forecasting. Economet Rev 29(5–6):594–621
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963
Weigend AS (2018) Time series prediction: forecasting the future and understanding the past. Routledge
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):1–34
Kirchgässner G, Wolters J, Hassler U (2012) Introduction to modern time series analysis. Springer Science & Business Media
Krollner B, Vanstone BJ, Finnie GR (2010) Financial time series forecasting with machine learning techniques: a survey. In: ESANN
Bontempi G, Taieb SB, Le Borgne Y-A (2012) Machine learning strategies for time series forecasting. In: European business intelligence summer school, Springer, pp 62–77
Reinsel GC (2003) Elements of multivariate time series analysis. Springer Science & Business Media
Ralanamahatana CA, Lin J, Gunopulos D, Keogh E, Vlachos M, Das G (2005) Mining time series data. In: Data mining and knowledge discovery handbook, Springer, pp 1069–1103
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(1735–80):12
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint arXiv:1412.3555
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):1–12
Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. In: NeurIPS
Kidger P, Morrill J, Foster J, Lyons T (2020) Neural controlled differential equations for irregular time series. In: NeurIPS
Brouwer ED, Simm J, Arany A, Moreau Y (2019) Gru-ode-bayes: continuous modeling of sporadically-observed time series. In: NeurIPS
Zang C, Wang F (2020) Neural dynamics on complex networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 892–902
Portwood GD, Mitra PP, Ribeiro MD, Nguyen TM, Nadiga BT, Saenz JA, Chertkov M, Garg A, Anandkumar A, Dengel A et al (2019) Turbulence forecasting via neural ode. Preprint arXiv:1911.05180
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser U, Polosukhin I (2017) Attention is all you need. In: NeurIPS
Zia T, Zahid U (2019) Long short-term memory recurrent neural network architectures for Urdu acoustic modeling. Int J Speech Technol 22(1):21–30
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. Preprint arXiv:1409.2329
Zhang J, Man K (1998) Time series prediction using RNN in multi-dimension embedding phase space. In: SMC’98 conference proceedings. 1998 IEEE international conference on systems, man, and cybernetics (Cat. No. 98CH36218), vol 2, pp 1868–1873
Zhang X, Shen F, Zhao J, Yang G (2017) Time series forecasting using GRU neural network with multi-lag after decomposition. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy E-SM (eds), ICONIP
Siami-Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in forecasting time series. In: ICMLA
Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2020) Time series forecasting of COVID-19 using deep learning models: India–USA comparative case study, Chaos, Solitons & Fractals, vol 140
Yoon J, Jarrett D, van der Schaar M (2019) Time-series generative adversarial networks. In: NeurIPS
Alaa A, Chan AJ, van der Schaar M (2021) Generative time-series modeling with Fourier flows. In: ICLR
Dormand J, Prince P (1980) A family of embedded Runge-Kutta formulae. J Comput Appl Math 6(1):19–26
Zhuang J, Dvornek N, Li X, Tatikonda S, Papademetris X, Duncan J (2020) Adaptive checkpoint adjoint method for gradient estimation in neural ode. In: ICML
Rubanova Y, Chen RTQ, Duvenaud DK (2019) Latent ordinary differential equations for irregularly-sampled time series. In: NeurIPS
Jordan ID, Sokół PA, Park IM (2021) Gated recurrent units viewed through the lens of continuous time dynamical systems, Frontiers in computational neuroscience, p 67
McKinley S, Levine M (1998) Cubic spline interpolation. Coll Redw 45(1):1049–1060
Tzen B, Raginsky M (2019) Neural stochastic differential equations: deep latent Gaussian models in the diffusion limit. Preprint arXiv:1905.09883
Lyons T, Qian Z et al (2002) System control and rough paths. Oxford University Press
Zhuang J, Dvornek NC, Tatikonda S, Duncan JS (2021) MALI: a memory efficient and reverse accurate integrator for neural ODEs. In: ICLR
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: ICLR
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: CVPR
Spratling MW, Johnson MH (2004) A feedback model of visual attention. J Cogn Neurosci 16(2):219–237
Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia
Kim H, Mnih A, Schwarz J, Garnelo M, Eslami A, Rosenbaum D, Vinyals O, Teh YW (2019) Attentive neural processes. In: ICLR
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Transactions on neural networks and learning systems
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: ICML
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: directional self-attention network for RNN/CNN-free language understanding. In: AAAI
Kiela D, Wang C, Cho K (2018) Dynamic meta-embeddings for improved sentence representations. In: EMNLP
Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: NeurIPS
Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. CoRR arXiv:1904.02874
Lee JB, Rossi RA, Kim S, Ahmed NK, Koh E (2019) Attention models in graphs: a survey. ACM Trans Knowl Discov Data 13(6)
Gao P, Yang X, Zhang R, Huang K (2020) Explainable tensorized neural ordinary differential equations forarbitrary-step time series prediction
Pontryagin L, Mishchenko E, Boltyanski V, Gamkrelidze R (1962) The mathematical theory of optimal processes. Interscience Publishers
Giles M, Pierce N (2000) An introduction to the adjoint approach to design. Flow Turbul Combust 65:393–415
Hager W (2000) Runge-kutta methods in optimal control and the transformed adjoint system. Numer Math 87:247–282
Lyons T, Caruana M, Lévy T (2004) Differential equations driven by rough paths. Springer, 2004, École D’Eté de Probabilités de Saint-Flour XXXIV—2004
Pourciau B (1977) Analysis and optimization of Lipschitz continuous mappings. J Optim Theory Appl 22:311–351
Bagnall A, Dau HA, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive, 2018
Reyna MA, Josef C, Seyedi S, Jeter R, Shashikumar R, Brandon Westover M, Sharma A, Nemati S, Clifford GD (2019) Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In: 2019 computing in cardiology (CinC), pp 1–4
Reiter PJ (2005) Using cart to generate partially synthetic, public use microdata. J Off Stat 21(441):01
Lyons TJ (1998) Differential equations driven by rough signals. Rev Mat Iberoam 14(2):215–310
Tassa Y, Doron Y, Muldal A, Erez T, Li Y, de Las Casas D, Budden D, Abdolmaleki A, Merel J, Lefrancq A, Lillicrap TP, Riedmiller MA (2018) Deepmind control suite. CoRR, arXiv:abs/1801.00690, [Online]
Jordan ID, Sokol PA, Park IM (2019) Gated recurrent units viewed through the lens of continuous time dynamical systems
Herrera C, Krach F, Teichmann J (2021) Neural jump ordinary differential equations: consistent continuous-time prediction and filtering. In: ICLR
Dupont E, Doucet A, Teh YW (2019) Augmented neural odes. In: NeurIPS
Jhin SY, Jo M, Kong T, Jeon J, Park N (2021) Ace-node: attentive co-evolving neural ordinary differential equations. In: KDD
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent neural networks for multivariate time series with missing values
Acknowledgements
Noseong Park is the corresponding author. This work was supported by an IITP grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)) and (No. 2022-0-00113, Developing a Sustainable Collaborative Multi-modal Lifelong Learning Framework).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jhin, S.Y., Shin, H., Kim, S. et al. Attentive neural controlled differential equations for time-series classification and forecasting. Knowl Inf Syst 66, 1885–1915 (2024). https://doi.org/10.1007/s10115-023-01977-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01977-5