Skip to main content

Advertisement

Log in

Multivariate count time series segmentation with “sums and shares” and Poisson lognormal mixture models: a comparative study using pedestrian flows within a multimodal transport hub

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

This paper deals with a clustering approach based on mixture models to analyze multidimensional mobility count time-series data within a multimodal transport hub. These time series are very likely to evolve depending on various periods characterized by strikes, maintenance works, or health measures against the Covid19 pandemic. In addition, exogenous one-off factors, such as concerts and transport disruptions, can also impact mobility. Our approach flexibly detects time segments within which the very noisy count data is synthesized into regular spatio-temporal mobility profiles. At the upper level of the modeling, evolving mixing weights are designed to detect segments properly. At the lower level, segment-specific count regression models take into account correlations between series and overdispersion as well as the impact of exogenous factors. For this purpose, we set up and compare two promising strategies that can address this issue, namely the “sums and shares” and “Poisson log-normal” models. The proposed methodologies are applied to actual data collected within a multimodal transport hub in the Paris region. Ticketing logs and pedestrian counts provided by stereo cameras are considered here. Experiments are carried out to show the ability of the statistical models to highlight mobility patterns within the transport hub. One model is chosen based on its ability to detect the most continuous segments possible while fitting the count time series well. An in-depth analysis of the time segmentation, mobility patterns, and impact of exogenous factors obtained with the chosen model is finally performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Code availability

R-code and simulated data used are available in the GitHub repository https://github.com/pdenailly/segmentation_models

Notes

  1. For these sections, only \(\textbf{x}_{j,t}^{1,...,8}\) (see Table 3) are used for computing the results, in order to exclude non-calendar effects. Thus, the total profiles and spatial distributions section are invariant by day.

References

  • Agard B, Morency C, Trépanier M (2006) Mining public transport user behaviour from smart card data. IFAC Proc Vol 39(3):399–404

    Article  Google Scholar 

  • Aitchison J, Ho C (1989) The multivariate poisson-log normal distribution. Biometrika 76(4):643–653

    Article  MathSciNet  MATH  Google Scholar 

  • Bai J (2010) Common breaks in means and variances for panel data. J Econom 157(1):78–92

    Article  MathSciNet  MATH  Google Scholar 

  • Baid U, Talbar S (2016) Comparative study of k-means, gaussian mixture model, fuzzy c-means algorithms for brain tumor segmentation. In: International conference on communication and signal processing 2016 (ICCASP 2016), Atlantis Press, pp 583–588

  • Balzotti C, Bragagnini A, Briani M et al (2018) Understanding human mobility flows from aggregated mobile phone data. IFAC-PapersOnLine 51(9):25–30

    Article  Google Scholar 

  • Bouveyron C, Celeux G, Murphy TB et al (2019) Model-based clustering and classification for data science: with applications in R, vol 50. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Briand AS, Côme E, Trépanier M et al (2017) Analyzing year-to-year changes in public transport passenger behaviour using smart card data. Transp Res Part C Emerg Technol 79:274–289

    Article  Google Scholar 

  • Briand AS, Come E, Khouadjia M, et al (2019) Detection of atypical events on a public transport network using smart card data. In: European transport conference 2019 Association for European Transport (AET)

  • Cecaj A, Lippi M, Mamei M et al (2021) Sensing and forecasting crowd distribution in smart cities: Potentials and approaches. IoT 2(1):33–49

    Article  Google Scholar 

  • Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212

    Article  MathSciNet  MATH  Google Scholar 

  • Chiquet J, Robin S, Mariadassou M (2019) Variational inference for sparse network reconstruction from count data. In: International conference on machine learning, PMLR, pp 1162–1171

  • Chiquet J, Mariadassou M, Robin S (2021) The poisson-lognormal model as a versatile framework for the joint analysis of species abundances. Front Ecol Evol 9:188

    Article  Google Scholar 

  • Côme E, Oukhellou L (2014) Model-based count series clustering for bike sharing system usage mining: a case study with the vélib’system of paris. ACM Trans Intell Syst Technol(TIST) 5(3):1–21

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  • Fernández-Ares A, Mora A, Arenas MG et al (2017) Studying real traffic and mobility scenarios for a smart city using a new monitoring and tracking system. Futur Gener Comput Syst 76:163–179

    Article  Google Scholar 

  • Ghaemi MS, Agard B, Trépanier M et al (2017) A visual segmentation method for temporal smart card data. Transp A Transp Sci 13(5):381–404

    Google Scholar 

  • Hilbe JM (2011) Negative binomial regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Holland PW, Welsch RE (1977) Robust regression using iteratively reweighted least-squares. Commun Statistics-theory Methods 6(9):813–827

    Article  MATH  Google Scholar 

  • Jones M, Marchand É (2019) Multivariate discrete distributions via sums and shares. J Multivar Anal 171:83–93

    Article  MathSciNet  MATH  Google Scholar 

  • Kim J, Zhang Y, Day J et al (2018) Mglm: an r package for multivariate categorical data analysis. R J 10(1):73

    Article  Google Scholar 

  • Kristoffersen MS, Dueholm JV, Gade R et al (2016) Pedestrian counting with occlusion handling using stereo thermal cameras. Sensors 16(1):62

    Article  Google Scholar 

  • Lange K, Hunter DR, Yang I (2000) Optimization transfer using surrogate objective functions. J Comput Graph Stat 9(1):1–20

    MathSciNet  Google Scholar 

  • Lashkari D, Golland P (2007) Convex clustering with exemplar-based models. Adv Neural Inf Process Syst 20

  • Li J, Zheng P, Zhang W (2020) Identifying the spatial distribution of public transportation trips by node and community characteristics. Transp Plan Technol 43(3):325–340

    Article  Google Scholar 

  • Li Y, Rahman T, Ma T et al (2021) A sparse negative binomial mixture model for clustering rna-seq count data. Biostatistics 24(1):68–84

    Article  MathSciNet  Google Scholar 

  • Magidson J, Vermunt J (2002) Latent class models for clustering: a comparison with k-means. Can J Marketing Res 20(1):36–43

    Google Scholar 

  • Manley E, Zhong C, Batty M (2018) Spatiotemporal variation in travel regularity through transit user profiling. Transportation 45(3):703–732

    Article  Google Scholar 

  • McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annu Rev Stat Appl 6:355–378

    Article  MathSciNet  Google Scholar 

  • Mohamed K, Côme E, Oukhellou L et al (2016) Clustering smart card data for urban mobility analysis. IEEE Trans Intell Transp Syst 18(3):712–728

    Google Scholar 

  • Mützel CM, Scheiner J (2021) Investigating spatio-temporal mobility patterns and changes in metro usage under the impact of covid-19 using taipei metro smart card data. Public Transp 1–24

  • de Nailly P, Côme E, Samé A et al (2021) What can we learn from 9 years of ticketing data at a major transport hub? a structural time series decomposition. Transp A Transp Sci 18(3):1445–1469

    Google Scholar 

  • Pavlyuk D, Spiridovska N, Yatskiv I (2020) Spatiotemporal dynamics of public transport demand: a case study of riga. Transport 35(6):576–587

    Article  Google Scholar 

  • Peláez G, Bacara D, de la Escalera A, et al (2015) Road detection with thermal cameras through 3d information. In: 2015 IEEE intelligent vehicles symposium (IV), IEEE, pp 255–260

  • Peyhardi J, Fernique P, Durand JB (2021) Splitting models for multivariate count data. J Multivar Anal 181(104):677

    MathSciNet  MATH  Google Scholar 

  • Ren B, Barnett I (2020) Autoregressive mixture models for serial correlation clustering of time series data. arXiv preprint arXiv:2006.16539

  • Ripley B, Venables B, Bates DM et al (2013) Package ‘mass’. Cran r 538:113–120

    Google Scholar 

  • Ripley B, Venables W, Ripley MB (2016) Package ‘nnet’. R Package Version 7(3–12):700

    Google Scholar 

  • Ronchi E, Scozzari R, Fronterrè M (2020) A risk analysis methodology for the use of crowd models during the covid-19 pandemic. LUTVDG/TVBB (3235)

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 461–464

  • Sibuya M, Yoshimura I, Shimizu R (1964) Negative multinomial distribution. Ann Inst Stat Math 16(1):409–426

    Article  MathSciNet  MATH  Google Scholar 

  • Silva A, Rothstein SJ, McNicholas PD et al (2019) A multivariate poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinf 20(1):1–11

    Article  Google Scholar 

  • Singh U, Determe JF, Horlin F et al (2020) Crowd forecasting based on wifi sensors and lstm neural networks. IEEE Trans Instrum Meas 69(9):6121–6131

    Article  Google Scholar 

  • Toqué F, Côme E, Oukhellou L, et al (2018) Short-term multi-step ahead forecasting of railway passenger flows during special events with machine learning methods. In: CASPT 2018, conference on advanced systems in public transport and transitdata 2018, p 15

  • Truong C, Oudre L, Vayatis N (2020) Selective review of offline change point detection methods. Signal Process 167(107):299

    Google Scholar 

  • Wang Z, Liu H, Zhu Y et al (2021) Identifying urban functional areas and their dynamic changes in beijing: Using multiyear transit smart card data. J Urban Plan Dev 147(2):04021002

    Article  Google Scholar 

  • Winkelmann R (2008) Econometric analysis of count data. Springer Science and Business Media, Berlin

    MATH  Google Scholar 

  • Zhang Y, Zhou H, Zhou J et al (2017) Regression models for multivariate count data. J Comput Graph Stat 26(1):1–13

    Article  MathSciNet  Google Scholar 

  • Zhong C, Manley E, Arisona SM et al (2015) Measuring variability of mobility patterns from multiday smart-card data. J Comput Sci 9:125–130

    Article  Google Scholar 

  • Zhou M, Hannah L, Dunson D, et al (2012) Beta-negative binomial process and poisson factor analysis. In: Artificial intelligence and statistics, PMLR, pp 1462–1471

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul de Nailly.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Mixture of sums and shares model estimation

Given \(z_{j,s}=1\) and \(\textbf{x}_{j,t}\), the series \(\textbf{y}_{j,t}\) are distributed according to the following mixture model:

$$\begin{aligned} p(\textbf{y}_{j,t};{\varvec{{\varvec{\alpha }}}}, {\varvec{{\varvec{\gamma }}}}, r, {\varvec{{\varvec{\xi }}}}) = \sum _{s=1}^S \pi _s(j;{\varvec{{\varvec{\alpha }}}}) g(v_{j,t}|\textbf{x}_{j,t}, {\varvec{{\varvec{\gamma }}}}_s, r_s) h(\textbf{y}_{j,t}|v_{j,t},\textbf{x}_{j,t}, {\varvec{{\varvec{\xi }}}}_s), \end{aligned}$$
(A1)

with \({\varvec{{\varvec{\gamma }}}} = ({\varvec{{\varvec{\gamma }}}}_s)_{s=1,...,S}\), \(r = (r_s)_{s=1,...,S}\) and \({\varvec{{\varvec{\xi }}}} = ({\varvec{{\varvec{\xi }}}}_s)_{s=1,...,S}\). The parameters of the model are estimated with the Expectation Maximization (EM) algorithm (Dempster et al. 1977) which requires a complete data log-likelihood maximization. The complete data log-likelihood can be written:

$$\begin{aligned} {\mathcal {L}}_c({\varvec{{\varvec{\alpha }}}}, {\varvec{{\varvec{\gamma }}}}, r, {\varvec{{\varvec{\xi }}}} ) = \sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T z_{j,s} \log \left( \pi _s(j;{\varvec{{\varvec{\alpha }}}}) g(v_{j,t}|\textbf{x}_{j,t}, {\varvec{{\varvec{\gamma }}}}_s, r_s) h(\textbf{y}_{j,t}|v_{j,t}, \textbf{x}_{j,t}, {\varvec{{\varvec{\xi }}}}_s) \right) . \end{aligned}$$
(A2)

Given the initial value of the parameters \({\varvec{{\varvec{\xi }}}}^{(0)}\), \({\varvec{{\varvec{\gamma }}}}^{(0)}\), \(r^{(0)}\) and \({\varvec{{\varvec{\alpha }}}}^{(0)}\), the following two steps are repeated until convergence.

  • Expectation step (E) The expectation of the completed log-likelihood is evaluated knowing the observed data Y and the set of current parameters: \({\varvec{{\varvec{\xi }}}}^{(c)}\), \({\varvec{{\varvec{\gamma }}}}^{(c)}\), \(r^{(c)}\) and \({\varvec{{\varvec{\alpha }}}}^{(c)}\).

    $$\begin{aligned} Q({\varvec{{\varvec{\alpha }}}}^{(c)}, {\varvec{{\varvec{\gamma }}}}^{(c)}, r^{(c)}, {\varvec{{\varvec{\xi }}}}^{(c)}) =&\sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T E_{{\varvec{\xi }}^{(c)}, {\varvec{\gamma }}^{(c)}, r^{(c)}, {\varvec{\alpha }}^{(c)}}[z_{j,s}|Y] \end{aligned}$$
    (A3)
    $$\begin{aligned}&log\left( \pi _s(j;{\varvec{\alpha }}^{(c)}) g(v_{j,t}|\textbf{x}_{j,t}, {\varvec{\gamma }}^{(c)}_s, r^{(c)}_s) h(\textbf{y}_{j,t}|v_{j,t},\textbf{x}_{j,t}, {\varvec{\xi }}^{(c)}_s) \right) , \end{aligned}$$
    (A4)

    where

    $$\begin{aligned} E_{{\varvec{\xi }}^{(c)}, {\varvec{\gamma }}^{(c)}, r^{(c)}, {\varvec{\alpha }}^{(c)}}[z_{j,s}|Y]&= \tau ^{(c)}_{j,s} \end{aligned}$$
    (A5)
    $$\begin{aligned}&= \frac{\pi _s(j;{\varvec{\alpha }}^{(c)}) \prod _T g(v_{j,t}|{\varvec{x}}_{j,t}, {\varvec{\gamma }}^{(c)}_s, r^{(c)}_s) h(\textbf{y}_{j,t}|\textbf{x}_{j,t}, v_{j,t}, {\varvec{\xi }}^{(c)}_s)}{ \sum _{s'}\pi _{s'}(j;{\varvec{\alpha }}^{(c)}) \prod _T g(v_{j,t}|\textbf{x}_{j,t}, {\varvec{\gamma }}^{(c)}_s, r^{(c)}_s) h(\textbf{y}_{j,t}|\textbf{x}_{j,t}, v_{j,t}, {\varvec{\xi }}^{(c)}_s)}. \end{aligned}$$
    (A6)

    The a posteriori probabilities that each day j belongs to segment s, \(\tau ^{(c)}_{j,s}\), are updated at each iteration of step E.

  • Maximization step (M) Parameters \({\varvec{\xi }}^{(c+1)}\), \({\varvec{\gamma }}^{(c+1)}\), \(r^{(c+1)}\) and \({\varvec{\alpha }}^{(c+1)}\) that maximize \( Q({\varvec{{\varvec{\alpha }}}}^{(c)}, {\varvec{{\varvec{\gamma }}}}^{(c)},r^{(c)}, {\varvec{{\varvec{\xi }}}}^{(c)})\) are calculated. It is possible to rewrite this quantity as:

    $$\begin{aligned} Q({\varvec{{\varvec{\alpha }}}}^{(c)}, {\varvec{{\varvec{\gamma }}}}^{(c)}, r^{(c)}, {\varvec{{\varvec{\xi }}}}^{(c)}) = Q_1({\varvec{\alpha }}^{(c)}) + Q_2({\varvec{\gamma }}^{(c)},r^{(c)}) + Q_3({\varvec{\xi }}^{(c)}) \end{aligned}$$
    (A7)

    where

    $$\begin{aligned} Q_1({\varvec{\alpha }}^{(c)}) = \sum _{s=1}^S\sum _{j=1}^J \tau ^{(c)}_{j,s} log(\pi _s(j;{\varvec{\alpha }}^{(c)})) \end{aligned}$$
    (A8)
    $$\begin{aligned} Q_2({\varvec{\gamma }}^{(c)},r^{(c)}) = \sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T \tau ^{(c)}_{j,s} log(g(v_{j,t}|\textbf{x}_{j,t}, {\varvec{\gamma }}^{(c)}_s, r^{(c)}_s)) \end{aligned}$$
    (A9)
    $$\begin{aligned} Q_3({\varvec{\xi }}^{(c)}) = \sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T\tau ^{(c)}_{j,s} log(h(\textbf{y}_{j,t}|\textbf{x}_{j,t}, v_{j,t}, {\varvec{\xi }}^{(c)}_s)). \end{aligned}$$
    (A10)

    The maximisation of \(Q_1\) consists in solving a weighted multinomial logistic regression. New values of \({\varvec{\alpha }}\) can be found using iterative procedures such as iteratively reweighted least squares (IRLS) (Holland and Welsch 1977). This problem is solved with with the multinom function of the nnet package (Ripley et al. 2016). \(Q_2\) is the log-likelihood corresponding to a negative binomial generalized linear model. Its maximisation is solved through an alternating iteration process provided by the glm.nb function in the MASS package (Ripley et al. 2013). Within each segment s, for a given value of \(r^{(c)}_s\) the linear model is fitted using an IRLS method. Next for fixed found \({\varvec{\gamma }}^{(c)}_s\) parameters, the \(r^{(c)}_s\) parameter is estimated with score and information iterations. The two steps are alterned until convergence and \({\varvec{\gamma }}^{(c+1)}_s\) and \(r^{(c+1)}_s\) are found. Note that \(\tau ^{(c)}_{j,s}\) are here used as prior weights in the fitting process. The criterion \(Q_3\), which is associated to a weighted Dirichlet multinomial regression model, is solved with the MGLM package (Kim et al. 2018). Because Dirichlet multinomial distribution does not belong to the exponential family, IRLS method is not used, as the expected information matrix is difficult to calculate. The method used here combine the minorization-maximization (MM) (Lange et al. 2000) algorithm and the Newton’s method. MM and Newton updates are computed at each iteration and the one with the higher log-likelihood is chosen.

Appendix B: Poisson log-normal mixture model estimation

The series \(\textbf{y}_{j,t}\) are distributed according to the following mixture model:

$$\begin{aligned} p(\textbf{y}_{j,t};{\varvec{\alpha }}, {\varvec{\rho }}, {\varvec{\Sigma }}) = \sum _{s=1}^S \pi _s(j;{\varvec{\alpha }}) \int _{R^P}\left[ \prod _{p=1}^P g(\textbf{y}_{j,t,p}|\theta _{j,t,p})\right] m(\theta _{j,t}|{\varvec{\rho }}_s, {\varvec{\Sigma }}_s)d\theta _{j,t}, \end{aligned}$$
(B11)

with \({\varvec{\rho }} = ({\varvec{\rho }}_s)_{s=1,...,S}\) and \({\varvec{\Sigma }} = ({\varvec{\Sigma }}_s)_{s=1,...,S}\). g is a Poisson distribution and m a Gaussian distribution function. The EM algorithm can be used for parameter estimation but finding the expected value of the complete data log-likelihood requires estimating the conditional expectations \(\mathop {\mathbb {E}}(Z_{js}\theta _{j,t}|\textbf{y}_{j,t},{\varvec{\rho }}_s,{\varvec{\Sigma }}_s)\) and \(\mathop {\mathbb {E}}(Z_{js}\theta _{j,t}\theta '_{j,t}|\textbf{y}_{j,t},{\varvec{\rho }}_s,{\varvec{\Sigma }}_s)\) which are intractable. These conditional expectations can be calculated with an EM algorithm coupled with a Markov chain Monte Carlo (MCMC-EM) algorithm as presented in the work of Silva et al. (2019), which however comes with a heavy calculation load. We refer instead to the work presented by Chiquet et al. (2019) that uses variational approximation which is an approximate inference technique. The idea behind variational inference is to use Gaussian densities and approximate complex posterior distributions by minimizing the Kullback–Leibler divergence between the true \(p(\theta )\) and approximating densities \(q(\theta )\). The marginal log-likelihood for \(\textbf{y}_{j,t}\) can be written as

$$\begin{aligned} \log p(\textbf{y}_{j,t}) = F(q(\theta _{j,t}), \textbf{y}_{j,t}) + D_{KL}(q(\theta _{j,t})|p(\theta _{j,t})), \end{aligned}$$
(B12)

with \(D_{KL}(q(\theta _{j,t})|p(\theta _{j,t}))\) the Kullback-Leibler divergence between \(p(\theta _{j,t})\) and \(q(\theta _{j,t})\). \(F(q(\theta _{j,t}), \textbf{y}_{j,t})\) is the expression of the variational lower bound of the log-likelihood. This is the criterion that we aim to maximize in the parameter estimation process. In the case of the Poisson-Lognormal model, q is assumed to be a Gaussian distribution:

$$\begin{aligned} q(\theta _{j,t}; \textbf{m}_{j,t}, \textbf{S}_{j,t}) = {\mathcal {N}}(\theta _{j,t}; \textbf{m}_{j,t}, \textbf{S}_{j,t}), \end{aligned}$$
(B13)

with \(\textbf{m}_{j,t}\) and \(\textbf{S}_{j,t} = diag(\textbf{S}_{j,t})\) the variational parameters associated with sample \(\textbf{y}_{j,t}\) at day j and time slot t. To minimize the Kullback–Leibler divergence, the variational lower bound has to be maximized. The complete data log-likelihood can be written as follows:

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_c({\varvec{\alpha }}, {\varvec{\rho }}, {\varvec{\Sigma }}, {\varvec{m}}, {\varvec{S}})&= \sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T z_{j,s} \log (\pi _s(j;{\varvec{\alpha }}))+\\ {}&\sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T z_{j,s}[F(q^{(s)}(\theta _{j,t}), \textbf{y}_{j,t}) + D_{KL}(q^{(s)}(\theta _{j,t})|p^{(s)}(\theta _{j,t}))], \end{aligned} \end{aligned}$$
(B14)

where \(D_{KL}(q^{(s)}(\theta _{j,t})|p^{(s)}(\theta _{j,t}))\) is the Kullback–Leibler divergence between \(p(\theta _{j,t}|\textbf{y}_{j,t}, z_{j}=s)\) and \(q^{(s)}(\theta _{j,t})\) with \(q^{(s)}(\theta _{j,t}) = {\mathcal {N}}(\textbf{m}_{j,t}^{(s)}, \textbf{S}_{j,t}^{(s)})\). And the variational lower bound of the log-likelihood for each observation \(\textbf{y}_{j,t}\) is

$$\begin{aligned} \begin{aligned} & F(q^{(s)}(\theta _{j,t}), \textbf{y}_{j,t})= \frac{1}{2}\log |\textbf{S}_{j,t}^{(s)}|- \frac{1}{2}(\textbf{m}_{j,t}^{(s)} - \textbf{x}_{j,t}^T{\varvec{\rho }}_{s})'{\varvec{\Sigma }}_s^{-1}(\textbf{m}_{j,t}^{(s)} - \textbf{x}_{j,t}^T{\varvec{\rho }}_{s}) - tr({\varvec{\Sigma }}_s^{-1}\textbf{S}_{j,t}^{(s)}) - \\&\quad \frac{1}{2}\log |{\varvec{\Sigma }}_s|- \frac{P}{2} + (\textbf{m}^{(s)})^{'}_{j,t}\textbf{y}_{j,t} - \sum _{p=1}^P(\exp (m_{j,t,p}^{(s)} + \frac{1}{2}s_{j,t,p}^{(s)}) + \log (y_{j,t,p}!)). \end{aligned} \end{aligned}$$
(B15)

The EM algorithm is used to estimate the parameters and the following two steps are repeated until convergence.

  • Expectation step (E) The expectation of the completed log-likelihood is evaluated knowing the observed data Y, the set of current parameters \({\varvec{\rho }}^{(c)}\), \({\varvec{\Sigma }}^{(c)}\) and \({\varvec{\alpha }}^{(c)}\) and variational parameters \(\textbf{m}^{(c)}_{j,t}\), \(\textbf{S}^{(c)}_{j,t}\).

    $$\begin{aligned} \begin{aligned} Q({\varvec{\rho }}^{(c)},{\varvec{\Sigma }}^{(c)},{\varvec{\alpha }}^{(c)},\textbf{m}^{(c)},\textbf{S}^{(c)})&= \sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T \tau _{j,s}^{(c)} \log (\pi _s(j;{\varvec{\alpha }}^{(c)})) +\\&\sum _{s=1}^S\sum _{j=1}^J\sum _{t=1}^T \tau _{j,s}^{(c)} E_{{\varvec{\rho }}^{(c)}, {\varvec{\Sigma }}^{(c)}, {\varvec{\alpha }}^{(c)}, m^{(c)}_{j,t}, S^{(c)}_{j,t}}[F(q^{(s)}(\theta _{j,t}), \textbf{y}_{j,t}) +\\&D_{KL}(q^{(s)}(\theta _{j,t})|p^{(s)}(\theta _{j,t}))], \end{aligned} \end{aligned}$$
    (B16)

    with \( \tau _{j,s}^{(c)} = E_{{\varvec{\rho }}^{(c)}, {\varvec{\Sigma }}^{(c)}, {\varvec{\alpha }}^{(c)}, m^{(c)}_{j,t}, S^{(c)}_{j,t}}[z_{j,s}|Y]\). The variational lower bound of the log-likelihood is used to approximate \( \tau _{j,s}^{(c)}\):

    $$\begin{aligned} \tau _{j,s}^{(c)} = \frac{\pi _s(j;{\varvec{\alpha }}^{(c)})\prod _{t=1}^T \exp (F(q^{(s)}(\theta _{j,t}), \textbf{y}_{j,t}))}{\sum _{h=1}^S\pi _h(j;{\varvec{\alpha }}^{(c)})\prod _{t=1}^T \exp (F(q^{(h)}(\theta _{j,t}), \textbf{y}_{j,t}))}. \end{aligned}$$
    (B17)

    Note that this approximation is used in the R package PLNmodels.

  • Maximization step (M) The maximization step is divided into two parts:

    • Conditionally on \({\varvec{\rho }}_s\) and \({\varvec{\Sigma }}_s\) and given \(\tau _{j,s}\), variational parameters \(\textbf{m}^{(c)}_{j,t}\) and \(\textbf{S}^{(c)}_{j,t}\) are updated. Because \(F(q^{(s)}(\theta _{j,t}), \textbf{y}_{j,t})\) is strictly concave with respect to \(\textbf{m}^{(c)}_{j,t}\) and \(\textbf{S}^{(c)}_{j,t}\), it is possible to obtain \(\textbf{S}^{(c+1)}_{j,t}\) with the fixed-point method and \(\textbf{m}^{(c+1)}_{j,t}\) with Newton’s method.

    • Knowing \(\tau _{j,s}^{(c)}\), \(\textbf{m}^{(c+1)}_{j,t}\) and \(\textbf{S}^{(c+1)}_{j,t}\) parameters \({\varvec{\rho }}^{(c+1)}\), \({\varvec{\Sigma }}^{(c+1)}\) and \({\varvec{\alpha }}^{(c+1)}\) are obtained.

Appendix C: Description of the time segments

See Table 5.

Table 5 Time segmentation

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Nailly, P., Côme, E., Oukhellou, L. et al. Multivariate count time series segmentation with “sums and shares” and Poisson lognormal mixture models: a comparative study using pedestrian flows within a multimodal transport hub. Adv Data Anal Classif (2023). https://doi.org/10.1007/s11634-023-00543-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11634-023-00543-9

Keywords

Mathematical Subject Classification

Navigation