Abstract
Geospatial analytics increasingly rely on data fusion methods to extract patterns from data; however robust results are difficult to achieve because of the need for spatial and temporal regularization and latent structures within data. Tensor decomposition is a promising approach because it can accommodate multidimensional structure of data (e.g., trajectory information about users, locations, and time periods). To address these challenges, we introduce Multi-Type Clustering using Regularized tensor Decomposition (MCRD), an innovative method for data analysis that provides insight not just about groupings within data types (e.g., clusters of users), but also about the interactions between data types (e.g., clusters of users and locations) in the latent features of complex multi-type datasets. This is done by combining two innovations. First, a tensor representing spatiotemporal data is decomposed using a novel regularization method to account for structure within the data. Next, within- and cross-type groups are found through the application of novel hypergraph community detection methods to the decomposed results. Experimentation on both synthetic and real trajectory data demonstrates MCRD’s capacity to reveal the within- and cross-type grouping in data, and MCRD outperforms related methods including tensor decomposition without regularization, unfolding of tensors, Laplacian regularization, and tensor block models. The robust and versatile analysis provided by combining new regularization and clustering techniques outlined in this paper likely have utility in geospatial analytics beyond the movement applications explicitly studied.
Similar content being viewed by others
Availability of data and material
While the is not publicly available at this time, the method for creating the synthetic data used is described in Section 4.1 . As for the real-world data, we used a portion of the Porto dataset available at http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html.
Code Availability
The code is not publicly available at this time.
Notes
Because it is an internal clustering index, the CH criterion is not as meaningful for comparing the results of clustering elements based on factor matrices with different number of factors.
The data can be found at: http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html.
The code used for this can be found at https://github.com/ike002jp/npartite.
References
Acar E, Kolda TG, Dunlavy DM (2011) All-at-once optimization for coupled matrix and tensor factorizations. arXiv:1105.3422
Bader BW, Kolda TG (2006) Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Trans on Math Softw 32(4):635–653. https://doi.org/10.1145/1186785.1186794
Bader BW, Kolda TG et al (2015) Matlab tensor toolbox version 2.6. Available online. http://www.sandia.gov/tgkolda/TensorToolbox/
Battaglino C, Ballard G, Kolda TG (2018) A practical randomized cp tensor decomposition. SIAM J Matrix Anal Appl 39(2):876–901
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3(1):1–27
Castro PS, Zhang D, Chen C, Li S, Pan G (2013) From taxi gps traces to social and community dynamics: a survey. ACM Computing Surveys (CSUR) 46(2):1–34
Chi EC, Gaines BR, Sun WW, Zhou H, Yang J (2018) Provable convex co-clustering of tensors. arXiv:1803.06518
Comon P, Luciani X, De Almeida AL (2009) Tensor decompositions, alternating least squares and other tales. J Chemom: A J Chemom Soc 23(7-8):393–405
Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech: Theory and Exp 2005(09):P09008
Gauvin L, Panisson A, Cattuto C (2014) Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PloS One 9(1):e86028
Grauwin S, Sobolevsky S, Moritz S, Gódor I, Ratti C (2015) Towards a comparative science of cities: Using mobile traffic records in new york, london, and hong kong. In: Computational approaches for urban environments, Springer, pp 363–387
Haass MJ, Van Benthem MH, Ochoa EM (2014) Tensor analysis methods for activity characterization in spatiotemporal data. Sandia Tech Report SAND2014–1825
Hong D, Kolda TG, Duersch JA (2018) Generalized canonical polyadic tensor decomposition. arXiv:abs/1808.07452
Ikematsu K, Murata T (2013) A fast method for detecting communities from tripartite networks. In: Int conferen on soc inform, Springer, pp 192–205
Ioannidis VN, Zamzam AS, Giannakis GB, Sidiropoulos ND (2018) Coupled graphs and tensor factorization for recommender systems and community detection. arXiv:1809.08353
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500. https://doi.org/10.1137/07070111x
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. pp 556–562
Li X, Li M, Gong YJ, Zhang XL, Yin J (2016) T-desp: Destination prediction based on big trajectory data. IEEE Transactions on Intell Transp Syst 17(8):2344–2354
Lin YR, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) Metafac: community discovery via relational hypergraph factorization. In: Proc of the 15th ACM SIGKDD int conferen on knowl discov and data min, ACM, pp 527–536
Liu JX, Wang D, Gao YL, Zheng CH, Xu Y, Yu J (2017) Regularized non-negative matrix factorization for identifying differentially expressed genes and clustering samples: a survey. IEEE/ACM Trans on Computl Biolog and Bioinform 15(3):974–987
Liu L, Andris C, Ratti C (2010) Uncovering cabdrivers’ behavior patterns from their digital traces. Comput Environ Urban Syst 34(6):541–548
Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: 2010 IEEE International conference on data mining, IEEE, pp 911–916
Moreira-Matias L, Gama J, Ferreira M, Mendes-Moreira J, Damas L (2016) Time-evolving od matrix estimation using high-speed gps data streams. Expert Systems with Applications 44:275–288
Moreira-Matias L, Gama J, Ferreira M, Moreira J, Damas L (2013) Predicting taxi-passenger demand using streaming data. IEEE Trans on Intell Transp Syst 14:1393–1402. https://doi.org/10.1109/TITS.2013.2262376
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. The Comput J 26(4):354–359
Narita A, Hayashi K, Tomioka R, Kashima H (2012) Tensor factorization using auxiliary information. Data Min and Knowl Discov 25(2):298–324
Neubauer N, Obermayer K (2010) Community detection in tagging-induced hypergraphs. In: Workshop on inform in netw. New York University NY, USA, pp 24–25
Ouvrard X, Goff JL, Marchand-Maillet S (2017) Adjacency and tensor representation in general hypergraphs part 1: e-adjacency tensor uniformisation using homogeneous polynomials. arXiv:1712.08189
Phithakkitnukoon S, Veloso M, Bento C, Biderman A, Ratti C (2010) Taxi-aware map: Identifying and predicting vacant taxis in the city. In: International joint conference on ambient intelligence, Springer, pp 86–95
Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proc of the 22nd int conferen on mach learn, ACM, pp 792–799
Sun L, Axhausen KW (2016) Understanding urban mobility patterns with a probabilistic tensor factorization framework. Transp Res Part B: Methodol 91:511–524
Takeuchi K, Tomioka R, Ishiguro K, Kimura A, Sawada H (2013) Non-negative multiple tensor factorization. In: 2013 IEEE 13Th int conferen on data min, IEEE, pp 1199–1204
Vervliet N, Debals O, Sorber L, Van Barel M, De Lathauwer L (2016) Tensorlab 3.0. https://www.tensorlab.net. Available online
Wang M, Zeng Y (2019) Multiway clustering via tensor block models. In: Adv in neural inf process sys, pp 713–723
Wang Y, Zheng Y, Xue Y (2014) Travel time estimation of a path using sparse trajectories. In: Proc of the 20th ACM SIGKDD int conferen on knowl discov and data min, ACM, pp 25–34
Wu R, Luo G, Jin Q, Shao J, Lu CT (2020) Learning evolving user’s behaviors on location-based social networks. GeoInformatica, pp 1–31
Wu T, Benson AR, Gleich DF (2016) General tensor spectral co-clustering for higher-order data. In: Adv in neural inf process syst, pp 2559–2567
Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J on Imaging Sci 6(3):1758–1789
Yao L, Sheng QZ, Qin Y, Wang X, Shemshadi A, He Q (2015) Context-aware point-of-interest recommendation using tensor factorization with social regularization. In: Proc of the 38th int ACM SIGIR conferen on res and dev in inf retr, ACM, pp 1007–1010
Yılmaz KY, Cemgil AT, Simsekli U (2011) Generalised coupled tensor factorisation. In: Adv in neural inf process syst, pp 2151–2159
Zheng Y (2015) Trajectory data mining: an overview. ACM Trans on Intell Syst Technol (TIST) 6(3):29
Zheng Y, Liu T, Wang Y, Zhu Y, Liu Y, Chang E (2014) Diagnosing new york city’s noises with ubiquitous data. In: Proc of the 2014 ACM int jt conferen on pervasive and ubiquitous comput, ACM, pp 715–725
Zheng Y, Liu Y, Yuan J, Xie X (2011) Urban computing with taxicabs. In: Proceedings of the 13th international conference on Ubiquitous computing, pp 89–98
Zheng Y, Zhou X (2011) Computing with spatial trajectories. Springer Science & Business Media
Acknowledgements
This work was supported by the US Army Engineer Research and Development Center, Geospatial Research Engineering basic research program. Any opinions expressed in this paper are those of the authors, and are not to be construed as official positions of the funding agency or the Department of the Army unless so designated by other authorized documents.
Funding
This work was supported by the U.S. Army Engineer Research and Development Center, Geospatial Research Engineering basic research program.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no conflicts of interest or competing interests.
Conflict of Interests
The authors have no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ellison, C.L., Fields, W.R. Multi-type clustering using regularized tensor decomposition. Geoinformatica 26, 707–743 (2022). https://doi.org/10.1007/s10707-021-00457-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-021-00457-8