Abstract
The massive and increasing availability of mobility data enables the study and the prediction of human mobility behavior and activities at various levels. In this paper, we tackle the problem of predicting the crash risk of a car driver in the long term. This is a very challenging task, requiring a deep knowledge of both the driver and their surroundings, yet it has several useful applications to public safety (e.g. by coaching high-risk drivers) and the insurance market (e.g. by adapting pricing to risk). We model each user with a data-driven approach based on a network representation of users’ mobility. In addition, we represent the areas in which users moves through the definition of a wide set of city indicators that capture different aspects of the city. These indicators are based on human mobility and are automatically computed from a set of different data sources, including mobility traces and road networks. Through these city indicators we develop a geographical transfer learning approach for the crash risk task such that we can build effective predictive models for another area where labeled data is not available. Empirical results over real datasets show the superiority of our solution.
Similar content being viewed by others
Availability of Data and Material
The vehicles datasets adopted in this work are private, and were provided within the scope of the Track & Know project (https://trackandknowproject.eu/), while the city indicators can be requested through the project web site.
Code Availability
The code is open source, and can be downloaded at: https://github.com/riccotti/CrashPrediction
Notes
We refer the interested reader to: https://christophm.github.io/interpretable-ml-book/shapley.html
The source code is available at: https://github.com/riccotti/CrashPrediction. The city indicators used in this paper can be obtained from the Track & Know project website (see next footnote), while the mobility datasets are proprietary, and cannot be publicly shared.
The drivers were sampled among those that had consistent data throughout the 12 months, and also ensuring to keep all those that had at least one crash in the year. This latter step was not possible on Dataset 2, a side effect being that Dataset 1 has a higher percentage of crash events.
Cross-validation was also tested, yet results do not change in any significant way.
In particular, we used RF with 100 estimators, allowing leaves with at least 1% of the training data, and with a cost matrix weighting a crash 100 times more than a no crash.
An ablation study (omitted due to space limits) showed that both IMN- and context-based features significantly contributed to such performances.
References
Longhi L, Nanni M (2019) Car telematics big data analytics for insurance and innovative mobility services. J Ambient Intell Humanized Comput 11:3989–3999
Wang Y, Xu W, Zhang Y, Qin Y, Zhang W, Wu X (2017) Machine learning methods for driving risk prediction. In: Proceedings of the 3rd ACM SIGSPATIAL workshop on emergency management using, p 10. ACM
Lee C, Hellinga B, Saccomanno F (2003) Real-time crash prediction model for application to crash prevention in freeway traffic. Transportation Research Record 1840(1):67–77
Ba Y et al (2017) Crash prediction with behavioral and physiological features for advanced vehicle collision avoidance system. Transportation Research Part C: Emerging Technologies 74:22–33
Cruz LA, et al (2019) Trajectory prediction from a mass of sparse and missing external sensor data. In: 2019 20th IEEE International conference on mobile data management (MDM), pp 310–319. IEEE
Guidotti R, Nanni M (2020) Crash prediction and risk assessment with individual mobility networks. In: 2020 21st IEEE International conference on mobile data management (MDM), pp 89–98. IEEE
Rinzivillo S, et al (2014) The purpose of motion: Learning activities from individual mobility networks. In: 2014 International conference on data science and advanced analytics (DSAA), pp 312–318. IEEE
Guidotti R, et al (2017) There’s a path for everyone: A data-driven personal model reproducing mobility agendas. In: 2017 IEEE International conference on data science and advanced analytics (DSAA), pp 303–312. IEEE
Nanni M, Bonavita A, Guidotti R (2021) City indicators for mobility data mining. In: Big mobility data analytics (BMDA). CEUR
Wang J, Xu W, Gong Y (2010) Real-time driving danger level prediction. Google Patents. US Patent 7,839,292
Salim FD, Loke SW, Rakotonirainy A, Srinivasan B, Krishnaswamy S (2007) Collision pattern modeling and real-time collision detection at road intersections. In: 2007 IEEE Intelligent transportation systems conference, pp 161–166. IEEE
Abdel-Aty MA, Pemmanaboina R (2006) Calibrating a real-time traffic crash-prediction model using archived weather and its traffic data. IEEE Transactions on Intelligent Transportation Systems 7(2):167–174
Mannering FL, Bhat CR (2014) Analytic methods in accident research: Methodological frontier and future directions. Analytic Methods in Accident Research 1:1–22
Kweon Y-J et al (2011) Development of crash prediction models with individual vehicular data. Transportation Research Part C: Emerging Technologies 19(6):1353–1363
Lord D, Mannering F (2010) The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation research part A: policy and practice 44(5):291–305
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proceedings of the IEEE PP, 1–34. https://doi.org/10.1109/JPROC.2020.3004555
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10):1345–1359
Bazzi H, Ienco D, Baghdadi N, Zribi M, Demarez V (2020) Distilling before refine: Spatio-temporal transfer learning for mapping irrigated areas using Sentinel-1 time series. IEEE Geoscience and Remote Sensing Letters 17(11):1909–1913. https://doi.org/10.1109/LGRS.2019.2960625
Syrris V, Pesek O, Soille P (2020) Satimnet: Structured and harmonised training data for enhanced satellite imagery classification. Remote Sensing 12:3358. https://doi.org/10.3390/rs12203358
Bappee FK, Soares A, Petry LM, Matwin S (2021) Examining the impact of cross-domain learning on crime prediction. J. Big Data 8(1):96. https://doi.org/10.1186/s40537-021-00489-9
Liu Z, Shen Y, Zhu Y (2018) Where will dockless shared bikes be stacked? — parking hotspots detection in a new city. In: Proc. of the 24th ACM SIGKDD. KDD ’18, pp 566–575. ACM, New York, NY, USA. https://doi.org/10.1145/3219819.3219920
Iddianozie C, McArdle G (2019) A transfer learning paradigm for spatial networks. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing. SAC ’19, pp. 659–666. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3297280.3297342
Rogerson PA (2010) Statistical methods for geography: a student’s guide. SAGE Publications, New York. https://books.google.ch/books?id=Zz69Ab8i0QsC
De Sherbinin HGA (2003) Bittar: The role of sustainability indicators as a tool for assessing territorial. Environmental Competitiveness; International Forum for Rural Development, London
Nélson A et al (2015) A comparative evaluation of mobility conditions in selected cities of the five brazilian regions. Transport Policy 37:147–156. https://doi.org/10.1016/j.tranpol.2014.10.017
Gillis D, Semanjski I, Lauwers D (2015) How to monitor sustainable mobility in cities? literature review in the frame of creating a set of sustainable mobility indicators. Sustainability 8:29
CITEAIR consortium (2007) Air Quality in Europe web site. [Online; accessed 21-December-2020]. http://www.airqualitynow.eu/
Tafidis P et al (2017) Sustainable urban mobility indicators: policy versus practice in the case of greek cities. Transportation Research Procedia 24:304–312. https://doi.org/10.1016/j.trpro.2017.05.122 (CSUM 2016, 26-27 May 2016, Volos, Greece)
Giannotti F et al (2011) Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB Journal 20(5):695–719
F L, G A, et al (2020) A.N.: Citywide traffic analysis based on the combination of visual and analytic approaches. J Geovis Spat Anal 4(15):1–17
Trasarti R, et al (2011) Mining mobility user profiles for car pooling. In: Proceedings of the 17th ACM SIGKDD International conference on knowledge discovery and data mining, pp 1190–1198. ACM
Guidotti R, Trasarti R, Nanni M (2015) Tosca: two-steps clustering algorithm for personal locations detection. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, p 38. ACM
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774
Shannon CE (1948) A mathematical theory of communication. The Bell System Technical Journal 27(3):379–423
Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23
Saberi M, Mahmassani HS, Brockmann D, Hosseini A (2017) A complex network perspective for characterizing urban travel demand patterns: graph theoretical analysis of large-scale origin-destination demand networks. Transportation 44(6):1383–1402
Blondel VD, Guillaume J-L, Lambiotte R (2008) Lefebvre E (2008) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10:10008
Alonso W (1976) A theory of movements: Introduction. Working Paper 266
Simini F, Gonzalez MC, Maritan A, Barabasi A-L (2012) A universal model for mobility and migration patterns. Nature 484(7392):96–100
Masucci AP, Serras J, Johansson A, Batty M (2013) Gravity versus radiation models: On the importance of scale and heterogeneity in commuting flows. Physical Review E 88(2):022812
Porta S, Crucitti P, Latora V (2006) Centrality measures in spatial networks of urban streets. Physical Review E 73(3, part 2):036125–1
Tan P-N et al (2005) Introduction to data mining. Pearson Addison Wesley, Boston
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7(85):2399–2434
Chakravarti L (1967) R.: Handbook of methods of applied statistics, Volume I. John Wiley and Sons, Hoboken
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of artificial Intelligence Research 16:321–357
Tan P-N (2018) Introduction to Data Mining. Pearson Education India, Tamil Nadu
Wang X, Khattak AJ, Liu J, Masghati-Amoli G, Son S (2015) What is the level of volatility in instantaneous driving decisions? Transportation Research Part C: Emerging Technologies 58:413–427. https://doi.org/10.1016/j.trc.2014.12.014 (Big Data in Transportation and Traffic Engineering)
Johnson DA, Trivedi MM (2011) Driving style recognition using a smartphone as a sensor platform. In: 2011 14th International IEEE conference on intelligent transportation systems (ITSC), pp 1609–1615. https://doi.org/10.1109/ITSC.2011.6083078
Acknowledgements
This work is partially supported by the European Community H2020 programme under the funding scheme Track&Know (Big Data for Mobility Tracking Knowledge Extraction in Urban Areas), G.A. 780754, https://trackandknowproject.eu/; and SoBigData++, G.A. 871042, http://www.sobigdata.eu.
Funding
This work is partially supported by the European Community H2020 programme under the funding scheme Track&Know (Big Data for Mobility Tracking Knowledge Extraction in Urban Areas), G.A. 780754, https://trackandknowproject.eu/; and SoBigData++, G.A. 871042, http://www.sobigdata.eu.
Author information
Authors and Affiliations
Contributions
Mirco Nanni: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Writing, Supervision, Project administration, Funding acquisition. Riccardo Guidotti: Conceptualization, Methodology, Formal analysis, Investigation, Software, Writing, Supervision. Agnese Bonavita: Methodology, Software, Validation, Investigation, Writing, Visualization. Omid Isfahani Alamdari: Methodology, Software, Validation, Investigation, Writing, Visualization.
Corresponding author
Ethics declarations
Conflicts of interest
Not applicable
Ethics Approval
Not applicable
Consent to Participate
Not applicable
Consent for Publication
Not applicable
Rights and permissions
About this article
Cite this article
Nanni, M., Guidotti, R., Bonavita, A. et al. City indicators for geographical transfer learning: an application to crash prediction. Geoinformatica 26, 581–612 (2022). https://doi.org/10.1007/s10707-022-00464-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-022-00464-3