Skip to main content

Advertisement

Log in

City indicators for geographical transfer learning: an application to crash prediction

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The massive and increasing availability of mobility data enables the study and the prediction of human mobility behavior and activities at various levels. In this paper, we tackle the problem of predicting the crash risk of a car driver in the long term. This is a very challenging task, requiring a deep knowledge of both the driver and their surroundings, yet it has several useful applications to public safety (e.g. by coaching high-risk drivers) and the insurance market (e.g. by adapting pricing to risk). We model each user with a data-driven approach based on a network representation of users’ mobility. In addition, we represent the areas in which users moves through the definition of a wide set of city indicators that capture different aspects of the city. These indicators are based on human mobility and are automatically computed from a set of different data sources, including mobility traces and road networks. Through these city indicators we develop a geographical transfer learning approach for the crash risk task such that we can build effective predictive models for another area where labeled data is not available. Empirical results over real datasets show the superiority of our solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of Data and Material

The vehicles datasets adopted in this work are private, and were provided within the scope of the Track & Know project (https://trackandknowproject.eu/), while the city indicators can be requested through the project web site.

Code Availability

The code is open source, and can be downloaded at: https://github.com/riccotti/CrashPrediction

Notes

  1. https://tinyurl.com/32k589z2

  2. We refer the interested reader to: https://christophm.github.io/interpretable-ml-book/shapley.html

  3. The source code is available at: https://github.com/riccotti/CrashPrediction. The city indicators used in this paper can be obtained from the Track & Know project website (see next footnote), while the mobility datasets are proprietary, and cannot be publicly shared.

  4. https://trackandknowproject.eu/

  5. The drivers were sampled among those that had consistent data throughout the 12 months, and also ensuring to keep all those that had at least one crash in the year. This latter step was not possible on Dataset 2, a side effect being that Dataset 1 has a higher percentage of crash events.

  6. Cross-validation was also tested, yet results do not change in any significant way.

  7. https://lightgbm.readthedocs.io/en/latest/index.html

  8. https://scikit-learn.org/stable/

  9. https://keras.io/

  10. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html

  11. In particular, we used RF with 100 estimators, allowing leaves with at least 1% of the training data, and with a cost matrix weighting a crash 100 times more than a no crash.

  12. An ablation study (omitted due to space limits) showed that both IMN- and context-based features significantly contributed to such performances.

References

  1. Longhi L, Nanni M (2019) Car telematics big data analytics for insurance and innovative mobility services. J Ambient Intell Humanized Comput 11:3989–3999

  2. Wang Y, Xu W, Zhang Y, Qin Y, Zhang W, Wu X (2017) Machine learning methods for driving risk prediction. In: Proceedings of the 3rd ACM SIGSPATIAL workshop on emergency management using, p 10. ACM

  3. Lee C, Hellinga B, Saccomanno F (2003) Real-time crash prediction model for application to crash prevention in freeway traffic. Transportation Research Record 1840(1):67–77

    Article  Google Scholar 

  4. Ba Y et al (2017) Crash prediction with behavioral and physiological features for advanced vehicle collision avoidance system. Transportation Research Part C: Emerging Technologies 74:22–33

    Article  Google Scholar 

  5. Cruz LA, et al (2019) Trajectory prediction from a mass of sparse and missing external sensor data. In: 2019 20th IEEE International conference on mobile data management (MDM), pp 310–319. IEEE

  6. Guidotti R, Nanni M (2020) Crash prediction and risk assessment with individual mobility networks. In: 2020 21st IEEE International conference on mobile data management (MDM), pp 89–98. IEEE

  7. Rinzivillo S, et al (2014) The purpose of motion: Learning activities from individual mobility networks. In: 2014 International conference on data science and advanced analytics (DSAA), pp 312–318. IEEE

  8. Guidotti R, et al (2017) There’s a path for everyone: A data-driven personal model reproducing mobility agendas. In: 2017 IEEE International conference on data science and advanced analytics (DSAA), pp 303–312. IEEE

  9. Nanni M, Bonavita A, Guidotti R (2021) City indicators for mobility data mining. In: Big mobility data analytics (BMDA). CEUR

  10. Wang J, Xu W, Gong Y (2010) Real-time driving danger level prediction. Google Patents. US Patent 7,839,292

  11. Salim FD, Loke SW, Rakotonirainy A, Srinivasan B, Krishnaswamy S (2007) Collision pattern modeling and real-time collision detection at road intersections. In: 2007 IEEE Intelligent transportation systems conference, pp 161–166. IEEE

  12. Abdel-Aty MA, Pemmanaboina R (2006) Calibrating a real-time traffic crash-prediction model using archived weather and its traffic data. IEEE Transactions on Intelligent Transportation Systems 7(2):167–174

    Article  Google Scholar 

  13. Mannering FL, Bhat CR (2014) Analytic methods in accident research: Methodological frontier and future directions. Analytic Methods in Accident Research 1:1–22

    Article  Google Scholar 

  14. Kweon Y-J et al (2011) Development of crash prediction models with individual vehicular data. Transportation Research Part C: Emerging Technologies 19(6):1353–1363

    Article  Google Scholar 

  15. Lord D, Mannering F (2010) The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation research part A: policy and practice 44(5):291–305

    Google Scholar 

  16. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359

    Article  Google Scholar 

  17. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proceedings of the IEEE PP, 1–34. https://doi.org/10.1109/JPROC.2020.3004555

  18. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10):1345–1359

    Article  Google Scholar 

  19. Bazzi H, Ienco D, Baghdadi N, Zribi M, Demarez V (2020) Distilling before refine: Spatio-temporal transfer learning for mapping irrigated areas using Sentinel-1 time series. IEEE Geoscience and Remote Sensing Letters 17(11):1909–1913. https://doi.org/10.1109/LGRS.2019.2960625

    Article  Google Scholar 

  20. Syrris V, Pesek O, Soille P (2020) Satimnet: Structured and harmonised training data for enhanced satellite imagery classification. Remote Sensing 12:3358. https://doi.org/10.3390/rs12203358

    Article  Google Scholar 

  21. Bappee FK, Soares A, Petry LM, Matwin S (2021) Examining the impact of cross-domain learning on crime prediction. J. Big Data 8(1):96. https://doi.org/10.1186/s40537-021-00489-9

    Article  Google Scholar 

  22. Liu Z, Shen Y, Zhu Y (2018) Where will dockless shared bikes be stacked? — parking hotspots detection in a new city. In: Proc. of the 24th ACM SIGKDD. KDD ’18, pp 566–575. ACM, New York, NY, USA. https://doi.org/10.1145/3219819.3219920

  23. Iddianozie C, McArdle G (2019) A transfer learning paradigm for spatial networks. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing. SAC ’19, pp. 659–666. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3297280.3297342

  24. Rogerson PA (2010) Statistical methods for geography: a student’s guide. SAGE Publications, New York. https://books.google.ch/books?id=Zz69Ab8i0QsC

  25. De Sherbinin HGA (2003) Bittar: The role of sustainability indicators as a tool for assessing territorial. Environmental Competitiveness; International Forum for Rural Development, London

    Google Scholar 

  26. Nélson A et al (2015) A comparative evaluation of mobility conditions in selected cities of the five brazilian regions. Transport Policy 37:147–156. https://doi.org/10.1016/j.tranpol.2014.10.017

    Article  Google Scholar 

  27. Gillis D, Semanjski I, Lauwers D (2015) How to monitor sustainable mobility in cities? literature review in the frame of creating a set of sustainable mobility indicators. Sustainability 8:29

    Article  Google Scholar 

  28. CITEAIR consortium (2007) Air Quality in Europe web site. [Online; accessed 21-December-2020]. http://www.airqualitynow.eu/

  29. Tafidis P et al (2017) Sustainable urban mobility indicators: policy versus practice in the case of greek cities. Transportation Research Procedia 24:304–312. https://doi.org/10.1016/j.trpro.2017.05.122 (CSUM 2016, 26-27 May 2016, Volos, Greece)

    Article  Google Scholar 

  30. Giannotti F et al (2011) Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB Journal 20(5):695–719

    Article  Google Scholar 

  31. F L, G A, et al (2020) A.N.: Citywide traffic analysis based on the combination of visual and analytic approaches. J Geovis Spat Anal 4(15):1–17

  32. Trasarti R, et al (2011) Mining mobility user profiles for car pooling. In: Proceedings of the 17th ACM SIGKDD International conference on knowledge discovery and data mining, pp 1190–1198. ACM

  33. Guidotti R, Trasarti R, Nanni M (2015) Tosca: two-steps clustering algorithm for personal locations detection. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, p 38. ACM

  34. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774

  35. Shannon CE (1948) A mathematical theory of communication. The Bell System Technical Journal 27(3):379–423

    Article  MathSciNet  MATH  Google Scholar 

  36. Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23

    Article  MathSciNet  MATH  Google Scholar 

  37. Saberi M, Mahmassani HS, Brockmann D, Hosseini A (2017) A complex network perspective for characterizing urban travel demand patterns: graph theoretical analysis of large-scale origin-destination demand networks. Transportation 44(6):1383–1402

    Article  Google Scholar 

  38. Blondel VD, Guillaume J-L, Lambiotte R (2008) Lefebvre E (2008) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10:10008

    Article  Google Scholar 

  39. Alonso W (1976) A theory of movements: Introduction. Working Paper 266

  40. Simini F, Gonzalez MC, Maritan A, Barabasi A-L (2012) A universal model for mobility and migration patterns. Nature 484(7392):96–100

    Article  Google Scholar 

  41. Masucci AP, Serras J, Johansson A, Batty M (2013) Gravity versus radiation models: On the importance of scale and heterogeneity in commuting flows. Physical Review E 88(2):022812

    Article  Google Scholar 

  42. Porta S, Crucitti P, Latora V (2006) Centrality measures in spatial networks of urban streets. Physical Review E 73(3, part 2):036125–1

    Article  MATH  Google Scholar 

  43. Tan P-N et al (2005) Introduction to data mining. Pearson Addison Wesley, Boston

    Google Scholar 

  44. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7(85):2399–2434

    MathSciNet  MATH  Google Scholar 

  45. Chakravarti L (1967) R.: Handbook of methods of applied statistics, Volume I. John Wiley and Sons, Hoboken

  46. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of artificial Intelligence Research 16:321–357

    Article  MATH  Google Scholar 

  47. Tan P-N (2018) Introduction to Data Mining. Pearson Education India, Tamil Nadu

    Google Scholar 

  48. Wang X, Khattak AJ, Liu J, Masghati-Amoli G, Son S (2015) What is the level of volatility in instantaneous driving decisions? Transportation Research Part C: Emerging Technologies 58:413–427. https://doi.org/10.1016/j.trc.2014.12.014 (Big Data in Transportation and Traffic Engineering)

    Article  Google Scholar 

  49. Johnson DA, Trivedi MM (2011) Driving style recognition using a smartphone as a sensor platform. In: 2011 14th International IEEE conference on intelligent transportation systems (ITSC), pp 1609–1615. https://doi.org/10.1109/ITSC.2011.6083078

Download references

Acknowledgements

This work is partially supported by the European Community H2020 programme under the funding scheme Track&Know (Big Data for Mobility Tracking Knowledge Extraction in Urban Areas), G.A. 780754, https://trackandknowproject.eu/; and SoBigData++, G.A. 871042, http://www.sobigdata.eu.

Funding

This work is partially supported by the European Community H2020 programme under the funding scheme Track&Know (Big Data for Mobility Tracking Knowledge Extraction in Urban Areas), G.A. 780754, https://trackandknowproject.eu/; and SoBigData++, G.A. 871042, http://www.sobigdata.eu.

Author information

Authors and Affiliations

Authors

Contributions

Mirco Nanni: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Writing, Supervision, Project administration, Funding acquisition. Riccardo Guidotti: Conceptualization, Methodology, Formal analysis, Investigation, Software, Writing, Supervision. Agnese Bonavita: Methodology, Software, Validation, Investigation, Writing, Visualization. Omid Isfahani Alamdari: Methodology, Software, Validation, Investigation, Writing, Visualization.

Corresponding author

Correspondence to Mirco Nanni.

Ethics declarations

Conflicts of interest

Not applicable

Ethics Approval

Not applicable

Consent to Participate

Not applicable

Consent for Publication

Not applicable

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nanni, M., Guidotti, R., Bonavita, A. et al. City indicators for geographical transfer learning: an application to crash prediction. Geoinformatica 26, 581–612 (2022). https://doi.org/10.1007/s10707-022-00464-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-022-00464-3

Keywords

Navigation