Abstract
Accurately estimating packages’ arrival time in e-commerce can enhance users’ shopping experience and improve the placement rate of products. This problem is often formalized as an Origin-Destination (OD)-based ETA (i.e., estimated time of arrival) prediction task, where the delivery time is estimated mainly based on sender and receiver addresses and other context information. One inherent challenge of the OD-based ETA problem is that the delivery time highly depends on the actual delivery trajectory which is unknown at the time of prediction. In this article, we tackle this challenge by effectively exploiting historical delivery trajectories. We propose a novel Knowledge Distillation Graph neural network-based package ETA prediction (KDG-ETA) model, which uses knowledge distillation in the training phase to distill the knowledge of historical trajectories into OD pair embeddings. In KDG-ETA, a multi-level trajectory graph representation model is proposed to fully exploit trajectory information at the node-level, edge-level, and path-level. Then, the OD representations embedded with trajectory knowledge are combined with context embeddings from feature extraction module for delivery time prediction using an adaptive attention module. KDG-ETA consistently outperforms existing state-of-the-art OD-based ETA prediction methods on three real-world Alibaba datasets, reducing the Mean Absolute Error (MAE) by 3.0%–39.1% as demonstrated in our extensive empirical evaluation.
- [1] . 2020. Knowledge distillation in deep learning and its applications. CoRR abs/2007.09029 (2020).Google Scholar
- [2] . 2016. Predictive analytics for enhancing travel time estimation in navigation apps of Apple, Google, and Microsoft. In Proceedings of the 9th ACM SIGSPATIAL International Workshop on Computational Transportation Science. ACM, 31–36.Google ScholarDigital Library
- [3] . 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. ACM, 785–794.Google ScholarDigital Library
- [4] . 2020. Sooner or later? Promising delivery speed in online retail. In Promising Delivery Speed in Online Retail.Google Scholar
- [5] . 2021. Attentional feature fusion. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2021). IEEE, 3559–3568.Google ScholarCross Ref
- [6] . 2019. Deep neural networks for predicting vehicle travel times. In Proceedings of the 2019 IEEE SENSORS. 1–4.
DOI: Google ScholarCross Ref - [7] . 2021. End-to-end prediction of parcel delivery time with deep learning for smart-city applications. IEEE Internet Things J. 8, 23 (2021), 17043–17056.Google ScholarCross Ref
- [8] . 2021. Graph-free knowledge distillation for graph neural networks. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI). 2321–2327.Google ScholarCross Ref
- [9] Austin Derrow-Pinion, Jennifer She, David Wong, Oliver Lange, Todd Hester, Luis Perez, Marc Nunkesser, Seongjae Lee, Xueying Guo, Brett Wiltshire, Peter W. Battaglia, Vishal Gupta, Ang Li, Zhongwen Xu, Alvaro Sanchez-Gonzalez, and Yujia Li, Petar Velickovic. 2021. ETA prediction with graph neural networks in google maps. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM’21). ACM, 3767–3776.Google ScholarDigital Library
- [10] . 2016. A survey on trajectory data mining: Techniques and applications. IEEE Access 4 (2016), 2056–2067.Google ScholarCross Ref
- [11] . 2019. Fast graph representation learning with PyTorch geometric. CoRR abs/1903.02428 (2019).Google Scholar
- [12] . 2020. CompactETA: A fast inference system for travel time prediction. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, CA, August 23-27, 2020). ACM, 3337–3345.Google ScholarDigital Library
- [13] . 2020. Ultrafast video attention prediction with coupled knowledge distillation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI). 10802–10809.Google ScholarCross Ref
- [14] . 2019. DeepIST: Deep image-based spatio-temporal network for travel time estimation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM 2019). ACM, 69–78.Google ScholarDigital Library
- [15] . 2021. Modeling spatio-temporal interactions for vehicle trajectory prediction based on graph representation learning. In Proceedings of the 24th IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 1334–1339.Google ScholarDigital Library
- [16] . 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, CA, August 13–17, 2016). ACM, 855–864.Google ScholarDigital Library
- [17] . 2023. Boosting graph neural networks via adaptive knowledge distillation. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI 2023), 35th Conference on Innovative Applications of Artificial Intelligence (IAAI 2023), 13th Symposium on Educational Advances in Artificial Intelligence (EAAI 2023) (Washington, DC, February 7–14, 2023). AAAI Press, 7793–7801.Google ScholarDigital Library
- [18] . 2023. Learning heterogeneous interaction strengths by trajectory prediction with graph neural network. In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, May 1–5, 2023). OpenReview.net.Google Scholar
- [19] . 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Proceedings of the Annual Conference on Neural Information Processing Systems 2017(December 4–9, 2017, Long Beach, CA, USA). 1024–1034.Google Scholar
- [20] . 2021. A graph-based approach for trajectory similarity computation in spatial networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21). ACM, 556–564.Google ScholarDigital Library
- [21] . 2021. Supervised learning for arrival time estimations in restaurant meal delivery. Transportation Science (2021).Google Scholar
- [22] . 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015). http://arxiv.org/abs/1503.02531Google Scholar
- [23] . 2017. A unified neural network approach for estimating travel time and distance for a taxi trip. arXiv: 1710.04350 (2017). http://arxiv.org/abs/1710.04350Google Scholar
- [24] . 2021. On representation knowledge distillation for graph neural networks. arXiv preprint arXiv:2111.04964 (2021).Google Scholar
- [25] . 2018. A convolutional neural network approach for modeling semantic trajectories and predicting future locations. In Proceedings of the Artificial Neural Networks and Machine Learning (ICANN 2018), Vol. 11139. Springer, 61–72.Google ScholarCross Ref
- [26] . 2022. Knowledge distillation with attention for deep transfer learning of convolutional networks. ACM Trans. Knowl. Discov. Data 16, 3 (2022), 42:1–42:20.Google ScholarDigital Library
- [27] . 2018. Multi-task representation learning for travel time estimation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018). ACM, 1695–1704.Google ScholarDigital Library
- [28] . 2021. Unsupervised categorical representation learning for package arrival time prediction. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21). ACM, 3935–3944.Google ScholarDigital Library
- [29] . 2018. xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018) (London, UK, August 19–23, 2018). ACM, 1754–1763.Google ScholarDigital Library
- [30] . 2022. Vehicle trajectory prediction using LSTMs with spatial-temporal attention mechanisms. IEEE Intell. Transp. Syst. Mag. 14, 2 (2022), 197–208.Google ScholarCross Ref
- [31] . 2019. Spatio-temporal GRU for trajectory classification. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1228–1233.Google ScholarCross Ref
- [32] . 2021. A novel road segment representation method for travel time estimation. In Proceedings of the Database Systems for Advanced Applications (DASFAA 2021). Vol. 12680. Springer, 398–413.Google ScholarDigital Library
- [33] . 2020. Multi-scale and multi-scope convolutional neural networks for destination prediction of trajectories. IEEE Trans. Intell. Transp. Syst. 21, 8 (2020), 3184–3195.Google ScholarCross Ref
- [34] . 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14) (New York, NY, August 24–27, 2014). ACM, 701–710.Google ScholarDigital Library
- [35] . 2023. Modeling intra- and inter-community information for route and time prediction in last-mile delivery. In Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE 2023) (Anaheim, CA, April 3–7, 2023). IEEE, 3106–3112.
DOI: Google ScholarCross Ref - [36] . 2010. Travel time estimation using floating car data. arXiv Preprint arXiv:1012.4249 (2010). http://arxiv.org/abs/1012.4249Google Scholar
- [37] . 2022. TTPNet: A neural network for travel time prediction based on tensor decomposition and graph embedding. IEEE Trans. Knowl. Data Eng. 34, 9 (2022), 4514–4526.
DOI: Google ScholarCross Ref - [38] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017(December 4–9, 2017, Long Beach, CA),. 5998–6008.Google Scholar
- [39] . 2019. A simple baseline for travel time estimation using large-scale trip data. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 19:1–19:22.Google ScholarDigital Library
- [40] . 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Trans. Knowl. Data Eng. (2020).Google ScholarCross Ref
- [41] . 2014. Travel time estimation of a path using sparse trajectories. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14). ACM, 25–34.Google ScholarDigital Library
- [42] . 2022. Graph2Route: A Dynamic spatial-temporal graph neural network for pick-up and delivery route prediction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington, DC, August 14–18, 2022). ACM, 4143–4152.Google Scholar
- [43] . 2019. DeepETA: A spatial-temporal sequential neural network model for estimating time of arrival in package delivery system. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI. 774–781.Google ScholarDigital Library
- [44] . 2021. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In WWW ’21: The Web Conference 2021. ACM, 1227–1237.Google ScholarDigital Library
- [45] . 2020. Distilling knowledge from graph convolutional networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 7072–7081.Google ScholarCross Ref
- [46] . 2020. Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020) , Part XII(
Lecture Notes in Computer Science , Vol. 12357). Springer, 507–523.Google ScholarDigital Library - [47] . 2023. Road network representation learning: A dual graph based approach. ACM Trans. Knowl. Discov. Data (Apr. 2023).Google Scholar
- [48] . 2023. Dual graph multitask framework for imbalanced delivery time estimation. In Proceedings of the 28th International Conference on Database Systems for Advanced Applications , (DASFAA 2023) (Tianjin, China, April 17–20, 2023), Part IV(
Lecture Notes in Computer Science , Vol. 13946). Springer, 606–618.Google ScholarDigital Library - [49] . 2023. Estimating package arrival time via heterogeneous hypergraph neural network. Expert Systems with Applications (2023), 121740.Google Scholar
- [50] . 2023. Delivery time prediction using large-scale graph structure learning based on quantile regression. In Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE 2023), (Anaheim, CA, April 3–7, 2023). IEEE, 3403–3416.Google ScholarCross Ref
- [51] . 2023. Inductive graph transformer for delivery time estimation. In Proceedings of the 16th ACM International Conference on Web Search and Data Mining (WSDM 2023) (Singapore, 27 February 2023-3 March 2023). ACM, 679–687.Google ScholarDigital Library
- [52] . 2020. Order fulfillment cycle time estimation for on-demand food delivery. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’20). ACM, 2571–2580.Google ScholarDigital Library
Index Terms
- Package Arrival Time Prediction via Knowledge Distillation Graph Neural Network
Recommendations
Multi-target Knowledge Distillation via Student Self-reflection
AbstractKnowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge ...
BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalPrediction models for click-through rate (CTR) learn feature interactions underlying user behaviors, which are crucial in recommendation systems. Due to their size and complexity, existing approaches have a limited range of applications. In order to ...
Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction
Machine Learning and Knowledge Discovery in DatabasesAbstractTraditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To ...
Comments