research-article

Package Arrival Time Prediction via Knowledge Distillation Graph Neural Network

Authors:
Lei Zhang

School of Software, Shandong University, Jinan, China and Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

School of Software, Shandong University, Jinan, China and Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

0000-0002-5808-5313
View Profile

,
Yong Liu

Alibaba-NTU Singapore Joint Research Institute, Singapore, Singapore

Alibaba-NTU Singapore Joint Research Institute, Singapore, Singapore

0000-0001-9031-9696
View Profile

,
Zhiwei Zeng

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

0000-0002-7787-5644
View Profile

,
Yiming Cao

School of Software, Shandong University, Jinan, China and Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

School of Software, Shandong University, Jinan, China and Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

0000-0003-4945-4049
View Profile

,
Xingyu Wu

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China

0000-0001-7802-0931
View Profile

,
Yonghui Xu

Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

0000-0002-1891-6186
View Profile

,
Zhiqi Shen

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

0000-0001-7626-7295
View Profile

,
Lizhen Cui

School of Software, Shandong University, Jinan, China and Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

School of Software, Shandong University, Jinan, China and Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China

0000-0002-8262-8883
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 18 Issue 5Article No.: 108pp 1–19https://doi.org/10.1145/3643033

Published:28 February 2024Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Accurately estimating packages’ arrival time in e-commerce can enhance users’ shopping experience and improve the placement rate of products. This problem is often formalized as an Origin-Destination (OD)-based ETA (i.e., estimated time of arrival) prediction task, where the delivery time is estimated mainly based on sender and receiver addresses and other context information. One inherent challenge of the OD-based ETA problem is that the delivery time highly depends on the actual delivery trajectory which is unknown at the time of prediction. In this article, we tackle this challenge by effectively exploiting historical delivery trajectories. We propose a novel Knowledge Distillation Graph neural network-based package ETA prediction (KDG-ETA) model, which uses knowledge distillation in the training phase to distill the knowledge of historical trajectories into OD pair embeddings. In KDG-ETA, a multi-level trajectory graph representation model is proposed to fully exploit trajectory information at the node-level, edge-level, and path-level. Then, the OD representations embedded with trajectory knowledge are combined with context embeddings from feature extraction module for delivery time prediction using an adaptive attention module. KDG-ETA consistently outperforms existing state-of-the-art OD-based ETA prediction methods on three real-world Alibaba datasets, reducing the Mean Absolute Error (MAE) by 3.0%–39.1% as demonstrated in our extensive empirical evaluation.

REFERENCES

[1] Alkhulaifi Abdolmaged, Alsahli Fahad, and Ahmad Irfan. 2020. Knowledge distillation in deep learning and its applications. CoRR abs/2007.09029 (2020).Google Scholar
[2] Amirian Pouria, Basiri Anahid, and Morley Jeremy. 2016. Predictive analytics for enhancing travel time estimation in navigation apps of Apple, Google, and Microsoft. In Proceedings of the 9th ACM SIGSPATIAL International Workshop on Computational Transportation Science. ACM, 31–36.Google ScholarDigital Library
[3] Chen Tianqi and Guestrin Carlos. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. ACM, 785–794.Google ScholarDigital Library
[4] Cui Ruomeng, Sun Tianshu, Lu Zhikun, and Golden Joseph. 2020. Sooner or later? Promising delivery speed in online retail. In Promising Delivery Speed in Online Retail.Google Scholar
[5] Dai Yimian, Gieseke Fabian, Oehmcke Stefan, Wu Yiquan, and Barnard Kobus. 2021. Attentional feature fusion. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2021). IEEE, 3559–3568.Google ScholarCross Ref
[6] Araujo Arthur Cruz De and Etemad Ali. 2019. Deep neural networks for predicting vehicle travel times. In Proceedings of the 2019 IEEE SENSORS. 1–4. DOI:Google ScholarCross Ref
[7] Araujo Arthur Cruz de and Etemad Ali. 2021. End-to-end prediction of parcel delivery time with deep learning for smart-city applications. IEEE Internet Things J. 8, 23 (2021), 17043–17056.Google ScholarCross Ref
[8] Deng Xiang and Zhang Zhongfei. 2021. Graph-free knowledge distillation for graph neural networks. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI). 2321–2327.Google ScholarCross Ref
[9] Austin Derrow-Pinion, Jennifer She, David Wong, Oliver Lange, Todd Hester, Luis Perez, Marc Nunkesser, Seongjae Lee, Xueying Guo, Brett Wiltshire, Peter W. Battaglia, Vishal Gupta, Ang Li, Zhongwen Xu, Alvaro Sanchez-Gonzalez, and Yujia Li, Petar Velickovic. 2021. ETA prediction with graph neural networks in google maps. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM’21). ACM, 3767–3776.Google ScholarDigital Library
[10] Feng Zhenni and Zhu Yanmin. 2016. A survey on trajectory data mining: Techniques and applications. IEEE Access 4 (2016), 2056–2067.Google ScholarCross Ref
[11] Fey Matthias and Lenssen Jan Eric. 2019. Fast graph representation learning with PyTorch geometric. CoRR abs/1903.02428 (2019).Google Scholar
[12] Fu Kun, Meng Fanlin, Ye Jieping, and Wang Zheng. 2020. CompactETA: A fast inference system for travel time prediction. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, CA, August 23-27, 2020). ACM, 3337–3345.Google ScholarDigital Library
[13] Fu Kui, Shi Peipei, Song Yafei, Ge Shiming, Lu Xiangju, and Li Jia. 2020. Ultrafast video attention prediction with coupled knowledge distillation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI). 10802–10809.Google ScholarCross Ref
[14] Fu Tao-Yang and Lee Wang-Chien. 2019. DeepIST: Deep image-based spatio-temporal network for travel time estimation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM 2019). ACM, 69–78.Google ScholarDigital Library
[15] Gao Ziyan and Sun Zhanbo. 2021. Modeling spatio-temporal interactions for vehicle trajectory prediction based on graph representation learning. In Proceedings of the 24th IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 1334–1339.Google ScholarDigital Library
[16] Grover Aditya and Leskovec Jure. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, CA, August 13–17, 2016). ACM, 855–864.Google ScholarDigital Library
[17] Guo Zhichun, Zhang Chunhui, Fan Yujie, Tian Yijun, Zhang Chuxu, and Chawla Nitesh V.. 2023. Boosting graph neural networks via adaptive knowledge distillation. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI 2023), 35th Conference on Innovative Applications of Artificial Intelligence (IAAI 2023), 13th Symposium on Educational Advances in Artificial Intelligence (EAAI 2023) (Washington, DC, February 7–14, 2023). AAAI Press, 7793–7801.Google ScholarDigital Library
[18] Ha Seungwoong and Jeong Hawoong. 2023. Learning heterogeneous interaction strengths by trajectory prediction with graph neural network. In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023) (Kigali, Rwanda, May 1–5, 2023). OpenReview.net.Google Scholar
[19] Hamilton William L., Ying Zhitao, and Leskovec Jure. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Proceedings of the Annual Conference on Neural Information Processing Systems 2017(December 4–9, 2017, Long Beach, CA, USA). 1024–1034.Google Scholar
[20] Han Peng, Wang Jin, Yao Di, Shang Shuo, and Zhang Xiangliang. 2021. A graph-based approach for trajectory similarity computation in spatial networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21). ACM, 556–564.Google ScholarDigital Library
[21] Hildebrandt Florentin D. and Ulmer Marlin W.. 2021. Supervised learning for arrival time estimations in restaurant meal delivery. Transportation Science (2021).Google Scholar
[22] Hinton Geoffrey E., Vinyals Oriol, and Dean Jeffrey. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015). http://arxiv.org/abs/1503.02531Google Scholar
[23] Jindal Ishan, Qin Tony, Chen Xuewen, Nokleby Matthew S., and Ye Jieping. 2017. A unified neural network approach for estimating travel time and distance for a taxi trip. arXiv: 1710.04350 (2017). http://arxiv.org/abs/1710.04350Google Scholar
[24] Joshi Chaitanya K., Liu Fayao, Xun Xu, Lin Jie, and Foo Chuan-Sheng. 2021. On representation knowledge distillation for graph neural networks. arXiv preprint arXiv:2111.04964 (2021).Google Scholar
[25] Karatzoglou Antonios, Schnell Nikolai, and Beigl Michael. 2018. A convolutional neural network approach for modeling semantic trajectories and predicting future locations. In Proceedings of the Artificial Neural Networks and Machine Learning (ICANN 2018), Vol. 11139. Springer, 61–72.Google ScholarCross Ref
[26] Li Xingjian, Xiong Haoyi, Chen Zeyu, Huan Jun, Liu Ji, Xu Cheng-Zhong, and Dou Dejing. 2022. Knowledge distillation with attention for deep transfer learning of convolutional networks. ACM Trans. Knowl. Discov. Data 16, 3 (2022), 42:1–42:20.Google ScholarDigital Library
[27] Li Yaguang, Fu Kun, Wang Zheng, Shahabi Cyrus, Ye Jieping, and Liu Yan. 2018. Multi-task representation learning for travel time estimation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018). ACM, 1695–1704.Google ScholarDigital Library
[28] Li Yang, Wu Xingyu, Wang Jinglong, Liu Yong, Wang Xiaoqing, Deng Yuming, and Miao Chunyan. 2021. Unsupervised categorical representation learning for package arrival time prediction. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21). ACM, 3935–3944.Google ScholarDigital Library
[29] Lian Jianxun, Zhou Xiaohuan, Zhang Fuzheng, Chen Zhongxia, Xie Xing, and Sun Guangzhong. 2018. xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018) (London, UK, August 19–23, 2018). ACM, 1754–1763.Google ScholarDigital Library
[30] Lin Lei, Li Weizi, Bi Huikun, and Qin Lingqiao. 2022. Vehicle trajectory prediction using LSTMs with spatial-temporal attention mechanisms. IEEE Intell. Transp. Syst. Mag. 14, 2 (2022), 197–208.Google ScholarCross Ref
[31] Liu Hongbin, Wu Hao, Sun Weiwei, and Lee Ickjai. 2019. Spatio-temporal GRU for trajectory classification. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 1228–1233.Google ScholarCross Ref
[32] Liu Wei, He Jiayu, Wang Haiming, Zhu Huaijie, and Yin Jian. 2021. A novel road segment representation method for travel time estimation. In Proceedings of the Database Systems for Advanced Applications (DASFAA 2021). Vol. 12680. Springer, 398–413.Google ScholarDigital Library
[33] Lv Jianming, Sun Qinghui, Li Qing, and Moreira-Matias Luís. 2020. Multi-scale and multi-scope convolutional neural networks for destination prediction of trajectories. IEEE Trans. Intell. Transp. Syst. 21, 8 (2020), 3184–3195.Google ScholarCross Ref
[34] Perozzi Bryan, Al-Rfou Rami, and Skiena Steven. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14) (New York, NY, August 24–27, 2014). ACM, 701–710.Google ScholarDigital Library
[35] Qiang Yuting, Wen Haomin, Wu Lixia, Mao Xiaowei, Wu Fan, Wan Huaiyu, and Hu Haoyuan. 2023. Modeling intra- and inter-community information for route and time prediction in last-mile delivery. In Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE 2023) (Anaheim, CA, April 3–7, 2023). IEEE, 3106–3112. DOI:Google ScholarCross Ref
[36] Sevlian Raffi and Rajagopal Ram. 2010. Travel time estimation using floating car data. arXiv Preprint arXiv:1012.4249 (2010). http://arxiv.org/abs/1012.4249Google Scholar
[37] Shen Yibin, Jin Cheqing, Hua Jiaxun, and Huang Dingjiang. 2022. TTPNet: A neural network for travel time prediction based on tensor decomposition and graph embedding. IEEE Trans. Knowl. Data Eng. 34, 9 (2022), 4514–4526. DOI:Google ScholarCross Ref
[38] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017(December 4–9, 2017, Long Beach, CA),. 5998–6008.Google Scholar
[39] Wang Hongjian, Tang Xianfeng, Kuo Yu-Hsuan, Kifer Daniel, and Li Zhenhui. 2019. A simple baseline for travel time estimation using large-scale trip data. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 19:1–19:22.Google ScholarDigital Library
[40] Wang Senzhang, Cao Jiannong, and Yu Philip. 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Trans. Knowl. Data Eng. (2020).Google ScholarCross Ref
[41] Wang Yilun, Zheng Yu, and Xue Yexiang. 2014. Travel time estimation of a path using sparse trajectories. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14). ACM, 25–34.Google ScholarDigital Library
[42] Wen Haomin, Lin Youfang, Mao Xiaowei, Wu Fan, Zhao Yiji, Wang Haochen, Zheng Jianbin, Wu Lixia, Hu Haoyuan, and Wan Huaiyu. 2022. Graph2Route: A Dynamic spatial-temporal graph neural network for pick-up and delivery route prediction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington, DC, August 14–18, 2022). ACM, 4143–4152.Google Scholar
[43] Wu Fan and Wu Lixia. 2019. DeepETA: A spatial-temporal sequential neural network model for estimating time of arrival in package delivery system. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI. 774–781.Google ScholarDigital Library
[44] Yang Cheng, Liu Jiawei, and Shi Chuan. 2021. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In WWW ’21: The Web Conference 2021. ACM, 1227–1237.Google ScholarDigital Library
[45] Yang Yiding, Qiu Jiayan, Song Mingli, Tao Dacheng, and Wang Xinchao. 2020. Distilling knowledge from graph convolutional networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 7072–7081.Google ScholarCross Ref
[46] Yu Cunjun, Ma Xiao, Ren Jiawei, Zhao Haiyu, and Yi Shuai. 2020. Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020) , Part XII(Lecture Notes in Computer Science, Vol. 12357). Springer, 507–523.Google ScholarDigital Library
[47] Zhang Liang and Long Cheng. 2023. Road network representation learning: A dual graph based approach. ACM Trans. Knowl. Discov. Data (Apr. 2023).Google Scholar
[48] Zhang Lei, Wang Mingliang, Zhou Xin, Wu Xingyu, Cao Yiming, Xu Yonghui, Cui Lizhen, and Shen Zhiqi. 2023. Dual graph multitask framework for imbalanced delivery time estimation. In Proceedings of the 28th International Conference on Database Systems for Advanced Applications , (DASFAA 2023) (Tianjin, China, April 17–20, 2023), Part IV(Lecture Notes in Computer Science, Vol. 13946). Springer, 606–618.Google ScholarDigital Library
[49] Zhang Lei, Wu Xingyu, Liu Yong, Zhou Xin, Cao Yiming, Xu Yonghui, Cui Lizhen, and Miao Chunyan. 2023. Estimating package arrival time via heterogeneous hypergraph neural network. Expert Systems with Applications (2023), 121740.Google Scholar
[50] Zhang Lei, Zhou Xin, Zeng Zhiwei, Cao Yiming, Xu Yonghui, Wang Mingliang, Wu Xingyu, Liu Yong, Cui Lizhen, and Shen Zhiqi. 2023. Delivery time prediction using large-scale graph structure learning based on quantile regression. In Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE 2023), (Anaheim, CA, April 3–7, 2023). IEEE, 3403–3416.Google ScholarCross Ref
[51] Zhou Xin, Wang Jinglong, Liu Yong, Wu Xingyu, Shen Zhiqi, and Leung Cyril. 2023. Inductive graph transformer for delivery time estimation. In Proceedings of the 16th ACM International Conference on Web Search and Data Mining (WSDM 2023) (Singapore, 27 February 2023-3 March 2023). ACM, 679–687.Google ScholarDigital Library
[52] Zhu Lin, Yu Wei, Zhou Kairong, Wang Xing, Feng Wenxing, Wang Pengyu, Chen Ning, and Lee Pei. 2020. Order fulfillment cycle time estimation for on-demand food delivery. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’20). ACM, 2571–2580.Google ScholarDigital Library

Index Terms

Package Arrival Time Prediction via Knowledge Distillation Graph Neural Network
1. Applied computing
  1. Operations research
    1. Forecasting
2. Information systems
  1. Information systems applications
    1. Spatial-temporal systems
      1. Location based services

Recommendations

Multi-target Knowledge Distillation via Student Self-reflection
Abstract
Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge ...
Read More
BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Prediction models for click-through rate (CTR) learn feature interactions underlying user behaviors, which are crucial in recommendation systems. Due to their size and complexity, existing approaches have a limited range of applications. In order to ...
Read More
Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction
Machine Learning and Knowledge Discovery in Databases
Abstract
Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 5
June 2024
699 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3613659
Editor:
Jian Pei
Duke University, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 February 2024
- Online AM: 24 January 2024
- Accepted: 16 January 2024
- Revised: 19 November 2023
- Received: 27 April 2023
Published in tkdd Volume 18, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Package arrival time prediction
graph neural network
trajectory data mining
knowledge distillation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 210
  Total Downloads
- Downloads (Last 12 months)210
- Downloads (Last 6 weeks)96
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Package Arrival Time Prediction via Knowledge Distillation Graph Neural Network

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Multi-target Knowledge Distillation via Student Self-reflection

BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction

Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Package Arrival Time Prediction via Knowledge Distillation Graph Neural Network

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Multi-target Knowledge Distillation via Student Self-reflection

BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction

Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media