Abstract
Customer churn can be defined as the phenomenon of customers who discontinue their relationship with a company. This problem is transversal to many industries, including the software industry. This study uses Machine Learning to build a predictive model to identify potential churners in a Portuguese software house. Six popular Machine Learning models: Random Forest, AdaBoost, Gradient Boosting Machine, Multilayer Perceptron Classifier, XGBoost, and Logistic Regression, were developed to assess which one would have a better performance. The experimental results show that boosting techniques such as XGBoost present the best predictive performance. The XGBoost model presents a Recall of 0.85 and a ROC AUC of 0.86. Additionally to the model performance, the study of the model's feature importance revealed that some factors, such as the time to solve a support ticket, the type of application, the license age, and the number of incidents, significantly influence customer churn. These insights can help the software industry key drivers of churn and prioritize retention efforts accordingly.
Similar content being viewed by others
References
Ahn, J., et al. 2020. A Survey on Churn Analysis in Various Business Domains. IEEE Access 8: 220816–220839. https://doi.org/10.1109/ACCESS.2020.3042657.
Aldhafferi, N., et al. 2022. Learning Trends in Customer Churn with Rule-Based and Kernel Methods. International Journal of Electrical and Computer Engineering 12 (5): 5364–5374. https://doi.org/10.11591/IJECE.V12I5.PP5364-5374.
Almeida, S., S. Mesquita, and I. Carvalho. 2022. The COVID-19 Impacts on the Hospitality Industry Highlights from Experts in Portugal. Tourism and Hospitality Management 28 (1): 61–81. https://doi.org/10.20867/THM.28.1.3.
AlShourbaji, I., et al. 2023. An Efficient Churn Prediction Model Using Gradient Boosting Machine and Metaheuristic Optimization. Scientific Reports. https://doi.org/10.1038/S41598-023-41093-6.
Bogaert, M., and L. Delaere. 2023. Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art. Mathematics. https://doi.org/10.3390/MATH11051137.
Chapman, P., et al. 2000. CRISP-DM 1.0: Step-by-Step Data Mining Guide. SPSS Inc. (Preprint).
Chen, T., and C. Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785.
Chowdhury, A., et al. 2021. Churn Prediction in Telecom Industry Using Machine Learning Ensembles with Class Balancing. In IEEE Asia–Pacific Conference on Computer Science and Data Engineering. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/CSDE53843.2021.9718498.
Dong, Q. 2022. Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors. Computational Intelligence and Neuroscience. https://doi.org/10.1155/2022/5314671.
Elyusufi, Y., and M.A. Kbir. 2022. Churn Prediction Analysis by Combining Machine Learning Algorithms and Best Features Exploration. International Journal of Advanced Computer Science and Applications 13 (7): 615–622. https://doi.org/10.14569/IJACSA.2022.0130773.
Gallo, A. 2014. The Value of Keeping the Right Customers. https://hbr.org/2014/10/the-value-of-keeping-the-right-customers. Accessed 8 Jan 2023.
Ge, Y., et al. 2017. Customer Churn Analysis for a Software-as-a-Service Company. In Systems and Information Engineering Design Symposium (SIEDS) (Preprint). https://doi.org/10.1109/SIEDS.2017.7937698.
Geiler, L., S. Affeldt, and M. Nadif. 2022. A Survey on Machine Learning Methods for Churn Prediction. International Journal of Data Science and Analytics 14 (3): 217–242. https://doi.org/10.1007/S41060-022-00312-5.
Hadden, J., et al. 2007. Computer Assisted Customer Churn Management: State-of-the-Art and Future Trends. Computers and Operations Research 34 (10): 2902–2917. https://doi.org/10.1016/J.COR.2005.11.007.
Hejazinia, R., and M. Kazemi. 2014. Prioritizing Factors Influencing Customer Churn. Interdisciplinary Journal of Contemporary Research in Business (Preprint).
Hunter, J.D. 2007. Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering 9 (3): 90–95. https://doi.org/10.1109/MCSE.2007.55.
Jain, H., A. Khunteta, and S. Srivastava. 2022. Telecom Churn Prediction Using an Ensemble Approach with Feature Engineering and Importance. International Journal of Intelligent Systems and Applications in Engineering 10 (3): 22–33.
Kim, M., and K.B. Hwang. 2022. An Empirical Evaluation of Sampling Methods for the Classification of Imbalanced Data. PLoS ONE. https://doi.org/10.1371/JOURNAL.PONE.0271260.
Kolomiiets, A., O. Mezentseva, and C.A.K. Kolesnikova. 2021. Customer Churn Prediction in the Software by Subscription Models IT Business Using Machine Learning Methods. In CEUR Workshop Proceedings, 2021.
Kyei, D.A., A. Thomas, and M. Bayoh. 2017. Innovation and Customer Retention in the Ghanaian Telecommunication Industry. International Journal of Innovation 5 (2): 171–183. https://doi.org/10.5585/IJI.V5I2.154.
Lalwani, P., et al. 2022. Customer Churn Prediction System: A Machine Learning Approach. Computing 104 (2): 271–294. https://doi.org/10.1007/S00607-021-00908-Y.
Lin, L., Z. Guo, and C. Zhou. 2023. Failure to Maintain Customers: Antecedents and Consequences of Service Downgrades. Journal of Service Theory and Practice 33 (3): 387–411. https://doi.org/10.1108/JSTP-03-2022-0057.
McKinney, W. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, 56–61. https://doi.org/10.25080/MAJORA-92BF1922-00A.
Nogueira, T.S. 2021. Churn Prediction Modeling Comparison in the Retail Energy Market. Master Dissertation, NOVA Information Management School (Preprint).
Pedregosa, F., et al. 2012. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (Preprint). http://arxiv.org/abs/1201.0490. Accessed 10 June 2023.
Perisic, A., and M. Pahor. 2022. RFM-LIR Feature Framework for Churn Prediction in the Mobile Games Market. IEEE Transactions on Games 14 (2): 126–137. https://doi.org/10.1109/TG.2021.3067114.
Popescu, M.-C., et al. 2009. Multilayer Perceptron and Neural Networks. WSEAS Transactions on Circuits and Systems 8 (7). https://www.researchgate.net/publication/228340819_Multilayer_perceptron_and_neural_networks. Accessed 24 May 2023.
Reddy, M.G.A., S. Raghavaraju, and P. Lashyry. 2022. Ensemble Approach on the Online Shopping Churn Prediction. In 6th International Conference on Trends in Electronics and Informatics, ICOEI 2022—Proceedings, 945–952. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICOEI53556.2022.9776921.
Sabbeh, S.F. 2018. Machine-Learning Techniques for Customer Retention: A Comparative Study. International Journal of Advanced Computer Science and Applications. https://doi.org/10.14569/IJACSA.2018.090238.
Saias, J., L. Rato, and T. Gonçalves. 2022. An Approach to Churn Prediction for Cloud Services Recommendation and User Retention. Information (switzerland). https://doi.org/10.3390/INFO13050227.
Slabber, E., T. Verster, and R. de Jongh. 2023. Some Insights About the Applicability of Logistic Factorisation Machines in Banking. Risks 11 (3): 48. https://doi.org/10.3390/RISKS11030048.
Vuttipittayamongkol, P., E. Elyan, and A. Petrovski. 2021. On the Class Overlap Problem in Imbalanced Data Classification. Knowledge-Based Systems. https://doi.org/10.1016/J.KNOSYS.2020.106631.
Waskom, M.L. 2021. Seaborn: Statistical Data Visualization. Journal of Open Source Software 6 (60): 3021. https://doi.org/10.21105/JOSS.03021.
Wen, X., et al. 2022. Three-Stage Churn Management Framework Based on DCN with Asymmetric Loss. Expert Systems with Applications. https://doi.org/10.1016/J.ESWA.2022.117998.
Wu, X., et al. 2022. Customer Churn Prediction for Web Browsers. Expert Systems with Applications. https://doi.org/10.1016/J.ESWA.2022.118177.
Xiahou, X., and Y. Harada. 2022. B2C E-Commerce Customer Churn Prediction Based on K-Means and SVM. Journal of Theoretical and Applied Electronic Commerce Research 17 (2): 458–475. https://doi.org/10.3390/JTAER17020024.
Xu, T., Y. Ma, and K. Kim. 2021. Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping. Applied Sciences (switzerland). https://doi.org/10.3390/APP11114742.
Zhang, T., S. Moro, and R.F. Ramos. 2022. A Data-Driven Approach to Improve Customer Churn Prediction Based on Telecom Customer Segmentation. Future Internet 14 (3): 94. https://doi.org/10.3390/FI14030094.
Funding
This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia) under the project - UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have a relationship with the studied software house, but this relationship did not influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dias, J.R., Antonio, N. Predicting customer churn using machine learning: A case study in the software industry. J Market Anal (2023). https://doi.org/10.1057/s41270-023-00269-9
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1057/s41270-023-00269-9