Predicting Customer Lifetime Value to Inform Product Investment Decision

Ayokunmi Sodamola1*, Emmanuella Wiafe2, Chukwuka Stanley Ekeocha3, Rianat Abbas4, & Mohamed Sheriff Jalloh5
1
Department of Management Information Systems, Baylor University, Texas, United States
2
Department of Biotechnology, Northeastern University, Boston, United States
3
Department of Business Administration, Indiana University, Indiana, United States
4Department of Management Information Systems, Baylor University, Texas, United States
5Department of Business Administration, Westcliff University, California, United States
DOI –
http://doi.org/10.37502/IJSMR.2026.9301

Full Text – PDF

Abstract

The growing intensity of retail market competition has compelled organizations to move beyond transactional performance metrics toward forward-looking measures of customer economic value. Customer Lifetime Value (CLV) has emerged as a strategically significant construct for understanding the long-term revenue potential of individual customer relationships, yet its systematic application to product investment decision-making remains methodologically underdeveloped in the literature. This study addresses that gap by employing ensemble machine learning models to predict CLV tiers and map the resulting classifications onto a structured product investment framework. Utilizing a publicly available retail dataset of 736 customers sourced from Kaggle, CLV tiers were engineered as a composite of tenure-adjusted spend, average order value, and retention probability, and customers were classified into Low, Medium, and High value segments. Two ensemble classifiers, Random Forest and XGBoost were trained, tuned using five-fold cross-validation with grid search, and evaluated on a 30% holdout test set. Random Forest achieved superior overall performance with an accuracy of 63% and a weighted F1 score of .62, outperforming XGBoost across all evaluation metrics. Both models consistently identified average order value and tenure months as the most dominant predictors of CLV tier, confirming that spending intensity and relationship longevity are the primary drivers of long-term customer value. CLV-to-product investment mapping revealed that Home and Garden and Groceries categories attract the highest concentration of high-value customers, while Electronics, despite lower penetration, generates the highest average spend among high-value customers. These findings demonstrate that structured CLV assessment, operationalized through ensemble machine learning, provides organizations with a robust and evidence-based foundation for aligning product investment priorities with the customers most likely to generate sustainable long-term revenue growth.

Keywords: Customer Lifetime Value, Ensemble Machine Learning, Product Investment Decision, Random Forest, XGBoost

References

  • Al Rafi, M., & Yassar, I. K. (2025). Forecasting Customer Lifetime Value: A Data-Driven Approach to Optimizing Marketing Budget Allocation. Journal of Computer Science and Technology Studies, 7(10), 537-550. doi:https://doi.org/10.32996/jcsts.2025.7.10.53
  • Ali, N., & Shabn, O. S. (2024). Customer lifetime value (CLV) insights for strategic marketing success and its impact on organizational financial performance. Cogent Business & Management, 11(1). doi:https://doi.org/10.1080/23311975.2024.2361321
  • Awaad, S. A., Kortam, W., & Ayad, N. (2024). Examining the impact of price sensitivity on customer lifetime value: empirical analysis. Cogent Business & Management, 11(1). doi:https://doi.org/10.1080/23311975.2024.2366441
  • Benoit, D. F., & Van den Poel, D. (2021). Benefits of quantile regression for the analysis of customer lifetime value in a contractual setting: An application in financial services. Expert Systems with Applications, 38(3), 10475–10483. https://doi.org/10.1016/j.eswa.2011.02.114
  • Blattberg, R. C., & Deighton, J. (1996). Manage marketing by the customer equity test. Harvard Business Review, 74(4), 136–144.
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Chamberlain, B., Cardoso, Â., Liu, C., Pagliari, R., & Deisenroth, M. (2021). Customer lifetime value prediction using embeddings. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 186–196. https://doi.org/10.1145/3447548.3467120
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
  • Cooper, R. G. (1990). Stage-gate systems: A new tool for managing new products. Business Horizons, 33(3), 44–54. https://doi.org/10.1016/0007-6813(90)90052-T
  • Coussement, K., Lessmann, S., & Verstraeten, G. (2017). A comparative analysis of data preparation algorithms for customer churn prediction in the telecommunications industry. Decision Support Systems, 95, 27–36. https://doi.org/10.1016/j.dss.2016.11.007
  • Cowan, G., Mercury, S., & Khraishi, R. (2023). Modelling customer lifetime-value in the retail banking industry. 1-23.
  • Fader, P. S., Hardie, B. G. S., & Lee, K. L. (2005). “Counting your customers” the easy way: An alternative to the Pareto/NBD model. Marketing Science, 24(2), 275–284. https://doi.org/10.1287/mksc.1040.0098
  • Gupta, S., & Lehmann, D. R. (2005). Managing customers as investments: The strategic value of customers in the long run. Wharton School Publishing.
  • Kumar, V., & Shah, D. (2009). Expanding the role of marketing: From customer equity to market capitalization. Journal of Marketing, 73(6), 119–136. https://doi.org/10.1509/jmkg.73.6.119
  • Ekinci, Y., Uray, N., & Ulengin, F. (2014). A customer lifetime value model for the banking industry: a guide to marketing actions. European Journal of Marketing, 48(4), 761-784. doi:https://doi.org/10.1108/EJM-12-2011-0714
  • Nyakeri, W. (2025). Strategic Customer Lifetime Value Prediction: Leveraging Machine Learning to Maximize Profitability in Retail -A Case Study Using 2010-2011 Online Retail Data. 1-13.
  • Oghenemaro, S. A. (2025). Optimizing Customer Lifetime Value (CLV) Prediction Models in Retail Banking Using Deep Learning and Behavioral Segmentation. American Journal of Humanities and Social Sciences Research, 9(7), 123-131.
  • Reinartz, W., & Kumar, V. (2003). The impact of customer relationship characteristics on profitable lifetime duration. Journal of Marketing, 67(1), 77–99. https://doi.org/10.1509/jmkg.67.1.77.18589
  • Rust, R. T., Lemon, K. N., & Zeithaml, V. A. (2004). Return on marketing: Using customer equity to focus marketing strategy. Journal of Marketing, 68(1), 109–127. https://doi.org/10.1509/jmkg.68.1.109.24030
  • Schmittlein, D. C., Morrison, D. G., & Colombo, R. (1987). Counting your customers: Who are they and what will they do next? Management Science, 33(1), 1–24. https://doi.org/10.1287/mnsc.33.1.1
  • Segarra-Moliner, J. R., & Bel-Oms, I. (2023). How Does Each ESG Dimension Predict Customer Lifetime Value by Segments? Evidence from U.S. Industrial and Technological Industries. Sustainability, 15(8), 6907. doi:https://doi.org/10.3390/su15086907
  • Tariq, M., Abbas, T., Abrar, M., & Iqbal, A. (2023). Does green product development and customer satisfaction influence green purchase intention? Journal of Retailing and Consumer Services, 71, 103–119. https://doi.org/10.1016/j.jretconser.2022.103199
  • Vanderveld, A., Pandey, A., Han, A., & Parekh, R. (2022). An engagement-based customer lifetime value system for e-commerce. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2customer–2lifecycle. https://doi.org/10.1145/2939672.2939715
  • Wong, A., Garcia, A. V., & Lim, Y. W. (2025). A data-driven approach to customer lifetime value prediction using probability and machine learning models. Decision Analytics Journal, 16(1), 100601. doi:https://doi.org/10.1016/j.dajour.2025.100601
  • You, J. (2025). Customer Lifetime Value Forecasting Using Ensemble Learning on Ecommerce Big Data. International Conference on Digital Economy and Intelligent Computing, 49-53. doi:https://dl.acm.org/doi/10.1145/3746972.3746981