Predictive Absenteeism Modeling and Behavioral Profile Clustering: A People Analytics Approach to Organizational Attendance Management

Authors

DOI:

https://doi.org/10.22481/recic.v8i1.19583

Keywords:

People Analytics, Absenteeism, Machine Learning, CRISP-DM, Random Forest, K-Means, Clustering, Risk Score

Abstract

Absenteeism is one of the most costly and complex phenomena in people management, affecting productivity, organizational climate, and indirect costs such as team overload and emergency overtime. Traditional attendance-management approaches operate in a descriptive and reactive manner, limiting the capacity for preventive intervention. This work proposes an integrated People Analytics approach to the predictive management of absenteeism, following the CRISP-DM methodology and combining two complementary fronts: (i) an Absenteeism Risk Score, based on a Random Forest algorithm, which estimates the probability of absence events occurring within windows of 7, 30, and 90 days; and (ii) a hybrid K-Means-based clustering that segments employees into four distinct behavioral profiles (Stable, Recurrent, Seasonal, and Severe). The study was conducted in a Brazilian public export-promotion organization, using 14,533 absenteeism events from the 2019-2025 period, distributed across 676 employees. The predictive model was selected among 14 candidates through stratified cross-validation, achieving an AUC of 0.855, an accuracy of 80.2%, and a recall of 63.1% on the temporal validation set. The clustering was validated through the Silhouette Score, the Calinski-Harabasz index, the Davies-Bouldin index, and an ANOVA test, with statistically significant differences (p < 0.001) among the clusters. The results were integrated into an interactive dashboard prototype in Power BI, designed for operational use by the Human Resources department, covering the visualization of individual scores, behavioral profiles, and aggregate indicators; effective adoption in decision-making routines and the longitudinal assessment of impact remain as future work. The main contributions include: (a) a reproducible pipeline for engineering behavioral variables, highlighting seasonality features based on multi-year consistency; (b) a hybrid clustering approach that combines deterministic criteria and unsupervised learning; and (c) an integrated operational instrument that can be replicated in other organizations.

Downloads

Download data is not yet available.

Author Biographies

César Antônio Ciuffo Moreira, Universidade de Brasília

Aqui está a versão em inglês, estruturada no mesmo formato de parágrafo corrido e com tom profissional:

🇺🇸 English Version

Professional with over 24 years of experience in strategic management, data science, business intelligence, and project management. He holds a Master’s degree in Applied Computing with an emphasis on Data Science from the University of Brasília (UnB) and is currently a PhD candidate in Applied Computing at the same institution. He has a consolidated track record in team building and capacity building, having led training programs for more than 250 professionals and coordinated data analyst training in partnership with the Instituto Tecnológico de Aeronáutica (ITA). His research focuses on People Analytics, Data Science applied to organizational management, and Business Intelligence, and he holds both PMP and CBPP certifications.

Brenno Lopes, Instituto Brasileiro de Ensino

Graduando de Engenharia de Software no IDP (Instituto de Desenvolvimento e Pesquisa), em Brasília, e construtor de produtos de tecnologia com foco em IA, dados, software e negócios. Aos 20 anos, atua na interseção entre o mundo acadêmico, corporativo e empreendedor, com uma trajetória que combina profundidade técnica e visão estratégica.

Pedro Cella, Universidade Cruzeiro do Sul

Graduado em Engenharia de Software na UnB (Universidade de Brasília) e cientista de dados. Trabalha na ApexBrasil, desenvolvendo temas de people analytics. Atua em desenvolvimento acadêmico e corporativo, focado em desenvolvimento de soluções e apoio ao processo decisório.

References

D. A. Harrison and J. J. Martocchio, “Time for absenteeism: A 20-year review of origins, offshoots, and outcomes,” Journal of Management, vol. 24, no. 3, pp. 305–350, 1998.

I. Bierla, B. Huver, and S. Richard, “New evidence on absenteeism and presenteeism,” International Journal of Human Resource Management, vol. 24, no. 7, pp. 1536–1550, 2013.

S. Markussen, K. Røed, O. J. Røgeberg, and S. Gaure, “The anatomy of absenteeism,” Journal of Health Economics, vol. 30, no. 2, pp. 277–292, 2011.

J. H. Marler and J. W. Boudreau, “An evidence-based review of hr analytics,” International Journal of Human Resource Management, vol. 28, no. 1, pp. 3–26, 2017.

T. Rasmussen and D. Ulrich, “Learning from practice: How hr analytics avoids being a management fad,” Organizational Dynamics, vol. 44, no. 3, pp. 236–242, 2015.

A. Tursunbayeva, S. Di Lauro, and C. Pagliari, “People analytics: A scoping review of conceptual boundaries and value propositions,” International Journal of Information Management, vol. 43, pp. 224–247, 2018.

P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth, “CRISP-DM 1.0: Step-by-step data

mining guide,” SPSS Inc. / CRISP-DM Consortium, Tech. Rep., 2000. [Online]. Available: https://www.kde.cs.uni-kassel.de/wp-content/uploads/lehre/ws2012-13/kdd/files/CRISPWP-0800.pdf

R. Wirth and J. Hipp, “CRISP-DM: Towards a standard process model for data mining,” in Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK, 2000, pp. 29–39. [Online]. Available: https://cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf

T. H. Davenport, “Competing on analytics,” Harvard Business Review, vol. 84, no. 1, pp. 98–107, 2006. [Online]. Available: https://hbr.org/2006/01/competing-on-analytics

T. H. Davenport, J. Harris, and J. Shapiro, “Competing on talent analytics,” Harvard Business Review, vol. 88, no. 10, pp. 52–58, 2010. [Online]. Available: https://hbr.org/2010/10/competing-on-talent-analytics

M. A. Huselid, “The science and practice of workforce analytics: Introduction to the hrm special issue,” Human Resource Management, vol. 57, no. 3, pp. 679–684, 2018.

D. B. Minbaeva, “Building credible human capital analytics for organi-zational competitive advantage,” Human Resource Management, vol. 57, no. 3, pp. 701–713, 2018.

P. Van der Laken, J. W. Boudreau, and J. H. Marler, “Data-driven human resources analytics: A review and new research directions,” Personnel Review, vol. 47, no. 5, pp. 991–1006, 2018.

R. M. Steers and S. R. Rhodes, “Major influences on employee attendance: A process model,” Journal of Applied Psychology, vol. 63, no. 4, pp. 391–407, 1978.

J. J. Martocchio, “Age-related differences in employee absenteeism: A meta-analysis,” Psychology and Aging, vol. 4, no. 4, pp. 409–414, 1989.

M. Laaksonen, P. Martikainen, O. Rahkonen, and E. Lahelma, “Explanations for gender differences in sickness absence: Evidence from middle-aged municipal employees from Finland,” Occupational and Environmental Medicine, vol. 65, no. 5, pp. 325–330, 2008.

P. Allebeck and A. Mastekaasa, “Risk factors for sick leave: General studies,” Scandinavian Journal of Public Health, vol. 32, no. 5 suppl, pp. 49–108, 2004.

A. Martiniano, R. P. Ferreira, R. J. Sassi, and C. Affonso, “Application of a neuro fuzzy network in prediction of absenteeism at work,” in 7th Iberian Conference on Information Systems and Technologies (CISTI). IEEE, 2012, pp. 1–4, conjunto de dados Absenteeism at Work disponível no UCI Machine Learning Repository, DOI: 10.24432/C5X882.

K. Tewari, S. Vandita, and S. Jain, “Predictive analysis of absenteeism in MNCs using machine learning algorithm,” in Proceedings of ICRIC 2019, ser. Lecture Notes in Electrical Engineering, P. K. Singh, A. K. Kar, Y. Singh, M. H. Kolekar, and S. Tanwar, Eds. Springer, Cham, 2020, vol. 597, pp. 3–14.

P. Llamas Blázquez, “Predicting workplace absenteeism using machine learning: a pilot study in occupational health,” Journal of Occupational Medicine and Toxicology, vol. 20, no. 38, 2025.

R. Punnoose and P. Ajit, “Prediction of employee turnover in organizations using machine learning algorithms,” International Journal of Advanced Research in Artificial Intelligence, vol. 5, no. 9, pp. 22–26, 2016.

F. Fallucchi, M. Coladangelo, R. Giuliano, and E. William De Luca, “Predicting employee attrition using machine learning techniques,” Computers, vol. 9, no. 4, p. 86, 2020.

C. Schröer, F. Kruse, and J. M. Gomez, “A systematic literature review on applying CRISP-DM process model,” Procedia Computer Science, vol. 181, pp. 526–534, 2021.

D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, 3rd ed. Hoboken, NJ: Wiley, 2013.

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn, “Bias in random forest variable importance measures: Illustrations, sources and a solution,” BMC Bioinformatics, vol. 8, no. 1, p. 25, 2007.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.

Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997.

P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, no. 1, pp. 3–42, 2006.

T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.

T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLOS ONE, vol. 10, no. 3, p. e0118432, 2015.

D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.

R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), vol. 2, 1995, pp. 1137–1143. [Online]. Available: https://www.ijcai.org/Proceedings/95-2/Papers/016.pdf

J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967. [Online]. Available: https://projecteuclid.org/ebooks/berkeley-symposium-on-mathematical-statistics-and-probability/Proceedings-of-the-Fifth-Berkeley-Symposium-on-Mathematical-Statistics-and/chapter/Some-methods-for-classification-and-analysis-of-multivariate-observations/bsmsp/1200512992

S. P. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.

P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987.

T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics, vol. 3, no. 1, pp. 1–27, 1974.

D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, 1979.

L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol. 2, no. 1, pp. 193–218, 1985.

M. Kuhn and K. Johnson, Applied Predictive Modeling. New York: Springer, 2013.

A. Zheng and A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Sebastopol, CA: O’Reilly Media, 2018. [Online]. Available: https://www.oreilly.com/library/view/feature-engineering-for/9781491953235/

H. Blockeel and L. De Raedt, “Top-down induction of first-order logical decision trees,” Artificial Intelligence, vol. 101, no. 1–2, pp. 285–297, 1998.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [Online]. Available: https://www.jmlr.org/papers/v12/pedregosa11a.html

E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach,” Biometrics, vol. 44, no. 3, pp. 837–845, 1988.

Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947.

M. B. Suehara and M. C. P. d. Silva, “Prevalence of airborne fungi in Brazil and correlations with respiratory diseases and fungal infections,” Ciência & Saúde Coletiva, vol. 28, no. 11, pp. 3289–3300, 2023.

Published

2026-06-30

How to Cite

CIUFFO MOREIRA, César Antônio; LOPES, Brenno; CELLA, Pedro. Predictive Absenteeism Modeling and Behavioral Profile Clustering: A People Analytics Approach to Organizational Attendance Management. Journal of Computer Science, [S. l.], v. 8, n. 1, p. e19583, 2026. DOI: 10.22481/recic.v8i1.19583. Disponível em: https://periodicos2.uesb.br/recic/article/view/19583. Acesso em: 30 jun. 2026.