Modelado Predictivo del Absentismo y Agrupamiento de Perfiles Conductuales: Un Enfoque de People Analytics Aplicado a la Gestión de la Asistencia Organizacional

Autores/as

DOI:

https://doi.org/10.22481/recic.v8i1.19583

Palabras clave:

People Analytics, Absentismo, Aprendizaje Automático, CRISP-DM, Random Forest, K-Means, Clusterización, Score de Riesgo

Resumen

El absentismo representa uno de los fenómenos más costosos y complejos de la gestión de personas, con impacto en la productividad, el clima organizacional y costos indirectos como la sobrecarga de los equipos y las horas extras de emergencia. Los enfoques tradicionales de gestión de la asistencia operan de forma descriptiva y reactiva, lo que limita la capacidad de intervención preventiva. Este trabajo propone un enfoque integrado de People Analytics aplicado a la gestión predictiva del absentismo, siguiendo la metodología CRISP-DM y combinando dos frentes complementarios: (i) un Score de Riesgo de Absentismo, basado en el algoritmo Random Forest, que estima la probabilidad de ocurrencia de eventos de ausencia en ventanas de 7, 30 y 90 días; y (ii) una clusterización híbrida basada en K-Means, que segmenta a los colaboradores en cuatro perfiles comportamentales distintos (Estables, Recurrentes, Estacionales y Graves). El estudio se llevó a cabo en una organización pública brasileña de promoción de exportaciones, utilizando 14.533 eventos de absentismo del período 2019-2025, distribuidos entre 676 colaboradores. El modelo predictivo fue seleccionado entre 14 candidatos mediante validación cruzada estratificada, alcanzando un AUC de 0,855, una exactitud del 80,2 % y un recall del 63,1 % en el conjunto de validación temporal. La clusterización fue validada mediante el Silhouette Score, el índice de Calinski-Harabasz, el índice de Davies-Bouldin y la prueba ANOVA, con diferencias estadísticamente significativas (p < 0,001) entre los clústeres. Los resultados se integraron en un prototipo de panel interactivo en Power BI, concebido para el uso operativo por parte del área de Recursos Humanos, que contempla la visualización de scores individuales, perfiles comportamentales e indicadores agregados; la adopción efectiva en la rutina de toma de decisiones y la evaluación longitudinal del impacto permanecen como trabajo futuro. Las contribuciones principales incluyen: (a) un pipeline reproducible de ingeniería de variables comportamentales, con énfasis en features de estacionalidad basadas en la consistencia multianual; (b) un enfoque híbrido de clusterización que combina criterios deterministas y aprendizaje no supervisado; y (c) un instrumento operativo integrado que puede replicarse en otras organizaciones.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

César Antônio Ciuffo Moreira, Universidade de Brasília

Aqui está a versão em espanhol, estruturada em formato de parágrafo corrido e polido para o ambiente profissional:

🇪🇸 Versión en Español

Profesional con más de 24 años de experiencia en gestión estratégica, ciencia de datos, inteligencia comercial y gestión de proyectos. Es Máster en Computación Aplicada con énfasis en Ciencia de Datos por la Universidad de Brasilia (UnB) y actual doctorando en Computación Aplicada por la misma institución. Posee una experiencia consolidada en la formación y capacitación de equipos, habiendo liderado programas de entrenamiento para más de 250 profesionales y coordinado la formación de analistas de datos en alianza con el Instituto Tecnológico de Aeronáutica (ITA). Es investigador en las áreas de People Analytics, Ciencia de Datos aplicada a la gestión organizacional e Inteligencia Comercial, y cuenta con las certificaciones PMP y CBPP.

Brenno Lopes, Instituto Brasileiro de Ensino

Graduando de Engenharia de Software no IDP (Instituto de Desenvolvimento e Pesquisa), em Brasília, e construtor de produtos de tecnologia com foco em IA, dados, software e negócios. Aos 20 anos, atua na interseção entre o mundo acadêmico, corporativo e empreendedor, com uma trajetória que combina profundidade técnica e visão estratégica.

Pedro Cella, Universidade Cruzeiro do Sul

Graduado em Engenharia de Software na UnB (Universidade de Brasília) e cientista de dados. Trabalha na ApexBrasil, desenvolvendo temas de people analytics. Atua em desenvolvimento acadêmico e corporativo, focado em desenvolvimento de soluções e apoio ao processo decisório.

Citas

D. A. Harrison and J. J. Martocchio, “Time for absenteeism: A 20-year review of origins, offshoots, and outcomes,” Journal of Management, vol. 24, no. 3, pp. 305–350, 1998.

I. Bierla, B. Huver, and S. Richard, “New evidence on absenteeism and presenteeism,” International Journal of Human Resource Management, vol. 24, no. 7, pp. 1536–1550, 2013.

S. Markussen, K. Røed, O. J. Røgeberg, and S. Gaure, “The anatomy of absenteeism,” Journal of Health Economics, vol. 30, no. 2, pp. 277–292, 2011.

J. H. Marler and J. W. Boudreau, “An evidence-based review of hr analytics,” International Journal of Human Resource Management, vol. 28, no. 1, pp. 3–26, 2017.

T. Rasmussen and D. Ulrich, “Learning from practice: How hr analytics avoids being a management fad,” Organizational Dynamics, vol. 44, no. 3, pp. 236–242, 2015.

A. Tursunbayeva, S. Di Lauro, and C. Pagliari, “People analytics: A scoping review of conceptual boundaries and value propositions,” International Journal of Information Management, vol. 43, pp. 224–247, 2018.

P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth, “CRISP-DM 1.0: Step-by-step data

mining guide,” SPSS Inc. / CRISP-DM Consortium, Tech. Rep., 2000. [Online]. Available: https://www.kde.cs.uni-kassel.de/wp-content/uploads/lehre/ws2012-13/kdd/files/CRISPWP-0800.pdf

R. Wirth and J. Hipp, “CRISP-DM: Towards a standard process model for data mining,” in Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK, 2000, pp. 29–39. [Online]. Available: https://cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf

T. H. Davenport, “Competing on analytics,” Harvard Business Review, vol. 84, no. 1, pp. 98–107, 2006. [Online]. Available: https://hbr.org/2006/01/competing-on-analytics

T. H. Davenport, J. Harris, and J. Shapiro, “Competing on talent analytics,” Harvard Business Review, vol. 88, no. 10, pp. 52–58, 2010. [Online]. Available: https://hbr.org/2010/10/competing-on-talent-analytics

M. A. Huselid, “The science and practice of workforce analytics: Introduction to the hrm special issue,” Human Resource Management, vol. 57, no. 3, pp. 679–684, 2018.

D. B. Minbaeva, “Building credible human capital analytics for organi-zational competitive advantage,” Human Resource Management, vol. 57, no. 3, pp. 701–713, 2018.

P. Van der Laken, J. W. Boudreau, and J. H. Marler, “Data-driven human resources analytics: A review and new research directions,” Personnel Review, vol. 47, no. 5, pp. 991–1006, 2018.

R. M. Steers and S. R. Rhodes, “Major influences on employee attendance: A process model,” Journal of Applied Psychology, vol. 63, no. 4, pp. 391–407, 1978.

J. J. Martocchio, “Age-related differences in employee absenteeism: A meta-analysis,” Psychology and Aging, vol. 4, no. 4, pp. 409–414, 1989.

M. Laaksonen, P. Martikainen, O. Rahkonen, and E. Lahelma, “Explanations for gender differences in sickness absence: Evidence from middle-aged municipal employees from Finland,” Occupational and Environmental Medicine, vol. 65, no. 5, pp. 325–330, 2008.

P. Allebeck and A. Mastekaasa, “Risk factors for sick leave: General studies,” Scandinavian Journal of Public Health, vol. 32, no. 5 suppl, pp. 49–108, 2004.

A. Martiniano, R. P. Ferreira, R. J. Sassi, and C. Affonso, “Application of a neuro fuzzy network in prediction of absenteeism at work,” in 7th Iberian Conference on Information Systems and Technologies (CISTI). IEEE, 2012, pp. 1–4, conjunto de dados Absenteeism at Work disponível no UCI Machine Learning Repository, DOI: 10.24432/C5X882.

K. Tewari, S. Vandita, and S. Jain, “Predictive analysis of absenteeism in MNCs using machine learning algorithm,” in Proceedings of ICRIC 2019, ser. Lecture Notes in Electrical Engineering, P. K. Singh, A. K. Kar, Y. Singh, M. H. Kolekar, and S. Tanwar, Eds. Springer, Cham, 2020, vol. 597, pp. 3–14.

P. Llamas Blázquez, “Predicting workplace absenteeism using machine learning: a pilot study in occupational health,” Journal of Occupational Medicine and Toxicology, vol. 20, no. 38, 2025.

R. Punnoose and P. Ajit, “Prediction of employee turnover in organizations using machine learning algorithms,” International Journal of Advanced Research in Artificial Intelligence, vol. 5, no. 9, pp. 22–26, 2016.

F. Fallucchi, M. Coladangelo, R. Giuliano, and E. William De Luca, “Predicting employee attrition using machine learning techniques,” Computers, vol. 9, no. 4, p. 86, 2020.

C. Schröer, F. Kruse, and J. M. Gomez, “A systematic literature review on applying CRISP-DM process model,” Procedia Computer Science, vol. 181, pp. 526–534, 2021.

D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, 3rd ed. Hoboken, NJ: Wiley, 2013.

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn, “Bias in random forest variable importance measures: Illustrations, sources and a solution,” BMC Bioinformatics, vol. 8, no. 1, p. 25, 2007.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.

Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997.

P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, no. 1, pp. 3–42, 2006.

T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.

T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLOS ONE, vol. 10, no. 3, p. e0118432, 2015.

D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.

R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), vol. 2, 1995, pp. 1137–1143. [Online]. Available: https://www.ijcai.org/Proceedings/95-2/Papers/016.pdf

J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967. [Online]. Available: https://projecteuclid.org/ebooks/berkeley-symposium-on-mathematical-statistics-and-probability/Proceedings-of-the-Fifth-Berkeley-Symposium-on-Mathematical-Statistics-and/chapter/Some-methods-for-classification-and-analysis-of-multivariate-observations/bsmsp/1200512992

S. P. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.

P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987.

T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics, vol. 3, no. 1, pp. 1–27, 1974.

D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, 1979.

L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol. 2, no. 1, pp. 193–218, 1985.

M. Kuhn and K. Johnson, Applied Predictive Modeling. New York: Springer, 2013.

A. Zheng and A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Sebastopol, CA: O’Reilly Media, 2018. [Online]. Available: https://www.oreilly.com/library/view/feature-engineering-for/9781491953235/

H. Blockeel and L. De Raedt, “Top-down induction of first-order logical decision trees,” Artificial Intelligence, vol. 101, no. 1–2, pp. 285–297, 1998.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [Online]. Available: https://www.jmlr.org/papers/v12/pedregosa11a.html

E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach,” Biometrics, vol. 44, no. 3, pp. 837–845, 1988.

Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947.

M. B. Suehara and M. C. P. d. Silva, “Prevalence of airborne fungi in Brazil and correlations with respiratory diseases and fungal infections,” Ciência & Saúde Coletiva, vol. 28, no. 11, pp. 3289–3300, 2023.

Publicado

2026-06-30

Cómo citar

CIUFFO MOREIRA, César Antônio; LOPES, Brenno; CELLA, Pedro. Modelado Predictivo del Absentismo y Agrupamiento de Perfiles Conductuales: Un Enfoque de People Analytics Aplicado a la Gestión de la Asistencia Organizacional. Revista de Ciência da Computação, [S. l.], v. 8, n. 1, p. e19583, 2026. DOI: 10.22481/recic.v8i1.19583. Disponível em: https://periodicos2.uesb.br/recic/article/view/19583. Acesso em: 30 jun. 2026.