Data Lake Implementation and Data Visualization for Public Auditing at the Controladoria-Geral do Estado de Mato Grosso
DOI:
https://doi.org/10.22481/recic.v8i1.19165Keywords:
data lake, public audit, data analysisAbstract
This study presents the implementation and evaluation of a Data Lake at the Controladoria-Geral do Estado de Mato Grosso (CGE-MT), with the objective of optimizing auditing processes and data analysis. The set of open-source systems that make up the environment (such as Apache HDFS, Spark, and Trino) was introduced, followed by an evaluation based on two criteria: technical and operational. The results demonstrated that the implemented infrastructure proved to be efficient for data analysis activities, providing a secure environment for data storage and processing while ensuring data integrity. Furthermore, built on the Data Lake, CGE-MT developed the “CGE Alerta” system, which enabled a 51% reduction in absenteeism-related irregularities across the State Secretariats of Mato Grosso and automated monitoring processes. The long-term viability of the solution was also demonstrated, as the available storage capacity allows for approximately 15 years of data retention without the need for immediate investments.
Downloads
References
Apache Airflow, “What is Airflow?,” Apache Airflow Documentation, 2024. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/. Accessed: Nov. 6, 2024.
K. Shvachko, H. Kuang, S. Radia and R. Chansler, "The Hadoop Distributed File System," 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, USA, 2010, pp. 1-10.
J. Schneider, C. Gröger, A. Lutsch, et al., “The Lakehouse: State of the Art on Concepts and Technologies,” SN Computer Science, vol. 5, p. 449, 2024.
M. Zaharia et al., “Resilient distributed datasets: A Fault-Tolerant abstraction for In-Memory cluster computing,” in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), San Jose, CA, USA, 2012, pp. 15-28.
C. Avci, B. Tekinerdogan, and I. N. Athanasiadis, “Software architectures for big data: a systematic literature review,” Big Data Analytics, vol. 5, no. 1, p. 5, 2020.
A. R. E. Da Silva et al., “Análise da relevância da arquitetura de implementação de Delta Lake para banco de dados empresariais,” 2024.
D. Borges, “CGE Alerta transforma gestão pública em 2024 e reduz pendências em até 51%,” 2025. [Online]. Available: https://www.cge.mt.gov.br/w/cge-alerta-transforma-gest%C3%A3o-p%C3%BAblica-em-2024-e-reduz-pend%C3%AAncias-em-at%C3%A9-51-/. Accessed: Mar. 3, 2025.
Dremio, “Project Nessie,” 2024. [Online]. Available: https://www.dremio.com/open-source/nessie/. Accessed: Nov. 6, 2024.
S. Fanelli et al., “Big data analysis for decision-making processes: challenges and opportunities for the management of health-care organizations,” Management Research Review, vol. 46, no. 3, pp. 369–389, 2023.
G. Boscov, “Mato Grosso é destaque no SECOP 2023: Excelência em Governo Digital,” Secretaria de Estado de Meio Ambiente, Desenvolvimento Sustentável e Turismo (MTI), Cuiabá, MT, 4 set. 2023. Online. Available: https://www.mti.mt.gov.br/-/mato-grosso-é-destaque-no-secop-2023-excelência-em-governo-digital. Accessed: Apr. 14, 2026.
T. Kluyver et al., “Jupyter Notebooks-a publishing format for reproducible computational workflows,” in Positioning and Power in Academic Publishing: Players, Agents and Agendas, 20th International Conference on Electronic Publishing, IOS Press, 2016, pp. 87-90.
T. Kafel, A. Wodecka-Hyjek, and R. Kusa, “Multidimensional public sector organizations' digital maturity model,” Administration & Public Management Review, vol. 37, pp. 64-82, 2021.
D. D. H. Ameen, S. W. Kareem, and S. B. Hasan, “A Big Data, Bigger Impact: A Comprehensive Review of Machine Learning Advancements,” in 2024 International Conference on Electrical Engineering and Computer Science (ICECOS), IEEE, 2024, pp. 1-6.
O. M. Ribeiro and J. M. R. Coelho, Auditoria fácil, 2. ed. São Paulo: Saraiva, 2013.
M. Santos, “O impacto das novas tecnologias na profissão do auditor,” KPMG Business Magazine, vol. 46, pp. 16-21, 2019.
L. Silveira, “CGE lança sistema que permite monitoramento e correção proativa de questões administrativas,” 2024. [Online]. Available: https://www.mti.mt.gov.br/-/cge-lan%C3%A7a-sistema-que-permite-monitoramento-e-corre%C3%A7%C3%A3o-proativa-de-quest%C3%B5es-administrativas/. Accessed: Mar. 3, 2025.
D. Appelbaum et al., “Impact of business analytics and enterprise systems on managerial accounting,” International Journal of Accounting Information Systems, vol. 25, pp. 29-44, 2017.
Trino, “Trino 464 Documentation.” [Online]. Available: https://trino.io/docs/current/overview/use-cases.html/. Accessed: Nov. 6, 2024.
R. Sethi et al., “Presto: SQL on everything,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE), IEEE, 2019, pp. 1802-1813.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Computer Science

This work is licensed under a Creative Commons Attribution 4.0 International License.