Basic Elements of Machine Learning

Authors

DOI:

https://doi.org/10.22481/intermaths.v4i2.13422

Keywords:

Machine Learning (ML), Artificial Intelligence (AI), Educational Data Mining (EDM), School Dropout Prediction, Neural Networks (RN)

Abstract

In this work we reviewed six scientific articles on school dropout that used Machine Learning as a methodology to point out the possible causes. The methodology used by this research is bibliographical, because it was investigated what are the themes and technical elements of each of the articles, especially in relation to machine learning. The results obtained are about the main elements that structure the machine learning algorithms from the six scientific articles studied. The theme is very comprehensive and involves several areas and subareas of scientific knowledge, such as Linear Algebra, Matrices, Theory of Computation, Computability, Models of Computation, Formal Language and Automata, Analysis of Algorithms and Computational Complexity. Data Mining Techniques (EDM) are the actions used to find patterns in a large volume of data. These patterns can be explanatory, so as to describe the relationships between data segments, or predictive, which can predict future values based on previous data. At the end, the reader will have a broad view of how the entire methodological process of production and construction of machine learning occurs.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

DEL BONIFRO, Francesca et al. Student dropout prediction. In: Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part I 21. Springer International Publishing, 2020. p. 129-140.

YU, Renzhe; LEE, Hansol; KIZILCEC, René F. Should college dropout prediction models include protected attributes?. In: Proceedings of the eighth ACM conference on learning@ scale. 2021. p. 91-100.

COSTA, Stella F.; DINIZ, Michael M. Application of logistic regression to predict the failure of students in subjects of a mathematics undergraduate course. Education and Information Technologies, v. 27, n. 9, p. 12381-12397, 2022.

MÁRQUEZ‐VERA, Carlos et al. Early dropout prediction using data mining: a case study with high school students. Expert Systems, v. 33, n. 1, p. 107-124, 2016.

KARIMI-HAGHIGHI, Marzieh; CASTILLO, Carlos; HERNÁNDEZ-LEO, Davinia. A causal inference study on the effects of first year workload on the dropout rate of undergraduates. In: International Conference on Artificial Intelligence in Education. Cham: Springer International Publishing, 2022. p. 15-27.

KOTSIANTIS, Sotiris B.; PIERRAKEAS, C. J.; PINTELAS, Panayiotis E. Preventing student dropout in distance learning using machine learning techniques. In: Knowledge-Based Intelligent Information and Engineering Systems: 7th International Conference, KES 2003, Oxford, UK, September 2003. Proceedings, Part II 7. Springer Berlin Heidelberg, 2003. p. 267-274.

CHANG, Chih-Chung; LIN, Chih-Jen. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), v. 2, n. 3, p. 1-27, 2011.

BREIMAN, Leo. Bagging predictors. Machine learning, v. 24, p. 123-140, 1996.

MUSCHELLI III, John. ROC and AUC with a binary predictor: a potentially misleading metric. Journal of classification, v. 37, n. 3, p. 696-708, 2020.

KOIZUMI, Yuma et al. SNIPER: Few-shot learning for anomaly detection to minimize false-negative rate with ensured true-positive rate. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. p. 915-919.

MURTHY, Sreerama K. Automatic construction of decision trees from data: A multi-disciplinary survey. Data mining and knowledge discovery, v. 2, p. 345-389, 1998.

MITCHELL, Tom M. Machine learning. 1997.

DOMINGOS, Pedro; PAZZANI, Michael. On the optimality of the simple Bayesian classifier under zero-one loss. Machine learning, v. 29, p. 103-130, 1997.

AHA, D. Lazy Learning. Kluwer Academic Publishers. 1997.

SCOTT LONG, John. Regression models for categorical and limited dependent variables. Advanced quantitative techniques in the social sciences, v. 7, 1997.

BURGES, Christopher J C. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, v. 2, n. 2, p. 121-167, 1998.

POWERS, David MW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061, 2020.

FREEMAN, Elizabeth A.; MOISEN, Gretchen G. A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa. Ecological modelling, v. 217, n. 1-2, p. 48-58, 2008.

LÓPEZ, Victoria et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information sciences, v. 250, p. 113-141, 2013.

BRADLEY, Andrew P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, v. 30, n. 7, p. 1145-1159, 1997.

SPACKMAN, Kent A. Signal detection theory: Valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on Machine learning. Morgan Kaufmann, 1989. p. 160-163.

HOSMER, D. W.; LEMESHOW, Stanley. John Wiley & Sons. New York, 2000.

ROSENBAUM, Paul R.; RUBIN, Donald B. The central role of the propensity score in observational studies for causal effects. Biometrika, v. 70, n. 1, p. 41-55, 1983.

BRAY, Bethany C. et al. Inverse propensity score weighting with a latent class exposure: Estimating the causal effect of reported reasons for alcohol use on problem alcohol use 16 years later. Prevention Science, v. 20, p. 394-406, 2019.

GLYNN, Adam N.; QUINN, Kevin M. An introduction to the augmented inverse propensity weighted estimator. Political analysis, v. 18, n. 1, p. 36-56, 2010.

VARELLA, Carlos Alberto Alves. Análise multivariada aplicada as ciências agrárias. Seropédica: Universidade Federal Rural do Rio de Janeiro, 2008.

ASSUNÇÃO, R. Linear Discriminant Analysis. Minas Gerais: DCC-UFMG, 2020.

MENOTTI, D. Classificação. Universidade Federal do Paraná (UFPR). Especialização em Engenharia Industrial 4.0. Paraná.

Published

2023-12-30

How to Cite

Ferreira, J. S. P. (2023). Basic Elements of Machine Learning. INTERMATHS, 4(2), 54-74. https://doi.org/10.22481/intermaths.v4i2.13422

Issue

Section

Artigos