Investigating the Role of LASSO in Feature Selection for Educational Data Mining (EDM) Applications

Authors

  • Mustafa Ahmed Khan College of Computer Science and Information Systems, Institute of Business Management, Karachi, Pakistan https://orcid.org/0009-0008-5656-9163
  • Khalid Mahboob College of Computer Science and Information Systems, Institute of Business Management, Karachi, Pakistan https://orcid.org/0000-0001-7431-4430
  • Urooj Yousuf College of Computer Science and Information Systems, Institute of Business Management, Karachi, Pakistan https://orcid.org/0000-0002-9514-0078
  • Muhammad Ramzan College of Computer Science and Information Systems, Institute of Business Management, Karachi, Pakistan
  • Muhammad Taha Shaikh College of Computer Science and Information Systems, Institute of Business Management, Karachi, Pakistan
  • Salman Akber College of Computer Science and Information Systems, Institute of Business Management, Karachi, Pakistan https://orcid.org/0009-0000-6159-1914

DOI:

https://doi.org/10.21015/vtse.v13i2.2111

Abstract

With the advent of digitalization, education-related activities have started generating massive amounts of data from various facets, such as student interaction, assessment, and learning management systems. Such vast amounts of data become suitable areas for Educational Data Mining (EDM) to reveal insights for actionable improvement in academic outcomes and personalized learning experiences. However, high dimensionality and the redundancy of the educational data also pose considerable threats to the accuracy, interpretability, and computational efficiency of modeling. Least Absolute Shrinkage and Selection Operator (LASSO) is one powerful technique for simultaneous regression and feature selection. By introducing sparsity, LASSO minimizes the absolute sum of regression coefficients, thereby forcing insignificant features to be reduced to zero automatically. This feature is handy in EDM, where relevant indicators such as attendance, quiz scores, or study patterns must be distinguished from noisy or redundant variables. This paper systematically investigates the application of LASSO in EDM by giving the mathematical background and geometric interpretation, along with practical usage recommendations. Also, LASSO performance has been checked on synthetic and real datasets, including the famous dataset UCI Student Performance. The findings prove that LASSO significantly enhances model interpretability, predictive accuracy, and a decline in complexity. In conclusion, limitations are discussed, as well as practical considerations and future directions for LASSO applications to next-generation educational analytics.

References

S. M. F. D. M. S. Mustapha, "Predictive Analysis of Students’ Learning Performance Using Data Mining Techniques: A Comparative Study of Feature Selection Methods," Applied System Innovation,, vol. 5, no. 6, 2023.

K. Roy and D. . M. Farid, "An Adaptive Feature Selection Algorithm for Student Performance Prediction,," IEEE Access, vol. 12, pp. 75577-75598,, 2024,.

H. C. Oz, Ç. Güven and G. Nápoles , ". School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach," Journal of Computational Social Science, vol. 6, p. 245–287, 13 December 2023.

. C. Jalota and . R. Agrawal, "Feature Selection Algorithms and Student Academic Performance: A Study," in International Conference on Innovative Computing and Communications, Singapore, 2021.

R. L. . C. Silva Filho, K. Brito and P. J. L. Adeodatato, "A data mining framework for reporting trends in the predictive contribution of factors related to educational achievement,," Expert Systems with Applications, vol. 221, 2023.

M. Freo and A. Luati , "Lasso-based variable selection methods in text regression: the case of short texts," AStA Adv Stat Anal , p. 69–99 , 2024.

J. Gui, Z. Sun, S. Ji, D. Tao a and T. Tan , "Feature Selection Based on Structured Sparsity: A Comprehensive Study," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 7, pp. 1490-1507, July 2017. DOI: https://doi.org/10.1109/TNNLS.2016.2551724

C. Wang, T. Li, Z. Lu, Z. Wang, T. Alballa, S. . A. Alhabeeb, M. . S. Albely and H. . A. E.-W. Khalifa, "Application of artificial intelligence for feature engineering in education sector and learning science,," Alexandria Engineering Journal, vol. 110, pp. 108-115, 2025.

J. . C. Immekus, T.-s. Jeong and . J. . E. Yoo, "Machine learning procedures for predictor variable selection for schoolwork-related anxiety: evidence from PISA 2015 mathematics, reading, and science assessments," Large-scale Assess Educ , vol. 10, 2022.

L. Cohausz, A. Tschalzev, C. Bartelt and H. Stuckenschmidt, "Investigating the importance of demographic features for EDM-predictions," in Proceedings of the 16th International Conference on Educational Data Mining, Bengaluru, India, 2023.

Z. Y, Y. Y, A. R, C. J, D. H and S. X , "Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis.," Front. Psychol., vol. 12.

U. Stańczyk, "Feature Evaluation by Filter, Wrapper, and Embedded Approaches," in Feature Selection for Data and Pattern Recognition, vol. 584, Springer, Berlin, Heidelberg, 31 December 2014, p. 29–44. DOI: https://doi.org/10.1007/978-3-662-45620-0_3

S. Bashir, I. . U. Khattak, A. Khan, F. . H. Khan, A. Gani and M. Shiraz, "A Novel Feature Selection Method for Classification of Medical Data Using Filters, Wrappers, and Embedded Approaches," Complexity, vol. 2022.

H. Deng and G. Runger, "Feature selection via regularized trees," in the 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, QLD, Australia, 2012.

S. V, M. Y, K. M and D. S, "Harnessing machine learning models and explainable AI to understand MOOC continuance intention," Information Discovery and Delivery, 2025.

F. Forouhideh and H. Aliakbarimajid, "From description to prediction: unveiling student performance in online learning through data-driven analysis and machine learning," 2023-24.

J. . C. Obi and I. . C. Jecinta, "A Review of Techniques for Regularization," International Journal of Research in Engineering and Science, vol. 11, no. 1, pp. 360-367, 2023.

M. . E. Anbari and A. Mkhadri, "Penalized regression combining the L 1 norm and a correlation based penalty," Sankhya B 76, p. 82–102, 2014. DOI: https://doi.org/10.1007/s13571-013-0065-4

M. El Jihaoui, O. El Kheir Abra and K. Mansouri, "Factors Affecting Student Academic Performance: A Combined Factor Analysis of Mixed Data and Multiple Linear Regression Analysis," in IEEE Access, vol. 13, pp. 15946-15964, 2025, doi: 10.1109/ACCESS.2025.3532099.

M. Iloska, P. M. Djurić and M. F. Bugallo, "Fast Sparse Learning from Streaming Data with LASSO," ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-5, doi: 10.1109/ICASSP49660.2025.10888851.

A. Dutta, K. Lakshmanan, R. Karthik, S. S. Priya, and A. Ramamoorthy, “An ellipsoid restrictive region-based regularization for regression analysis,” International Journal of Information Technology (Singapore), vol. 17, no. 3, pp. 1865–1871, 2024, doi: 10.1007/s41870-024-02282-2.

S. Christina, S. Sowjanya, C. Lakshmhyma, L. Prathiba and M. S. Amzad Basha, "Data-Driven Insights into Student Performance: Benchmarking Machine Learning Models for Grade Prediction using Regression and Classification Approaches," 2025 International Conference on Intelligent Systems and Computational Networks (ICISCN), Bidar, India, 2025, pp. 1-6, doi: 10.1109/ICISCN64258.2025.10934398.

Y. Luo and Z. Wang, "Feature Mining Algorithm for Student Academic Prediction Based on Interpretable Deep Neural Network," 2024 12th International Conference on Information and Education Technology (ICIET), Yamaguchi, Japan, 2024, pp. 1-5, doi: 10.1109/ICIET60671.2024.10542709.

B. Bouihi, A. Bousselham, E. Aoula, F. Ennibras, and A. Deraoui, “Prediction of Higher Education Student Dropout based on Regularized Regression Models,” Engineering, Technology and Applied Science Research, vol. 14, no. 6, pp. 17811–17815, 2024, doi: 10.48084/etasr.8644.

J. C. J. Luza and C. Rodriguez, "Predictive Attributes in Machine Learning for University Academic Performance: A Feature Engineering Approach," 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN), Indore, India, 2024, pp. 443-456, doi: 10.1109/CICN63059.2024.10847424.

S. Gupta, B. Kishan and P. Gulia, "Comparative Analysis of Predictive Algorithms for Performance Measurement," in IEEE Access, vol. 12, pp. 33949-33958, 2024, doi: 10.1109/ACCESS.2024.3372082.

I. Papadogiannis, M. Wallace, and G. Karountzou, “Educational Data Mining : A Foundational Overview,” Encyclopedia, vol. 4, no. 4, pp. 1644–1664, 2024, doi: https://doi.org/10.3390/encyclopedia4040108.

S. Pirenne and G. Claeskens, “Parametric Programming-based Approximate Selective Inference for Adaptive Lasso, Adaptive Elastic Net and Group Lasso,” Journal of Statistical Computation and Simulation, vol. 94, no. 11, pp. 2412–2435, Apr. 2024. doi:10.1080/00949655.2024.2337342

A. Abdelhadi, S. Zainudin, and N. S. Sani, “A Regression Model to Predict Key Performance Indicators in Higher Education Enrollments,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 1, pp. 454–460, 2022, doi: 10.14569/IJACSA.2022.0130156.[1]

S. Ansari and A. B. Nassif, "A Comprehensive Study of Regression Analysis and the Existing Techniques," 2022 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, 2022, pp. 1-10, doi: 10.1109/ASET53988.2022.9734973.

P. Cortez. "Student Performance," UCI Machine Learning Repository, 2008. [Online]. Available: https://doi.org/10.24432/C5TG7T.

L. Liu, J. Gao, G. Beasley, and S. H. Jung, “LASSO and Elastic Net Tend to Over-Select Features,” Mathematics, vol. 11, no. 17, pp. 1–16, 2023, doi: 10.3390/math11173738.

Downloads

Published

2025-05-04

How to Cite

Khan, M. A., Mahboob, K., Yousuf , U., Ramzan , M., Shaikh, M. T., & Salman Akber. (2025). Investigating the Role of LASSO in Feature Selection for Educational Data Mining (EDM) Applications. VFAST Transactions on Software Engineering, 13(2), 56–67. https://doi.org/10.21015/vtse.v13i2.2111

Issue

Section

Articles