Harnessing Machine Learning for Accurate Smog Level Prediction: A Study of Air Quality in India

Authors

  • Sahil Jatoi Department of Computer Systems Engineering, Mehran University of Engineering and Technology Jamshoro 76062, Sindh, Pakistan. https://orcid.org/0009-0004-2947-5808
  • Bushra Abro Department of Electronic Engineering, Mehran University of Engineering and Technology Jamshoro 76062, Sindh, Pakistan https://orcid.org/0009-0001-7474-8674
  • Sanam Narejo Department of Computer Systems Engineering, Mehran University of Engineering and Technology Jamshoro 76062, Sindh, Pakistan https://orcid.org/0000-0002-3537-8949
  • Yaqoob Ali Baloch Department of Electronic Engineering, Mehran University of Engineering and Technology Jamshoro 76062, Sindh, Pakistan
  • Kehkashan Asma Department of Electronic Engineering, Mehran University of Engineering and Technology Jamshoro 76062, Sindh, Pakistan https://orcid.org/0000-0003-2841-2464

DOI:

https://doi.org/10.21015/vtcs.v13i1.2077

Keywords:

Air Quality Index (AQI); Machine Learning (ML) Models; Smog Prediction; Automation and Environmental Monitoring

Abstract

Accurate prediction of smog concentrations is needed to mitigate the harm of AP on public health and the environment. This research proposes a new method to combine machine learning (ML) models with live data from Central Pollution Control Board (CPCB) to fill in the smog prediction accuracy gaps. The data consist of hourly AQI readings from different towns in India which were preprocessed to adjust for missing values and normalize data before ML models. The algorithms were tested with 8 ML algorithms, and hyper-parameter settings were tuned using the GridSearchCV method. The results show that XG Boost Regressor (XGBR) and Extra Tree Regressor (ETR) models significantly surpass other ML algorithms and traditional techniques with better accuracy on predicting smog. These results are useful for policymakers and environmental agencies to implement sustainable air quality management.

References

Javed A, Hussain M, Anwar F, Zahoor M, Raza S, Ullah I, et al. The potential impact of smog spell on humans’ health amid COVID-19 rages. Int J Environ Res Public Health. 2021;18(21). doi:10.3390/ijerph182111408.

Roser M. Data review: how many people die from air pollution? [Internet]. 2021 [cited 2025 May 6]. Available from: https://ourworldindata.org/data-review-air-pollution-deaths

Azimi MN, Rahman MM. Unveiling the health consequences of air pollution in the world’s most polluted nations. Sci Rep. 2024;14(1):1–25. doi:10.1038/s41598-024-60786-0.

Kaur R, Pandey P. Air Pollution, Climate Change, and Human Health in Indian Cities: A Brief Review. Front Sustain Cities. 2021;3(August). doi:10.3389/frsc.2021.705131.

Manisalidis I, Stavropoulou E, Stavropoulos A, Bezirtzoglou E. Environmental and Health Impacts of Air Pollution: A Review. Front Public Health. 2020;8(February):1–13. doi:10.3389/fpubh.2020.00014.

Raza W, Anwar F, Shahzad A, Bukhari SMH, Javaid N, Hassan R, et al. A review on the deteriorating situation of smog and its preventive measures in Pakistan. J Clean Prod. 2021;279:123676. doi:10.1016/j.jclepro.2020.123676.

Grzywa-Celińska A, Krusiński A, Milanowski J. ‘Smoging kills’ – Effects of air pollution on human respiratory system. Ann Agric Environ Med. 2020;27(1):1–5. doi:10.26444/aaem/110477.

Siddiqui SA, Fatima N, Ahmad A. Smart Air Pollution Monitoring System with Smog Prediction Model using Machine Learning. Int J Adv Comput Sci Appl. 2021;12(8):401–9. doi:10.14569/IJACSA.2021.0120846.

Geetha S, Prasika L. Smog prediction model using time series with long-short term memory. Int J Mech Eng Technol. 2019;10(1):1026–32.

Tian J, Liu Y, Zheng W, Yin L. Smog prediction based on the deep belief - BP neural network model (DBN-BP). Urban Clim. 2022;41(Dec 2021):101078. doi:10.1016/j.uclim.2021.101078.

Shih DH, Wu TW, Liu WX, Shih PY. An azure aces early warning system for air quality index deteriorating. Int J Environ Res Public Health. 2019;16(23). doi:10.3390/ijerph16234679.

Shah SK, Tariq Z, Lee J, Lee Y. Real-Time Machine Learning for Air Quality and Environmental Noise Detection. In: Proc. 2020 IEEE Int. Conf. Big Data (Big Data 2020). IEEE; 2020. p. 3506–15. doi:10.1109/BigData50022.2020.9377939.

Osman N, Jamlos MF, Dzaharudin F, Khan AR, Yusof KY, Khairuddin AK. Real-Time and Predictive Analytics of Air Quality with IoT System: A Review. In: Lecture Notes in Networks and Systems. 2021. doi:10.1007/978-981-33-4597-3_11.

Kow PY, Hsia IW, Chang LC, Chang FJ. Real-time image-based air quality estimation by deep learning neural networks. J Environ Manage. 2022;307:114560. doi:10.1016/j.jenvman.2022.114560.

Liu C, Pan G, Song D, Wei H. Air Quality Index Forecasting via Genetic Algorithm-Based Improved Extreme Learning Machine. IEEE Access. 2023;11:67086–97. doi:10.1109/ACCESS.2023.3291146.

Ravindiran G, Hayder G, Kanagarathinam K, Alagumalai A, Sonne C. Air quality prediction by machine learning models: A predictive study on the Indian coastal city of Visakhapatnam. Chemosphere. 2023;338:139518. doi:10.1016/j.chemosphere.2023.139518.

Hardini M, Chakim MHR, Magdalena L, Kenta H, Rafika AS, Julianingsih D. Image-based Air Quality Prediction using Convolutional Neural Networks and Machine Learning. APTISI Trans Technopreneursh. 2023;5(1SP):109–23. doi:10.34306/att.v5i1Sp.337.

Hardini M, Sunarjo RA, Asfi M, Chakim MHR, Sanjaya YPA. Predicting Air Quality Index using Ensemble Machine Learning. ADI J Recent Innov. 2023;5(1Sp):78–86. doi:10.34306/ajri.v5i1sp.981.

Morapedi TD, Obagbuwa IC. Air pollution particulate matter (PM2.5) prediction in South African cities using machine learning techniques. Front Artif Intell. 2023;6. doi:10.3389/frai.2023.1230087.

Chen Y, Huang L, Xie X, Liu Z, Hu J. Improved prediction of hourly PM2.5 concentrations with a long short-term memory and spatio-temporal causal convolutional network deep learning model. Sci Total Environ. 2024;912:168672. doi:10.1016/j.scitotenv.2023.168672.

Zhang Z, Johansson C, Engardt M, Stafoggia M, Ma X. Improving 3-day deterministic air pollution forecasts using machine learning algorithms. Atmos Chem Phys. 2024;24(2):807–51. doi:10.5194/acp-24-807-2024.

Masseran N, Safari MAM, Tajuddin RRM. Probabilistic classification of the severity classes of unhealthy air pollution events. Environ Monit Assess. 2024;196(6). doi:10.1007/s10661-024-12700-4.

Kuo CY, Yang WW, Su ECY. Improving dengue fever predictions in Taiwan based on feature selection and random forests. BMC Infect Dis. 2024;24(Suppl 2):1–11. doi:10.1186/s12879-024-09220-4.

Oldenburg V, Cardenas-Cartagena J, Valdenegro-Toro M. Forecasting Smog Clouds With Deep Learning [Internet]. arXiv; 2024 [cited 2024 Nov 17]. Available from: https://arxiv.org/abs/2410.02759v1

Kumar K, Pande BP. Air pollution prediction with machine learning: a case study of Indian cities. Int J Environ Sci Technol. 2023;20(5):5333–48. doi:10.1007/S13762-022-04241-5/TABLES/7.

Sanjeev D. Implementation of machine learning algorithms for analysis and prediction of air quality [Internet]. Int J Eng Res Technol (IJERT). 2021 [cited 2024 Nov 17]. Available from: https://www.academia.edu/download/66199522/implementation_of_machine_learning_algorithms_for_IJERTV10IS030323.pdf

Jin XB, Ma S, Liu Y, Huang J, Gong J, Fu H, et al. Deep Spatio-Temporal Graph Network with Self-Optimization for Air Quality Prediction. Entropy. 2023 Jan;25(2):247. doi:10.3390/e25020247.

Chen CW, Tseng YS, Mukundan A, Wang HC. Air pollution: Sensitive detection of PM2.5 and PM10 concentration using hyperspectral imaging. Appl Sci (Switzerland). 2021 May;11(10):4543. doi:10.3390/app11104543/s1.

AlThuwaynee OF, Pradhan B, Jaafari A, Alamri AM, Melesse AM, Shahabi H, et al. Demystifying uncertainty in PM10 susceptibility mapping using variable drop-off in extreme-gradient boosting (XGB) and random forest (RF) algorithms. Environ Sci Pollut Res Int. 2021 Aug;28(32):43544–66. doi:10.1007/s11356-021-13255-4.

Avila ML, Alonso AM, Peña D. Modelling multiple seasonalities with ARIMA: Forecasting Madrid NO2 hourly pollution levels [Preprint]. Res Sq. 2023 Apr. doi:10.21203/rs.3.rs-2860239/v1.

Abirami S, Chitra P. Probabilistic air quality forecasting using deep learning spatial–temporal neural network. Geoinformatica. 2023 Apr;27(2):199–235. doi:10.1007/s10707-022-00479-w.

Kapoor NR, Kumar A, Kumar A, Kumar A, Arora HC. Prediction of Indoor Air Quality Using Artificial Intelligence. In: Mishra D, Dehuri S, Behera HS, editors. Machine Intelligence, Big Data Analytics, and IoT in Image Processing: Practical Applications. Hoboken, NJ: Wiley; 2023. p. 447–69. doi:10.1002/9781119865513.ch18.

Liu Q, Cui B, Liu Z. Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling. Atmosphere. 2024 Apr;15(5):553. doi:10.3390/atmos15050553.

Gupta NS, Mohta Y, Heda K, Armaan R, Valarmathi B, Arulkumaran G. Prediction of Air Quality Index Using Machine Learning Techniques: A Comparative Analysis. J Environ Public Health. 2023;2023:4916267. doi:10.1155/2023/4916267.

Downloads

Published

2025-05-06

How to Cite

Jatoi, S., Abro, B., Narejo, S., Yaqoob Ali Baloch, & Asma, K. (2025). Harnessing Machine Learning for Accurate Smog Level Prediction: A Study of Air Quality in India . VAWKUM Transactions on Computer Sciences, 13(1), 106–119. https://doi.org/10.21015/vtcs.v13i1.2077