Covid-19 Sentiment Analysis on X (formerly Twitter) Using Machine Learning Classifiers: Performance Comparison and Key Insights

Authors

DOI:

https://doi.org/10.21015/vtse.v13i2.2103

Abstract

The current generation and widely used platforms like X (formerly Twitter) enable the study of public attitudes toward important topics, including the COVID-19 outbreak. In this paper, machine learning approaches (ML) are employed to build a sentiment analysis system for COVID-19 hashtagged tweets. We employed four ML classifiers, namely Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF), to classify the tweets into positive, negative, and neutral sentiments. In total, the examined dataset includes 178,240 tweets that are related to COVID-19 and were preprocessed through natural language processing. To assess the performance of the classifiers, we used accuracy, precision and recall, and F1 score. The results show that the DT classifier has the highest accuracy of 94% when compared to other models concerning precision and recall. Undersampling and oversampling were the techniques examined for addressing the issue of class imbalance. Such findings imply that ML, especially the SVM and DTs, can be useful in the next large-scale public sentiment analysis during a pandemic. Among the recommendations for further enhancements of the sentiment analysis approaches and their use in monitoring people’s reactions to social media during the pandemic are included in the paper.

References

S. M. Vohra and J. B. Teraiya, "A comparative study of sentiment analysis techniques," Journal of Information, Knowledge, and Research in Computer Engineering, vol. 2, no. 2, pp. 313–317, 2013.

P. Bakhsh, M. Ismail, M. A. Khan, M. Ali, and R. A. Memon, "Optimisation of sentiment analysis for e-commerce," VFAST Transactions on Software Engineering, vol. 12, no. 3, pp. 243–262, 2024. DOI: https://doi.org/10.21015/vtse.v12i3.1907

C. Kaur and A. Sharma, "Twitter sentiment analysis on Coronavirus using Textblob," EasyChair Preprint, no. 2974, 2020.

M. A. Arshed et al., "LSTM based sentiment analysis model to monitor COVID-19 emotion," VFAST Transactions on Software Engineering, vol. 10, no. 2, pp. 70–78, 2022. DOI: https://doi.org/10.21015/vtse.v10i2.1006

A. Rehman et al., "The impact of COVID-19 on e-learning: Context-based sentiment analysis discourse using text mining," VAWKUM Transactions on Computer Sciences, vol. 11, no. 1, pp. 184–203, 2023.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

R. Singh, R. Singh, and A. Bhatia, "Sentiment analysis using machine learning technique to predict outbreaks and epidemics," International Journal of Advanced Science and Research, vol. 3, no. 2, pp. 19–24, 2018.

M. S. Neethu and R. Rajasree, "Sentiment analysis in Twitter using machine learning techniques," in Proc. Int. Conf. Computing, Communications and Networking Technologies, Tiruchengode, India, 2013. DOI: https://doi.org/10.1109/ICCCNT.2013.6726818

T. Javed, M. A. Nouman, and R. Zahid, "BERT model adoption for sarcasm detection on Twitter data," VFAST Transactions on Software Engineering, vol. 12, no. 3, pp. 177–198, 2024. DOI: https://doi.org/10.21015/vtse.v12i3.1908

U. Naseem, I. Razzak, and M. Khushi, "COVIDSenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analysis," IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 1003–1015, 2021.

M. Annett and G. Kondrak, "A comparison of sentiment analysis techniques: Polarizing movie blogs," Lecture Notes in Computer Science, vol. 5032, pp. 25–35, 2008. DOI: https://doi.org/10.1007/978-3-540-68825-9_3

H. Wang, Z. Wang, and Y. Dong, "Phase-adjusted estimation of the number of Coronavirus disease 2019 cases in Wuhan, China," Cell Discovery, vol. 6, article no. 10, 2020.

P. Tyagi, N. Goyal, and T. Gupta, "Analysis of COVID-19 tweets during lockdown phases," in Proc. Int. Conf. Information and Education Technology, Okayama, Japan, 2021, pp. 471–475.

M. Alhajji, A. Al Khalifah, M. Aljubran, and M. Alkhalifah, "Sentiment analysis of tweets in Saudi Arabia regarding governmental preventive measures to contain COVID-19," Preprint, 2020.

A. Waheed et al., "CovidGAN: Data augmentation using auxiliary classifier GAN for improved COVID-19 detection," IEEE Access, vol. 8, pp. 91916–91923, 2020.

V. Mahalakshmi et al., "Twitter sentiment analysis using conditional generative adversarial network," International Journal of Cognitive Computing in Engineering, vol. 5, pp. 161–169, 2024.

R. Khan, P. Shrivastava, and A. Kapoor, "Social media analysis with AI: Sentiment analysis techniques for the analysis of Twitter COVID-19 data," Journal of Critical Reviews, vol. 7, no. 9, pp. 2761–2774, 2020.

G. Kalia, "A research paper on social media: An innovative educational tool," Issues and Ideas in Education, vol. 1, no. 1, pp. 43–50, 2013. DOI: https://doi.org/10.15415/iie.2013.11003

M. ur Rehman and M. Bashir, "Sentiment analysis on disputed territory discrepancies using machine learning-based text mining approach," VFAST Transactions on Software Engineering, vol. 11, no. 2, pp. 17–25, 2023. DOI: https://doi.org/10.21015/vtse.v11i2.1486

R. P. Kaila and A. K. Prasad, "Informational flow on Twitter–coronavirus outbreak–topic modelling approach," International Journal of Advanced Research in Engineering and Technology, vol. 11, no. 3, pp. 128–134, 2020.

T. Carpenter and T. Way, "Tracking sentiment analysis through Twitter," in Proc. Int. Conf. Information and Knowledge Engineering, Las Vegas, NV, USA, 2012.

M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, "Lexicon-based methods for sentiment analysis," Computational Linguistics, vol. 37, no. 2, pp. 267–307, 2011. DOI: https://doi.org/10.1162/COLI_a_00049

A. Gelbukh, "Natural language processing," in Proc. Int. Conf. Hybrid Intelligent Systems, Rio de Janeiro, Brazil, 2005. DOI: https://doi.org/10.1109/ICHIS.2005.79

E. Kiely, L. Robertson, R. Rieder, and D. A. Gore, "Timeline of Trump’s COVID-19 comments," FactCheck.org, Oct. 2020. [Online]. Available: https://www.factcheck.org/2020/10/timeline-of-Trumps-covid-19-comments/

M. Hagen, M. Potthast, M. Büchner, and B. Stein, "Webis: An ensemble for Twitter sentiment detection," in Proc. Int. Workshop on Semantic Evaluation, Denver, CO, USA, pp. 582–589, 2015. DOI: https://doi.org/10.18653/v1/S15-2097

A. P. Jain and P. Dandannavar, "Application of machine learning techniques to sentiment analysis," in Proc. Int. Conf. Applied and Theoretical Computing and Communication Technology, Bangalore, India, pp. 628–632, 2016. DOI: https://doi.org/10.1109/ICATCCT.2016.7912076

W. Medhat, A. Hassan, and H. Korashy, "Sentiment analysis algorithms and applications: A survey," Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, 2014. DOI: https://doi.org/10.1016/j.asej.2014.04.011

J. Zhou, A. H. Gandomi, F. Chen, and A. Holzinger, "Evaluating the quality of machine learning explanations: A survey on methods and metrics," Electronics, vol. 10, no. 5, p. 593, 2021.

T. Wongvorachan, S. He, and O. Bulut, "A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining," Information, vol. 14, no. 1, p. 54, 2023.

I. H. Witten, E. Frank, A. Mark, and J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, Chennai, India: Morgan Kaufmann, 2016.

Downloads

Published

2025-05-03

How to Cite

Shaikh , M. A., Muhammad Ismail, Bakhsh, P., Jokhio, S., Khan, M. A., & Ali, K. (2025). Covid-19 Sentiment Analysis on X (formerly Twitter) Using Machine Learning Classifiers: Performance Comparison and Key Insights. VFAST Transactions on Software Engineering, 13(2), 13–27. https://doi.org/10.21015/vtse.v13i2.2103

Issue

Section

Articles