Sentiment Analysis of Multilingual Roman Text for E-Commerce Reviews using Machine Learning Approaches

Authors

  • Sana Riaz Department of Information Technology, Quaid e Awam University of Engineering, Science & Technology, Nawabshah, Pakistan
  • Sarfraz Natha Department of Information Technology, Quaid e Awam University of Engineering, Science & Technology, Nawabshah, Pakistan & Department of Software Engineering, Sir Syed University of Engineering & Technology, Karachi, Pakistan https://orcid.org/0009-0008-3254-8720
  • Asghar Ali Chandio School of Engineering and Information Technology, University of New South Wales, Canberra, Australia & Department of Artificial Intelligence, Quaid-e-Awam University of Engineering, Science & Technology, Pakistan https://orcid.org/0000-0001-8821-2355
  • Mehwish Leghari Department of Data Science, Quaid-e-Awam University of Engineering, Science & Technology, Pakistan https://orcid.org/0000-0002-0756-6336
  • Abeer Javed Syed Department of Computer Science, IQRA University, Pakistan

DOI:

https://doi.org/10.21015/vtse.v13i1.2067

Abstract

Sentiment analysis, a type of natural language processing (NLP) analyzes the text data to extract and identify subjective information including attitudes, opinions, and feelings. Sentiment analysis can be used to examine audience feedback and reviews in the context of multilingual product reviews. In this paper, a sentiment analysis model using machine learning approaches has been developed for multilingual product reviews in Roman Urdu or Sindhi to determine how the public feels about certain posts, products, etc. The importance of sentiment analysis for product context reviews in many languages in Roman is multifaceted. It can offer insightful information on the preferences of the likes and dislikes of the audience. To accomplish multilingual sentiment analysis, a dataset of reviews in Roman Urdu and Sindhi languages from diverse online platforms and social media sources like YouTube, Facebook, TikTok, Daraz, and Instagram was collected.To identify pertinent features essential for categorizing reviews into negative, positive, or neutral sentiments based on polarity, the Term Frequency Inverse Document Frequency (TF-IDF) method was used. For classification, five different machine learning classifiers including Linear Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), and K-nearest neighbors (KNN) were used. The classification results were measured in terms of precision score, recall score, and F1-score. With TF-IDF, the SVM, and LR outperformed than other classifiers and obtained an F1-score of 0.77%, and 0.78%. To further improve the classification accuracy, the Synthetic Minority Over-sampling TEchnique (SMOTE) was used to manage the class imbalance problem. With SMOTE, the classification accuracy of LR and SVM was improved to 0.79% and 0.80%.

References

U. Singh, K. Abhishek, and H. K. Azad, "A Survey of Cutting-edge Multimodal Sentiment Analysis," ACM Computing Surveys, vol. 56, no. 9, pp. 1-38, 2024.

K. Du, F. Xing, R. Mao, and E. Cambria, "Financial Sentiment Analysis: Techniques and Applications," ACM Computing Surveys, vol. 56, no. 9, pp. 1-42, 2024.

G. Popoola, K. K. Abdullah, G. S. Fuhnwi, and J. Agbaje, "Sentiment Analysis of Financial News Data Using TF-IDF and Machine Learning Algorithms," in 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), 2024, pp. 1-6.

M. Arif, M. Hasan, S. A. Al Shiam, M. P. Ahmed, et al., "Predicting Customer Sentiment in Social Media Interactions: Analyzing Amazon Help Twitter Conversations Using Machine Learning," Int. J. Adv. Sci. Comput. Eng., vol. 6, no. 2, pp. 52-56, 2024.

P. S. Ghatora, S. E. Hosseini, S. Pervez, M. J. Iqbal, and N. Shaukat, "Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM," Big Data and Cognitive Computing, vol. 8, no. 12, pp. 199, 2024.

N. A. Sharma, A. S. Ali, and M. A. Kabir, "A Review of Sentiment Analysis: Tasks, Applications, and Deep Learning Techniques," Int. J. Data Sci. Anal., pp. 1-38, 2024.

M. A. Soomro, R. N. Memon, A. A. Chandio, M. Leghari, and M. H. Soomro, "A Dataset of Roman Urdu Text with Spelling Variations for Sentence-Level Sentiment Analysis," Data in Brief, vol. 57, pp. 111170, 2024.

M. A. Soomro, R. N. Memon, A. A. Chandio, M. Leghari, and M. Khalid, "Spelling Variation of Roman Urdu Using Machine Learning," J. Comput. Biomed. Informatics, vol. 7, no. 02, 2024.

K. Jawad, M. Ahmad, M. Alvi, and M. N. Alvi, "RUSAS: Roman Urdu Sentiment Analysis System," Computers, Mater. Continua, vol. 79, no. 1, 2024.

G. Manias, A. Mavrogiorgou, A. Kiourtis, C. Symvoulidis, and D. Kyriazis, "Multilingual Text Categorization and Sentiment Analysis: A Comparative Analysis of the Utilization of Multilingual Approaches for Classifying Twitter Data," Neural Comput. Appl., vol. 35, no. 29, pp. 21415-21431, 2023.

Z. Wang, Z. Hu, S.-B. Ho, E. Cambria, and A.-H. Tan, "MiMuSA—Mimicking Human Language Understanding for Fine-Grained Multi-Class Sentiment Analysis," Neural Comput. Appl., pp. 1–15, 2023.

S. Maruvur Selvi and P. S. Sreeja, "Sentimental Analysis of Movie Reviews in Tamil Text," in 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1157–1162, May 2023.

R. Keinan and Y. HaCohen-Kerner, "JCT at SemEval-2023 Tasks 12A and 12B: Sentiment Analysis for Tweets Written in Low-Resource African Languages Using Various Machine Learning and Deep Learning Methods, Resampling, and Hyperparameter Tuning," Proc. 17th Int. Workshop on Semantic Evaluation (SemEval-2023), pp. 365–378, 2023.

P. Shah, P. Swaminarayan, and M. Patel, "Sentiment Analysis on Film Review in Gujarati Language Using Machine Learning," Int. J. Electr. Comput. Eng. IJECE, vol. 12, no. 1, pp. 1030, 2022.

M. Al-Ayyoub, A. A. Khamaiseh, Y. Jararweh, and M. N. Al-Kabi, "Comprehensive Survey of Arabic Sentiment Analysis," Adv. Arab. Nat. Lang. Process. ANLP Its Appl., vol. 56, no. 2, pp. 320–342, 2019.

M. A. Qureshi, M. Asif, M. F. Khan, A. Kamal, and B. Shahid, "Roman Urdu Sentiment Analysis of Songs Reviews," VFAST Trans. Softw. Eng., vol. 11, no. 1, pp. 101-108, 2023.

J. Jawad, A. Kazim, M. Ahmad, M. Alvi, and M. B. Alvi, "RUSAS: Roman Urdu Sentiment Analysis System," Comput. Mater. Continua, vol. 79, no. 1, 2024.

M. Hammad and H. Anwar, "Sentiment Analysis of Sindhi Tweets Dataset Using Supervised Machine Learning Techniques," in 2019 22nd International Multitopic Conference (INMIC), pp. 1-6, 2019.

F. Noor, M. Bakhtyar, and J. Baber, "Sentiment Analysis in E-Commerce Using SVM on Roman Urdu Text," in Second International Conference on Emerging Technologies in Computing, iCETiC, London, UK, Aug. 19–20, 2019.

A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A. Hussain, "Multimodal Sentiment Analysis: A Systematic Review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions," Information Fusion, vol. 91, pp. 424-444, 2023.

M. Das and P. J. A. Alphonse, "A Comparative Study on TF-IDF Feature Weighting Method and Its Analysis Using Unstructured Dataset," arXiv preprint arXiv:2308.04037, 2023.

S. Akuma, T. Lubem, and I. T. Adom, "Comparing Bag of Words and TF-IDF with Different Models for Hate Speech Detection from Live Tweets," Int. J. Inf. Technol., vol. 14, no. 7, pp. 3629-3635, 2022.

R. K. Das, M. Islam, M. M. Hasan, S. Razia, M. Hassan, and S. A. Khushbu, "Sentiment Analysis in Multilingual Context: Comparative Analysis of Machine Learning and Hybrid Deep Learning Models," Heliyon, vol. 9, no. 9, 2023.

P. Savci and B. Das, "Prediction of the Customers’ Interests Using Sentiment Analysis in E-Commerce Data for Comparison of Arabic, English, and Turkish Languages," J. King Saud Univ.-Comput. Inf. Sci., vol. 35, no. 3, pp. 227-237, 2023.

J. Chen and L. Pan, "An AI-Based Cross-Language Aspect-Level Sentiment Analysis Model Using English Corpus," Eng. Rep., vol. 6, no. 12, pp. 2969, 2024.

B. Chandio, A. Shaikh, M. Bakhtyar, M. Alrizq, J. Baber, A. Sulaiman, and W. Noor, "Sentiment Analysis of Roman Urdu on E-Commerce Reviews Using Machine Learning," CMES-Comput. Model. Eng. Sci., vol. 131, no. 3, pp. 1263-1287, 2022.

M. Bilal, M. Israr, M. Shahid, and A. Khan, "Sentiment Classification of Roman-Urdu Opinions Using Naïve Bayesian, Decision Tree, and KNN Classification Techniques," J. King Saud Univ.-Comput. Inf. Sci., vol. 28, no. 3, pp. 330-344, 2016.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, "A Comparative Analysis of Logistic Regression, Random Forest, and KNN Models for the Text Classification," Augment Hum. Res., vol. 5, no. 1, pp. 12, 2020.

X. Fang and J. Zhan, "Sentiment Analysis Using Product Review Data," J. Big Data, vol. 2, pp. 1-14, 2015.

Downloads

Published

2025-03-28

How to Cite

Riaz, S., Natha, S., Ali Chandio, A., Leghari, M., & Syed, A. J. (2025). Sentiment Analysis of Multilingual Roman Text for E-Commerce Reviews using Machine Learning Approaches. VFAST Transactions on Software Engineering, 13(1), 131–140. https://doi.org/10.21015/vtse.v13i1.2067