Urdu-Punjabi Code Switched Sentiment Analysis Empowered by a Deep Learning Framework Integrating XLM-R, and GPT

Muzammal Hussain; Saddam Ali; Hina Sattar; Ali Raza; Muhammad Hamza Akbar; Muhammad Ahsan Rafiq

doi:10.21015/vtcs.v13i2.2144

Authors

Muzammal Hussain Department of Computer Science, Government College University Faisalabad, Sahiwal Campus, Pakistan https://orcid.org/0000-0002-7320-4413
Saddam Ali Department of Computer Science, Government College University Faisalabad, Sahiwal Campus, Pakistan https://orcid.org/0009-0001-9928-0115
Hina Sattar Department of Computer Science, University of Sahiwal, Pakistan https://orcid.org/0009-0009-9928-2370
Ali Raza Faculty of Computer Science & Information Technology, Superior University, Pakistan https://orcid.org/0009-0006-9108-2962
Muhammad Hamza Akbar Department of Computer Science, Government College University Faisalabad, Sahiwal Campus, Pakistan https://orcid.org/0009-0005-4060-1497
Muhammad Ahsan Rafiq Department of Electrical Engineering, The University of Faisalabad, Pakistan https://orcid.org/0009-0007-4356-5166

DOI:

https://doi.org/10.21015/vtcs.v13i2.2144

Abstract

Sentiment analysis is a procedure that uses computational methods, textual analysis, and natural language processing to derive significant insights from textual sources. Sentiment analysis detects and quantifies the attitudes, opinions, and emotional states that individuals convey through textual information. The majority of existing sentiment analysis work is centered on the English language, leaving low-resource languages largely underexplored. Performing sentiment analysis on low-resource languages is challenging due to the unavailability of extensive datasets and associated resources. To overcome the challenge of unavailability of datasets we proposed Large Urdu-Punjabi code switched Corpus for Sentiment Analysis (LUPCSA-25) comprises over 10,00,000 user reviews in Urdu and Punjabi (Shahmukhi). Urdu and Punjabi domain specialists enrolled in PhD provided additional annotations to the dataset. In this research, we examine how head-pruning strategies can enhance both the predictive accuracy and computational efficiency of transformer architectures—specifically XLM-R and GPT-2—for sentiment classification of Urdu–Punjabi code-switched text. After preprocessing the textual data, BERT embeddings are produced and subsequently passed to the proposed classification model for determining sentiment. The performance of the proposed classifier is assessed by comparing it with baseline classifiers. The results demonstrate that the proposed classifiers with head pruning technique surpass current state-of-the art models with a precision rate of 96.4%.

References

W. Hersh, “Search still matters: Information retrieval in the era of generative AI,” J. Am. Med. Inform. Assoc., 2024, Art. no. ocae014.

X. Li, J. Jin, Y. Zhou, Y. Zhang, P. Zhang, Y. Zhu, and Z. Dou, “From matching to generation: A survey on generative information retrieval,” ACM Trans. Inf. Syst., vol. 43, no. 3, pp. 1–62, 2025.

G. Singh, R. Bhandari, and P. Singh, “Advancing NLP for Punjabi language: A comprehensive review of language processing challenges and opportunities,” in Proc. 2nd Int. Conf. Intell. Data Commun. Technol. Internet Things (IDCIoT), Jan. 2024, pp. 1250–1257.

N. Singh, M. Kumar, B. Singh, and J. Singh, “DeepSpacy-NER: An efficient deep learning model for named entity recognition for Punjabi language,” Evol. Syst., vol. 14, no. 4, pp. 673–683, 2023.

H. Khalid, G. Murtaza, and Q. Abbas, “Using data augmentation and bidirectional encoder representations from transformers for improving Punjabi named entity recognition,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 6, pp. 1–13, 2023.

D. P. Dash, M. Kolekar, C. Chakraborty, and M. R. Khosravi, “Review of machine and deep learning techniques in epileptic seizure detection using physiological signals and sentiment analysis,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 23, no. 1, pp. 1–29, 2024.

X. Zhan, C. Shi, L. Li, K. Xu, and H. Zheng, “Aspect category sentiment analysis based on multiple attention mechanisms and pre-trained models,” Appl. Comput. Eng., vol. 71, pp. 21–26, 2024.

A. Al-Adaileh, M. Al-Kfairy, M. Tubishat, and O. Alfandi, “A sentiment analysis approach for understanding users’ perception of metaverse marketplace,” Intell. Syst. Appl., vol. 22, Art. no. 200362, 2024.

X. Zhao et al., “Rdgcn: Reinforced dependency graph convolutional network for aspect-based sentiment analysis,” in Proc. 17th ACM Int. Conf. Web Search Data Min. (WSDM), Mar. 2024, pp. 976–984.

T. Zhan, C. Shi, Y. Shi, H. Li, and Y. Lin, “Optimization techniques for sentiment analysis based on LLM (GPT-3),” arXiv preprint arXiv:2405.09770, 2024.

O. Alqaryouti, N. Siyam, A. A. Monem, and K. Shaalan, “Aspect-based sentiment analysis using smart government review data,” Appl. Comput. Informatics, vol. 20, no. 1/2, pp. 142–161, 2024.

J. Zheng, R. Liu, R. Zhang, and H. Xu, “How do firms use virtual brand communities to improve innovation performance? Based on consumer participation and organizational learning perspectives,” Eur. J. Innov. Manag., vol. 27, no. 3, pp. 894–921, 2024.

M. M. Abedi and E. Sacchi, “A machine learning tool for collecting and analyzing subjective road safety data from Twitter,” Expert Syst. Appl., vol. 240, Art. no. 122582, 2024.

A. Tiwari, J. Sehgal, M. Singh, and A. Mishra, “Sentiment analysis in English-Punjabi mixed social media posts,” in Proc. 2025 IEEE Int. Conf. Interdiscip. Approaches Technol. Manage. Soc. Innov. (IATMSI), vol. 3, Mar. 2025, pp. 1–6.

N. R. Dave, M. A. Mehta, and K. Kotecha, “A systematic review of stemmers of Indian and non-Indian vernacular languages,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 23, no. 1, pp. 1–51, 2024.

A. Noreen, I. Muneer, and R. M. A. Nawab, “Mono-lingual text reuse detection for the Urdu language at lexical level,” Eng. Appl. Artif. Intell., vol. 136, Art. no. 109003, 2024.

S. Ishikawa and S. Yoshida, “Relative clause constructions in New Indo-Aryan languages: Hierarchies of macro roles,” Formal Approaches South Asian Lang., 2024.

S. Bibi, S. Asghar, and M. Zubair, “Sense unveiled: Enhancing Urdu corpus for nuanced word sense disambiguation,” IEEE Access, 2024.

M. R. Ashraf, M. Hussain, M. A. Jaffar, W. Y. Ramay, and M. Faheem, “Revolutionizing Urdu sentiment analysis: Harnessing the power of XLM-R and GPT-2,” IEEE Access, 2024.

Q. Xi and P. Jiang, “Design of news sentiment classification and recommendation system based on multi-model fusion and text similarity,” Int. J. Cogn. Comput. Eng., vol. 6, pp. 44–54, 2025.

Z. Chu et al., “An effective strategy for sentiment analysis based on complex-valued embedding and quantum long short-term memory neural network,” Axioms, vol. 13, no. 3, Art. no. 207, 2024.

L. Yang, J. Zhong, T. Wen, and Y. Liao, “CCIN-SA: Composite cross modal interaction network with attention enhancement for multimodal sentiment analysis,” Inf. Fusion, Art. no. 103230, 2025.

A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A. Hussain, “Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions,” Inf. Fusion, vol. 91, pp. 424–444, 2023.

A. Romero-Cantón and R. Aranda, “Sentiment classification for Mexican tourist reviews based on K-NN and Jaccard distance,” 2021.

M. M. Abdelgwad, T. H. A. Soliman, and A. I. Taloba, “Arabic aspect sentiment polarity classification using BERT,” J. Big Data, vol. 9, no. 1, Art. no. 115, 2022.

Y. Bie, Y. Yang, and Y. Zhang, “Fusing syntactic structure information and lexical semantic information for end-to-end aspect-based sentiment analysis,” Tsinghua Sci. Technol., vol. 28, no. 2, pp. 230–243, 2022.

A. Zouzou and I. El Azami, “Text sentiment analysis with CNN & GRU model using GloVe,” in Proc. IEEE Conf., 2021.

I. A. Ahmad, P. Gatla, and R. K. Mundotiya, “Sarcasm identification and classification in Hindi newspaper headlines,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 24, no. 4, pp. 1–21, 2025.

A. Altaf et al., “Deep learning based cross domain sentiment classification for Urdu language,” IEEE Access, vol. 10, pp. 102135–102147, 2022.

M. Indah, “Implementation of cloud computing on ResQHub application back-end using Google Cloud Platform,” Sinkron: J. Penelit. Tek. Inform., vol. 9, no. 2, 2025.

B. Zhu et al., “Cross-lingual entity alignment based on complex relationships and fine-grained attributes,” Appl. Soft Comput., vol. 172, Art. no. 112894, 2025.

B. Kancharla, P. Singh, L. B. Kancharla, Y. Chama, and R. Sharma, “Identifying aggression and offensive language in code-mixed tweets: A multi-task transfer learning approach,” in Proc. 1st Workshop Nat. Lang. Process. Indo-Aryan Dravidian Lang., Jan. 2025, pp. 122–128.

T. Dao et al., “Flashattention: Fast and memory-efficient exact attention with IO-awareness,” Adv. Neural Inf. Process. Syst., vol. 35, pp. 16344–16359, 2022.

H. Wang, Z. Zhang, and S. Han, “Spatten: Efficient sparse attention architecture with cascade token and head pruning,” in Proc. IEEE Conf., 2021.

A. Zayed et al., “Fairness-aware structured pruning in transformers,” in Proc. AAAI Conf. Artif. Intell., vol. 38, no. 20, Mar. 2024, pp. 22484–22492.

S. B. Belhaouari and I. Kraidia, “Efficient self-attention with smart pruning for sustainable large language models,” Sci. Rep., vol. 15, no. 1, Art. no. 10171, 2025.

M. Shabbir et al., “Advancing NLP for Shahmukhi Punjabi: Word embedding and text classification with a novel dataset,” VAWKUM Trans. Comput. Sci., vol. 13, no. 1, pp. 22–43, Apr. 2025.

Urdu-Punjabi Code Switched Sentiment Analysis Empowered by a Deep Learning Framework Integrating XLM-R, and GPT

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information