Model Interpretability with XAI for Sarcastic Behavior Detection in Low-Resource Roman Urdu Language using Machine Learning and Ensembled Approaches
DOI:
https://doi.org/10.21015/vtse.v14i2.2343Abstract
This work proposes a framework for sarcasm detection in Roman Urdu using machine learning and ensemble approaches. For dataset preparation, we extracted data from X and manually annotated a corpus of 11,320 tweets into two classes, i.e., sarcastic and non-sarcastic. Various classifiers were implemented using lexical features related to sarcastic patterns. The proposed preprocessing framework and ensemble-based model use language-related normalization strategies such as spelling variation standardization, and slang expansion to address orthographic variation and code-mixing within the text for sarcasm detection. Experimental findings show that ensemble learning models, especially Random Forest and XGBoost obtained the most accurate results with an accuracy of 87% and an F1-score of 88%, thus providing a reliable baseline on sarcasm detection using South Asian Roman Urdu language. These findings emphasize the potential of ensemble approaches to process and handle the complexity of sarcasm in low-resource code-mixed native languages. Furthermore, in order to enhance transparency and trust in the model's decisions, we employed the Explainable AI (XAI) technique, particularly LIME (Local Interpretable Model-agnostic Explanations), to interpret the predictions of our best-performing model. These findings emphasize the potential of ensemble approaches to process and handle the complexity of sarcasm in low-resource code-mixed languages, and XAI offers essential insights into the process of the model’s decision-making.
References
A. D. Yacoub, S. Slim, and A. Aboutabl, “A survey of sentiment analysis and sarcasm detection: Challenges, techniques, and trends,” Int. J. Electr. Comput. Eng. Syst., vol. 15, no. 1, pp. 69–78, 2024. DOI: https://doi.org/10.32985/ijeces.15.1.7
A. O. Bajeh, A. Shittu, and C. Asiyanbola, “Automatic sarcasm detection in textual data: A literature review,” Sule Lamido Univ. J. Sci. Technol., vol. 11, no. 1–2, pp. 233–242, 2025.
A. Mansoori et al., “Detection of sarcasm in news headlines using NLP and machine learning,” in Generative AI in Creative Industries. Cham, Switzerland: Springer Nature, 2025, pp. 503–517. DOI: https://doi.org/10.1007/978-3-031-89175-5_31
M. E. Hassan et al., “Detection of sarcasm in Urdu tweets using deep learning and transformer-based hybrid approaches,” IEEE Access, vol. 12, pp. 61542–61555, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3393856
T. Javed, M. A. Nouman, and R. Zahid, “BERT model adoption for sarcasm detection on Twitter data,” VFAST Trans. Softw. Eng., vol. 12, no. 3, pp. 177–198, 2024. DOI: https://doi.org/10.21015/vtse.v12i3.1908
N. A. Helal et al., “A contextual-based approach for sarcasm detection,” Sci. Rep., vol. 14, no. 1, p. 15415, 2024. DOI: https://doi.org/10.1038/s41598-024-65217-8
A. Abuzayed and H. Al-Khalifa, “Sarcasm and sentiment detection in Arabic tweets using BERT-based models and data augmentation,” in Proc. 6th Arabic Nat. Lang. Process. Workshop, 2021, pp. 312–317.
T. Ptáek, I. Habernal, and J. Hong, “Sarcasm detection on Czech and English Twitter,” in Proc. 25th Int. Conf. Comput. Linguistics (COLING), Dublin, Ireland, 2014, pp. 213–223.
R. Misra and P. Arora, “Sarcasm detection using news headlines dataset,” AI Open, vol. 4, pp. 13–18, 2023. DOI: https://doi.org/10.1016/j.aiopen.2023.01.001
S. Khan et al., “An automated approach to identify sarcasm in low-resource language,” PLoS One, vol. 19, no. 12, p. e0307186, 2024. DOI: https://doi.org/10.1371/journal.pone.0307186
M. A. Manzoor et al., “Lexical variation and sentiment analysis of Roman Urdu sentences with deep neural networks,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, 2020. DOI: https://doi.org/10.14569/IJACSA.2020.0110290
S. Swami et al., “A corpus of English-Hindi code-mixed tweets for sarcasm detection,” arXiv preprint arXiv:1805.11869, 2018.
R. Jamil et al., “Detecting sarcasm in multi-domain datasets using convolutional neural networks and long short-term memory network model,” PeerJ Comput. Sci., vol. 7, p. e645, 2021. DOI: https://doi.org/10.7717/peerj-cs.645
A. Ashwitha et al., “Sarcasm detection in natural language processing,” Mater. Today Proc., vol. 37, pp. 3324–3331, 2021. DOI: https://doi.org/10.1016/j.matpr.2020.09.124
A. Baruah et al., “Context-aware sarcasm detection using BERT,” in Proc. 2nd Workshop Figurative Lang. Process., 2020, pp. 83–87. DOI: https://doi.org/10.18653/v1/2020.figlang-1.12
S. K. Lora et al., “A transformer-based generative adversarial learning to detect sarcasm from Bengali text with correct classification of confusing text,” Heliyon, vol. 9, no. 12, 2023. DOI: https://doi.org/10.1016/j.heliyon.2023.e22531
S. Sengupta et al., “Milestones in Bengali sentiment analysis leveraging transformer-models: Fundamentals, challenges and future directions,” arXiv preprint arXiv:2401.07847, 2024.
S. K. Lora et al., “Ben-Sarc: A self-annotated corpus for sarcasm detection from Bengali social media comments and its baseline evaluation,” Nat. Lang. Process., vol. 31, no. 2, pp. 674–699, 2025. DOI: https://doi.org/10.1017/nlp.2024.11
M. A. Hasan et al., “Zero- and few-shot prompting with LLMs: A comparative study with fine-tuned models for Bangla sentiment analysis,” in Proc. LREC-COLING, 2024, pp. 17808–17818.
S. Chanda, A. Mishra, and S. Pal, “Sarcasm detection in Tamil and Malayalam Dravidian code-mixed text,” in Proc. FIRE (Working Notes), 2023, pp. 336–343.
A. Kirkpatrick and I. Schaller-Schwaner, “English as a lingua franca,” in Handbook of Practical Second Language Teaching and Learning. Routledge, 2022, pp. 97–113. DOI: https://doi.org/10.4324/9781003106609-8
S. Dhall, S. Kumar, and S. Kumar, “A review on sentiment analysis in low-resource languages focusing on fake news and sarcasm detection as major challenges,” SN Comput. Sci., vol. 6, no. 6, p. 693, 2025. DOI: https://doi.org/10.1007/s42979-025-04240-6
A. D. Yacoub, A. E. Aboutabl, and S. Slim, “A survey of challenges, methods, and trends in sentiment analysis and sarcasm detection,” FCI-H Informatics Bull., vol. 6, no. 2, pp. 61–68, 2024.
D. Šandor and M. Bagić Babac, “Sarcasm detection in online comments using machine learning,” Inf. Discov. Deliv., vol. 52, no. 2, pp. 213–226, 2024. DOI: https://doi.org/10.1108/IDD-01-2023-0002
G. Naidu, T. Zuva, and E. M. Sibanda, “A review of evaluation metrics in machine learning algorithms,” in Proc. Comput. Sci. On-line Conf., 2023, pp. 15–25. DOI: https://doi.org/10.1007/978-3-031-35314-7_2
T. Aljrees, “Improving prediction of Arabic fake news using ELMO’s features-based tri-ensemble model and LIME XAI,” IEEE Access, vol. 12, pp. 63066–63076, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3392297
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-By) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution License CC BY