Ontology based Semantic Analysis framework in Sindhi Language

Authors

  • Anisha Ali Department of Information Technology, Quaid-e-Awam University of Engineering Science and Technology, Nawabshah, Pakistan https://orcid.org/0009-0002-1374-2591
  • Malik Ghaffar Department of Software Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan https://orcid.org/0009-0008-4863-5351
  • Saima Siraj Somroo Department of Information Technology, Quaid-e-Awam University of Engineering Science and Technology, Nawabshah, Pakistan
  • Anwar Ali Sanjrani Department of Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan
  • Tooba Ali Department of Information Technology, Quaid-e-Awam University of Engineering Science and Technology, Nawabshah, Pakistan https://orcid.org/0009-0002-4574-2975
  • Tabasum Jalbani Department of Information Technology, Quaid-e-Awam University of Engineering Science and Technology, Nawabshah, Pakistan https://orcid.org/0009-0008-5031-8670

DOI:

https://doi.org/10.21015/vtse.v13i1.2080

Abstract

 Sentiment analysis, identifying polarity information (Positive, Negative, or Neutral sentiment) from textual data, is a crucial aspect of natural language understanding. However, its implementation in low resource languages like Sindhi presents significant challenges due to linguistic diversity and a limited amount of labeled data. This work addresses these challenges by proposing an ontology-driven sentiment analysis framework that integrates domain-specific ontological knowledge with the power of the Distil-BERT model for efficient sentiment classification. We constructed a custom Sindhi sentiment dataset, comprising 123 sentences annotated with three sentiment classes: Positive, Negative, and Neutral. The Distil-BERT model was employed for tokenization and sequence classification, leveraging its efficiency and adaptability for resource-constrained settings. Using Pytorch and the Hugging Face Transformers library, we trained the model with supervised pre-training arguments using the Trainer API. Additionally, a domain-specific ontology was developed to capture complex linguistic relationships and enrich the model’s semantic understanding, enabling it to handle diverse sentiment- bearing expressions effectively. Experimental results highlight the efficacy of our approach. The ontology-driven model achieved an impressive accuracy of 93%, significantly outperforming the baseline model, which achieved 82%. This improvement underscores the importance of integrating ontological knowledge, particularly in addressing the nuances of low-resource languages like Sindhi. Performance evaluation metrics, including precision, recall, and F1 Score, further validate the superior performance of the ontology-driven framework. This study presents a robust solution for sentiment analysis in Sindhi, laying the groundwork for future research in Natural Language Processing (NLP) for low-resource languages. Expanding the ontology to include more sentiment contexts and exploring hybrid deep learning approaches for sentiment classification offer promising directions for future work.

References

M. B. Alvi, N. A. Mahoto, M. S. A. Reshan, M. Unar, M. Elmagzoub, and A. Shaikh, "Count me too: Sentiment analysis of Roman Sindhi script," SAGE Open, vol. 13, no. 3, p. 21582440231197452, 2023.

X. Liang, Y.-M. J. Khaw, S.-Y. Liew, T.-P. Tan, and D. Qin, "Towards low-resource languages machine translation: A language-specific fine-tuning with LoRA for specialized large language models," IEEE Access, 2025.

P. Pakray, A. Gelbukh, and S. Bandyopadhyay, "Natural language processing applications for low-resource languages," Natural Language Processing, vol. 31, no. 2, pp. 183–197, 2025.

P. Middha, H. Agarwal, V. Rajput, A. Thakur, S. Singh, and S. Saraswat, "Advancing low resource natural language processing: Techniques, applications, and future directions," in 2024 Second Int. Conf. on Advanced Computing & Communication Technologies (ICACCTech), pp. 337–341, IEEE, 2024.

F. Rehman and T. R. Soomro, "Urdu sentiment analysis," Applied Computer Systems, vol. 27, no. 1, pp. 30–42, 2022.

N. Boudad, R. Faizi, R. O. H. Thami, and R. Chiheb, "Sentiment analysis in Arabic: A review of the literature," Ain Shams Engineering Journal, vol. 9, no. 4, pp. 2479–2490, 2018. DOI: https://doi.org/10.1016/j.asej.2017.04.007

A. Ali, M. Khan, K. Khan, R. U. Khan, and A. Aloraini, "Sentiment analysis of low-resource language literature using data processing and deep learning," Computers, Materials & Continua, vol. 79, no. 1, 2024.

A. Raza, M. H. Soomro, I. Shahzad, and S. Batool, "Abstractive text summarization for Urdu language," Journal of Computing & Biomedical Informatics, vol. 7, no. 2, 2024.

M. B. Alvi, N. A. Mahoto, M. S. A. Reshan, M. Unar, M. Elmagzoub, and A. Shaikh, "Count me too: Sentiment analysis of Roman Sindhi script," SAGE Open, vol. 13, no. 3, p. 21582440231197452, 2023.

R. Munir and R. Ullah, "Ontology-based sentiment analysis of the Urdu language tweets," Asian Journal of Engineering, Sciences & Technology (AJEST), vol. 12, no. 2, p. 6, 2022.

S. K. Narayanasamy, K. Srinivasan, S. M. Qaisar, and C.-Y. Chang, "Ontology-enabled emotional sentiment analysis on COVID-19 pandemic-related Twitter streams," Frontiers in Public Health, vol. 9, p. 798905, 2021.

S. M. Khabour, Q. A. Al-Radaideh, and D. Mustafa, "A new ontology-based method for Arabic sentiment analysis," Big Data and Cognitive Computing, vol. 6, no. 2, p. 48, 2022.

E. M. Aboelela, W. Gad, and R. Ismail, "Ontology-based approach for feature level sentiment analysis," International Journal of Intelligent Computing and Information Sciences, vol. 21, no. 3, pp. 1–12, 2021.

M. Hawalah, "Semantic ontology-based approach to enhance Arabic text classification," Big Data and Cognitive Computing, vol. 3, no. 4, p. 53, 2019.

H. Sweidan, N. El-Bendary, and H. Al-Feel, "Sentence-level aspect-based sentiment analysis for classifying adverse drug reactions (ADRs) using hybrid ontology-XLNet transfer learning," IEEE Access, vol. 9, pp. 90828–90846, 2021.

R. Obiedat, D. Al-Darras, E. Alzaghoul, and O. Harfoushi, "Arabic aspect-based sentiment analysis: A systematic literature review," IEEE Access, vol. 9, pp. 152628–152645, 2021.

L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, and A. Gelbukh, "Urdu sentiment analysis with deep learning methods," IEEE Access, vol. 9, pp. 97803–97812, 2021.

M. B. Shelke and S. N. Deshmukh, "Recent advances in sentiment analysis of Indian languages," International Journal of Future Generation Communication and Networking, vol. 13, no. 4, pp. 1656–1675, 2020.

C. Storey and E. H. Park, "An ontology of emotion process to support sentiment analysis," Journal of the Association for Information Systems, vol. 23, no. 4, pp. 999–1036, 2022. DOI: https://doi.org/10.17705/1jais.00749

G. K. Rajput, A. Kumar, and S. Kundu, "A comparative study on sentiment analysis approaches and methods," in 2020 9th Int. Conf. on System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 2020.

A. J. Kulkarni, "Ontology-based natural language processing for sentimental knowledge analysis using deep learning architectures," ACM Trans. Asian Low-Resource Lang. Inf. Process., Jan. 2024.

D. U. Vidanagama, A. T. P. Silva, and A. S. Karunananda, "Ontology-based sentiment analysis for fake review detection," Expert Systems with Applications, vol. 206, p. 117869, 2022.

F. Ali et al., "Transportation sentiment analysis using word embedding and ontology-based topic modeling," Knowledge-Based Systems, vol. 174, pp. 27–42, 2019.

L. Zhuang, K. Schouten, and F. Frasincar, "SOBA: Semi-automated ontology builder for aspect-based sentiment analysis," Journal of Web Semantics, vol. 60, p. 100544, 2020.

S. de Kok and F. Frasincar, "Using word embeddings for ontology-driven aspect-based sentiment analysis," in Proc. 35th ACM Symposium on Applied Computing, Brno, Czech Republic, pp. 834–842, 2020.

Downloads

Published

2025-03-31

How to Cite

Ali, A., Ghaffar, M., Somroo, S. S., Ali Sanjrani, A., Ali, T., & Jalbani, T. (2025). Ontology based Semantic Analysis framework in Sindhi Language. VFAST Transactions on Software Engineering, 13(1), 193–206. https://doi.org/10.21015/vtse.v13i1.2080