An NLP Approach to Predict and Suggest Next Word In Urdu Typing

Muhammad Hassan; Saad Ahmed; Rohail Qamar; Saman Hina; Hira Farman

doi:10.21015/vtse.v12i4.2011

Authors

Muhammad Hassan Department of Computer Science & Information Technology, NED University of Engineering & Technology, Karachi, Pakistan https://orcid.org/0000-0002-0628-533X
Saad Ahmed Department of Computer Science, IQRA University, Karachi, Pakistan https://orcid.org/0000-0001-6121-8124
Rohail Qamar Department of Computer Science & Information Technology, NED University of Engineering & Technology, Karachi, Pakistan https://orcid.org/0000-0001-8697-6706
Saman Hina Independent Researcher, London, England https://orcid.org/0000-0002-7649-4372
Hira Farman Department of Computer Science, IQRA University, Karachi, Pakistan https://orcid.org/0000-0002-9026-0935

DOI:

https://doi.org/10.21015/vtse.v12i4.2011

Abstract

The importance of fast speed typing is very important for computerization of contents in any language. Urdu which is a prominent language of south Asia also subjected to computerization and due to lack of resources available the process of computerizing the Urdu content has been hampered by the low speed in Urdu typing. Similarly high demand of Urdu content which needs to be digitized makes it more expensive. During this research we have worked on various aspects of Urdu language and discovered many limitations which exists which are creating hurdles in high-speed typing in Urdu language. As 35+ alphabets are in the Urdu language, the international ISO standard keyboards are only on English alphabets that are 25+ that make a quiet big difference of about 10 alphabets that means we have to press and hold SHIFT key while typing these 10+ alphabets that are wasting our time and slowing our speed of typing so we tried to solve this problem by keeping the standard along as they are. This paper is based on the word prediction and suggestion in Urdu Language (UL) based on a stochastic model, Hidden Markov Model is used to predict the next word, while Unigram Model was also used to suggest the current word and the next upcoming word, N-Gram Model was followed keeping N=2. Now, the biggest achievement in this Paper is POS tagging as each suggestion and prediction is also based upon Tagged words with a dataset of thousands of Tag combinations based upon frequency of occurrence is on test data. This tool is developed to implement this concept for Urdu Language (UL) and tested by regular and new URDU content writers to check their improvements in their typing speeds. We made some programs to let you type less and choose more.

References

The Editors of Encyclopaedia Britannica, “Urdu Language | History, Script, & Words,” Encyclopaedia Britannica. [Online]. Available: https://www.britannica.com/topic/Urdu-language.

M. A. Khan, M. A. Khan, and M. N. Ali, “Design of Urdu Virtual Keyboard,” presented at the Conference on Language & Technology, 2009. [Online]. Available: https://www.semanticscholar.org/paper/Design-of-Urdu-Virtual-Keyboard-Khan-Khan/d385649378ab0f4ec68535e836fb1226930ce340#paper-header.

S. Shahzadi, B. Fatima, K. Malik, and S. M. Sarwar, “Urdu Word Prediction System for Mobile Phones,” World Applied Sciences Journal, vol. 22, no. 1, pp. 113–120, 2013, doi: 10.5829/idosi.wasj.2013.22.01.142.

C. Aliprandi, N. Carmignani, P. Mancarella, and L. Pontecorvo, “An Inflected-Sensitive Letter and Word Prediction System,” Int. J. Computing & Information Sciences, vol. 5, no. 2, pp. 79–85, 2007. [Online]. Available: https://www.semanticscholar.org/paper/An-Inflected-Sensitive-Letter-and-Word-Prediction-Aliprandi-Carmignani/9b4b3ea87f5074620b92b983be3ec365d0b0409c

J. J. Li and A. Nenkova, “Fast and Accurate Prediction of Sentence Specificity,” presented at the AAAI Conf. on Artificial Intelligence, 2015. [Online]. Available: https://www.semanticscholar.org/paper/Fast-and-Accurate-Prediction-of-Sentence-Li-Nenkova/69f5a7032605a88e7bed7bf0c9c2218c5e3f2512.

Md. M. Haque, Md. T. Habib, and Md. M. Rahman, “Automated Word Prediction in Bangla Language Using Stochastic Language Models,” Int. J. VFAST Trans. Software Eng., vol. 12, no. 4, pp. 67–75, 2024.

A. Nandi and H. V. Jagdaish, “Effective Phrase Prediction,” in 33rd Int. Conf. on Very Large Databases, 2007, pp. 219–230.

BBC - Languages, “A Guide to Urdu - The Urdu Alphabet,” BBC.co.uk, 2014. [Online]. Available: https://www.bbc.co.uk/languages/other/urdu/guide/alphabet.shtml.

Urdu Dictionary Board, “Urdu Lughat - Published Volumes,” udb.gov.pk, 2009. [Online]. Available: http://udb.gov.pk/Matbooaat.php.

Center for Language Engineering, “Urdu Parts of Speech (POS) Tagset,” Center for Language Engineering, 2013. [Online]. Available: https://www.cle.org.pk/Downloads/langproc/UrduPOStagger/Urdu%20POS%20Tagset%200.3.pdf.

URDU Typing Test by 10fastfingers.com. [Online]. Available: https://10fastfingers.com/typing-test/urdu/top50.

M. Hassan et al., “Effective Word Prediction in Urdu Language Using Stochastic Model,” Sukkur IBA J. Comput. Math. Sci., vol. 2, no. 2, pp. 38–46, Sep. 2018, doi: https://doi.org/10.30537/sjcms.v2i2.304.

N. Mukhtar, M. Abid Khan, N. Chiragh, S. Nazir, and A. Ullah Jan, “An Intelligent Unsupervised Approach for Handling Context-Dependent Words in Urdu Sentiment Analysis,” Trans. Asian & Low-Resource Lang. Info. Process., vol. 21, no. 5, pp. 1–15, 2022.

Data sourced from Ethnologue, Ethnologue. [Online]. Available: https://www.ethnologue.com/insights/ethnologue200/.

M. V. Koroteev, “BERT: A Review of Applications in Natural Language Processing and Understanding,” arXiv preprint arXiv:2103.11943, 2021.

A. Vaswani et al., “Attention is All You Need,” in Advances in Neural Information Processing Systems, 2017.

A. Tehseen, T. Ehsan, H. B. Liaqat, A. Ali, and A. Al-Fuqaha, “Neural POS Tagging of Shahmukhi by Using Contextualized Word Representations,” J. King Saud Univ. – Comput. Inf. Sci., vol. 35, no. 1, pp. 335–356, Dec. 2022, doi: 10.1016/j.jksuci.2022.12.004.

S. Shaukat, M. Asad, and A. Akram, “Developing an Urdu Lemmatizer Using a Dictionary-Based Lookup Approach,” Appl. Sci., vol. 13, no. 8, p. 5103, Apr. 2023, doi: 10.3390/app13085103.

L. F. Naz, R. Qamar, R. Asif, M. Imran, and S. Ahmed, “Robot Vision Over CosGANs to Enhance Performance with Source-Free Domain Adaptation Using Advanced Loss Function,” Intell. Autom. Soft Comput., vol. 0, no. 0, pp. 1–10, Jan. 2024, doi: 10.32604/iasc.2024.055074.

An NLP Approach to Predict and Suggest Next Word In Urdu Typing

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

ISSN

Scopus Metrics

SCImago

Scopus CiteScore

Make a Submission