Enhancing Interpretability in Anxiety Detection on Reddit: A Machine Learning Approach with LIME and Topic Modeling

Authors

DOI:

https://doi.org/10.21015/vtse.v13i2.2139

Abstract

In modern society, mental disorders, particularly anxiety, are becoming more and more prevalent concerns. Individuals express their opinions and feelings on social media platforms like Reddit which offers valuable information for understanding mental health. This study applies BERTopic and Local Interpretable Model-agnostic Explanations (LIME) to demonstrate the interpretation of machine learning models in anxiety detection. To analyze and identify the linguistic patterns, a novel dataset has been collected from Reddit communities utilizing multiple subreddits pertaining to anxiety and casual conversations. For topic modeling BERTopic was used to discover key topics in discussions. In addition, TF-IDF features were used to train a Random Forest Classifier, which obtained an accuracy of 88% in classifying the post between anxiety and non-anxiety. Furthermore, to ensure transparency in model decision making process, LIME was used to examine textual features that influence models. This study emphasizes the importance of explainability with regards to AI-assisted mental health solutions while also demonstrating the usefulness of social media data in analyzing how anxiety is articulated, and language is employed differently.

References

A. Monreale, B. Iavarone, E. Rossetto, and A. Beretta, "Detecting addiction, anxiety, and depression by users psychometric profiles," in *Companion Proc. Web Conf.*, pp. 1189–1197, 2022.

T. Zhang, A. M. Schoene, S. Ji, and S. Ananiadou, "Natural language processing applied to mental illness detection: a narrative review," *NPJ Digit. Med.*, vol. 5, no. 1, p. 46, 2022.

H. E. Skallevold, N. Rokaya, N. Wongsirichat, and D. Rokaya, "Importance of oral health in mental health disorders: An updated review," *J. Oral Biol. Craniofac. Res.*, vol. 13, no. 5, pp. 544–552, 2023.

J. H. Shen and F. Rudzicz, "Detecting anxiety through reddit," in *Proc. 4th Workshop Comput. Linguistics Clin. Psychol.—From Linguistic Signal to Clin. Reality*, pp. 58–65, 2017.

O. Remes, C. Brayne, R. Van Der Linde, and L. Lafortune, "A systematic review of reviews on the prevalence of anxiety disorders in adult populations," *Brain Behav.*, vol. 6, no. 7, p. e00497, 2016.

J. M. De Lijster et al., "The age of onset of anxiety disorders: a meta-analysis," *Can. J. Psychiatry*, vol. 62, no. 4, p. 237, 2016.

N. S. Kamarudin, G. Beigi, and H. Liu, "A study on mental health discussion through reddit," in *Proc. ICSECS-ICOCSIM*, pp. 637–643, IEEE, 2021.

R. A. Calvo, D. N. Milne, M. S. Hussain, and H. Christensen, "Natural language processing in mental health applications using non-clinical texts," *Nat. Lang. Eng.*, vol. 23, no. 5, pp. 649–685, 2017.

B. S. Satpute, W. P. Rahane, and R. Bharati, "Examining social media posts for identification of anxiety and depression utilizing machine learning techniques," in *Proc. 3rd Int. Conf. Technol. Adv. Comput. Sci. (ICTACS)*, pp. 295–300, IEEE, 2023.

S. Inamdar, R. Chapekar, S. Gite, and B. Pradhan, "Machine learning driven mental stress detection on reddit posts using natural language processing," *Hum.-Centric Intell. Syst.*, vol. 3, no. 2, pp. 80–91, 2023.

J. L. Imbwaga, N. B. Chittaragi, and S. G. Koolagudi, "Explainable hate speech detection using LIME," *Int. J. Speech Technol.*, vol. 27, no. 3, pp. 793–815, 2024.

K. Rosamma and K. Rosamma Jr, "Analyzing online conversations on reddit: A study of stress and anxiety through topic modeling and sentiment analysis," *Cureus*, vol. 16, no. 9, 2024.

K. Sampath and T. Durairaj, "Data set creation and empirical analysis for detecting signs of depression from social media postings," in *Int. Conf. Comput. Intell. Data Sci.*, pp. 136–151, Springer, 2022.

I. Vayansky and S. A. Kumar, "A review of topic modeling methods," *Inf. Syst.*, vol. 94, p. 101582, 2020.

M. Grootendorst, "BERTopic: Neural topic modeling with a class-based tf-idf procedure," *arXiv preprint*, arXiv:2203.05794, 2022.

S. Xu, "Bayesian naïve Bayes classifiers to text classification," *J. Inf. Sci.*, vol. 44, no. 1, pp. 48–59, 2018.

H. Nakahara, A. Jinguji, S. Sato, and T. Sasao, "A random forest using a multi-valued decision diagram on an FPGA," in *Proc. 47th Int. Symp. Mult.-Valued Logic (ISMVL)*, pp. 266–271, IEEE, 2017.

Y. Jia, J. Bailey, K. Ramamohanarao, C. Leckie, and X. Ma, "Exploiting patterns to explain individual predictions," *Knowl. Inf. Syst.*, vol. 62, pp. 927–950, 2020.

M. T. Ribeiro, S. Singh, and C. Guestrin, "‘Why should I trust you?’ Explaining the predictions of any classifier," in *Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining*, pp. 1135–1144, 2016.

A. Adak, B. Pradhan, N. Shukla, and A. Alamri, "Unboxing deep learning model of food delivery service reviews using explainable artificial intelligence (XAI) technique," *Foods*, vol. 11, no. 14, p. 2019, 2022.

S. Sathyanarayanan and B. R. Tantri, "Confusion matrix-based performance evaluation metrics," *Afr. J. Biomed. Res.*, pp. 4023–4031, 2024.

Downloads

Published

2025-06-30

How to Cite

Memon, G. F., Syed, M. S. S., Saba, E., & Laghari, S. A. (2025). Enhancing Interpretability in Anxiety Detection on Reddit: A Machine Learning Approach with LIME and Topic Modeling. VFAST Transactions on Software Engineering, 13(2), 245–254. https://doi.org/10.21015/vtse.v13i2.2139

Issue

Section

Articles