Hybrid Model for Real-Time Mobile Snatching Detection in Video Surveillance Using Time-Distributed CNN and Attention-Based LSTM

Faisal Khan; Irshad Ahmad; Muhammad  Zubair; Yasir Saleem  Afridi

doi:10.21015/vtse.v14i1.2279

Authors

Faisal Khan Department of Computer Science, Islamia College Peshawar, Khyber Pakhtunkhwa, 25000, Pakistan. https://orcid.org/0009-0009-1930-8375
Irshad Ahmad Department of Computer Science, Islamia College Peshawar, Khyber Pakhtunkhwa, 25000, Pakistan. https://orcid.org/0009-0004-0021-556X
Muhammad Zubair Department of Computer Science, Islamia College Peshawar, Khyber Pakhtunkhwa, 25000, Pakistan. https://orcid.org/0009-0002-8529-0141
Yasir Saleem Afridi Department of Computer Systems Engineering, University of Engineering and Technology, Peshawar, Pakistan https://orcid.org/0000-0003-0866-0815

DOI:

https://doi.org/10.21015/vtse.v14i1.2279

Abstract

We propose a Hybrid approach that consolidates Time Distributed CNNs with Attention-Embedded LSTM network model for identifying mobile theft activities from video surveillance. Gadget snatching incidents seems to be increasing a little too rapidly globally, and another step has been taken by the police in Pakistan as they now possess around 1,700 mobile phone data in the effort of halting this. We propose a model to tackle this challenge by combining temporal relation modeling ability of LSTMs and the spatial feature extraction power of CNNs.An attention mechanism that directs focus to salient cues in video sequences enhances its effectiveness. The system was trained and tested with a real-life dataset of snatching events that were reported on social media. The results of the test show that our method works because it is 96.45\% accurate.The research presented here highlights the potential of social media platforms as effective instruments for crime prevention and identification, thereby advancing the field of artificial intelligence-driven crime detection. We want to make the algorithm's source code and dataset public so that more people can use it and do more research in this area.

References

K. Walby, “Open-street camera surveillance and governance in Canada,” Can. J. Criminol. Crim. Justice, vol. 47, no. 4, pp. 655–684, 2005.

A. Isnard and T. C. Council, “Can surveillance cameras be successful in preventing crime and controlling anti-social behaviours,” in Proc. Character, Impact and Prevention of Crime in Regional Australia Conf., Aug. 2001, pp. 1–3.

N. F. M. Zamri et al., “Real-time snatch theft detection using deep learning networks,” Inst. Big Data Anal. Artif. Intell., vol. 2, no. 3, p. 4, 2023.

A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Comput. Intell. Neurosci., vol. 2018, Art. no. 7068349, 2018.

J. Zou, Y. Han, and S. S. So, “Overview of artificial neural networks,” in Artificial Neural Networks: Methods and Applications, 2008, pp. 14–22.

S. R. Ke et al., “A review on video-based human activity recognition,” Computers, vol. 2, no. 2, pp. 88–131, 2013.

Z. Li et al., “A survey of convolutional neural networks: Analysis, applications, and prospects,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 6999–7019, 2021.

Y. Liu et al., “WT-2DCNN: A convolutional neural network traffic flow prediction model based on wavelet reconstruction,” Physica A, vol. 603, Art. no. 127817, 2022.

Z. Sun et al., “Human action recognition from various data modalities: A review,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3200–3225, 2022.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Adv. Neural Inf. Process. Syst., vol. 25, 2012.

M. Humayun et al., “A transfer learning approach with a convolutional neural network for the classification of lung carcinoma,” Healthcare, vol. 10, no. 6, Art. no. 1058, 2022.

T. H. Nguyen et al., “A VGG-19 model with transfer learning and image segmentation for classification of tomato leaf disease,” AgriEngineering, vol. 4, no. 4, pp. 871–887, 2022.

M. A. Mamun et al., “Recognition of human skin diseases using Inception-V3 with transfer learning,” Int. J. Inf. Technol., vol. 14, no. 6, pp. 3145–3154, 2022.

X. Yin et al., “Recognition of grape leaf diseases using MobileNetV3 and deep transfer learning,” Int. J. Agric. Biol. Eng., vol. 15, no. 3, pp. 184–194, 2022.

D. Dresvyanskiy et al., “End-to-end modeling and transfer learning for audiovisual emotion recognition in-the-wild,” Multimodal Technol. Interact., vol. 6, no. 2, Art. no. 11, 2022.

M. M. Buzau et al., “Hybrid deep neural networks for detection of non-technical losses in electricity smart meters,” IEEE Trans. Power Syst., vol. 35, no. 2, pp. 1254–1263, 2019.

M. Wasim et al., “A novel deep learning based automated academic activities recognition in cyber-physical systems,” IEEE Access, vol. 9, pp. 63718–63728, 2021.

M. Jan, “Print media on coverage of political parties in Pakistan: Treatment of opinion pages of the ‘Dawn’ and ‘News’,” Gomal Univ. J. Res., vol. 29, no. 1, pp. 118–128, 2013.

S. Hess, International News and Foreign Correspondents, Brookings Institution Press, 1996.

K. Sultan et al., “Media ethics and responsibility: Analysis of GEO News and ARY News coverage on Hamid Mir’s issue,” J. Soc. Sci., pp. 225–249, 2016.

T. A. Van Dijk, News Analysis: Case Studies of International and National News in the Press. London, U.K.: Routledge, 2013.

R. B. Sohail, “Analyzing the portrayal of political leadership in leading Pakistani news channels: A critical analysis,” J. Journalism Politics Soc., vol. 2, no. 1, pp. 69–86, 2024.

S. M. Kang and R. P. Wildes, “Review of action recognition and detection methods,” arXiv preprint arXiv:1610.06906, 2016.

K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human action classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012.

W. Kay et al., “The Kinetics human action video dataset,” arXiv preprint arXiv:1705.06950, 2017.

S. Abu-El-Haija et al., “YouTube-8M: A large-scale video classification benchmark,” arXiv preprint arXiv:1609.08675, 2016.

P. X. Nguyen, Y. Wang, M. S. Ryoo, A. Shah, and L. Davis, “The open world of micro-videos,” arXiv preprint arXiv:1603.09439, 2016.

F. Caba Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, “ActivityNet: A large-scale video benchmark for human activity understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015.

N. Murray, L. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2012.

R. Goyal et al., “The ‘Something-Something’ video database for learning and evaluating visual common sense,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017.

J. Yoon, H. Park, J. Kim, and S. Lee, “D-Vlog: Multimodal vlog dataset for depression detection,” in Proc. AAAI Conf. Artif. Intell., 2022.

D. Damen et al., “EPIC-KITCHENS-100,” Int. J. Comput. Vis., vol. 130, pp. 33–55, 2022.

D. Lin, S. Fidler, and R. Urtasun, “Holistic scene understanding for 3D object detection with RGB-D cameras,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2013.

K. Grauman et al., “Ego4D: Around the world in 3,000 hours of egocentric video,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022.

W.-R. Ko et al., “AIR-Act2Act: Human–human interaction dataset for teaching non-verbal social behaviors to robots,” Int. J. Robot. Res., vol. 40, no. 4–5, pp. 691–697, 2021.

J. Laufs, H. Borrion, and B. Bradford, “Security and the smart city: A systematic review,” Sustain. Cities Soc., vol. 55, Art. no. 102023, 2020.

H. Yue, Y. Li, L. Wang, and X. Zhang, “Detecting people on the street and the streetscape physical environment from Baidu Street View images and their effects on community-level street crime in a Chinese city,” ISPRS Int. J. Geo-Inf., vol. 11, no. 3, Art. no. 151, 2022.

D. D. de Paula, D. H. Salvadeo, and D. M. de Araujo, “CamNuvem: A robbery dataset for video anomaly detection,” Sensors, vol. 22, no. 24, Art. no. 10016, 2022.

P. Kapoor and P. K. Singh, “Multidimensional crime dataset analysis,” in Intell. Syst. Des. Appl. (ISDA 2018), Vellore, India, Dec. 6–8, 2018, vol. 1. Cham, Switzerland: Springer, 2020.

B. M. Peixoto, M. A. Pimentel, M. A. T. Figueiredo, and J. L. Oliveira, “Harnessing high-level concepts, visual, and auditory features for violence detection in videos,” J. Vis. Commun. Image Represent., vol. 78, Art. no. 103174, 2021.

M. K. Singh and B. Kumar, “Fine tuning the pre-trained convolutional neural network models for hyperspectral image classification using transfer learning,” in Comput. Vis. Robot. (CVR 2022). Singapore: Springer, 2023, pp. 271–283.

S. M. Al-Selwi, M. H. Ali, A. A. H. Al-Saedi, and M. A. Khan, “RNN-LSTM: From applications to modeling techniques and beyond—Systematic review,” J. King Saud Univ. Comput. Inf. Sci., Art. no. 102068, 2024.

U. M. Butt, S. S. Iqbal, and M. Z. Khan, “Detecting video surveillance using VGG19 convolutional neural networks,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, 2020.

T. Wang, T. Liu, and Y. Lu, “A hybrid multi-step storm surge forecasting model using multiple feature selection, deep learning neural network and transfer learning,” Soft Comput., vol. 27, no. 2, pp. 935–952, 2023.

S. An, J. Lee, M. Kim, and H. Park, “Transfer learning for human activity recognition using representational analysis of neural networks,” ACM Trans. Comput. Healthcare, vol. 4, no. 1, pp. 1–21, 2023.

E. L. Andrade, S. Blunsden, and R. B. Fisher, “Hidden Markov models for optical flow analysis in crowds,” in Proc. 18th Int. Conf. Pattern Recognit. (ICPR), 2006.

N. Ihaddadene and C. Djeraba, “Real-time crowd motion analysis,” in Proc. 19th Int. Conf. Pattern Recognit. (ICPR), 2008.

K. Goya, T. Higuchi, K. Fujimura, and M. Nakajima, “A method for automatic detection of crimes for public security by using motion analysis,” in Proc. 5th Int. Conf. Intell. Inf. Hiding Multimedia Signal Process., 2009.

C. Gu et al., “AVA: A video dataset of spatio-temporally localized atomic visual actions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018.

Y. Lee, J. Kim, H. Kim, S. Park, and K. Sohn, “Diverse temporal aggregation and depthwise spatiotemporal factorization for efficient video classification,” IEEE Access, vol. 9, pp. 163054–163064, 2021.

I. M. Pires, R. Alves, M. A. Silva, and J. M. R. S. Tavares, “Homogeneous data normalization and deep learning: A case study in human activity classification,” Future Internet, vol. 12, no. 11, Art. no. 194, 2020.

K. Apostolidis and V. Mezaris, “A fast smart-cropping method and dataset for video retargeting,” in Proc. IEEE Int. Conf. Image Process. (ICIP), 2021.

S. Montaha, M. H. Asghar, A. Al-Obaidi, and M. T. Mahmood, “TimeDistributed-CNN-LSTM: A hybrid approach combining CNN and LSTM to classify brain tumor on 3D MRI scans performing ablation study,” IEEE Access, vol. 10, pp. 60039–60059, 2022.

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

H. Yan et al., “Language-centered human activity recognition,” arXiv preprint arXiv:2410.00003, 2024.

Y. Du, X. Li, H. Zhang, and J. Wang, “Full transformer network with masking future for word-level sign language recognition,” Neurocomputing, vol. 500, pp. 115–123, 2022.

C. Lazo-Quispe, J. Smith, M. Chen, and A. Kumar, “Impact of pose estimation models for landmark-based sign language recognition,” 2024.

N. Naz, M. R. Khan, S. Ahmed, and A. Iqbal, “MIPA-ResGCN: A multi-input part attention enhanced residual graph convolutional framework for sign language recognition,” Comput. Electr. Eng., vol. 112, Art. no. 109009, 2023.

N. Naz, M. R. Khan, S. Ahmed, and A. Iqbal, “SignGraph: An efficient and accurate pose-based graph convolution approach toward sign language recognition,” IEEE Access, vol. 11, pp. 19135–19147, 2023.

H. D. Shoorkand, M. Nourelfath, and A. Hajji, “A hybrid CNN-LSTM model for joint optimization of production and imperfect predictive maintenance planning,” Rel. Eng. Syst. Saf., vol. 241, Art. no. 109707, 2024.

Hybrid Model for Real-Time Mobile Snatching Detection in Video Surveillance Using Time-Distributed CNN and Attention-Based LSTM

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

ISSN

Scopus Metrics

SCImago

Scopus CiteScore

Make a Submission