The Role of CNN and RNN in the Classification of Audio Music Genres

Mohsin Ashraf; Fazeel Abid; Muhammad Atif; Satwat Bashir

doi:10.21015/vtse.v10i2.793

Authors

Mohsin Ashraf University of Central Punjab, Lahore
Fazeel Abid Department of Information Systems, University of Management and Technology, Lahore, Pakistan
Muhammad Atif Department of CS & IT, University of Lahore, Lahore, Pakistan
Satwat Bashir Department of Information Systems, University of Management and Technology, Lahore, Pakistan

DOI:

https://doi.org/10.21015/vtse.v10i2.793

Abstract

This study aims at determining how various types of neural networks can be used to categorize music files. We used the GTZAN dataset that contains several genres of traditional music. Every genre has some common traditions that can be referred to as features. The task of classifying music genres based on features is challenging. Deep neural architectures such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) have been considered for music analysis. However, it has been observed that neural architectures are data-intensive and face the problem of overfitting. To address this issue, we present a framework containing CNN and RNN with Long Short Time Memory (LSTM) having multiple layers to categorize the music genres and handle the problem of overfitting. Our experiments also revealed the strengths and limitations of deep learning. Finally, we found CNN to be best among other state-of-the-art models and achieved the training and test accuracies of 86.53 % and 81.90 % respectively.

References

J. S. Luz, M. C. Oliveira, F. H. D. Araújo, and D. M. V. Magalhães, “Ensemble of handcrafted and deep features for urban sound classification,” Appl. Acoust., vol. 175, p. 107819, Apr. 2021, doi: 10.1016/j.apacoust.2020.107819.

B. Ait Hammou, A. Ait Lahcen, and S. Mouline, “Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics,” Inf. Process. Manag., vol. 57, no. 1, p. 102122, Jan. 2020, doi: 10.1016/j.ipm.2019.102122.

J. Deng et al., “Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration,” Neural Comput. Appl., vol. 32, no. 4, pp. 1095–1107, Feb. 2020, doi: 10.1007/s00521-019-04158-0.

M. Ashraf, G. Geng, X. Wang, F. Ahmad, and F. Abid, “A Globally Regularized Joint Neural Architecture for Music Classification,” IEEE Access, vol. 8, pp. 220980–220989, 2020, doi: 10.1109/ACCESS.2020.3043142.

F. Abid, A. Karami, M. Lundy, F. Webb, and Y. K. Dwivedi, “Twitter and Research: A Systematic Literature Review through Text Mining,” IEEE Access, vol. 10, pp. 67698–67717, 2021, doi: 10.1109/ACCESS.2020.2983656.

M. AlShamsi, S. A. Salloum, M. Alshurideh, and S. Abdallah, “Artificial Intelligence and Blockchain for Transparency in Governance,” in Studies in Computational Intelligence, vol. 912, Springer, 2021, pp. 219–230.

M. Bretan, G. Weinberg, and L. Heck, “A Unit Selection Methodology for Music Generation Using Deep Neural Networks,” Proc. 8th Int. Conf. Comput. Creat. ICCC 2017, Dec. 2016, Accessed: Oct. 21, 2021. [Online]. Available: http://arxiv.org/abs/1612.03789.

M. Ashraf, G. Guohua, X. Wang, and F. Ahmad, “Integration of Speech/ Music Discrimination and Mood Classification with Audio Feature Extraction,” in 2018 International Conference on Frontiers of Information Technology (FIT), Dec. 2018, pp. 224–229, doi: 10.1109/FIT.2018.00046.

B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, “Music transcription modelling and composition using deep learning,” Apr. 2016, Accessed: Oct. 21, 2021. [Online]. Available: http://arxiv.org/abs/1604.08723.

A. Sarroff, … M. C. the I. S. for M., and undefined 2014, “Musical audio synthesis using autoencoding neural nets,” research.gold.ac.uk, Accessed: Oct. 21, 2021. [Online]. Available: https://research.gold.ac.uk/17628/1/AndySarroffMichaelCaseyICMC2014.pdf.

J. Engel, C. Resnick, … A. R.-I., and undefined 2017, “Neural audio synthesis of musical notes with wavenet autoencoders,” proceedings.mlr.press, Accessed: Oct. 21, 2021. [Online]. Available: http://proceedings.mlr.press/v70/engel17a.html.

G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 293–302, Jul. 2002, doi: 10.1109/TSA.2002.800560. DOI: https://doi.org/10.1109/TSA.2002.800560

Y. R. Pandeya and J. Lee, “Deep learning-based late fusion of multimodal information for emotion classification of music video,” Multimed. Tools Appl., vol. 80, no. 2, pp. 2887–2905, Jan. 2021, doi: 10.1007/s11042-020-08836-3.

Y. Chen, L. Xu, K. Liu, D. Zeng, and J. Zhao, “Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks.” [Online]. Available: http://projects.ldc.upenn.edu/ace/.

The Role of CNN and RNN in the Classification of Audio Music Genres

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

ISSN

Scopus Metrics

SCImago

Scopus CiteScore

Make a Submission