The Role of CNN and RNN in the Classification of Audio Music Genres
DOI:
https://doi.org/10.21015/vtse.v10i2.793Abstract
This study aims at determining how various types of neural networks can be used to categorize music files. We used the GTZAN dataset that contains several genres of traditional music. Every genre has some common traditions that can be referred to as features. The task of classifying music genres based on features is challenging. Deep neural architectures such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) have been considered for music analysis. However, it has been observed that neural architectures are data-intensive and face the problem of overfitting. To address this issue, we present a framework containing CNN and RNN with Long Short Time Memory (LSTM) having multiple layers to categorize the music genres and handle the problem of overfitting. Our experiments also revealed the strengths and limitations of deep learning. Finally, we found CNN to be best among other state-of-the-art models and achieved the training and test accuracies of 86.53 % and 81.90 % respectively.
References
J. S. Luz, M. C. Oliveira, F. H. D. Araújo, and D. M. V. Magalhães, “Ensemble of handcrafted and deep features for urban sound classification,” Appl. Acoust., vol. 175, p. 107819, Apr. 2021, doi: 10.1016/j.apacoust.2020.107819.
B. Ait Hammou, A. Ait Lahcen, and S. Mouline, “Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics,” Inf. Process. Manag., vol. 57, no. 1, p. 102122, Jan. 2020, doi: 10.1016/j.ipm.2019.102122.
J. Deng et al., “Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration,” Neural Comput. Appl., vol. 32, no. 4, pp. 1095–1107, Feb. 2020, doi: 10.1007/s00521-019-04158-0.
M. Ashraf, G. Geng, X. Wang, F. Ahmad, and F. Abid, “A Globally Regularized Joint Neural Architecture for Music Classification,” IEEE Access, vol. 8, pp. 220980–220989, 2020, doi: 10.1109/ACCESS.2020.3043142.
F. Abid, A. Karami, M. Lundy, F. Webb, and Y. K. Dwivedi, “Twitter and Research: A Systematic Literature Review through Text Mining,” IEEE Access, vol. 10, pp. 67698–67717, 2021, doi: 10.1109/ACCESS.2020.2983656.
M. AlShamsi, S. A. Salloum, M. Alshurideh, and S. Abdallah, “Artificial Intelligence and Blockchain for Transparency in Governance,” in Studies in Computational Intelligence, vol. 912, Springer, 2021, pp. 219–230.
M. Bretan, G. Weinberg, and L. Heck, “A Unit Selection Methodology for Music Generation Using Deep Neural Networks,” Proc. 8th Int. Conf. Comput. Creat. ICCC 2017, Dec. 2016, Accessed: Oct. 21, 2021. [Online]. Available: http://arxiv.org/abs/1612.03789.
M. Ashraf, G. Guohua, X. Wang, and F. Ahmad, “Integration of Speech/ Music Discrimination and Mood Classification with Audio Feature Extraction,” in 2018 International Conference on Frontiers of Information Technology (FIT), Dec. 2018, pp. 224–229, doi: 10.1109/FIT.2018.00046.
B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, “Music transcription modelling and composition using deep learning,” Apr. 2016, Accessed: Oct. 21, 2021. [Online]. Available: http://arxiv.org/abs/1604.08723.
A. Sarroff, … M. C. the I. S. for M., and undefined 2014, “Musical audio synthesis using autoencoding neural nets,” research.gold.ac.uk, Accessed: Oct. 21, 2021. [Online]. Available: https://research.gold.ac.uk/17628/1/AndySarroffMichaelCaseyICMC2014.pdf.
J. Engel, C. Resnick, … A. R.-I., and undefined 2017, “Neural audio synthesis of musical notes with wavenet autoencoders,” proceedings.mlr.press, Accessed: Oct. 21, 2021. [Online]. Available: http://proceedings.mlr.press/v70/engel17a.html.
G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 293–302, Jul. 2002, doi: 10.1109/TSA.2002.800560. DOI: https://doi.org/10.1109/TSA.2002.800560
Y. R. Pandeya and J. Lee, “Deep learning-based late fusion of multimodal information for emotion classification of music video,” Multimed. Tools Appl., vol. 80, no. 2, pp. 2887–2905, Jan. 2021, doi: 10.1007/s11042-020-08836-3.
Y. Chen, L. Xu, K. Liu, D. Zeng, and J. Zhao, “Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks.” [Online]. Available: http://projects.ldc.upenn.edu/ace/.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-By) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution License CC BY