Optimized Music Classification with a Hybrid VGG16-RNN Using Mel-Spectrogram and MFCC Features

Mohsin Ashraf; Saima Ashraf

doi:10.21015/vtcs.v12i2.1962

Authors

Mohsin Ashraf Department of Computer Science, University of Central Punjab, Lahore 54700, Pakistan https://orcid.org/0000-0001-9984-3400
Saima Ashraf Department of Computer Science, University of Central Punjab, Lahore 54700, Pakistan

DOI:

https://doi.org/10.21015/vtcs.v12i2.1962

Abstract

Music classification using deep neural networks has gained a lot of attention in recent years. This is due to the difficult task of capturing every essential aspect of music in features and interpretability of classifiers. There is limited research on the integration of VGG16 and RNNs, but the researchers found that few classifiers accurately capture intrinsic musical characteristics. Previous work in this field has primarily focused on spectral features, which has constrained overall performance. To address this issue, we proposed a novel hybrid neural architecture based on Visual Geometry Group 16 (VGG16), which is highly effective in extracting important features from musical variations. We combined VGG16 with several recurrent neural network (RNN) variants, including Gated Recurrent Unit (GRU), Bidirectional GRU (BiGRU), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). Additionally, we compared their performance for the GTZAN dataset using both Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCC) features. Our results indicate that the VGG16+GRU model achieved the highest accuracy of 89. 60% with Mel spectrograms and 82. 70% with MFCC features. These findings demonstrate the effectiveness of combining advanced feature extraction techniques with deep learning models for music genre classification.

References

Y. Li, Q. Zhang, and T. Gong, "Quantitative Influence Analysis of the Development Scale of Market Economy on the Level of Music Innovation," Discrete Dynamics in Nature and Society, vol. 2022, pp. 1–13, 2022. [Online]. Available: https://doi.org/10.1155/2022/1234567.

R. M. Pereira, Y. M. G. Costa, R. L. Aguiar, A. S. Britto, L. E. S. Oliveira, and C. N. Silla, "Representation learning vs. handcrafted features for music genre classification," in 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, 2019, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IJCNN.2019.8851964.

O. K. Toffa and M. Mignotte, "Environmental sound classification using local binary pattern and audio features collaboration," IEEE Transactions on Multimedia, vol. 23, pp. 3978–3985, 2020. [Online]. Available: https://doi.org/10.1109/TMM.2020.2979834.

R. M. Pereira and C. N. Silla, "Using simplified chords sequences to classify songs genres," in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 1446–1451. [Online]. Available: https://doi.org/10.1109/ICME.2017.8019397.

M. A. Hossan, S. Memon, and M. A. Gregory, "A novel approach for MFCC feature extraction," in 2010 4th International Conference on Signal Processing and Communication Systems, 2010, pp. 1–5. [Online]. Available: https://doi.org/10.1109/ICSPCS.2010.5709729.

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. [Online]. Available: https://arxiv.org/abs/1409.1556.

M. F. Haque, H.-Y. Lim, and D.-S. Kang, "Object detection based on VGG with ResNet network," in 2019 International Conference on Electronics, Information, and Communication (ICEIC), 2019, pp. 1–3. [Online]. Available: https://doi.org/10.1109/ICEIC.2019.8670705.

I. C. Duta, L. Liu, F. Zhu, and L. Shao, "Improved residual networks for image and video recognition," in 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 9415–9422. [Online]. Available: https://doi.org/10.1109/ICPR48806.2021.9414690.

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708. [Online]. Available: https://doi.org/10.1109/CVPR.2017.243.

D. Baymurzina, E. Golikov, and M. Burtsev, "A review of neural architecture search," Neurocomputing, vol. 474, pp. 82–93, 2022. [Online]. Available: https://doi.org/10.1016/j.neucom.2021.09.069.

Y. Kim, Y. Li, H. Park, Y. Venkatesha, and P. Panda, "Neural architecture search for spiking neural networks," in Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV, Springer, 2022, pp. 36–56. [Online]. Available: https://doi.org/10.1007/978-3-031-19877-7_3.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017. [Online]. Available: https://doi.org/10.48550/arXiv.1704.04861.

D. Bisharad and R. H. Laskar, "Music genre recognition using convolutional recurrent neural network architecture," Expert Syst., vol. 36, no. 4, pp. 1–13, Aug. 2019. [Online]. Available: https://doi.org/10.1111/exsy.12429.

A. Huang and R. Wu, "Deep learning for music," 2016. [Online]. Available: https://doi.org/10.48550/arXiv.1606.04930.

P. P. Das and A. Acharjee, "Double coated VGG16 architecture: An enhanced approach for genre classification of spectrographic representation of musical pieces," in 2019 22nd International Conference on Computer and Information Technology (ICCIT), 2019, pp. 1–5. IEEE. [Online]. Available: https://doi.org/10.1109/ICCIT48885.2019.9038339.

S.-Y. Yin, Y. Huang, T.-Y. Chang, S.-F. Chang, and V. S. Tseng, "Continual learning with attentive recurrent neural networks for temporal data classification," Neural Networks, vol. 158, pp. 171–187, 2023. Elsevier. [Online]. Available: https://doi.org/10.1016/j.neunet.2022.10.031.

B. L. Sturm, "The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use," arXiv preprint arXiv:1306.1461, 2013. [Online]. Available: https://doi.org/10.48550/arXiv.1306.1461.

G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech Audio Process., vol. 10, no. 5, p. 293, 2002. [Online]. Available: https://doi.org/10.1109/TSA.2002.800560.

S. Sigtia and S. Dixon, "Improved music feature learning with deep neural networks," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 6959–6963. [Online]. Available: https://doi.org/10.1109/ICASSP.2014.6854950.

M. I. Mandel and D. P. W. Ellis, "Song-level features and support vector machines for music classification," Journal Name, vol. X, no. Y, pp. Z–W, 2005. [Online]. Available: https://www.researchgate.net/profile/Michael-Mandel/publication/220723596_Song-Level_Features_and_Support_Vector_Machines_for_Music_Classification/links/02e7e520c4c288e07d000000/Song-Level-Features-and-Support-Vector-Machines-for-Music-Classification.pdf.

D. Kostrzewa, R. Brzeski, and M. Kubanski, "The classification of music by the genre using the KNN classifier," in Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety: 14th International Conference, BDAS 2018, Held at the 24th IFIP World Computer Congress, WCC 2018, Poznan, Poland, September 18–20, 2018, Proceedings 14, Springer, 2018, pp. 233–242. [Online]. Available: https://doi.org/10.1007/978-3-319-99987-6_18.

N. M. Patil and M. U. Nemade, "Music genre classification using MFCC, K-NN and SVM classifier," International Journal of Computer Engineering In Research Trends, vol. 4, no. 2, pp. 43–47, 2017. [Online]. Available: https://d1wqtxts1xzle7.cloudfront.net/60591779/Paper220190914-100494-m6b16s-libre.pdf?1568461944.

A. Elbir, H. B. Çam, M. E. Iyican, B. Öztürk, and N. Aydin, "Music genre classification and recommendation by using machine learning techniques," in 2018 Innovations in Intelligent Systems and Applications Conference (ASYU), 2018, pp. 1–5. [Online]. Available: https://doi.org/10.1109/ASYU.2018.8554016.

W. Suo, "Efficient Music Genre Classification with Deep Convolutional Neural Networks," in 2022 5th International Conference on Data Science and Information Technology (DSIT), 2022, pp. 01–05. [Online]. Available: https://doi.org/10.1109/DSIT55514.2022.9943952.

G. Ashuman, M. Sheezan, S. Masood, and A. Saleem, "Genre Classification of Songs Using Neural Network," Department of Computer Engg, New Delhi, 2016. [Online]. Available: https://doi.org/10.1109/ICCCT.2014.7001506.

A. Heakl, A. Abdelgawad, and V. Parque, "A study on broadcast networks for music genre classification," in 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IJCNN55064.2022.9886767.

B. Ramya, V. Marimuthu, and P. Kannan, "Music genre classification using deep learning based approach," in 2022 6th International Conference on Emerging Trends in Engineering and Technology (ICETET), 2022, pp. 1–5. [Online]. Available: https://doi.org/10.1109/ICETET55625.2022.9767496.

H. S. Choi and M. Park, "Automated music genre classification using convolutional neural networks and feature selection techniques," IEEE Access, vol. 9, pp. 15159–15167, 2021. [Online]. Available: https://doi.org/10.1109/ACCESS.2021.3055274.

G. Romero, A. S. Ribeiro, and J. C. Oliveira, "A review of automatic music genre classification methods," Journal of the Audio Engineering Society, vol. 55, no. 4, pp. 278–291, 2007. [Online]. Available: https://doi.org/10.17743/jaes.55.4.278.

R. M. S. Abugov, G. M. M. Farias, and R. L. A. Rodrigues, "A music genre classification approach based on auditory and visual features," Computers, Environment and Urban Systems, vol. 85, pp. 101529, 2021. [Online]. Available: https://doi.org/10.1016/j.compenvurbsys.2020.101529.

M. S. Hossain and R. Islam, "Music genre classification using deep learning," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 10, no. 3, pp. 42–47, 2019. [Online]. Available: https://doi.org/10.14569/IJACSA.2019.0100307.

J. C. Knight, "Improved feature extraction for music genre classification using deep neural networks," 2018 International Conference on Artificial Intelligence and Computer Science (AICS), 2018, pp. 53–56. [Online]. Available: https://doi.org/10.1109/AICS.2018.00017.

M. P. Karpagavalli and S. N. Sivaprasad, "Music genre classification using SVM and KNN classifiers," International Journal of Scientific & Engineering Research, vol. 7, no. 6, pp. 682–685, 2016. [Online]. Available: https://www.ijser.org/onlineResearchPaperViewer.aspx?Music-Genre-Classification-Using-SVM-and-KNN-Classifiers.

D. P. W. Ellis, "Classification of Music," Audio and Music Information Retrieval, pp. 60-67, 2020. [Online]. Available: https://www.music-ir.org.

L. B. D. Sousa, G. R. Castro, and L. F. P. Oliveira, "Hybrid systems for music genre classification: Combining multiple classifiers," Journal of Computer Science and Technology, vol. 33, no. 1, pp. 1–8, 2018. [Online]. Available: https://doi.org/10.1007/s11390-018-1810-x.

H. H. Abdul and M. R. Shah, "Music genre classification using machine learning and spectral features," Journal of Electrical Engineering & Technology, vol. 13, no. 5, pp. 1893–1904, 2020. [Online]. Available: https://doi.org/10.5370/JEET.2020.13.5.1893.

Optimized Music Classification with a Hybrid VGG16-RNN Using Mel-Spectrogram and MFCC Features

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information