Breast Cancer Classification Using Gene Expression Profile Data with a Deep Learning Framework
DOI:
https://doi.org/10.21015/vtse.v13i4.2255Abstract
Breast cancer is a harmful disease that is dangerous to human life. It causes a continuous increase in the overall death rate. It is a challenging task for machine learning models to classify gene expression profile data because of its complex nature. These machine-learning models took a lot of time and consumed larger data during their training, which resulted in inconsistent accuracy. The main purpose of this research is to build a deep neural network model that can accurately classify cancerous and non-cancerous patients by using gene expression profile datasets of breast cancer. This study presents a novel deep learning framework specifically designed for the classification of breast cancer using gene expression profile data. Unlike conventional machine learning approaches that require extensive preprocessing and manual feature extraction, the proposed model automatically learns discriminative features from high-dimensional genomic data. The proposed research implies a highly accurate model having precision (0.91), recall (1.00), and F1 score (0.95) with 95% accuracy. The macro average and weighted average both indicate that the model performs well on average also which is helpful for medical practitioners for early breast cancer prognosis.
References
Y. Zhang, Y. Ji, S. Liu, J. Li, J. Wu, Q. Jin, X. Liu, H. Duan, Z. Feng, Y. Liu et al., “Global burden of female breast cancer: new estimates in 2022, temporal trend and future projections up to 2050 based on the latest release from GLOBOCAN,” J. Natl. Cancer Center, vol. 5, no. 3, p. 287, 2025.
N. H. Khan, S.-F. Duan, D.-D. Wu, and X.-Y. Ji, “Better reporting and awareness campaigns needed for breast cancer in Pakistani women,” Cancer Manag. Res., pp. 2125–2129, 2021.
R. L. Siegel, A. N. Giaquinto, and A. Jemal, “Cancer statistics, 2024,” CA: A Cancer J. Clin., vol. 74, no. 1, 2024.
H. Liu, Z. Guo, and P. Wang, “Genetic expression in cancer research: challenges and complexity,” Gene Rep., vol. 37, p. 102042, 2024.
H. Motieghader, A. Najafi, B. Sadeghi, and A. Masoudi-Nejad, “A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata,” Informatics Med. Unlocked, vol. 9, pp. 246–254, 2017. DOI: https://doi.org/10.1016/j.imu.2017.10.004
D. Toro-Domínguez, J. A. Villatoro-García, J. Martorell-Marugán, Y. Román-Montoya, M. E. Alarcón-Riquelme, and P. Carmona-Saéz, “A survey of gene expression meta-analysis: methods and applications,” Brief. Bioinform., vol. 22, no. 2, pp. 1694–1705, 2021.
L.-L. Chen and V. N. Kim, “Small and long non-coding RNAs: past, present, and future,” Cell, vol. 187, no. 23, pp. 6451–6485, 2024.
A. Thalor, H. K. Joon, G. Singh, S. Roy, and D. Gupta, “Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer,” Comput. Struct. Biotechnol. J., vol. 20, pp. 1618–1631, 2022, doi: 10.1016/j.csbj.2022.03.002.
“The Cancer Genome Atlas Program (TCGA),” National Cancer Institute, accessed Oct. 2025. [Online]. Available: https://www.cancer.gov/ccg/research/genome-sequencing/tcga
E. Barbierato and A. Gatti, “The challenges of machine learning: A critical review,” Electronics, vol. 13, no. 2, p. 416, 2024.
R. A. Dar, M. Rasool, A. Assad et al., “Breast cancer detection using deep learning: datasets, methods, and challenges ahead,” Comput. Biol. Med., vol. 149, p. 106073, 2022.
C. Gunavathi and K. Premalatha, “Cuckoo search optimisation for feature selection in cancer classification: a new approach,” Int. J. Data Min. Bioinform., vol. 13, no. 3, pp. 248–265, 2015. DOI: https://doi.org/10.1504/IJDMB.2015.072092
V. Sheth, U. Tripathi, and A. Sharma, “A comparative analysis of machine learning algorithms for classification purpose,” Procedia Comput. Sci., vol. 215, pp. 422–431, 2022.
P. Birzhandi, K. T. Kim, and H. Y. Youn, “Reduction of training data for support vector machine: a survey,” Soft Comput., vol. 26, no. 8, pp. 3729–3742, 2022.
A. Wahid, D. M. Khan, N. Iqbal, S. A. Khan, A. Ali, M. Khan, and Z. Khan, “Feature selection and classification for gene expression data using novel correlation-based overlapping score method via Chou’s 5-steps rule,” Chemom. Intell. Lab. Syst., vol. 199, p. 103958, 2020.
H. Dong and S. N. Markovic, The Basics of Cancer Immunotherapy, 2nd ed. Cham, Switzerland: Springer, 2024, pp. 1–221, doi: 10.1007/978-3-031-59475-5/COVER.
M. Dashtban and M. Balafar, “Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts,” Genomics, vol. 109, no. 2, pp. 91–107, 2017. DOI: https://doi.org/10.1016/j.ygeno.2017.01.004
S. Aboelenin, F. A. Elbasheer, M. M. Eltoukhy, W. M. El-Hady, and K. M. Hosny, “A hybrid framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer,” Complex Intell. Syst., vol. 11, no. 2, p. 142, 2025.
N. Ayub, N. S. et al., “Forecasting multi-level deep learning autoencoder architecture (MDLAA) for parametric prediction based on convolutional neural networks,” Eng. Technol. Appl. Sci. Res., vol. 15, no. 2, pp. 21279–21283, 2025.
S. E. Amoury and Y. Smili, “Design of an optimal convolutional neural network architecture for MRI brain tumor classification by exploiting particle swarm optimization,” J. Imaging, vol. 11, no. 2, p. 31, 2025, doi: 10.3390/jimaging11020031.
H. M. Albarakati, S. ur Rehman, M. A. Khan, A. Hamza, J. Aftab, A. Alasiry et al., “A unified super-resolution framework of remote-sensing satellite image classification based on information fusion of novel deep convolutional neural network architectures,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 14421–14436, 2024, doi: 10.1109/JSTARS.2025.3542194.
C. Hatzis, L. Pusztai, V. Valero, D. J. Booser, L. Esserman, A. Lluch et al., “A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer,” JAMA, vol. 305, no. 18, pp. 1873–1881, 2011. DOI: https://doi.org/10.1001/jama.2011.593
V. H. Kumar, “Python libraries, development frameworks and algorithms for machine learning applications,” Int. J. Eng. Res. Technol. (IJERT), vol. 7, no. 4, 2018.
K. Kalyankranthim, “Heat map visualization in Python – Seaborn library,” Medium, accessed Feb. 16, 2025. [Online]. Available: https://medium.com/@kalyankranthim/heat-map-visualization-in-python-seaborn-library-fe97024f16fb
A. Hajieskandar, “Molecular cancer classification method on microarrays gene expression data using hybrid deep neural network and grey wolf algorithm,” J. Ambient Intell. Humanized Comput., vol. 14, no. 5, pp. 5297–5307, 2023.
S. S. Alkamli and H. M. Alshamlan, “Performance evaluation of hybrid bio-inspired and deep learning algorithms in gene selection and cancer classification,” IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3556816.
H. M. Albarakati, S. ur Rehman, M. A. Khan, A. Hamza, J. Aftab, A. Alasiry et al., “A unified super-resolution framework of remote-sensing satellite image classification based on information fusion of novel deep convolutional neural network architectures,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 14421–14436, 2024, doi: 10.1109/JSTARS.2025.3542194.
M. Lamba, “A hybrid gene selection model for molecular breast cancer classification using a deep neural network,” Int. J. Appl. Pattern Recognit., vol. 6, no. 3, pp. 195–216, 2021.
A. Klawonn, C. D. et al., “A domain decomposition–based CNN-DNN architecture for model parallel training applied to image recognition problems,” SIAM J. Sci. Comput., vol. 46, no. 5, pp. C557–C582, 2024.
J. Gutiérrez-Zaballa, K. Basterretxea, and J. Echanobe, “Balancing robustness and efficiency in embedded DNNs through activation function selection,” arXiv preprint arXiv:2504.05119, 2025.
M. Abdelhamid and A. Desai, “Balancing the scales: A comprehensive study on tackling class imbalance in binary classification,” arXiv preprint arXiv:2409.19751, 2024.
C. C. Young, K. E. Little et al., “Development and validation of a reliable DNA copy-number-based machine learning algorithm (CopyClust) for breast cancer integrative cluster classification,” Sci. Rep., vol. 14, p. 11861, 2024, doi: 10.1038/s41598-024-62724-6.
Q. Ma, L. Chen, K. Feng, W. Guo, T. Huang, and Y.-D. Cai, “Exploring prognostic gene factors in breast cancer via machine learning,” Cancers, vol. 16, no. 10, p. 1818, 2024, doi: 10.3390/cancers16101818.
G. Sultan and S. Zubair, “An ensemble of bioinformatics and machine learning approaches to identify shared breast cancer biomarkers among diverse populations,” Sci. Rep., vol. 13, no. 1, p. 10432, 2023, doi: 10.1038/s41598-023-37487-8.
G. Kallah-Dagadu et al., “Breast cancer prediction based on gene expression data using interpretable machine learning techniques,” Sci. Rep., vol. 15, no. 1, p. 7594, 2025.
M. M. Omran et al., “Comparative analysis of statistical and deep learning-based multi-omics integration for breast cancer subtype classification,” J. Transl. Med., vol. 23, no. 1, p. 709, 2025.
S. Gupta et al., “Deep learning techniques for cancer classification using microarray gene expression data,” Front. Physiol., vol. 13, p. 952709, 2022.
S. Majumder et al., “[Retracted] Performance analysis of deep learning models for binary classification of cancer gene expression data,” J. Healthc. Eng., vol. 2022, p. 1122536, 2022.
D. Jia et al., “Breast cancer case identification based on deep learning and bioinformatics analysis,” Front. Genet., vol. 12, p. 628136, 2021.
G. Sultan and S. Zubair, “An ensemble of bioinformatics and machine learning approaches to identify shared breast cancer biomarkers among diverse populations,” Sci. Rep., vol. 13, no. 1, p. 10432, 2023, doi: 10.1038/s41598-023-37487-8.
“Breast cancer gene expression dataset,” Kaggle, accessed 2025. [Online]. Available: https://www.kaggle.com/datasets/orvile/gene-expression-profiles-of-breast-cancer
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-By) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution License CC BY