Identifying Key Genes of Liver Cancer by Using Random Forest Classification

Adeel Ashraf, Muhammad Sohaib Roomi, Muhammad Sohaib Akram


Liver cancer is considered as one of the most deadly cancer. To devise a treatment which is helpful to eradicate, it is inevitable to identify potential biomarkers which are very important in the development of liver cancer. To identify the pathways and key genes we use different enrichment analysis techniques such as pathway analysis and functional analysis. To identify biomarkers we constructed a network which is named as protein protein interaction network to analyse by selecting different network nodes. Our results show that we identified those biomarkers like ESR1 and TOP2 successfully which are potential biomarkers for liver cancer. In addition to that our method can be applied to other different datasets which are for different diseases to choose key genes.

Full Text:



A. Fujimoto et al., “Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer,” Nat. Genet., vol. 48, no. 5, p. 500, 2016.

J. Zhan, Y. Cai, S. He, L. Wang, and Z. Yang, “Tandem Molecular Self‐Assembly in Liver Cancer Cells,” Angew. Chemie Int. Ed., vol. 57, no. 7, pp. 1813–1816, 2018.

S. M. Inavolu et al., “IODNE: An integrated optimization method for identifying the deregulated subnetwork for precision medicine in cancer,” CPT pharmacometrics Syst. Pharmacol., vol. 6, no. 3, pp. 168–176, 2017.

S.-P. Deng and W.-L. Guo, “Identifying key genes of liver cancer by networking of multiple data sets,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 16, no. 3, pp. 792–800, 2018

J. Kuipers et al., “Mutational interactions define novel cancer subgroups,” Nat. Commun., vol. 9, no. 1, pp. 1–10, 2018.

A. Keliris, H. Salehghaffari, B. Cairl, P. Krishnamurthy, M. Maniatakos, and F. Khorrami, “Machine learning-based defense against process-aware attacks on industrial control systems,” in 2016 IEEE International Test Conference (ITC), 2016, pp. 1–10.

P. Maji and E. Shah, “Significance and functional similarity for identification of disease genes,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 14, no. 6, pp. 1419–1433, 2016.

H. Güney and H. Öztoprak, “Microarray-based cancer diagnosis: repeated cross-validation-based ensemble feature selection,” Electron. Lett., vol. 54, no. 5, pp. 272–274, 2018.

J. Pati, “Gene Expression Analysis for Early Lung Cancer Prediction Using Machine Learning Techniques: An Eco-Genomics Approach,” IEEE Access, vol. 7, pp. 4232–4238, 2018.

J. Li, W. Dong, and D. Meng, “Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 15, no. 6, pp. 2028–2038, 2017.

L. T. T. Scaria and T. Christopher, “A Bio-inspired Algorithm based Multi-class Classification Scheme for Microarray Gene Data,” J. Med. Syst., vol. 43, no. 7, p. 208, 2019.

M. Timilsina, H. Yang, R. Sahay, and D. Rebholz-Schuhmann, “Predicting links between tumor samples and genes using 2-Layered graph based diffusion approach,” BMC Bioinformatics, vol. 20, no. 1, p. 462, 2019.

Z. Xu, Y. Zhou, Y. Cao, T. L. A. Dinh, J. Wan, and M. Zhao, “Identification of candidate biomarkers and analysis of prognostic values in ovarian cancer by integrated bioinformatics analysis,” Med. Oncol., vol. 33, no. 11, p. 130, 2016.

W. Du, K. Dickinson, C. A. Johnson, and L. N. Saligan, “Identifying Genes to Predict Cancer Radiotherapy-Related Fatigue with Machine-Learning Methods,” in Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2018, p. 527.

S.-P. Deng, L. Zhu, and D.-S. Huang, “Predicting hub genes associated with cervical cancer through gene co-expression networks,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 13, no. 1, pp. 27–35, 2015.

L. Zhang, H. Liu, Y. Huang, X. Wang, Y. Chen, and J. Meng, “Cancer progression prediction using gene interaction regularized elastic net,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 14, no. 1, pp. 145–154, 2017.

B. Liang, C. Li, and J. Zhao, “Identification of key pathways and genes in colorectal cancer using bioinformatics analysis,” Med. Oncol., vol. 33, no. 10, p. 111, 2016.

H. Q. Pham, L. Rueda, and A. Ngom, “Predicting Breast Cancer Outcome under Different Treatments by Feature Selection Approaches,” in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2017, p. 617

A. Amala and I. A. Emerson, “Identification of target genes in cancer diseases using protein–protein interaction networks,” Netw. Model. Anal. Heal. Informatics Bioinforma., vol. 8, no. 1, p. 2, 2019.

H. Liu, Y. Zhao, L. Zhang, and X. Chen, “Anti-cancer drug response prediction using neighbor-based collaborative filtering with global effect removal,” Mol. Ther. Acids, vol. 13, pp. 303–311, 2018.

Saeed, S.; Mahmood, M. K.; Khan, Y. D., An exposition of facial expression recognition techniques. Neural Computing and Applications 2018, 29 (9), 425-443.

Butt, A. H.; Khan, Y. D., CanLect-Pred: A cancer therapeutics tool for prediction of target cancerlectins using experiential annotated proteomic sequences. IEEE Access 2019, 8, 9520-9531.

Amanat, S.; Ashraf, A.; Hussain, W.; Rasool, N.; Khan, Y. D., Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Current Bioinformatics 2020, 15 (5), 396-407.

Ilyas, S., Hussain, W., Ashraf, A., Khan, Y. D., Khan, S. A., & Chou, K. C. (2019). iMethylK-PseAAC: Improving accuracy of lysine methylation sites identification by incorporating statistical moments and position relative features into general PseAAC via Chou’s 5-steps rule. Current Genomics, 20(4), 275-292.

Hussain, W.; Rasool, N.; Khan, Y. D., A Sequence-Based Predictor of Zika Virus Proteins Developed by Integration of PseAAC and Statistical Moments. Combinatorial chemistry & high throughput screening 2020, 23 (8), 797-804.

Khan, Y. D.; Alzahrani, E.; Alghamdi, W.; Ullah, M. Z., Sequence-based Identification of Allergen Proteins Developed by Integration of PseAAC and Statistical Moments via 5-Step Rule. Current Bioinformatics 2020, 15 (9), 1046-1055.

Mahmood, M. K.; Ehsan, A.; Khan, Y. D.; Chou, K.-C., iHyd-LysSite (EPSV): Identifying Hydroxylysine Sites in Protein Using Statistical Formulation by Extracting Enhanced Position and Sequence Variant Feature Technique. Current Genomics 2020, 21 (7), 536-545.

Naseer, S.; Hussain, W.; Khan, Y. D.; Rasool, N., IPhosS (Deep)-PseAAC: Identify phosphoserine sites in proteins using deep learning on general pseudo amino acid compositions via modified 5-Steps rule. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2020.

Naseer, S.; Hussain, W.; Khan, Y. D.; Rasool, N., Sequence-based identification of arginine amidation sites in proteins using deep representations of proteins and PseAAC. Current Bioinformatics 2020, 15 (8), 937-948.

Shah, A. A.; Khan, Y. D., Identification of 4-carboxyglutamate residue sites based on position based statistical feature and multiple classification. Scientific Reports 2020, 10 (1), 1-10.

Awais, M.; Hussain, W.; Rasool, N.; Khan, Y. D., iTSP-PseAAC: Identifying Tumor Suppressor Proteins by Using Fully Connected Neural Network and PseAAC. Current Bioinformatics 2021, 16 (5), 700-709.

Hussain, W.; Rasool, N.; Khan, Y. D., Insights into Machine Learning-based approaches for Virtual Screening in Drug Discovery: Existing strategies and streamlining through FP-CADD. Current Drug Discovery Technologies 2021, 18 (4), 463-472.

Khan, Y. D.; Khan, N. S.; Naseer, S.; Butt, A. H., iSUMOK-PseAAC: prediction of lysine sumoylation sites using statistical moments and Chou’s PseAAC. PeerJ 2021, 9, e11581.

Malebary, S. J.; Khan, R.; Khan, Y. D., ProtoPred: Advancing Oncological Research Through Identification of Proto-Oncogene Proteins. IEEE Access 2021, 9, 68788-68797.

Malebary, S. J.; Khan, Y. D., Evaluating machine learning methodologies for identification of cancer driver genes. Scientific reports 2021, 11 (1), 1-13.

Malebary, S. J.; Khan, Y. D., Identification of Antimicrobial Peptides Using Chou's 5 Step Rule. CMC-COMPUTERS MATERIALS & CONTINUA 2021, 67 (3), 2863-2881.

Naseer, S.; Ali, R. F.; Khan, Y. D.; Dominic, P., iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. Journal of Biomolecular Structure and Dynamics 2021, 1-14.

Naseer, S.; Hussain, W.; Khan, Y. D.; Rasool, N., NPalmitoylDeep-PseAAC: A Predictor of N-Palmitoylation Sites in Proteins Using Deep Representations of Proteins and PseAAC via Modified 5-Steps Rule. Current Bioinformatics 2021, 16 (2), 294-305.

Naseer, S.; Hussain, W.; Khan, Y. D.; Rasool, N., Optimization of serine phosphorylation prediction in proteins by comparing human engineered features and deep representations. Analytical Biochemistry 2021, 615, 114069.

Khanum, S., Ashraf, M. A., Karim, A., Shoaib, B., Khan, M. A., Naqvi, R. A., ... & Alswaitti, M. Gly-LysPred: Identification of Lysine Glycation Sites in Protein Using Position Relative Features and Statistical Moments via Chou’s 5 Step Rule.

Lv, H., Dao, F. Y., Zhang, D., Yang, H., & Lin, H. (2021). Advances in mapping the epigenetic modifications of 5‐methylcytosine

(5mC), N6‐methyladenine (6mA), and N4‐methylcytosine (4mC). Biotechnology and Bioengineering.

Zulfiqar, H., Sun, Z. J., Huang, Q. L., Yuan, S. S., Lv, H., Dao, F. Y., ... & Li, Y. W. (2021). Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods.

Liu, Y., Wang, X., & Liu, B. (2019). A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Briefings in bioinformatics, 20(1), 330-346.

Zhang, D., Xu, Z. C., Su, W., Yang, Y. H., Lv, H., Yang, H., & Lin, H. (2021). iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics, 37(2), 171-177.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.