DeepImmuno-PSSM: Identification of Immunoglobulin based on Deep learning and PSSM-Profiles

Authors

  • Ali Ghulam Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan
  • Zar Nawab Khan Swati Department of computer science, Karakoram international university Gilgit, Pakistan
  • Farman Ali Department of Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa, Pakistan
  • Saima Tunio Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Nida Jabeen College of information and compute, Taiyuan university of Technology, 030024, Shanxi ,Taiyuan, China
  • Natasha Iqbal Department of Botany, Government College University of Faisalabad, 37000, Faisalabad, Pakistan.

DOI:

https://doi.org/10.21015/vtcs.v11i1.1396

Abstract

Immunoglobulin has a close connection to a number of disorders and is important in both biological and medicinal contexts. Therefore, it is crucial for illness research to employ efficient techniques to increase the categorization accuracy of immunoglobulins. Computational models have been used in a small number of research to address this important issue, but the accuracy of the predictions is not good enough. As a result, we use a cutting-edge deep learning technique with convolutional neural networks to enhance the performance results. In this study, the immunoglobulin features were extracted using the dipeptide acid composition (DPC) with the position-specific scoring matrix (DPC-PSSM) and position-specific scoring matrix-transition probability composition (PSSM-TPC) methods. we apply extracted features information from the DPC-PSSM profiles and PSSM-TPC profile by using a 1D-convolutional neural network (CNN) over an input shape.  The outcomes demonstrated that the DeepImmuno-PSSM method based on sequential minimal optimization was able to properly predict DPC-PSSM accuracy score 93.44% obtained and of the immunoglobulins using the greatest feature subcategory produced by the PSSM-TPC feature mining approach accuracy score 89.92% obtained. Our findings indicate that we are able to provide a useful model for enhancing immunoglobulin proteins' capacity for prediction. Additionally, it implies that employing sequence data in deep learning and PSSM-based features may open up new path for biochemical modelling.

References

A. Ghulam, R. Sikander, F. Ali, Z. N. Khan Swati, A. Unar, and D. B. Talpur, “Accurate prediction of immunoglobulin proteins using machine learning model,” Inform. Med. Unlocked, vol. 29, no. 100885, p. 100885, 2022.

J. E. T. Narciso et al., “Analysis of the antibody structure based on high-resolution crystallographic studies,” N. Biotechnol., vol. 28, no. 5, pp. 435–447, 2011.

C. Chothia and A. M. Lesk, “Canonical structures for the hypervariable regions of immunoglobulins,” J. Mol. Biol., vol. 196, no. 4, pp. 901–917, 1987.

J. P. Gomes, L. Santos, and Y. Shoenfeld, “Intravenous immunoglobulin (IVIG) in the vanguard therapy of Systemic Sclerosis,” Clin. Immunol., vol. 199, pp. 25–28, 2019.

I. Sela-Culang, S. Ashkenazi, B. Peters, and Y. Ofran, “PEASE: predicting B-cell epitopes utilizing antibody sequence,” Bioinformatics, vol. 31, no. 8, pp. 1313–1315, 2015.

C. K. Hua, A. T. Gacerez, C. L. Sentman, M. E. Ackerman, Y. Choi, and C. Bailey-Kellogg, “Computationally-driven identification of antibody epitopes,” Elife, vol. 6, 2017.

L. Cantarini et al., “Efficacy and safety of intravenous immunoglobulin treatment in refractory behcet’s disease with different organ involvement: A case series,” Isr. Med. Assoc. J., vol. 18, no. 3–4, pp. 238–242, 2016.

S. Tenti, M. Fabbroni, V. Mancini, F. Russo, M. Galeazzi, and A. Fioravanti, “Intravenous Immunoglobulins as a new opportunity to treat discoid lupus erythematosus: A case report and review of the literature,” Autoimmun. Rev., vol. 17, no. 8, pp. 791–795, 2018.

R. Lepore, P. P. Olimpieri, M. A. Messih, and A. Tramontano, “PIGSPro: prediction of immunoGlobulin structures v2,” Nucleic Acids Res., vol. 45, no. W1, pp. W17–W23, 2017.

P. Marcatili, P. P. Olimpieri, A. Chailyan, and A. Tramontano, “Erratum: antibody modeling using the Prediction of ImmunoGlobulin Structure (PIGS) web server,” Nat. Protoc., vol. 10, no. 4, p. 644, 2015.

M. Lundqvist, J. Stigler, G. Elia, I. Lynch, T. Cedervall, and K. A. Dawson, “Nanoparticle size and surface properties determine the protein corona with possible implications for biological impacts,” Proc. Natl. Acad. Sci. U. S. A., vol. 105, no. 38, pp. 14265–14270, 2008.

C. Sacchetti et al., “Surface polyethylene glycol conformation influences the protein corona of polyethylene glycol-modified single-walled carbon nanotubes: potential implications on biological performance,” ACS Nano, vol. 7, no. 3, pp. 1974–1989, 2013.

V. H. Nguyen and B.-J. Lee, “Protein corona: a new approach for nanomedicine design,” Int. J. Nanomedicine, vol. 12, pp. 3137–3151, 2017.

P. Salvo et al., “Biosensors for detecting lymphocytes and immunoglobulins,” Biosensors (Basel), vol. 10, no. 11, p. 155, 2020.

X. Zeng, S. Zhu, X. Liu, Y. Zhou, R. Nussinov, and F. Cheng, “deepDR: a network-based deep learning approach to in silico drug repositioning,” Bioinformatics, vol. 35, no. 24, pp. 5191–5198, 2019.

Y. Ding, J. Tang, and F. Guo, “Identification of drug-side effect association via multiple information integration with centered kernel alignment,” Neurocomputing, vol. 325, pp. 211–224, 2019.

L. Wei, C. Zhou, H. Chen, J. Song, and R. Su, “ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides,” Bioinformatics, vol. 34, no. 23, pp. 4007–4016, 2018.

X.-J. Zhu, C.-Q. Feng, H.-Y. Lai, W. Chen, and L. Hao, “Predicting protein structural classes for low-similarity sequences by evaluating different features,” Knowl. Based Syst., vol. 163, pp. 787–793, 2019.

H. Tang et al., “HBPred: a tool to identify growth hormone-binding proteins,” Int. J. Biol. Sci., vol. 14, no. 8, pp. 957–964, 2018.

W. Chen, P.-M. Feng, H. Lin, and K.-C. Chou, “iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition,” Nucleic Acids Res., vol. 41, no. 6, p. e68, 2013.

X. Fu, L. Cai, X. Zeng, and Q. Zou, “StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency,” Bioinformatics, vol. 36, no. 10, pp. 3028–3034, 2020.

H.-L. Li, Y.-H. Pang, and B. Liu, “BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models,” Nucleic Acids Res., vol. 49, no. 22, p. e129, 2021.

Y. Zhai, Y. Chen, Z. Teng, and Y. Zhao, “Identifying antioxidant proteins by using amino acid composition and protein-protein interactions,” Front. Cell Dev. Biol., vol. 8, p. 591487, 2020.

K.-C. Chou, “Prediction of protein cellular attributes using pseudo-amino acid composition,” Proteins, vol. 44, no. 1, pp. 60–60, 2001.

L. Cai, L. Wang, X. Fu, C. Xia, X. Zeng, and Q. Zou, “ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation,” Brief. Bioinform., vol. 22, no. 4, 2021.

Identification of Intrinsically Disordered Regions based on Sequence-to-Sequence Learning. .

J. X. Tan et al., “Identification of hormone binding proteins based on machine learning methods,” Math. Biosci. Eng., vol. 16, no. 4, pp. 2466–2480, 2019.

Y. Shen, J. Tang, and F. Guo, “Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC,” J. Theor. Biol., vol. 462, pp. 230–239, 2019.

L.-M. Liu, Y. Xu, and K.-C. Chou, “IPGK-PseAAC: Identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC,” Med. Chem., vol. 13, no. 6, pp. 552–559, 2017.

H. Tang, W. Chen, and H. Lin, “Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique,” Mol. Biosyst., vol. 12, no. 4, pp. 1269–1275, 2016.

Y. Ding, J. Tang, and F. Guo, “Identification of drug-target interactions via multiple information integration,” Inf. Sci. (Ny), vol. 418–419, pp. 546–560, 2017.

K. Jia and R. L. Jernigan, “New amino acid substitution matrix brings sequence alignments into agreement with structure matches,” Proteins, vol. 89, no. 6, pp. 671–682, 2021.

Q. Dong, S. Zhou, and J. Guan, “A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation,” Bioinformatics, vol. 25, no. 20, pp. 2655–2662, 2009.

R. Muhammod, S. Ahmed, D. Md Farid, S. Shatabda, A. Sharma, and A. Dehzangi, “PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences,” Bioinformatics, vol. 35, no. 19, pp. 3831–3833, 2019.

H. Saini et al., “Protein fold recognition using genetic algorithm optimized voting scheme and profile bigram,” J. Softw., vol. 11, no. 8, pp. 756–767, 2016.

A. K. Yadav and D. Singla, “VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques,” J. Biosci., vol. 45, no. 1, 2020.

Y. Gong, B. Liao, D. Peng, and Q. Zou, “Accurate prediction and key feature recognition of immunoglobulin,” Appl. Sci. (Basel), vol. 11, no. 15, p. 6894, 2021.

Xenarios UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view Methods. .

W. Li and A. Godzik, “Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658–1659, 2006.

A. Mohammadi, J. Zahiri, S. Mohammadi, M. Khodarahmi, and S. S. Arab, “PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles,” Biol. Methods Protoc., vol. 7, no. 1, p. bpac008, 2022.

T. Liu, X. Zheng, and J. Wang, “Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile,” Biochimie, vol. 92, no. 10, pp. 1330–1334, 2010.

S. Ding, S. Yan, S. Qi, Y. Li, and Y. Yao, “A protein structural classes prediction method based on PSI-BLAST profile,” J. Theor. Biol., vol. 353, pp. 19–23, 2014.

F. Ali, S. Ahmed, Z. N. K. Swati, and S. Akbar, “DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information,” J. Comput. Aided Mol. Des., vol. 33, no. 7, pp. 645–658, 2019.

Q. Wei, Q. Zhang, H. Gao, T. Song, A. Salhi, and B. Yu, “DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier,” Knowl. Based Syst., vol. 256, no. 109875, p. 109875, 2022.

S. Zhang, F. Ye, and X. Yuan, “Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM,” J. Biomol. Struct. Dyn., vol. 29, no. 6, pp. 634–642, 2012.

N. Q. K. Le, T.-T. Huynh, E. K. Y. Yapp, and H.-Y. Yeh, “Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles,” Comput. Methods Programs Biomed., vol. 177, pp. 81–88, 2019.

A. Ghualm, X. Lei, Y. Zhang, S. Cheng, and M. Guo, “Identification of pathway-specific protein domain by incorporating hyperparameter optimization based on 2D convolutional neural network,” IEEE Access, vol. 8, pp. 180140–180155, 2020.

R. Sikander, Y. Wang, A. Ghulam, and X. Wu, “Identification of enzymes-specific protein domain based on DDE, and Convolutional Neural Network,” Front. Genet., vol. 12, p. 759384, 2021.

A. Ghulam, F. Ali, R. Sikander, A. Ahmad, A. Ahmed, and S. Patil, “ACP-2DCNN: Deep learning-based model for improving prediction of anticancer peptides using two-dimensional convolutional neural network,” Chemometr. Intell. Lab. Syst., vol. 226, no. 104589, p. 104589, 2022.

R. Sikander, M. Arif, A. Ghulam, A. Worachartcheewan, M. A. Thafar, and S. Habib, “Identification of the ubiquitin-proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network,” Front. Genet., vol. 13, p. 851688, 2022.

N.-Q.-K. Le, Q.-T. Ho, and Y.-Y. Ou, “Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins,” J. Comput. Chem., vol. 38, no. 23, pp. 2000–2006, 2017.

D. Molchanov, A. Ashukha, and D. Vetrov, “Variational Dropout sparsifies deep neural networks,” arXiv [stat.ML], 2017.

X. Zeng et al., “Target identification among known drugs by deep learning from heterogeneous networks,” Chem. Sci., vol. 11, no. 7, pp. 1775–1797, 2020.

Z. Hong, X. Zeng, L. Wei, and X. Liu, “Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism,” Bioinformatics, vol. 36, no. 4, pp. 1037–1043, 2020.

J. Lin, H. Chen, S. Li, Y. Liu, X. Li, and B. Yu, “Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier,” Artif. Intell. Med., vol. 98, pp. 35–47, 2019.

R. Su, X. Liu, L. Wei, and Q. Zou, “Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response,” Methods, vol. 166, pp. 91–102, 2019.

K. C. Chou, “Using subsite coupling to predict signal peptides,” Protein Eng., vol. 14, no. 2, pp. 75–79, 2001.

H. Yang et al., “Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition,” Biomed Res. Int., vol. 2016, p. 5413903, 2016.

B. Dai and C. Bailey-Kellogg, “Protein interaction interface region prediction by geometric deep learning,” Bioinformatics, vol. 37, no. 17, pp. 2580–2588, 2021.

Downloads

Published

2023-03-17

How to Cite

Ghulam, A., Nawab Khan Swati, Z., Ali, F., Tunio, S., Jabeen, N., & Iqbal, N. (2023). DeepImmuno-PSSM: Identification of Immunoglobulin based on Deep learning and PSSM-Profiles. VAWKUM Transactions on Computer Sciences, 11(1), 54–66. https://doi.org/10.21015/vtcs.v11i1.1396