Analyzing updates in Amino Acid Composition and Translation Algorithm towards Predicting Membrane Proteins using Machine Learning Approaches

Abdulsalam Mohammed Alfarsi, Abdulrahman Mohammed Alghanmi


Membrane proteins are of different types that take on different functions. Classification of protein sequences in a data set is very important for understanding cell functions, disease prevention, and drug discovery. Initially, traditional methods were used for transmembrane protein classification. However, due to advanced technology and new research, it increases the transmembrane protein datasets by thousands which are almost impossible to obtain accurate results based on traditional methods. Computational methods are very useful for membrane protein classification. Several methods such as Pseudo Amino Acid Composition (PseAAC) can extract many silent features of a protein sequence. In this work, we intended to modify an existing algorithm of amino acid composition and translation to extract membrane protein features with better accuracy. To validate our algorithm, we will use the Support Vector Machine SVM and KNN.

Full Text:



. Chou, K. C., & Elrod, D. W. (1999). Prediction of membrane protein types and subcellular locations. Proteins: Structure, Function, and Bioinformatics, 34(1), 137-153.

. Chou, K. C. (2001). Prediction of protein cellular attributes using pseudo‐amino acid composition. Proteins: Structure, Function, and Bioinformatics, 43(3), 246-255.

. Wang, M., Yang, J., Liu, G. P., Xu, Z. J., & Chou, K. C. (2004). Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. Protein Engineering Design and Selection, 17(6), 509-516.

. Chou, K. C., & Cai, Y. D. (2005). Prediction of membrane protein types by incorporating amphipathic effects. Journal of chemical information and modeling, 45(2), 407-413.

. Chou, K. C., & Cai, Y. D. (2005). Using GO-PseAA predictor to identify membrane proteins and their types. Biochemical and biophysical research communications, 327(3), 845-847.

. Augen, J. (2004). Bioinformatics in the post-genomic era: Genome, transcriptome, proteome, and information-based medicine. Addison-Wesley Professional.

. Fulekar, M. H. (Ed.). (2009). Bioinformatics: applications in life and environmental sciences. Springer Science & Business Media.

. Leavitt, H. J., & Whisler, T. L. (1958). Management in the 1980’s. November.

. Krogh, A., Larsson, B., Von Heijne, G., & Sonnhammer, E. L. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of molecular biology, 305(3), 567-580.

. Wallin, E., & Heijne, G. V. (1998). Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Science, 7(4), 1029-1038.

. Jones, D. T. (1998). Do transmembrane protein superfolds exist?. FEBS letters, 423(3), 281-285.

. Gao, Q. B., Ye, X. F., Jin, Z. C., & He, J. (2010). Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. Analytical biochemistry, 398(1), 52-59.

. Russell, R. B., & Eggleston, D. S. (2000). New roles for structure in biology and drug discovery. Nature Structural & Molecular Biology, 7, 928-930.

. Russ, A. P., & Lampel, S. (2005). The druggable genome: an update. Drug discovery today, 10(23), 1607-1610.

. Singer, S. J., & Nicolson, G. L. (1972). The fluid mosaic model of the structure of cell membranes. Membranes and Viruses in Immunopathology; Day, SB, Good, RA, Eds, 7-47.

. Wallin, E., & Heijne, G. V. (1998). Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Science, 7(4), 1029-1038.

. Bagos, P. G., Liakopoulos, T. D., Spyropoulos, I. C., & Hamodrakas, S. J. (2004). A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins. BMC bioinformatics, 5(1), 1.

. Fairman, J. W., Noinaj, N., & Buchanan, S. K. (2011). The structural biology of β-barrel membrane proteins: a summary of recent reports. Current opinion in structural biology, 21(4), 523-531.

. Afridi, T. H., Khan, A., & Lee, Y. S. (2012). Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition. Amino Acids, 42(4), 1443-1454.

. Ung, P., & Winkler, D. A. (2011). Tripeptide motifs in biology: targets for peptidomimetic design. J. Med. Chem, 54(5), 1111-1125.

. Kumar, M., Gromiha, M. M., & Raghava, G. P. (2011). SVM based prediction of RNA‐binding proteins using binding residues and evolutionary information. Journal of Molecular Recognition, 24(2), 303-313.

. Chou, K. C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of theoretical biology, 273(1), 236-247.

. Ding, H., Deng, E. Z., Yuan, L. F., Liu, L., Lin, H., Chen, W., & Chou, K. C. (2014). iCTX-Type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed research international, 2014.

. Hayat, M., & Iqbal, N. (2014). Discriminating protein structure classes by incorporating pseudo average chemical shift to Chou's general PseAAC and support vector machine. Computer methods and programs in biomedicine, 116(3), 184-192.

. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.