Open Access Open Access  Restricted Access Subscription Access

Evaluating EIIP and CPNR Methods for Protein Hotspot Prediction

G.S.L.B.V. Prashanthi, Sk. Minhaz, Md. Parvez, D.Javed Ali

Abstract


This project focuses on predicting hotspots using two mapping techniques: EIIP and CPNR. Hotspots are defined as residues formed after protein-protein interactions within biological cells. The main objective is to identify the location of these hotspots and compare the performance of the two methods based on precision, accuracy, F1 score, and recall.

The project begins by taking a protein sequence as input, which is a 40-alphabet sequence representing amino acids. This sequence is then transformed using the two mapping techniques. In CPNR- based mapping, each amino acid is assigned a prime number that corresponds to its mapping. In EIIP technique, each amino acid is mapped based on its Electron Ion Interaction Potential value. Once the hotspots are identified, not every peak is considered a hotspot. Only the peaks above the characteristic frequency are considered as hotspots. To analyze the performance of the methods, machine learning techniques are employed, considering four parameters. This analysis allows for a comparison of the performance of both methods and helps determine which one yields better results.

In summary, this project focuses on predicting hotspots in protein sequences using EIIP and CPNR mapping techniques. By evaluating the performance of these methods, it aims to identify the more effective approach for hotspot prediction.


Full Text:

PDF

References


Li, J., Wong, L., & Yang, Q. (2005). Guest Editors' Introduction: Data Mining in Bioinformatics. IEEE intelligent systems, 20(6), 16-18.

Jayanthi, K. and Mahesh, C.,(2019) Need of Machine Learning In Bioinformatics. Int. J. Innov. Technol. Exploring Eng. (IJITEE), 8 (11): 2608

Ramachandran P. and Antoniou A.(2008), Identification of hot-spot locations in proteins using digital filters. IEEE J. Sel. Topics Signal Process. 2(3):378–389, Jun. 2008.

Sabarish, A. R., & Thomas, T. (2011). A frequency domain approach to protein sequence similarity analysis and functional classification. Signal & Image Processing, 2(1).

Protein Data Bank (PDB), Research Collaboratory for Structural Bioinformatics (RCSB). [Online]. Available: http://www.rcsb.org/pdb/ H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucl. Acids Res., vol. 28, no. 1, pp. 235–242, 2000.

Swiss-Prot Protein Knowledgebase. Swiss Inst. Bioinfor- matics (SIB). [Online]. Available: http://us.expasy.org/sprot/

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C., Estreicher, A., Gasteiger, E., ... & Schneider, M. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research, 31(1), 365-370.

Ramachandran, P. and Antoniou, A., Identification of Hot-Spots Locations in Proteins Using Digital Filters. IEEE. J. Sel. Top. Signal Process., 2(3):378–389.

George, T. P., & Thomas, T. (2010). Discrete wavelet transform de-noising in eukaryotic gene splicing. BMC bioinformatics, 11, 1-8.

Chakraborty, S., & Gupta, V. (2016, February). DWT based cancer identification using EIIP. In 2016 second international conference on computational intelligence & communication technology (CICT) (pp. 718-723). IEEE..


Refbacks

  • There are currently no refbacks.