Open Access Open Access  Restricted Access Subscription Access

Protein Secondary Structure Prediction Using ESMFold: A Deep Learning-Based Bioinformatics Web Application

Tarun G S, Tarun Sai, Vedant Habbu, Siddharth Holla, Rohan K B

Abstract


Protein secondary structure prediction remains a cornerstone problem in computational biology, underpinning applications in drug discovery, genetic disease research, vaccine design, and protein engineering. Traditional approaches rely on evolutionary information derived from multiple sequence alignments and are computationally expensive. This paper presents a complete bioinformatics web application that harnesses ESMFold, a pretrained protein language model developed by Meta AI, to predict per-residue secondary structure labels — alpha helix (H), beta sheet (E), and coil (C) — directly from amino acid sequences in a zero-shot fashion, without requiring alignment databases or model fine-tuning. The system supports three input modalities: manual sequence entry, FASTA file upload, and live Protein Data Bank (PDB) ID retrieval. Prediction output is presented through color-coded sequence maps, per-residue pLDDT confidence heatmaps, structural composition charts, and interactive 3D molecular visualization. The application achieved Q3 accuracy between 81.4% and 87.2% on benchmark proteins spanning all major fold classes, consistent with published ESMFold performance. The tool serves both research and educational purposes, combining scientific rigor with an accessible and visually rich Streamlit interface.


Full Text:

PDF

References


Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. (2022). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123–1130.

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS, 118(15), e2016239118.

Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292(2), 195–202.

Klausen, M. S., Jespersen, M. C., Nielsen, H., Jensen, K. K., Jurtz, V. I., Soenderby, C. K., ... & Winther, O. (2019). NetSurf-2: improved prediction of solvent accessibility and secondary structure by protein language models. Bioinformatics, 35(24), 5324–5326.

Rost, B., & Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology, 232(2), 584–599.

Chou, P. Y., & Fasman, G. D. (1974). Prediction of protein conformation. Biochemistry, 13(2), 222–245.

Yang, J., Anishchenko, I., Park, H., Peng, Z., Ovchinnikov, S., & Baker, D. (2020). Improved protein structure prediction using predicted interresidue orientations. PNAS, 117(3), 1496–1503.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. EMNLP 2020 System Demonstrations, 38–45.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., ... & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Research, 28(1), 235–242.


Refbacks

  • There are currently no refbacks.