Treffer: Protein Secondary Structure Prediction Using Soft Computing Techniques.

Title:
Protein Secondary Structure Prediction Using Soft Computing Techniques.
Authors:
K., Sajani1 (AUTHOR), Yaduvanshi, Pragyendu1 (AUTHOR), Masood, Sarfaraz2 (AUTHOR) smasood@jmi.ac.in, Singh, Prithvi3 (AUTHOR) prithvi.mastermind@gmail.com
Source:
Biotechnology & Applied Biochemistry. Jan2026, p1. 10p. 4 Illustrations.
Database:
Academic Search Index

Weitere Informationen

ABSTRACT Accurate prediction of protein secondary structure is a critical step toward understanding protein function and facilitating structure‑based drug discovery. We present a template‑independent, single sequence method utilizing a shallow feed‑forward artificial neural network (ANN) with one hot (binary) amino acid encoding and a sliding window input. The network is trained and evaluated on two datasets: (i) a curated, nonhomologous Protein Data Bank (PDB) set with a strict ≤25%$ \le 25{\mathrm{\% }}$ maximum pairwise sequence identity, annotated with STRIDE; and (ii) a homologous human papillomavirus (HPV) set (L1, L2, E1–E7) whose labels are obtained from the Proteus predictor and used solely for a system specific, post hoc analysis. To improve transparency regarding generalization, we report the all‑vs‑all sequence identity distribution for the nonhomologous set (matrix and histogram). The model achieves competitive Q3 accuracy on the nonhomologous PDB benchmark and yields 82.2%$82.2{\mathrm{\% }}$ Q3‑agreement (Proteus) on the HPV case study. We explicitly frame the HPV evaluation as agreement with a labeling tool rather than accuracy versus experiment. Despite its simplicity and lack of evolutionary profiles, the ANN demonstrates robust sequence‐only performance, offering a lightweight baseline that is easy to reproduce and deploy on the CPU. We discuss limitations (dataset size, lack of cross‑tool bake‑offs, absence of long‑range features) and delineate concrete avenues for future work. [ABSTRACT FROM AUTHOR]