Please use this identifier to cite or link to this item: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1185675
Title: Protein family membership governs exosite predictability across the structural proteome.
Authors: OMAGE, F. B.
MAZONI, I.
YANO, I. H.
NESHICH, G.
Affiliation: FOLORUNSHO BRIGHT OMAGE, UNIVERSIDADE ESTADUAL DE CAMPINAS; IVAN MAZONI, CNPTIA; INACIO HENRIQUE YANO, CNPTIA; GORAN NESIC, CNPTIA.
Date Issued: 2026
Citation: Artificial Intelligence in the Life Sciences, v. 9, 100166, June 2026.
Description: Exosites, defined as protein surface regions that mediate macromolecular recognition at sites distinct from catalytic centers, represent emerging targets for selective drug design, yet their structural diversity has precluded systematic computational identification. Here we demonstrate that exosite prediction performance varies substantially across protein families, ranging from Matthews correlation coefficient (MCC) of 0.47 for coagulation factors to 0.14 for kinases. Using ExositeDB, we developed STINGExoFind, a gradient boosting framework leveraging 87 structural descriptors from the STINGRDB2 database, and evaluated 180 proteins under leave-one-protein-out cross-validation (LOPO-CV). Coagulation proteases achieved 50% success rates at the MCC ≥ 0.5 threshold, whereas kinases and caspases remained largely unpredictable. Ten structures spanning six families exceeded MCC ≥ 0.7, including MAPK/ERK2 (MCC = 0.86) within the otherwise challenging kinase family, indicating that high-confidence predictions remain achievable for specific proteins even in poorly-performing families. These results establish exosite prediction as a family-specific rather than universal challenge: computational approaches can meaningfully guide experimental validation for coagulation factors and similarly consistent protein families, while structurally diverse families require experimental characterization. STINGExoFind is provided as a community resource to support future method development and exosite-targeting drug discovery.
NAL Thesaurus: Protein structure
Keywords: Aprendizado de máquina
Estrutura proteica
Aumento de gradiente
Descritores de nanoambiente
Descoberta de fármacos
Exosite prediction
Machine learning
Gradient boosting
Nanoenvironment descriptors
Drug discovery
ISSN: 2667-3185
DOI: https://doi.org/10.1016/j.ailsci.2026.100166
Type of Material: Artigo de periódico
Access: openAccess
Appears in Collections:Artigo em periódico indexado (CNPTIA)

Files in This Item:
File SizeFormat 
AA-Protein-family-2026.pdf925,33 kBAdobe PDFView/Open

FacebookTwitterDeliciousLinkedInGoogle BookmarksMySpace