Datum in ura / Date and time: 9.3.26
(11:00-12:00)
Predavalnica / Location: FAMNIT-MP6
Predavatelj / Lecturer: Janez Konc (University of Primorska and National Institute of Chemistry)
Naslov / Title: Graph-Theoretic Analysis of Water Binding Sites and SNP Pathogenicity: Maximum-Clique Methods and Insights from ProBiS on Human Proteins
Vsebina / Abstract: We investigate how different classes of protein binding sites, particularly conserved water binding sites, affect the pathogenicity of single-nucleotide polymorphisms (SNPs) in human proteins. Our analysis is grounded in a graph-theoretic framework implemented within the ProBiS platform, where protein surface regions are represented as similarity graphs and structurally conserved binding sites are identified via maximum-clique computations. In this work, we employ and further develop new efficient maximum-clique algorithms tailored to large-scale protein structural comparison, enabling the processing of a comprehensive dataset of human protein structures from the Protein Data Bank (PDB).
Conserved water molecules, which correspond to highly stable and topologically coherent vertices within these similarity graphs, are essential for maintaining protein stability, mediating ligand recognition, and shaping conformational dynamics. Despite their central biochemical role, the relationship between water binding sites and the pathogenicity of nearby genetic variants has remained largely uncharacterized.
Our results demonstrate that SNPs located within or proximal to conserved water binding sites exhibit significantly elevated pathogenicity relative to SNPs in other structural environments. This previously unreported association provides new mechanistic insights into how perturbations of water-mediated interaction networks contribute to human disease. To facilitate further research, we provide an openly accessible dataset containing over 40,000 SNPs mapped to conserved water binding sites and more than 500,000 SNPs associated with other binding site categories. Coupled with our improved maximum-clique methodology, this dataset forms a foundation for advancing mathematical and computational approaches to predicting the pathogenicity of newly observed genetic variants and may ultimately support the rational design of targeted therapeutic strategies.