PCDv1.0.0 · S.Y.A.L.I.S Labs · ClinVar-derived · AlphaFold2

Pharmacological Chaperone Database

Proteome-scale computational database of de novo pharmacological chaperones for pathogenic protein misfolding variants, generated by the REFOLD pipeline.

MetricValue
Total entries0
Avg. fpocket drug score0.000
Diseases represented0
ClinVar pathogenic missense (target)178,597
6.8288%
LIVE
Build date: 2026-05-19Total entries: 12,196Diseases covered: 2304Avg. druggability: 0.834

Protein Misfolding & the Pharmacological Chaperone Gap

40–60% of pathogenic missense variants cause disease not by disrupting a protein's active site, but by triggering premature proteasomal degradation of the misfolded intermediate before it can reach its functional destination. The endoplasmic reticulum quality-control machinery flags structurally aberrant conformers and routes them for degradation via the ubiquitin-proteasome system.

The key mechanistic insight: these proteins retain native-like function; they simply never arrive. A small molecule that binds the misfolded intermediate and shifts the folding free-energy landscape toward the native basin constitutes a pharmacological chaperone. No systematic computational framework for their design has previously been described. REFOLD provides this framework.

Three FDA-approved pharmacological chaperones currently exist, each discovered by brute-force high-throughput screening against a single target at cost exceeding $1B per molecule. REFOLD designs them from structural first principles, for every protein in the human proteome, fully automatically.

3D molecular rendering of a pharmacological chaperone bound in a transient pocket

Representative transient pocket (cyan surface) identified in the misfolded intermediate of CFTR-F508del. De novo chaperone shown as amber sticks. Generated by REFOLD Stage 2–3.

The Pharmacological Chaperone Gap and REFOLD Pipeline systemic overview

Top: Native vs. mutant folding free-energy landscape. Bottom: REFOLD Stage 1→2→3 pipeline producing the proteome-wide chaperone map.

StageObjectiveMethodKey OutputPerformance
Stage 1Rescue amenability filterGNN-LM Fusion ClassifierRescue score ∈ [0,1]; threshold ≥ 0.70AUC 0.88 (Gaucher), 0.79 (CF)
Stage 2Transient pocket identificationANM conformational sampling + fpocket α-sphere scoringCryptic pocket geometry, E_ij matrix, best conformation PDBDruggability ≥ 0.75; absent in WT
Stage 3De novo chaperone designSE(3)-equivariant diffusion on pocket E_ij conditioningSMILES, physicochemical properties, composite scoreScore = 0.45·affinity + 0.30·QED + 0.15·SA + 0.10·logP
ML Architecture — Technical Detail

Part A: GNN feature extractor with atom-edge message passing + Transformer LM fusion → rescuability logits.

Part B: SE(3)-equivariant denoising network for structure-conditioned molecular generation (T=100→T=1 diffusion trajectory).

GNN-LM Fusion Classifier and SE(3) Diffusion Model technical architecture

Inputs: AlphaFold2 structure, ESM-2 embeddings (1280-d), FoldX ΔΔG, PSSM conservation scores. Conditioning: E_ij Cα–Cα distance matrix from Stage 2 pocket geometry.

REFOLD pipeline end-to-end overview
Aaryan Senthilvanan
Aaryan Senthilvanan
Principal Investigator
S.Y.A.L.I.S Labs

REFOLD — Proteome-Scale Pharmacological Chaperone Design

REFOLD is a fully automated, end-to-end computational pipeline that accepts any human disease-associated missense mutation as input and produces: (1) a binary prediction of whether the variant drives pathogenesis via protein misfolding rather than active-site disruption; (2) the 3D transient-conformation structure of the mutant protein with cryptic binding pockets resolved; (3) de novo designed small-molecule candidates that bind those pockets and thermodynamically stabilize the native fold.

The PCD is the product of continuously running REFOLD across the complete human proteome — processing every pathogenic missense variant in ClinVar sequentially, with each result automatically injected into this living database.

Human protein-coding genes20,430
ClinVar pathogenic missense variants178,597
Genes with known pathogenic variants5,992
Pipeline runtime per variant (avg.)~4–8 min
Database update frequencyContinuous (daemon)