S.Y.A.L.I.S Labs·Preprint — 2026
Computational Drug Discovery · Proteomics · Machine Learning
R.E.F.O.L.D
Proteome-Scale De Novo Design of Pharmacological Chaperones
for Context-Locked Cryptic Pockets
Aaryan Senthilvanan  ·  S.Y.A.L.I.S Labs  ·  2026
668+
Database Entries
231+
Diseases Covered
0.835
Avg Drug. Score
178,597
ClinVar Variants

Abstract

Pathogenic missense mutations frequently cause disease through accelerated protein misfolding rather than direct functional disruption — a mechanism that leaves the vast majority of rare disease variants without any approved pharmacological intervention. Traditional structure-based drug discovery remains largely blind to these variants, relying on static wild-type templates that fail to capture the transient conformational trajectories unique to destabilized mutant ensembles.

To overcome this, we introduce REFOLD, a fully automated computational framework for the proteome-scale de novo synthesis of pharmacological chaperones. REFOLD couples a multi-modal GNN-Transformer fusion classifier for rescue amenability scoring with an Anisotropic Network Model (ANM) that simulates 20 mutant transition-state conformations at 1.5 Å RMSD to expose hidden, allosteric cryptic pockets undetectable in wild-type structures. A pharmacophore-guided evolutionary SMILES search, scored against RDKit molecular descriptors and the Eij pairwise Cα–Cα distance matrix of pocket-lining residues, then designs highly druglike, synthetically accessible small molecules tailored to these coordinate constraints entirely from scratch.

We validate REFOLD on two historically intractable Class II misfolding targets. For the GBA1 L444P variant (Gaucher disease Type I), the model bypassed the collapsed canonical active site (WT fpocket druggability: 0.11), identified a distal allosteric hinge at residues A40–R41–P42–C43–D63–S64–F65–R87–M88–E89–L90 (mutant pocket druggability: 0.926, volume: 462.6 ų), and generated a bis-aromatic piperidine scaffold with MW 317 Da, SA score 2.7, and QED 0.79. For the CFTR G85E variant (Cystic Fibrosis), REFOLD similarly bypassed the VX-809 canonical binding site (druggability 0.003 in mutant), instead targeting the exposed ICL1-TM2 junction (druggability: 0.807, volume: 484.0 ų) with a fluorinated piperazine scaffold of MW 316 Da and exceptional SA score 1.7. In both cases, molecules act as physical “kinetic splints” that stabilize misfolded intermediates and rescue structural integrity before ER quality control-mediated degradation.

Scaled across the full ClinVar pathogenic missense catalog (178,597 variants), REFOLD has to date generated 668 complete chaperone entries spanning 231 distinct diseases, with a mean pocket druggability of 0.835 across all accepted variants. All results are continuously published to the open-access Pharmacological Chaperone Database (PCD), providing interactive 3D transient-conformation visualizations, full Eij matrices, molecular property profiles, and SMILES strings for every entry — establishing a scalable blueprint for zero-shot orphan disease drug discovery.

Pipeline Architecture

INPUTPathogenicmissense variantRESCUE FILTERStage 1GNN-LM ClassifierRescue score ≥ 0.70Multi-modal GNN × Transformersequence + structure fusionamenablePOCKET DETECTIONStage 2ANM + fpocket20 conformations · 1.5 Å RMSDdrug. threshold > 0.70Anisotropic Network Model ·Voronoi alpha-sphere clusteringcryptic pocketCHAPERONE DESIGNStage 3SMILES EvolutionMW <350 Da · SA <3.5 · QED >0.7Evolutionary search ·RDKit pharmacophore scoringOUTPUTChaperone+ PDB atlas
Stage 1 · Rescue Filter
GNN-LM Fusion Classifier
A multi-modal graph neural network encodes the local residue contact graph of the mutant structure while a Transformer-based language model captures sequence-level evolutionary context. The fused representation produces a rescue amenability score; only variants with score ≥ 0.70 proceed. This eliminates loss-of-function variants that chaperones cannot rescue.
Stage 2 · Pocket Detection
ANM + fpocket
An Anisotropic Network Model samples 20 mutant transition-state conformations along low-frequency eigenmodes at 1.5 Å RMSD from the AlphaFold structure. fpocket runs Voronoi alpha-sphere clustering on each conformation; pockets absent from the WT ensemble but present in ≥ 3 mutant conformations are flagged as cryptic. The E_ij pairwise Cα–Cα distance matrix of lining residues is extracted as a pharmacophore conditioning signal.
Stage 3 · Chaperone Design
Evolutionary SMILES Search
Starting from a diverse seed population, an evolutionary SMILES search applies mutation, crossover, and elite selection over multiple generations. Each candidate is scored on a composite of fpocket affinity estimate, Lipinski Ro5, Veber oral bioavailability, SA score (< 3.5), QED (> 0.7), and pharmacophore complementarity to the E_ij matrix. Top-10 candidates per variant are stored.

Validation Cases

GBA1L444PGaucher Disease Type I
0.926
Drug.
317
MW (Da)
2.7
SA Score
0.795
QED
Cryptic Pocket
Distal allosteric hinge — absent in WT (drug=0.11), exposed by L444P ensemble
Pocket-Lining Residues
A40R41P42C43D63S64F65R87M88E89L90
Generated Chaperone (SMILES)
O=C(NCc1ccc(O)c(N2CCCCC2)c1)C1CCNCC1
CFTRG85ECystic Fibrosis
0.807
Drug.
316
MW (Da)
1.7
SA Score
0.869
QED
Cryptic Pocket
ICL1-TM2 junction — VX-809 canonical site destroyed (drug=0.003)
Pocket-Lining Residues
I70L73R74F77F78F81L195
Generated Chaperone (SMILES)
O=C(c1cc(F)ccc1F)N1CCN(Cc2ccccc2)CC1

Key Design Principles

Context-locked cryptic pockets
Pockets are only sampled in mutant conformational trajectories — not visible in WT crystal structures or static AlphaFold models. REFOLD specifically targets this structurally dynamic regime.
Kinetic splint mechanism
Generated chaperones are not designed to restore enzymatic function directly; they stabilize misfolded intermediates long enough to evade ERAD and traffic correctly to their destination organelle.
Fully de novo, zero-shot
No experimental binding data, crystal co-complex, or known ligand scaffold is used. Every chaperone is generated from scratch conditioned solely on the cryptic pocket geometry.
Proteome-scale automation
The pipeline runs unattended: ClinVar variant queue → AlphaFold fetch → ANM sampling → fpocket screening → SMILES evolution → GitHub push → live website update. One full entry takes ~2 minutes.
Aaryan Senthilvanan · S.Y.A.L.I.S Labs · 2026
All data, code, and generated structures are openly available.