OpenADMET PXR Structure Challenge: Final Model Report

General Strategy

The goal was to create a simple model with few heuristics. I wanted to benchmark every method I could get my hands on and apply limited, physically defensible perturbations to that model.

Clean the chemistry: Identify the dominant tautomer and protonation state at pH 7.4 and use that ligand state for modeling.
Try dozens of base models: Test different cofolding methods, pose generators, and scoring functions to build base pose predictions.
Filter/fix: Remove poses that are physically implausible.
Principled swaps: Apply perturbations where physically defensible and highly supported by prior information and model consensus.
- Multivalent binding (11): Small ligands can bind multiple copies in PXR. Where this was indicated, the best two-copy model was used.
- PDB substructure support (1): Pin the ligand pose to the PDB pose if there is a similar substructure and overwhelming modeling support.
Minimize final pose: Remove ligand strain with constrained minimization.

Detailed Methods

1. Chemistry Cleaning

RDKit canonicalization.
Uni-pKa tautomer and protonation computation at pH 7.4.
Choose the most dominant state as the canonical ligand state for modeling.

The pocket of PXR likely has a much lower dielectric than bulk solvent. This approach is probably overstating the charge of these molecules. However, this seemed to work better than letting RDKit or the cofolding models try to infer chemistry.

2. Finding the Best Base Model: ESMFold2 with no MSA

I wanted to see if any cofolding method was clearly better than another at predicting PXR poses and scoring them. I ran a benchmark against both the PDB structures, after filtering out rifamycin-like structures and 8SV* structures, and the external scorer.

External Scoring Benchmark

Best-of-25 models by iPTM, after filtering.

Method	LDDT-PLI	BiSymRMSD	LDDT-LP
Chai1	0.416	4.593	0.902
Boltz2.1	0.434	4.479	0.895
OpenFold3	0.475	4.286	0.897
Boltz2	0.478	4.157	0.915
Protenix	0.504	3.879	0.910
ESMFold2 (no-MSA)	0.545	3.634	0.915

ESMFold2, computed without MSA because PXR plus ligand just barely fit on my 16 GB GPU, is head-and-shoulders above everything else. Adding MSA resulted in OOM every time. This was the biggest finding of my entire campaign and quite unexpected.

With cofolding, I have always seen that sampling more produces better results. I therefore increased sampling to 100 total models for each ligand with ESMFold2 and used that as my base set of structures.

3. Filtering on Physics

Cofolding can produce some interesting chemical and physical artifacts and happily score them quite well. I used PoseBusters to filter out poses with clashes and bad ligand geometry and required poses to pass every filter except non-aromatic_ring_non-flatness, which failed on roughly 80% of poses. Cofolding often produces slightly distorted ring configurations, but these do not affect the pose and are easily fixed by restrained minimization.

I tested some hydrogen-bond filters to remove obviously terrible interactions, including buried unsaturated hydrogen bonds and like-charged atom interactions, but these did not help scoring.

4. Principled Swaps

Multivalency: 11 swaps.

The PXR binding pocket is large and I wondered if multiple copies of a ligand could bind simultaneously in the pocket. This is seen in crystal structures, where a number of small molecules co-bind with estradiol. It is also hinted at by the organizers, who mention that any pose will be scored against the closest copy in the experimental structures. Cofolding produced meaningfully different poses in some cases when folding in two copies instead of one.

I used my best cofolding models, Boltz2 and ESMFold2, to cofold two copies of each ligand with MW < 302 Da. Against the ESMFold2 parent top model, I compared any two-copy model where the highest individual protein-ligand iPTM was greater than the iPTM from the single-copy model. I chose those poses to replace the single-copy model. This resulted in 11 swaps. The analogous Boltz2 experiment identified 10 swap candidates, all in the set of 11 from ESMFold2.

PDB anchor similarity: 1 swap.

A number of ligands (40) were identified with a moiety similar to one in a PXR PDB structure that was buried in a specific PXR sub-pocket. While it would have been tempting to dock all of these with restraints, many had small substitutions in the moiety or in other parts of the molecule that made me question their fit. I did GNINA docking (100 poses) on the matched crystal template for each of the 40 challenge ligands I identified. Out of these 40, only 8 found the native pocket of the analogous crystal moiety consistently (>10% of poses) and with at least one high GNN score in the top 10. In the end, only one ligand overwhelmingly preferred the crystal pose, so I decided to swap only the one I was absolutely confident in, x03406-1. I am sure this means it will be wrong.

5. Final Pose Minimization

Raw cofolding poses produce both small clashes and small ligand abnormalities that would not be seen in a crystal structure. To relax these, I did a three-step minimization, keeping heavy atoms highly restrained and then slowly relaxing those restraints to prevent large movements. My average RMSD before and after minimization is roughly 0.2 angstroms.

Other Approaches I Tried

Learning a pose-picking method from the PDB structures. I generated extensive cofolding of the set of re-refined crystal structures of PXR and set up an AI-assisted autoresearch loop to find the best or closest model out of 400 cross-method poses using 5-fold cross-validation. The results did not generalize to the external scorer, even when only applied to drug-like compounds. ESMFold2 performed better.
Pose consistency across methods. I expected good poses to be consistent across cofolding and docking methods. This was the case for ligands where ESMFold2 had very high confidence anyway. For anything with low confidence, the different cofolding methods were often in completely different areas of the pocket or in ProLIF fingerprint space.
Higher-order scoring functions. CNN-score and MM-GBSA were not helpful according to the external scorer.

Final Thoughts

I am extremely surprised at the ESMFold2 model being the best-performing method. I am interested to see how it performs against the larger suite of drug-like compounds, both in pose prediction and in correctly building the binding site.

Thanks so much to the organizers for putting together a great competition and to the rest of the participants for interesting discussions about methods. PXR is the most difficult target I have ever worked with, given the large amount of information available; I am happy to have had the chance to get to know it better.

Todos

Deposit all cofolding models.
Organize scripts to recreate the above and publish on GitHub.