Phase 1 Post-Mortem
Submission: activity_20260525_allraw_top12_activefit
Final Phase 1 Results: 2026-05-27
Model Architecture: CheMeleon fine-tuned representations + NNLS stacking + dual perturbation layers
Summary Metrics
| Metric | Value |
|---|---|
| Final rank | 14 / 338 |
| MAE | 0.4291 |
| R² | 0.6061 |
| Spearman ρ | 0.7947 (weakest metric — see below) |
| Activity call precision/recall (pEC₅₀ ≥ 5.0) | 0.848 |
Initial Thoughts
As soon as I saw the scatter plot, I smacked my head. I had spent the better part of the last two weeks chasing what I thought was active-region compression — which is clearly an issue (blue dots under the diagonal). However, I had significantly more issues on the bottom end, which I had neglected to investigate fully. Obviously there would be some duds in the analogue expansion, and my model was predicting at least pEC₅₀ > 3.0 for almost all compounds, while the training set demonstrates the assay happily reports values under 2.
What Worked
Active activity prediction. For confirmed actives (pEC₅₀ ≥ 5.0, n=112) the model MAE is 0.292 with a healthy underprediction bias (−0.207). The perturbation layers — the analog-series gate and GNINA/UniMol2 — successfully improved fidelity in the active tail.
Binary activity classification (threshold pEC₅₀ ≥ 5.0) came back at 84.8% precision and recall: the model correctly learns what an active PXR compound looks like even though the training set is heavily enriched for actives. Rank ordering within the mid-range pEC₅₀ (3.5–5.5) is also strong at Spearman ≈ 0.52–0.53 in-window.
What Failed
The inactive region. pEC₅₀ < 3.5 has Spearman ρ = −0.14: anti-correlated rank-order. All 10 worst absolute errors fall in this range, overshooting their true potencies by 1.6–2.8 pEC₅₀ units. The model was predicting a floor of ~3.5–4.0 for almost everything, while the true values go well below 2.0.
Root cause: The dose-response training set (4,139 compounds) is active-enriched — median pEC₅₀ 4.65, few confirmed inactives below 3.0. The NNLS positive-only constraint creates a prediction floor. The perturbation layers have no mechanism for recognizing true inactives and amplify the problem instead of correcting it.
Plate 1 top compounds. I generally consider it a success if a model correctly triages the top molecules into the first experimental batch. By this metric the model is not great:
- Top molecule (OADMET-0006546) is ranked 22nd by the model
- Second most potent is ranked 66th
- Only 2 of the top 8 true actives land in the predicted top 8
- Only 7 of the top 24 true actives land in the predicted top 24
In resource-constrained settings, this model is not satisfactory at triaging up the best compounds.
Phase 2 Strategy
- Inactive-aware tail perturbation — add a dedicated layer that downshifts predictions for molecules resembling training-set inactives (structural motifs associated with inactivity, binary classifier for “inactive probability,” conditional negative perturbation)
- Emax-derived features — the unblinded data includes Emax values (peak fold-change vs baseline) that can distinguish flat dose-response (true inactive) from steep-but-low response (weak active)
- Calibration post-hoc — isotonic regression on out-of-fold predictions to fix the CI coverage gap without touching point predictions
- Rank-aware loss — add model heads trained on pairwise ranking objectives so the ensemble has explicit incentive to correctly order compounds below the training floor
Architecture Recap
- PXR foundation models — ChemProp v2 / CheMeleon fine-tuned on 10,870 HTS molecules, one to a continuous target (
log2fc_median) and one to a binary target (log2fc_gt_0.75) - Blended regression — NNLS stacking of 5 heads on Butina out-of-fold predictions: TabICL continuous (0.311), TabPFN continuous (0.204), LGBM continuous (0.189), LGBM binary (0.146), auxMT anchor (0.143)
- Analog-series active perturbation — small perturbation for molecules with high RDKit2D/DrugLike similarity to high-active compounds (pEC₅₀ > 5.5)
- Docking-enhanced lift — GNINA cross-docking against 8 PXR crystal structures; top 12 poses embedded in UniMol2 and regressed against dose-response actives; perturbations capped at ±0.05 pEC₅₀ units