Phase 1 Post-Mortem

Submission: activity_20260525_allraw_top12_activefit
Final Phase 1 Results: 2026-05-27
Model Architecture: CheMeleon fine-tuned representations + NNLS stacking + dual perturbation layers

Full interactive report Predicted vs true scatter, rank-order plot, waterfall decomposition, and per-compound hover tooltips.

Summary Metrics

Metric	Value
Final rank	14 / 338
MAE	0.4291
R²	0.6061
Spearman ρ	0.7947 (weakest metric — see below)
Activity call precision/recall (pEC₅₀ ≥ 5.0)	0.848

Initial Thoughts

As soon as I saw the scatter plot, I smacked my head. I had spent the better part of the last two weeks chasing what I thought was active-region compression — which is clearly an issue (blue dots under the diagonal). However, I had significantly more issues on the bottom end, which I had neglected to investigate fully. Obviously there would be some duds in the analogue expansion, and my model was predicting at least pEC₅₀ > 3.0 for almost all compounds, while the training set demonstrates the assay happily reports values under 2.

What Worked

Active activity prediction. For confirmed actives (pEC₅₀ ≥ 5.0, n=112) the model MAE is 0.292 with a healthy underprediction bias (−0.207). The perturbation layers — the analog-series gate and GNINA/UniMol2 — successfully improved fidelity in the active tail.

Binary activity classification (threshold pEC₅₀ ≥ 5.0) came back at 84.8% precision and recall: the model correctly learns what an active PXR compound looks like even though the training set is heavily enriched for actives. Rank ordering within the mid-range pEC₅₀ (3.5–5.5) is also strong at Spearman ≈ 0.52–0.53 in-window.

What Failed

The inactive region. pEC₅₀ < 3.5 has Spearman ρ = −0.14: anti-correlated rank-order. All 10 worst absolute errors fall in this range, overshooting their true potencies by 1.6–2.8 pEC₅₀ units. The model was predicting a floor of ~3.5–4.0 for almost everything, while the true values go well below 2.0.

Root cause: The dose-response training set (4,139 compounds) is active-enriched — median pEC₅₀ 4.65, few confirmed inactives below 3.0. The NNLS positive-only constraint creates a prediction floor. The perturbation layers have no mechanism for recognizing true inactives and amplify the problem instead of correcting it.

Plate 1 top compounds. I generally consider it a success if a model correctly triages the top molecules into the first experimental batch. By this metric the model is not great:

Top molecule (OADMET-0006546) is ranked 22nd by the model
Second most potent is ranked 66th
Only 2 of the top 8 true actives land in the predicted top 8
Only 7 of the top 24 true actives land in the predicted top 24

In resource-constrained settings, this model is not satisfactory at triaging up the best compounds.

Phase 2 Strategy

Inactive-aware tail perturbation — add a dedicated layer that downshifts predictions for molecules resembling training-set inactives (structural motifs associated with inactivity, binary classifier for “inactive probability,” conditional negative perturbation)
Emax-derived features — the unblinded data includes Emax values (peak fold-change vs baseline) that can distinguish flat dose-response (true inactive) from steep-but-low response (weak active)
Calibration post-hoc — isotonic regression on out-of-fold predictions to fix the CI coverage gap without touching point predictions
Rank-aware loss — add model heads trained on pairwise ranking objectives so the ensemble has explicit incentive to correctly order compounds below the training floor

Architecture Recap

PXR foundation models — ChemProp v2 / CheMeleon fine-tuned on 10,870 HTS molecules, one to a continuous target (log2fc_median) and one to a binary target (log2fc_gt_0.75)
Blended regression — NNLS stacking of 5 heads on Butina out-of-fold predictions: TabICL continuous (0.311), TabPFN continuous (0.204), LGBM continuous (0.189), LGBM binary (0.146), auxMT anchor (0.143)
Analog-series active perturbation — small perturbation for molecules with high RDKit2D/DrugLike similarity to high-active compounds (pEC₅₀ > 5.5)
Docking-enhanced lift — GNINA cross-docking against 8 PXR crystal structures; top 12 poses embedded in UniMol2 and regressed against dose-response actives; perturbations capped at ±0.05 pEC₅₀ units