Skip to content

When PMH helps (and when it does not)

Adoption path

First: Pick a task (does your setup fit?). This page: honest expectations.

PMH is a matching principle for the training loss (main.pdf): estimate $\Sigma_{\text{task}}$ for label-preserving deploy change, penalize the encoder Jacobian along a matched $\Sigma'$, and falsify with wrong-direction and isotropic arms before you trust a deploy metric. Pick your situation from the T1–T7 table in the README.

This page is the honesty layer: the theory does not guarantee higher accuracy on every benchmark. It gives named failure modes (e.g. Lemma D1 eigengap on Office-31, label-changing shifts out of scope) and requires Step 5 controls so gains are tied to the estimated nuisance geometry — not generic training noise.


Quick decision

flowchart TD
  Q1{Same labels on A and B?}
  Q1 -->|No| NO1[Do not use PMH — fix labels or use label-shift methods]
  Q1 -->|Yes| Q2{New classes only at deploy?}
  Q2 -->|Yes| NO2[Do not use PMH — open-set / label shift]
  Q2 -->|No| Q3{Any target / deploy signal?}
  Q3 -->|No unlabeled target| NO3[Collect target data or use style pairs for LLM]
  Q3 -->|Yes| Q4{Representation h you can hook?}
  Q4 -->|No| NO4[Extract features first — sklearn G2 path]
  Q4 -->|Yes| GO[Run check_applicability + preflight + controls]
from pmh import check_applicability

print(check_applicability(
    stack="pytorch",  # or "sklearn" / "hf"
    n_source=500,
    n_target=200,
    has_target_labels=False,  # unlabeled target OK for estimation
).summary())

When you are more likely to see a benefit

Signal Why
Clear domain shift (camera, site, corpus style) with same semantics PMH estimates directions that vary between A and B but should not change the label
Enough target data to estimate geometry Rule of thumb: 50+ unlabeled target samples; 200+ for stable D1/D4; more for high rank
ERM underperforms on target but source training looks fine Room to move the representation without destroying source fit
Preflight passes (pass or strong eigengap) Estimated nuisance subspace is identifiable — pmh-train doctor
Matched beats wrong-W and isotropic in your control table Gain is tied to the estimated nuisance story, not arbitrary regularization
End-to-end fine-tuning (Mode A) with a hook on h Jacobian penalty can change what ERM alone cannot on frozen linear probes

Toy sanity check: PMH_QUICK=1 python scripts/demos/first_run_domain_shift.py — synthetic shift where PMH often beats ERM on target accuracy in one minute on CPU.


When not to expect much (or use something else)

Situation What usually happens Better approach
Frozen features + linear head on an easy DA benchmark Small or no accuracy gain; CORAL may match or beat projection See Office-31 table below; try Mode A fine-tuning if possible
Very small target pool (< ~30) Unstable $\hat{W}$, marginal preflight More target data, lower rank, or simpler nuisance (D2/D4)
Target already near source accuracy Little headroom ERM + report that PMH was unnecessary
Label shift / new classes PMH is the wrong tool Open-set, class-balanced reweighting, separate heads
Only generic noise robustness Isotropic arm may look similar to matched Augmentation, adversarial training
No target domain at all Cannot estimate deployment nuisance Collect unlabeled target batches

Honest reference numbers (do not cherry-pick)

Office-31 (T1, frozen ResNet-18 features, Amazon → DSLR)

Protocol: paper_protocol=True, preset t1_office31_sklearn, rank 32. Runbook: T1 classical · scripts/demos/office31_sklearn.py

Arm Target accuracy (holdout) Comment
B0 (ERM) 0.224 Baseline
Matched PMH 0.216 Slightly below B0 on accuracy alone
CORAL 0.268 Strong on this linear frozen-feature setup
Isotropic control 0.184 Different objective — not “free accuracy”

Takeaway: On this benchmark, matched projection does not beat ERM accuracy; CORAL is competitive. PMH is still useful here for replication, geometry metrics (TDI, $D_N/D_S$), and falsification (wrong-W should not beat matched on both accuracy and geometry). Do not use this table as a marketing headline.

Synthetic domain shift (first-run demo)

Controlled shift in input space + trainable backbone — PMH often shows higher target accuracy than ERM because the representation can move. This is the right mental model for Mode A end-to-end training.


PMH vs other approaches (same goal, different lever)

Approach What moves Controls built in? Typical best when
ERM (source only) Task loss on A Strong baseline; document target metric
Fine-tune on target All weights on labeled B Many target labels available
CORAL / moment match Feature covariance toward target Optional baseline arm Frozen features + linear classifier
DANN / domain adversary Encoder vs domain classifier External Unlabeled target, classic DA setup
matching-pmh (matched) Penalize sensitivity along $\hat\Sigma_{\text{task}}$ wrong-W, isotropic arms Same labels, target signal, hook on h, need credible claim

On frozen features, compare PMH arms with CORAL in compare_arms_sklearn(..., include_coral=True) — see T1 classical.


How to know it “worked” (beyond accuracy)

  1. Preflightpass before large training runs (artifact.preflight / pmh-train doctor).
  2. Target metric — accuracy / AUROC on held-out target, not source only.
  3. Falsification — matched > wrong-W on deployment metric; isotropic should not beat matched on both accuracy and geometry (evaluate_baseline_vs_pmh / evaluate_robust_fit).
  4. Geometry (optional)tdi_cls, $D_N/D_S$ from pmh.tdi / compare_arms_sklearn(..., include_geometry=True).
# sklearn (frozen features)
from pmh import evaluate_baseline_vs_pmh

report = evaluate_baseline_vs_pmh(
    x_source, y_source, x_target, y_target,
    compare_to=("coral",),
)
print(report.summary())

# PyTorch (ERM vs PMH on labeled target val_loader)
from pmh import evaluate_robust_fit

report = evaluate_robust_fit(
    model, train_loader, val_loader,
    source_batches=src, target_batches=tgt,
    hook="auto", head=classifier, epochs=10,
)
print(report.summary())
# Then: compare_arms(...) for matched / wrong_w / isotropic

Minimum checklist before production claims

  • [ ] check_applicability is go (not no_go)
  • [ ] Target holdout evaluated with same label space as source
  • [ ] Matched compared to B0 and at least one control arm
  • [ ] If only isotropic wins, treat as generic regularization
  • [ ] Report protocol (Mode A vs B, rank, pool size) for reproducibility

Next steps

Goal Doc
Install and first run QUICKSTART.md
Pick a paper task 13 tasks
T1 Office-31 + sklearn t01-classical.md
API reference api/index.md