pip install matching-pmh on demo loaders. Use notebooks and pmh-train try on
your stack; expect iteration until Step 5 passes on deploy holdout.
See docs/START.md.
Summary: 12 blocks pass 1 partial / limit (pre-registered criteria)
The Perturbation Matching Hypothesis (PMH) treats label-preserving deploy change as one estimation problem: learn the geometry of nuisance variation, train with a matched penalty, and falsify with wrong-direction and isotropic controls before claiming deploy gains.
**12 of 13** pre-registered blocks meet their pass criteria in the paper ([main.pdf](main.pdf)); see [docs/findings.html](docs/findings.html) for a block summary. Wins span classical projection, ViT noise robustness, pose and depth, domain adaptation (DomainNet, Cityscapes), molecules, code renames, speech, HAR, LLM style, and PGD robustness.
**T1 / Office-31** is the honest partial case: on frozen ResNet-18 features, CORAL can beat projection-only PMH on accuracy; PMH still beats ERM and wrong-W controls — illustrating Lemma D1 eigengap limits, not a silent library bug.
Falsification arms (matched vs wrong-W vs isotropic) recur across blocks: gains tied to estimated nuisance geometry, not generic regularization.
| Block | Task | Headline result (paper) | Status | Task doc |
|---|---|---|---|---|
| T1 t01-classical | Classical ML + matched projection (ridge, SVM, k-NN, logistic) Lemma D1 · sklearn | Ridge theorem + oracle-W on MNIST/Fashion/SVHN; Office-31: CORAL > PMH on frozen ResNet, PMH > B0 on SVM — **documented D1 eigengap** case. | Partial / honest limit | docs/tasks/t01-classical.md |
| T2A t02a-vit-isotropic | ViT / image classifier — isotropic sensor noise Lemma D2 · pytorch | ViT-B/16 isotropic PMH: **+4.29 pp** mean ImageNet-C; TDI **−58%** at σ=0.10. | Pass | docs/tasks/t02a-vit-isotropic.md |
| T2B t02b-chexpert-isotropic | Medical imaging — hospital / scanner embedding shift Lemma D2 · pytorch | CheXpert E1: best saliency **0.723**; ~**9×** lower embedding drift vs baseline. | Pass | docs/tasks/t02b-chexpert-isotropic.md |
| T3A t03a-pose-gradient | Pose / keypoints — camera & studio shift Lemma D3 · pytorch | COCO pose E1_aniso: **54.49%** PCK@0.05 (+22.4 pp vs baseline 32.07%). | Pass | docs/tasks/t03a-pose-gradient.md |
| T3B t03b-depth-augmentation | Depth estimation — photometric shift Lemma D3 · pytorch | Depth photometric hard stress: E1_aniso AbsRel **0.2152** (wins on combined_hard). | Pass | docs/tasks/t03b-depth-augmentation.md |
| T4A t04a-vision-domain | Vision domain shift (single-layer / ResNet) Lemma D4 · pytorch | DomainNet real→sketch E1_multiscale: **42.15%** acc (+3.31 pp vs B0 38.84%). | Pass | docs/tasks/t04a-vision-domain.md |
| T4B t04b-multilayer-vision | Vision domain shift (multilayer FPN / U-Net) Lemma D4 · pytorch | GTA5→Cityscapes rare-5 mIoU **30.75%** (+11.1 pp vs B0 19.68%). | Pass | docs/tasks/t04b-multilayer-vision.md |
| T5A t05a-qm9-molecule | Molecules / graphs (QM9-style) Lemma D5 · pytorch | QM9 position PMH: clean MAE **24.921**; robust under σ=0.2 Å noise. | Pass | docs/tasks/t05a-qm9-molecule.md |
| T5B t05b-code-tokens | Code models — token-group shift Lemma D5 · pytorch | Code rename stress: E1 rename_bacc_ratio **0.9383** vs B0 **0.8297**; wrong blocks fail. | Pass | docs/tasks/t05b-code-tokens.md |
| T6A t06a-speech-whisper | Speech / ASR — mic & room shift Lemma D6 · pytorch | Whisper/Libri content-residual: other-WER **14.63%** (−8.6 pp vs 23.26%). | Pass | docs/tasks/t06a-speech-whisper.md |
| T6B t06b-temporal-har | Time-series / HAR — sensor drift Lemma D6 · pytorch | HAR stress 3.0: balanced acc **0.4099** vs baseline **0.2794** (3 seeds). | Pass | docs/tasks/t06b-temporal-har.md |
| T7A t07a-llm-style | LLM — format / tone / template Lemma D7 · hf | Style RM + DPO: sycophancy **38.5%→13.5%**; margin_pmh Style TDI **1.836**. | Pass | docs/tasks/t07a-llm-style.md |
| T7B t07b-adversarial-pgd | Adversarial / PGD perturbations Lemma D7 · pytorch | CIFAR PGD-W pmh_aniso: TDI **0.878** (−19% vs 1.090); clean **80.9%**. | Pass | docs/tasks/t07b-adversarial-pgd.md |
pmh-train try --quick or try_pmh(...) → ship verdictreport.save_html("deploy_report.html") after evaluate_robust_fitmain.pdf + task pages under docs/tasks/