PMH paper — block findings

Summary: 12 blocks pass 1 partial / limit (pre-registered criteria)

The Perturbation Matching Hypothesis (PMH) treats label-preserving deploy change as one estimation problem: learn the geometry of nuisance variation, train with a matched penalty, and falsify with wrong-direction and isotropic controls before claiming deploy gains.

**12 of 13** pre-registered blocks meet their pass criteria in the paper ([main.pdf](main.pdf)); see [docs/findings.html](docs/findings.html) for a block summary. Wins span classical projection, ViT noise robustness, pose and depth, domain adaptation (DomainNet, Cityscapes), molecules, code renames, speech, HAR, LLM style, and PGD robustness.

**T1 / Office-31** is the honest partial case: on frozen ResNet-18 features, CORAL can beat projection-only PMH on accuracy; PMH still beats ERM and wrong-W controls — illustrating Lemma D1 eigengap limits, not a silent library bug.

Falsification arms (matched vs wrong-W vs isotropic) recur across blocks: gains tied to estimated nuisance geometry, not generic regularization.

Thirteen blocks (T1–T7)

Block	Task	Headline result (paper)	Status	Task doc
T1 t01-classical	Classical ML + matched projection (ridge, SVM, k-NN, logistic) Lemma D1 · sklearn	Ridge theorem + oracle-W on MNIST/Fashion/SVHN; Office-31: CORAL > PMH on frozen ResNet, PMH > B0 on SVM — documented D1 eigengap case.	Partial / honest limit	`docs/tasks/t01-classical.md`
T2A t02a-vit-isotropic	ViT / image classifier — isotropic sensor noise Lemma D2 · pytorch	ViT-B/16 isotropic PMH: +4.29 pp mean ImageNet-C; TDI −58% at σ=0.10.	Pass	`docs/tasks/t02a-vit-isotropic.md`
T2B t02b-chexpert-isotropic	Medical imaging — hospital / scanner embedding shift Lemma D2 · pytorch	CheXpert E1: best saliency 0.723; ~9× lower embedding drift vs baseline.	Pass	`docs/tasks/t02b-chexpert-isotropic.md`
T3A t03a-pose-gradient	Pose / keypoints — camera & studio shift Lemma D3 · pytorch	COCO pose E1_aniso: 54.49% PCK@0.05 (+22.4 pp vs baseline 32.07%).	Pass	`docs/tasks/t03a-pose-gradient.md`
T3B t03b-depth-augmentation	Depth estimation — photometric shift Lemma D3 · pytorch	Depth photometric hard stress: E1_aniso AbsRel 0.2152 (wins on combined_hard).	Pass	`docs/tasks/t03b-depth-augmentation.md`
T4A t04a-vision-domain	Vision domain shift (single-layer / ResNet) Lemma D4 · pytorch	DomainNet real→sketch E1_multiscale: 42.15% acc (+3.31 pp vs B0 38.84%).	Pass	`docs/tasks/t04a-vision-domain.md`
T4B t04b-multilayer-vision	Vision domain shift (multilayer FPN / U-Net) Lemma D4 · pytorch	GTA5→Cityscapes rare-5 mIoU 30.75% (+11.1 pp vs B0 19.68%).	Pass	`docs/tasks/t04b-multilayer-vision.md`
T5A t05a-qm9-molecule	Molecules / graphs (QM9-style) Lemma D5 · pytorch	QM9 position PMH: clean MAE 24.921; robust under σ=0.2 Å noise.	Pass	`docs/tasks/t05a-qm9-molecule.md`
T5B t05b-code-tokens	Code models — token-group shift Lemma D5 · pytorch	Code rename stress: E1 rename_bacc_ratio 0.9383 vs B0 0.8297; wrong blocks fail.	Pass	`docs/tasks/t05b-code-tokens.md`
T6A t06a-speech-whisper	Speech / ASR — mic & room shift Lemma D6 · pytorch	Whisper/Libri content-residual: other-WER 14.63% (−8.6 pp vs 23.26%).	Pass	`docs/tasks/t06a-speech-whisper.md`
T6B t06b-temporal-har	Time-series / HAR — sensor drift Lemma D6 · pytorch	HAR stress 3.0: balanced acc 0.4099 vs baseline 0.2794 (3 seeds).	Pass	`docs/tasks/t06b-temporal-har.md`
T7A t07a-llm-style	LLM — format / tone / template Lemma D7 · hf	Style RM + DPO: sycophancy 38.5%→13.5%; margin_pmh Style TDI 1.836.	Pass	`docs/tasks/t07a-llm-style.md`
T7B t07b-adversarial-pgd	Adversarial / PGD perturbations Lemma D7 · pytorch	CIFAR PGD-W pmh_aniso: TDI 0.878 (−19% vs 1.090); clean 80.9%.	Pass	`docs/tasks/t07b-adversarial-pgd.md`

Use the library

Golden path: pmh-train try --quick or try_pmh(...) → ship verdict
Your metrics: report.save_html("deploy_report.html") after evaluate_robust_fit
Read a block: main.pdf + task pages under docs/tasks/