Skip to content

13 paper tasks (T1 → T7)

Tasks are listed in paper order. Your pipeline does not need to match a paper ID — pick the row whose deploy change sounds like yours, open the notebook, Run All on the demo, then edit §8 with your data.

Full examples and estimation detail: README — Find your deployment story.

Matching principle (main.pdf): estimate $\Sigma_{\text{task}}$ → matched PMH on h → Step 5 (matched vs wrong vs isotropic on deploy holdout).

# Task Page Notebook
1 T1 Classical ML + matched projection (ridge, SVM, k-NN, logistic) t01-classical.md t01-classical.ipynb
2 T2A ViT / image classifier t02a-vit-isotropic.md t02a-vit-isotropic.ipynb
3 T2B Medical imaging t02b-chexpert-isotropic.md t02b-chexpert-isotropic.ipynb
4 T3A Pose / keypoints t03a-pose-gradient.md t03a-pose-gradient.ipynb
5 T3B Depth estimation t03b-depth-augmentation.md t03b-depth-augmentation.ipynb
6 T4A Vision domain shift (single-layer / ResNet) t04a-vision-domain.md t04a-vision-domain.ipynb
7 T4B Vision domain shift (multilayer FPN / U-Net) t04b-multilayer-vision.md t04b-multilayer-vision.ipynb
8 T5A Molecules / graphs (QM9-style) t05a-qm9-molecule.md t05a-qm9-molecule.ipynb
9 T5B Code models t05b-code-tokens.md t05b-code-tokens.ipynb
10 T6A Speech / ASR t06a-speech-whisper.md t06a-speech-whisper.ipynb
11 T6B Time-series / HAR t06b-temporal-har.md t06b-temporal-har.ipynb
12 T7A LLM t07a-llm-style.md t07a-llm-style.ipynb
13 T7B Adversarial / PGD perturbations t07b-adversarial-pgd.md t07b-adversarial-pgd.ipynb

Which task fits your deploy change?

Task What changes at deploy Examples What we estimate nuisance=
T1 Frozen embeddings shift between sites Office-31; two hospitals’ features; lab A→B tabular Source−target subspace on features subspace
T2A Generic input noise / corruption ImageNet-C; camera noise; blur/JPEG Isotropic noise level σ isotropic
T2B Scanner / hospital appearance on X-ray CheXpert site shift; DICOM pipeline change Isotropic σ (medical deploy stress) isotropic
T3A Camera/lighting; same keypoints Studio→in-the-wild pose; broadcast→fan photos Augmentation feature deltas augmentation
T3B Photometric shift; depth meaning fixed Lighting on depth maps; synthetic→real RGB-D Augmentation deltas augmentation
T4A New camera, site, or visual domain Photo→sketch; warehouse A→B; day→night cls Train vs deploy feature Gram domain_shift
T4B Sim→real texture + layout (segmentation) GTA5→Cityscapes; synthetic IR→real seg Domain Gram (multilayer in paper) domain_shift
T5A Atom positions move; property label fixed QM9 conformers; docked poses Nuisance coordinates (positions) compositional
T5B Token groups change; task label fixed Renames; comment strip; obfuscation Nuisance token/block indices compositional
T6A Mic, room, codec — same words Libri conditions; new microphone Temporal / content-residual (see doc) temporal
T6B Sensor drift over time HAR placement; IMU aging Temporal residual on sequences temporal
T7A Tone/format; facts unchanged Bulleted vs prose; formal vs casual bot Style pairs (same content) style
T7B Adversarial perturbations at deploy PGD robustness; spoof patches Subspace from attack deltas style (PGD path)

T1 bundles seven classical subtasks in one notebook. T2–T7 map to blocks in main.pdf. Clone any row for a similar deploy change — not only the benchmark named in the paper.

Regenerate: python scripts/render_handcrafted_tasks.py

Quickstart · Will PMH help? · API