13 paper tasks (T1 → T7)¶
Tasks are listed in paper order. Your pipeline does not need to match a paper ID — pick the row whose deploy change sounds like yours, open the notebook, Run All on the demo, then edit §8 with your data.
Full examples and estimation detail: README — Find your deployment story.
Matching principle (main.pdf): estimate $\Sigma_{\text{task}}$ → matched PMH on h → Step 5 (matched vs wrong vs isotropic on deploy holdout).
Which task fits your deploy change?¶
| Task | What changes at deploy | Examples | What we estimate | nuisance= |
|---|---|---|---|---|
| T1 | Frozen embeddings shift between sites | Office-31; two hospitals’ features; lab A→B tabular | Source−target subspace on features | subspace |
| T2A | Generic input noise / corruption | ImageNet-C; camera noise; blur/JPEG | Isotropic noise level σ | isotropic |
| T2B | Scanner / hospital appearance on X-ray | CheXpert site shift; DICOM pipeline change | Isotropic σ (medical deploy stress) | isotropic |
| T3A | Camera/lighting; same keypoints | Studio→in-the-wild pose; broadcast→fan photos | Augmentation feature deltas | augmentation |
| T3B | Photometric shift; depth meaning fixed | Lighting on depth maps; synthetic→real RGB-D | Augmentation deltas | augmentation |
| T4A | New camera, site, or visual domain | Photo→sketch; warehouse A→B; day→night cls | Train vs deploy feature Gram | domain_shift |
| T4B | Sim→real texture + layout (segmentation) | GTA5→Cityscapes; synthetic IR→real seg | Domain Gram (multilayer in paper) | domain_shift |
| T5A | Atom positions move; property label fixed | QM9 conformers; docked poses | Nuisance coordinates (positions) | compositional |
| T5B | Token groups change; task label fixed | Renames; comment strip; obfuscation | Nuisance token/block indices | compositional |
| T6A | Mic, room, codec — same words | Libri conditions; new microphone | Temporal / content-residual (see doc) | temporal |
| T6B | Sensor drift over time | HAR placement; IMU aging | Temporal residual on sequences | temporal |
| T7A | Tone/format; facts unchanged | Bulleted vs prose; formal vs casual bot | Style pairs (same content) | style |
| T7B | Adversarial perturbations at deploy | PGD robustness; spoof patches | Subspace from attack deltas | style (PGD path) |
T1 bundles seven classical subtasks in one notebook. T2–T7 map to blocks in main.pdf. Clone any row for a similar deploy change — not only the benchmark named in the paper.
Regenerate: python scripts/render_handcrafted_tasks.py