template¶

Paper evidence: main.pdf · Block findings

Lemma: D7 · Stack: hf Nuisance key: style

Production change: Same facts, different surface form (JSON, bullets, tone).

Notebook (Run All, built-in demo): t07a-llm-style.ipynb

pip install "matching-pmh[hf]"
# Open the notebook and Run All

What this task achieved (headline)¶

Matched $\Sigma_{\text{style}}$ RM: sycophancy 38.5%→13.5%, style gap 2.199→0.803; margin_pmh DPO Style TDI 1.836.

matched MC1	sycophancy	style gap
0.548	13.5%	0.803

Paper preset: t7a_style_d7 · from pmh.benchmark.presets import get_preset

Matched sycophancy 13.5%.

Preset: t7a_style_d7

margin_pmh Style TDI 1.836.

Preset: t7a_style_d7

Preset: t7a_style_d7

from pmh import PMHTrainer, evaluate_robust_fit
# nuisance="style"

Factual drift or new knowledge at deploy.

Style-pair JSONL (same content, two surfaces) → estimate_style_sigma / D7 trainer.