T7A — LLM — format / tone / template¶
Paper evidence: main.pdf · Block findings
Lemma: D7 · Stack: hf
Nuisance key: style
Production change: Same facts, different surface form (JSON, bullets, tone).
Notebook (Run All, built-in demo): t07a-llm-style.ipynb
pip install "matching-pmh[hf]"
# Open the notebook and Run All
What this task achieved (headline)¶
Matched $\Sigma_{\text{style}}$ RM: sycophancy 38.5%→13.5%, style gap 2.199→0.803; margin_pmh DPO Style TDI 1.836.
| matched MC1 | sycophancy | style gap |
|---|---|---|
| 0.548 | 13.5% | 0.803 |
Paper preset: t7a_style_d7 · from pmh.benchmark.presets import get_preset
Subtasks (paper)¶
RM behavioral eval (TQA n=500)¶
Matched sycophancy 13.5%.
Preset: t7a_style_d7
Geometric DPO + style geometry¶
margin_pmh Style TDI 1.836.
Preset: t7a_style_d7
Synthetic alignment pipeline¶
Preset: t7a_style_d7
Run with matching-pmh¶
from pmh import PMHTrainer, evaluate_robust_fit
# nuisance="style"
Do not use PMH when¶
Factual drift or new knowledge at deploy.
Replace demo data with yours¶
Style-pair JSONL (same content, two surfaces) → estimate_style_sigma / D7 trainer.