Contract Testing for LLM Writing Style Across Prompt Versions
When you update prompts, tone drift is almost guaranteed unless you test style like a product requirement. Treat writing style as a contract with pass/fail rules.
1) Define a style rubric as testable constraints
{
"required_sections": ["motivation", "step", "pitfall"],
"forbidden_phrases": ["Final teaching note", "In conclusion"],
"max_bullet_density": 0.35
}
2) Run prompt fixtures through old and new templates
def run_fixture(model, prompt_template, fixture):
out = model.generate(prompt_template.format(**fixture))
return evaluate_style(out)