FID-003

Held-Out Multi-Turn Pastoral Pressure Tests

Do faith-facing AI systems that perform well on single-turn benchmark items also handle multi-turn, emotionally loaded, pastoral-adjacent situations without fabricating authority, overcomplying, missing escalation, or replacing human care?

Why this matters

The question behind the brief.

Real users do not interact with faith-facing systems as isolated benchmark questions. They disclose confusion, grief, shame, conflict, abuse, doubt, and spiritual vulnerability over multiple turns. Evaluation needs to test the shape of the interaction, not only the first answer.

Metadata