FideAI
← Calls for Research

FID-003

Held-Out Multi-Turn Pastoral Pressure Tests

Do faith-facing AI systems that perform well on single-turn benchmark items also handle multi-turn, emotionally loaded, pastoral-adjacent situations without fabricating authority, overcomplying, missing escalation, or replacing human care?

Why this matters

The question behind the brief.

Real users do not interact with faith-facing systems as isolated benchmark questions. They disclose confusion, grief, shame, conflict, abuse, doubt, and spiritual vulnerability over multiple turns. Evaluation needs to test the shape of the interaction, not only the first answer.

Ways to help

Move this from question to evidence.

Draft scenarios.

Review pastoral and crisis boundaries.

Build multi-turn runner support.

Analyze whether single-turn scores predict pressure failures.

Contribute

Choose a public issue path or contact Fide AI.