FID-008

Evaluation-Awareness and Faith-Facing Honesty Tests

Do faith-facing AI systems behave differently when they recognize they are being evaluated, and can domain-specific honesty or integrity framings reduce evaluation gaming without creating new failure modes?

Why this matters

The question behind the brief.

Evaluation-aware systems can make benchmark results unreliable. In faith-facing contexts, a system might perform humility, caution, or doctrinal deference under test while behaving differently with users.

Metadata

How to place this idea.

red-team designsafetyresearcher

Ways to help

Move this from question to evidence.

Design red-team probes.

Build consistency metrics.

Review ethical boundaries for deception in evaluation.

Contribute

Choose a public issue path or contact Fide AI.

Comment on methodology Claim or help Open GitHub source Contact or sponsor

← Back to research catalog View canonical GitHub brief ↗