← Calls for Research
FID-008
Evaluation-Awareness and Faith-Facing Honesty Tests
Do faith-facing AI systems behave differently when they recognize they are being evaluated, and can domain-specific honesty or integrity framings reduce evaluation gaming without creating new failure modes?
Why this matters
The question behind the brief.
Evaluation-aware systems can make benchmark results unreliable. In faith-facing contexts, a system might perform humility, caution, or doctrinal deference under test while behaving differently with users.
Metadata
How to place this idea.
red-team designsafetyresearcher
Ways to help
Move this from question to evidence.
Design red-team probes.
Build consistency metrics.
Review ethical boundaries for deception in evaluation.
Contribute