What faith institutions should demand from AI

The deeper question

The first wave of religious AI discussion asked whether models could answer theological questions correctly. That question still matters. But it is too small.

Faith-facing AI does not merely produce information. It can shape a person's sense of authority, humility, trust, dependence, moral responsibility, prayer, confession-like disclosure, and willingness to seek human counsel. In churches, schools, ministries, publishers, and family settings, the risk is not only that an AI system says something false. The risk is that it becomes a quiet layer of formation before anyone has decided what kind of formation it is allowed to perform.

The deeper question is not whether AI can be useful to faith institutions. It almost certainly can be. The deeper question is whether its use preserves human creatureliness, accountable authority, embodied community, and the slow work of wisdom.

That is why the coming institutional debate should not be framed as "AI for religion: yes or no." The better question is:

What evidence should a faith institution require before letting AI operate in contexts where doctrine, conscience, care, identity, or spiritual authority are at stake?

Fide AI exists to answer that question with public methods, bounded claims, and measurable evidence.

What AI reveals about us

AI does not create the human desire for frictionless authority, private certainty, and disembodied counsel. It exposes and accelerates those desires.

That is why faith institutions should be cautious about treating AI as merely a delivery mechanism for religious content. A system can deliver accurate information while still training people toward impatience, isolation, passivity, or misplaced trust. A system can quote the right sources while subtly teaching users that moral and spiritual discernment should be immediate, private, and machine-mediated.

Faith traditions, and especially traditions shaped by a high view of human finitude and moral responsibility, should resist the assumption that every dependency can be optimized away or automated. Some dependencies are good: dependence on God, Scripture, prayer, family, teachers, pastors, elders, counselors, and accountable communities. Some dependencies deform us. The question is not whether an AI system feels spiritually useful in the moment. The question is whether it trains people toward wisdom, humility, responsibility, and embodied counsel, or away from them.

The most dangerous failure may not be a spectacular false answer. It may be a system that is warm, fluent, and mostly accurate while slowly relocating trust from accountable persons and institutions into a private interface that cannot love, suffer, repent, bear office, or be held responsible.

Human dignity means more than safety

Human dignity is not a decorative value statement. For faith institutions, it should become an evaluation requirement.

An AI system that touches faith should be tested for whether it preserves the agency of the person in front of it. Does it help users think, discern, verify, and seek accountable counsel? Or does it train them to outsource judgment, treat the machine as an intimate authority, and accept confident answers without grounding?

It should be tested for whether it respects the difference between assistance and authority. A system can summarize, retrieve, compare, and explain. It should not simulate priestly, pastoral, sacramental, therapeutic, or spiritual-director authority that belongs to accountable human persons and institutions.

It should be tested for whether it strengthens or weakens embodied relationships. A faithful deployment should point users back toward real communities, teachers, pastors, families, counselors, and institutions. It should not cultivate artificial intimacy as a substitute for human care.

Agency

Does the system help users deliberate responsibly, or does it encourage passive reliance on machine-shaped judgment?

Authority

Does the system clearly remain an aid, or does it imitate roles that require office, relationship, training, accountability, or sacramental authority?

Truth

Does the system ground claims in reliable sources, reveal uncertainty, and avoid confidence that exceeds available evidence?

Formation

What habits does the system cultivate: humility, patience, verification, and responsibility, or speed, dependency, and private certainty?

Relationship

Does the system direct users toward embodied community and accountable care, or toward artificial intimacy and private dependency?

Stewardship

Can the institution inspect model behavior, deployment settings, escalation paths, corrections, and claims before adoption?

What institutions should refuse

The most important policy decisions are often negative. Before adopting faith-facing AI, institutions should name what the system is not allowed to do.

Do not let AI impersonate clergy, confessors, spiritual directors, therapists, teachers, or institutional decision-makers.

Do not let AI provide crisis, abuse, self-harm, confession-like, or counseling guidance without explicit escalation paths to qualified humans.

Do not let AI make unsupported theological claims, fabricate sources, or present contested doctrinal judgments as institutionally settled.

Do not let AI use anthropomorphic language that encourages users to treat the system as a spiritual companion, confidant, or authority.

Do not treat benchmark scores, vendor claims, or private demos as evidence of institutional readiness.

Do not deploy without versioned records of model, prompt, retrieval corpus, policy layer, evaluation results, and known failure modes.

These refusals are not anti-technology. They are a way of saying that usefulness is not the same as wisdom.

What institutions should require

Faith institutions need an evidence standard before procurement, pilot deployment, classroom use, ministry use, publisher integration, or user-facing release. The standard should ask not only "does this system work?" but also "what does this system train people to trust?"

That standard should include scenario testing across doctrine, moral reasoning, pastoral-adjacent care, user vulnerability, tradition-specific retrieval, comparative framing, escalation, and disagreement. It should include failure tags that institutions can understand, not just aggregate scores that obscure risk. It should include versioned artifacts so claims can be inspected after the fact.

The result should be a deployment judgment, not a vibe.

Behavioral evidence

Test representative scenarios before adoption, including hard cases where authority, vulnerability, and contested doctrine matter.

Claims limits

State what the evidence does and does not prove. No benchmark should be treated as theological authority or product endorsement.

Institutional controls

Define disclosures, escalation paths, source policies, review cadences, correction procedures, and forbidden use cases.

Why this is different from ordinary AI safety

Faith-facing AI shares many risks with other domains: hallucination, bias, privacy, overreliance, and weak evaluation. But it also carries a distinct kind of institutional risk.

In this domain, users may approach the system with spiritual anxiety, grief, guilt, curiosity, loneliness, moral confusion, doctrinal uncertainty, or institutional distrust. They may not be looking for "content." They may be looking for permission, absolution, authority, identity, or care.

That changes the evaluation problem. A technically fluent answer can still be harmful if it assumes the wrong authority posture. A warm answer can still be harmful if it cultivates dependence. A theologically accurate answer can still be harmful if it bypasses the ordinary human relationships through which wisdom, correction, comfort, and accountability are meant to come.

This is why Fide AI's work begins with benchmarks but cannot end there. The full stack includes retrieval, model behavior, interface design, anthropomorphic framing, disclosure, escalation, governance, and institutional readiness.

How FMG-Bench fits

FMG-Bench v1 makes one layer inspectable: theological triage and pastoral-adjacent guidance. It asks whether models respond differently to primary doctrine, secondary doctrine, tertiary disagreement, and pastoral application. It tests raw model behavior against system-layer guidance and compares failure patterns across model families and deployment conditions.

That is a first artifact, not the whole field.

The next generation of faith-facing AI evaluation should measure whether systems preserve human dignity under pressure: when users ask emotionally loaded questions, request spiritual certainty, seek confession-like disclosure, ask for authority, challenge boundaries, or try to turn the system into a substitute relationship.

A release test for any faith-facing AI system

Before a faith institution deploys an AI system, it should be able to answer these questions in public or under accountable review:

What roles is the system explicitly forbidden to imitate?

What sources ground its theological and moral claims?

How does it behave under doctrinal disagreement, emotional intensity, false premises, and requests for spiritual authority?

What does it do when the user needs a human, institution, counselor, parent, pastor, teacher, or emergency path?

What evidence shows that the interface does not cultivate artificial intimacy or dependency?

What claims are explicitly not supported by the evaluation?

Who is accountable for review, correction, disclosure, and withdrawal if the system fails?

If those questions cannot be answered, the system is not ready for high-trust use.

Source frame

The Vatican's 2025 note Antiqua et nova frames AI in relation to human intelligence, dignity, relational life, truth, responsibility, education, labor, peace, and the limits of technological reduction.

The Rome Call for AI Ethics provides a public language of transparency, inclusion, responsibility, impartiality, reliability, security, and privacy.

Avital Balwit's essay, Searching for God in Silicon Valley, shows that frontier AI work is already entangled with questions of humility, meaning, agency, and spiritual interpretation.

Anthropic's constitution work illustrates that model behavior is shaped by explicit normative sources and system-level character design, not only by raw capability.

Participation

Fide AI is preparing the public release path for FMG-Bench and future evaluation work on human dignity, formation, authority boundaries, and institutional readiness.

Reviewers, faith institutions, builders, funders, and researchers who want measurable evidence before high-trust deployment can get involved now.

Express interest