The CTO AI Readiness Audit: Five Questions Before the First Deployment

Most enterprise AI incidents are not surprises. The failure mode existed before go-live. It was present in the architecture, the data layer, the permission model, or the absence of a monitoring plan. Nobody surfaced it because no structured pre-deployment review existed, and nobody thought to ask the five questions that would have identified it.

The discovery happens in production, which is the most expensive place to discover it.

The Pre-Deployment Gap

A five-gate readiness audit showing ownership, data control, permission boundaries, failure monitoring, and rollback planning before launch.

The pre-deployment gap is not a technical problem. It is a process problem. The engineering team built the system to specification. The vendor delivered the integration. The project sponsor signed off. And nobody asked the questions that distinguish a system ready for production from a system that will create an incident within ninety days.

The five questions below are not a compliance checklist. They are a structural diagnosis of whether the organization has what is required for the deployment to succeed and remain trustworthy over time. They apply whether the AI system is a vendor-supplied SaaS integration, an internally built RAG application, or an agentic workflow built on top of a foundation model API.

The cost of skipping the audit is asymmetric. The gap discovered after an incident arrives with legal exposure, reputational cost, regulatory notification obligations, and a remediation timeline that runs backward through the deployment history. The gap discovered before go-live is a design problem.

Question One: Who Owns the Output?

For every output this AI system produces, who is responsible for its quality, and who is accountable when it is wrong?

The ghost system failure is the most common early AI deployment failure mode: a system deployed without a named output owner accumulates errors without feedback. Nobody’s job includes reviewing whether the outputs are still correct. Nobody notices when quality degrades because quality is nobody’s metric. Nobody escalates when the system produces a harmful output because nobody is watching.

The output owner is not the vendor who built the system. It is not the IT team that operates the infrastructure. It is the business function whose process the system serves, represented by a named individual who understands the system’s failure modes and has a defined review cadence.

Before go-live, verify: the output owner is identified by name, has been briefed on how the system fails (not just how it succeeds), has a review schedule that predates the first user query, and has a clear escalation path for the scenario where the system produces something that should not have reached a user.

If the answer to “who owns the output” is “the vendor” or “the IT team,” the system is not ready for production. Business ownership of AI output is not optional in an AI-First deployment.

Question Two: What Data Is the System Using and Who Controls It?

What data sources does the system access? Who controls access to that data? What happens when that data changes?

Enterprise data changes continuously. Schema updates in the ERP, new document formats in the knowledge base, API version changes in connected systems, document retraction when contracts are superseded — any of these can silently degrade AI system quality without triggering an alert. The system continues to run. The outputs continue to be generated. The quality decline is invisible until a user acts on an output based on stale or incorrect data.

The data provenance requirement: every data source the system touches should be documented with four properties before go-live — the source owner, the access control mechanism, the update frequency, and the quality characteristics the system was tested against.

The change management risk is specifically the gap between the data the system was tested on and the data it will encounter in production. Clean, well-structured test data is not representative of production data quality. A system that performs well on the test corpus will encounter production edge cases within weeks.

Before go-live, verify: a data lineage map exists for every data source; change notification mechanisms are in place for each upstream source; the system has been tested against a realistic sample of production data quality variations, including documents with formatting errors, incomplete records, and field values outside the expected range.

Question Three: What Can the System Do That It Should Not?

What actions can this system take autonomously, and are any of those actions irreversible?

The permission model audit starts with a complete list of every action the AI system can execute: write to a database, send a communication, modify a record, trigger a workflow, call an external API, create or delete a file. For each action, classify it on two axes: reversible or irreversible, low-impact or high-impact.

The default principle for AI-First deployments follows from the Twelve-Factor Agents framework: AI systems should start with the minimum permissions required for the first use case. Permissions are expanded only when the system has demonstrated reliability at the current scope and the expansion has been explicitly approved by someone who understands the risk surface of the expansion.

The MCP governance implication is specific: if the system uses MCP servers to connect to external tools or internal systems, each server’s available tool set should be whitelisted to the functions required for the current use case. Granting broad access because the MCP server supports broad access is the configuration equivalent of deploying a system without a permission model. The April 2026 disclosures around MCP remote code execution vulnerabilities confirm that the attack surface of over-permissioned MCP configurations is not theoretical.

Before go-live, verify: an explicit permission inventory exists for every action the system can take; irreversible actions require a human approval gate before execution; the permission model has been reviewed by someone not on the implementation team, who was specifically asked to find cases where the system could take an action the business would not want it to take.

Question Four: How Will You Know When It Fails?

What monitoring is in place? What are the alert thresholds? Who receives the alerts?

AI systems fail differently from traditional software. A traditional application either works or throws an error. An AI system can produce outputs that are plausible but wrong, confident but hallucinated, relevant-seeming but stale — without triggering any error state. The failure is silent, gradual, and invisible to any monitoring that only checks for system errors rather than output quality.

The minimum viable monitoring stack for an enterprise AI deployment has three components: structured logging for every significant AI step (the query, the retrieved context, the generated output, the human action taken), a quality metric baseline established before go-live against which production quality is compared weekly, and alert thresholds that flag anomalies in query volume, response latency, human escalation rate, or quality metric decline.

The quality metric baseline requires an evaluation harness. Before the first production query, a representative sample of expected queries should be run against the system, the outputs scored against defined quality criteria, and the scores recorded as the baseline. The RAGAS framework provides a set of retrieval-augmented generation quality metrics — faithfulness, answer relevance, context precision, context recall — that are measurable without human review of every output. Establishing the baseline before go-live makes post-launch quality monitoring quantitative rather than impressionistic.

Before go-live, verify: logging infrastructure is operational and has been tested against the production configuration; alert thresholds are defined and have triggered correctly in the test environment; there is a named person who will review the weekly quality report, whose calendar already has the recurring review scheduled.

Question Five: What Is the Rollback Plan?

If this system needs to be shut down tomorrow morning, what is the rollback plan and how long does it take?

Every AI system that enters production should have a defined state for the process without AI. This may be the previous manual process, a simplified version, a read-only mode, or a parallel system that was not retired when the AI deployment went live. The rollback should be executable within a defined time window by a named team without requiring an emergency change control process.

Rollback planning matters for governance because a system without a rollback plan cannot be stopped when a problem is found without creating a larger operational disruption. The inability to stop creates pressure to tolerate known problems, which is the structural condition under which AI incidents escalate from manageable quality failures to visible harms.

The staged deployment pattern naturally produces rollback capability: read-only first, supervised write second, autonomous write third. At each stage, the previous stage is the rollback target. The organization can revert to the previous stage without reconstructing a capability that no longer exists.

Before go-live, verify: the rollback plan is documented, has been tested, and is known to the incident response team; the rollback decision authority is assigned to a named person who can authorize an emergency shutdown without convening a change control board; the team responsible for executing the rollback has the access and the documentation required to do so at any hour.

The five questions do not require sophisticated tooling or extended timelines. They require the decision to ask them before the system goes live, rather than discovering the answers in the course of an incident.

An AI-First deployment culture asks them as a matter of routine. That is what distinguishes organizations that move fast with confidence from organizations that move fast until the first incident.

Terraris.ai conducts AI readiness audits before production deployments and builds the governance infrastructure that makes the five questions answerable before go-live. Start with a scoping call.