From Verification to Assurance

Traditional quality engineering asks one fundamental question: "Did we build what was specified?" Requirements go in, testing validates outputs, and a pass/fail verdict comes out. It works when systems are deterministic.

AI systems break this model. Outputs vary. Behaviour emerges from training data rather than specifications. What counts as "correct" becomes a judgment call, not a binary.

Verification Thinking

"Did we build what was specified?"

→

Assurance Thinking

"What's our confidence band, and can we justify it?"

This shift matters because AI will be wrong sometimes. That's by design. The question isn't whether errors occur, but whether you understand their impact, can detect them, and can explain your decisions when stakeholders ask.

Four Questions. Complete Coverage.

Each pillar addresses a dimension of AI quality that traditional testing frameworks overlook. Together, they provide the foundation for genuine assurance.

Pillar 1

Functional

"Does it work?"

Pillar 2

Behavioural

"Does it behave?"

Pillar 3

Operational

"Does it last?"

Pillar 4

Governance

"Can we answer for it?"

Functional Quality

"Does it work?"

This is the pillar closest to traditional QE, but it requires a fundamental shift in thinking. With AI, "works" isn't a binary. You're not checking whether outputs match expected values. You're evaluating whether outputs are fit for purpose.

A language model that produces grammatically correct but misleading text has "worked" in one sense and failed in another. A recommendation engine that technically functions but consistently makes poor suggestions isn't providing value. Functional quality for AI means evaluating outputs against real-world fitness, not just technical correctness.

What this looks like in practice

Evaluation rubrics that define "good enough" for your specific use case, with clear criteria for human reviewers
Representative test sets that cover edge cases, adversarial inputs, and realistic variation. Not just happy paths
Consistency testing to understand how much outputs vary and whether that variation is acceptable
Baseline comparisons to measure whether the AI actually improves on non-AI alternatives

Standards Alignment

Maps to NIST AI RMF MEASURE function (2.1-2.4) and ISO 42001 requirements for AI system performance evaluation and validation.

Behavioural Quality

"Does it behave?"

This pillar asks questions that traditional QE never had to consider. Safety. Fairness. Boundaries. An AI system can be functionally accurate while being unsafe, biased, or harmful.

Consider a hiring AI that correctly predicts job performance but systematically disadvantages certain groups. Or a customer service bot that answers questions accurately but can be manipulated into revealing sensitive information. These aren't functional failures. They're behavioural failures.

What this looks like in practice

Safety testing to identify outputs that could cause harm, even when the AI is "working correctly"
Fairness evaluation across protected characteristics and demographic groups
Boundary testing to understand what the AI refuses to do and whether those boundaries hold
Adversarial testing to probe how the system responds to manipulation attempts
Red teaming to actively try to make the system behave inappropriately

Standards Alignment

Maps to NIST AI RMF principles on fairness, safety, and security (MAP 1.5, MEASURE 2.10-2.11) and ISO 42001 requirements for bias assessment and harm mitigation.

Operational Quality

"Does it last?"

Traditional software doesn't degrade on its own. If it worked at release, it works today (bugs aside). AI is different. Models drift as the world changes. Training data becomes stale. What passed testing last quarter may be failing silently now.

This pillar recognises that AI quality isn't a release gate. It's a continuous discipline. The system you deployed isn't the system running today, and it won't be the system running next month.

What this looks like in practice

Drift detection to identify when model performance degrades over time
Continuous evaluation against held-out test sets, not just release-time validation
Input monitoring to detect when real-world data diverges from training distribution
Performance tracking against defined SLAs with automatic alerting
Incident response plans specific to AI failure modes

Standards Alignment

Maps to NIST AI RMF MANAGE function (4.1-4.3) and ISO 42001 requirements for monitoring, measurement, and continual improvement of AI systems.

Governance Quality

"Can we answer for it?"

When an AI system makes a consequential decision, someone will eventually ask: "Why?" This pillar ensures you have an answer. Not "the AI did it" but "here's exactly how that decision was made, who approved it, and what controls were in place."

Accountability requires traceability. You need to know what data trained the model, what tests it passed, who signed off on deployment, and how decisions are being logged. When the regulator calls (and increasingly they will) you need documentation, not explanations.

What this looks like in practice

Decision documentation capturing who approved what, when, and based on what evidence
Audit trails that trace AI outputs back to inputs, models, and human oversight
Risk assessments completed before deployment and reviewed regularly
Human oversight mechanisms appropriate to the stakes of the decisions
Regulatory mapping to current and emerging AI governance requirements

Standards Alignment

Maps directly to ISO 42001 management system requirements and NIST AI RMF GOVERN function (1.1-1.7), including roles, responsibilities, documentation, and accountability structures.

Put This Framework Into Practice

Our workshops take your team from understanding the Four Pillars to applying them in your context. Half-day or full-day sessions that build real capability, not just awareness.

Enquire About Training View All Services

The Four Pillars of AI Quality

From Verification to Assurance

Verification Thinking

Assurance Thinking

Four Questions. Complete Coverage.

Functional

Behavioural

Operational

Governance

Functional Quality

What this looks like in practice

Standards Alignment

Behavioural Quality

What this looks like in practice

Standards Alignment

Operational Quality

What this looks like in practice

Standards Alignment

Governance Quality

What this looks like in practice

Standards Alignment

Put This Framework Into Practice