Our Assurance Model

The Four Pillars of AI Quality

A practical framework for assuring AI systems. Grounded in ISO 42001 and NIST AI RMF. Four questions that move you from verification to genuine assurance.

From Verification to Assurance

Traditional quality engineering asks one fundamental question: "Did we build what was specified?" Requirements go in, testing validates outputs, and a pass/fail verdict comes out. It works when systems are deterministic.

AI systems break this model. Outputs vary. Behaviour emerges from training data rather than specifications. What counts as "correct" becomes a judgment call, not a binary.

Verification Thinking

"Did we build what was specified?"

Assurance Thinking

"What's our confidence band, and can we justify it?"

This shift matters because AI will be wrong sometimes. That's by design. The question isn't whether errors occur, but whether you understand their impact, can detect them, and can explain your decisions when stakeholders ask.

Four Questions. Complete Coverage.

Each pillar addresses a dimension of AI quality that traditional testing frameworks overlook. Together, they provide the foundation for genuine assurance.

Pillar 1

Functional

"Does it work?"

Pillar 2

Behavioural

"Does it behave?"

Pillar 3

Operational

"Does it last?"

Pillar 4

Governance

"Can we answer for it?"

1

Functional Quality

"Does it work?"

This is the pillar closest to traditional QE, but it requires a fundamental shift in thinking. With AI, "works" isn't a binary. You're not checking whether outputs match expected values. You're evaluating whether outputs are fit for purpose.

A language model that produces grammatically correct but misleading text has "worked" in one sense and failed in another. A recommendation engine that technically functions but consistently makes poor suggestions isn't providing value. Functional quality for AI means evaluating outputs against real-world fitness, not just technical correctness.

What this looks like in practice

  • Evaluation rubrics that define "good enough" for your specific use case, with clear criteria for human reviewers
  • Representative test sets that cover edge cases, adversarial inputs, and realistic variation. Not just happy paths
  • Consistency testing to understand how much outputs vary and whether that variation is acceptable
  • Baseline comparisons to measure whether the AI actually improves on non-AI alternatives

Standards Alignment

Maps to NIST AI RMF MEASURE function (2.1-2.4) and ISO 42001 requirements for AI system performance evaluation and validation.

2

Behavioural Quality

"Does it behave?"

This pillar asks questions that traditional QE never had to consider. Safety. Fairness. Boundaries. An AI system can be functionally accurate while being unsafe, biased, or harmful.

Consider a hiring AI that correctly predicts job performance but systematically disadvantages certain groups. Or a customer service bot that answers questions accurately but can be manipulated into revealing sensitive information. These aren't functional failures. They're behavioural failures.

What this looks like in practice

  • Safety testing to identify outputs that could cause harm, even when the AI is "working correctly"
  • Fairness evaluation across protected characteristics and demographic groups
  • Boundary testing to understand what the AI refuses to do and whether those boundaries hold
  • Adversarial testing to probe how the system responds to manipulation attempts
  • Red teaming to actively try to make the system behave inappropriately

Standards Alignment

Maps to NIST AI RMF principles on fairness, safety, and security (MAP 1.5, MEASURE 2.10-2.11) and ISO 42001 requirements for bias assessment and harm mitigation.

3

Operational Quality

"Does it last?"

Traditional software doesn't degrade on its own. If it worked at release, it works today (bugs aside). AI is different. Models drift as the world changes. Training data becomes stale. What passed testing last quarter may be failing silently now.

This pillar recognises that AI quality isn't a release gate. It's a continuous discipline. The system you deployed isn't the system running today, and it won't be the system running next month.

What this looks like in practice

  • Drift detection to identify when model performance degrades over time
  • Continuous evaluation against held-out test sets, not just release-time validation
  • Input monitoring to detect when real-world data diverges from training distribution
  • Performance tracking against defined SLAs with automatic alerting
  • Incident response plans specific to AI failure modes

Standards Alignment

Maps to NIST AI RMF MANAGE function (4.1-4.3) and ISO 42001 requirements for monitoring, measurement, and continual improvement of AI systems.

4

Governance Quality

"Can we answer for it?"

When an AI system makes a consequential decision, someone will eventually ask: "Why?" This pillar ensures you have an answer. Not "the AI did it" but "here's exactly how that decision was made, who approved it, and what controls were in place."

Accountability requires traceability. You need to know what data trained the model, what tests it passed, who signed off on deployment, and how decisions are being logged. When the regulator calls (and increasingly they will) you need documentation, not explanations.

What this looks like in practice

  • Decision documentation capturing who approved what, when, and based on what evidence
  • Audit trails that trace AI outputs back to inputs, models, and human oversight
  • Risk assessments completed before deployment and reviewed regularly
  • Human oversight mechanisms appropriate to the stakes of the decisions
  • Regulatory mapping to current and emerging AI governance requirements

Standards Alignment

Maps directly to ISO 42001 management system requirements and NIST AI RMF GOVERN function (1.1-1.7), including roles, responsibilities, documentation, and accountability structures.

Put This Framework Into Practice

Our workshops take your team from understanding the Four Pillars to applying them in your context. Half-day or full-day sessions that build real capability, not just awareness.