QA Copilot Test Overview

What Are Copilot QA Tests?

Copilot QA Tests exist to ensure the functionality, performance, and reliability of AI assistants (“Agents”) within the ai12z platform. By running these QA Tests, you can be confident that your Agent:

Responds accurately and helpfully to user prompts.
Follows any system prompt or policy rules.
Correctly handles forms and data collection (if applicable).
Executes more complex workflows that may require multiple steps, clarifications, or logic.

These tests automate the process of checking whether the Agent does what it should in realistic conversational scenarios.

Why Use QA Copilot Tests?

Validation of Agent Behavior
- Tests assess how well your Agent handles various inputs and contexts.
- Ensures the Agent meets your standards for clarity, completeness, and accuracy.
Validation of System Prompt Behavior
- Lets you confirm that any custom system prompts (e.g., business rules or style guidelines) are being respected by the Agent’s responses.
Validation of Forms
- Confirms that if your Agent collects data (like a form for check-out requests or reservations), it does so properly and accurately.
Validation of Workflows
- Checks that your Agent can guide a user through a multi-step process.
- Ensures branching logic and follow-up queries are handled correctly.

How Do QA Copilot Tests Work?

Think of it as “a Agent talking to another Agent.” Specifically:

QA Copilot – The testing agent, simulating a bot user or actual user interactions. It poses questions or follows the scenario you specify.
Your Copilot – The target AI assistant you are testing. It responds to the QA Copilot’s queries exactly as it would respond to a real user.

The QA Copilot will:

Execute the conversation scenario.
Collect the responses from your Agent.
Analyze the returned data to decide:
- If the conversation should continue with follow-up questions.
- Or if the interaction is complete and can be judged as Passed or Failed against the test instructions.

Creating a QA Test

Navigate to “Copilot QA Tests.”
- You will see a listing of all existing tests (if any), each with its Name, Instructions, Last Status, Modified date, and an Action menu.
Click “Create” (New QA Test).
- Provide a Name for your test (e.g., “Airport”).
- In the Instruction field, describe in plain English what you want the QA Copilot to check.
- For example:
```
Validate when someone asks "What are the directions from the airport?"
That the bot returns with directions.
```
Save the test.
- Once saved, the QA Copilot knows the test scenario and success criteria.

What Goes Into the Test Instructions?

You can make your test instructions as simple or complex as you like:

Simple single-step: “Ask for directions from the airport and ensure the bot responds with accurate directions.”
Multi-step scenario: “Ask for a late check-out, confirm the date and time, then see if the bot accurately finalizes the request.”

The QA Copilot will follow these instructions step-by-step, carrying on a conversation with the target Agent to verify each part of the test.

Running QA Tests

There are three ways to run your QA Copilot Tests:

Run an Individual Test
- Click the Run Test button next to a single QA Test.
- You will see the conversation flow between the QA Copilot and your Agent, plus the pass/fail result.
Run All Tests
- Use this option to run your entire suite of QA Copilot Tests at once.
- The QA Copilot will move through each test scenario, recording results in sequence.
Schedule Tests
- Set up a periodic schedule (e.g., daily, weekly) so QA tests run automatically.
- This is useful if your Agent is in production and you want to continuously verify behavior and catch regressions early.

Viewing Test Results

After a test runs, you’ll be able to see:

Conversation Log – Shows exactly what the QA Copilot asked and how your Agent responded, including timestamps.
Duration – How long the test took.
Number of Iterations – The back-and-forth exchanges between QA Copilot and your Agent.
Result (Pass/Fail) – A final verdict based on your instructions and the QA Copilot’s analysis.

If a test fails, you’ll likely see a short summary of why the QA Copilot considered it unsuccessful (e.g., “The response did not include directions,” or “The form was not filled correctly”).

Example: Directions From the Airport

Below is a quick illustration of a QA Copilot Test:

Test Name: “Airport”
Instruction: “Validate when someone asks ‘What are the directions from the airport?’ That the bot returns directions.”

When you run this test:

The QA Copilot plays the User role, asking:

“What are the directions from the airport?”
Your Agent should reply with step-by-step directions or a relevant answer.
The QA Copilot checks if directions were indeed provided.
If yes, it marks the test as Passed. If no, test fails.

Best Practices

Keep Tests Focused – Each QA Test should cover a clear, self-contained scenario. This makes it easier to diagnose which part of the Agent’s logic might need refinement.
Build Complex Scenarios Gradually – If you need multi-step validations (e.g., ask a question, confirm a user detail, handle a form), break them down to ensure that each step is clearly tested.
Check System Prompt Behavior – If you have special rules or disclaimers in your system prompts, ensure your tests confirm they are followed (for example, your Agent must never reveal certain confidential info).
Automate Regularly – Run scheduled tests to continuously monitor your Agent’s behavior and quickly catch any regressions or performance issues.

Summary

QA Copilot Tests are a vital feature for maintaining quality and consistency in your AI Agent interactions. By having one Agent (the QA Copilot) simulate user queries and evaluate responses from your target Agent, you can easily:

Ensure the Agent follows expected behaviors and policies.
Validate system prompt compliance and forms.
Verify multi-step workflows work reliably.
Identify failures quickly, address them, and retest.

Use the Create, Run, and Schedule workflow to build a robust QA process that keeps your AI assistants performing at their best.

What Are Copilot QA Tests?​

Why Use QA Copilot Tests?​

How Do QA Copilot Tests Work?​

Creating a QA Test​

What Goes Into the Test Instructions?​

Running QA Tests​

Viewing Test Results​

Example: Directions From the Airport​

Best Practices​

Summary​