QA Copilot Test Overview
What Are Copilot QA Tests?
Copilot QA Tests exist to ensure the functionality, performance, and reliability of AI assistants (“copilots”) within the ai12z platform. By running these QA Tests, you can be confident that your copilot:
- Responds accurately and helpfully to user prompts.
- Follows any system prompt or policy rules.
- Correctly handles forms and data collection (if applicable).
- Executes more complex workflows that may require multiple steps, clarifications, or logic.
These tests automate the process of checking whether the copilot does what it should in realistic conversational scenarios.
Why Use QA Copilot Tests?
-
Validation of Agent Behavior
- Tests assess how well your copilot handles various inputs and contexts.
- Ensures the copilot meets your standards for clarity, completeness, and accuracy.
-
Validation of System Prompt Behavior
- Lets you confirm that any custom system prompts (e.g., business rules or style guidelines) are being respected by the copilot’s responses.
-
Validation of Forms
- Confirms that if your copilot collects data (like a form for check-out requests or reservations), it does so properly and accurately.
-
Validation of Workflows
- Checks that your copilot can guide a user through a multi-step process.
- Ensures branching logic and follow-up queries are handled correctly.
How Do QA Copilot Tests Work?
Think of it as “a copilot talking to another copilot.” Specifically:
- QA Copilot – The testing agent, simulating a bot user or actual user interactions. It poses questions or follows the scenario you specify.
- Your Copilot – The target AI assistant you are testing. It responds to the QA Copilot’s queries exactly as it would respond to a real user.
The QA Copilot will:
- Execute the conversation scenario.
- Collect the responses from your copilot.
- Analyze the returned data to decide:
- If the conversation should continue with follow-up questions.
- Or if the interaction is complete and can be judged as Passed or Failed against the test instructions.
Creating a QA Test
-
Navigate to “Copilot QA Tests.”
- You will see a listing of all existing tests (if any), each with its Name, Instructions, Last Status, Modified date, and an Action menu.
-
Click “Create” (New QA Test).
- Provide a Name for your test (e.g., “Airport”).
- In the Instruction field, describe in plain English what you want the QA Copilot to check.
- For example:
Validate when someone asks "What are the directions from the airport?"
That the bot returns with directions.
-
Save the test.
- Once saved, the QA Copilot knows the test scenario and success criteria.
What Goes Into the Test Instructions?
You can make your test instructions as simple or complex as you like:
- Simple single-step: “Ask for directions from the airport and ensure the bot responds with accurate directions.”
- Multi-step scenario: “Ask for a late check-out, confirm the date and time, then see if the bot accurately finalizes the request.”
The QA Copilot will follow these instructions step-by-step, carrying on a conversation with the target copilot to verify each part of the test.
Running QA Tests
There are three ways to run your QA Copilot Tests:
-
Run an Individual Test
- Click the Run Test button next to a single QA Test.
- You will see the conversation flow between the QA Copilot and your copilot, plus the pass/fail result.
-
Run All Tests
- Use this option to run your entire suite of QA Copilot Tests at once.
- The QA Copilot will move through each test scenario, recording results in sequence.
-
Schedule Tests
- Set up a periodic schedule (e.g., daily, weekly) so QA tests run automatically.
- This is useful if your copilot is in production and you want to continuously verify behavior and catch regressions early.
Viewing Test Results
After a test runs, you’ll be able to see:
- Conversation Log – Shows exactly what the QA Copilot asked and how your copilot responded, including timestamps.
- Duration – How long the test took.
- Number of Iterations – The back-and-forth exchanges between QA Copilot and your copilot.
- Result (Pass/Fail) – A final verdict based on your instructions and the QA Copilot’s analysis.
If a test fails, you’ll likely see a short summary of why the QA Copilot considered it unsuccessful (e.g., “The response did not include directions,” or “The form was not filled correctly”).
Example: Directions From the Airport
Below is a quick illustration of a QA Copilot Test:
- Test Name: “Airport”
- Instruction: “Validate when someone asks ‘What are the directions from the airport?’ That the bot returns directions.”
When you run this test:
- The QA Copilot plays the User role, asking:
“What are the directions from the airport?”
- Your copilot should reply with step-by-step directions or a relevant answer.
- The QA Copilot checks if directions were indeed provided.
- If yes, it marks the test as Passed. If no, test fails.
Best Practices
- Keep Tests Focused – Each QA Test should cover a clear, self-contained scenario. This makes it easier to diagnose which part of the copilot’s logic might need refinement.
- Build Complex Scenarios Gradually – If you need multi-step validations (e.g., ask a question, confirm a user detail, handle a form), break them down to ensure that each step is clearly tested.
- Check System Prompt Behavior – If you have special rules or disclaimers in your system prompts, ensure your tests confirm they are followed (for example, your copilot must never reveal certain confidential info).
- Automate Regularly – Run scheduled tests to continuously monitor your copilot’s behavior and quickly catch any regressions or performance issues.
Summary
QA Copilot Tests are a vital feature for maintaining quality and consistency in your AI copilot interactions. By having one copilot (the QA Copilot) simulate user queries and evaluate responses from your target copilot, you can easily:
- Ensure the copilot follows expected behaviors and policies.
- Validate system prompt compliance and forms.
- Verify multi-step workflows work reliably.
- Identify failures quickly, address them, and retest.
Use the Create, Run, and Schedule workflow to build a robust QA process that keeps your AI assistants performing at their best.