Best Practices: Testing your Virtual Agents with Playbooks

At Cognigy, we understand the importance of robust quality assurance (QA) processes while deploying conversational AI solutions. To facilitate efficient and comprehensive testing, we have developed Playbooks, a powerful yet underrated feature designed specifically for automated QA testing. In this article, we will provide recommendations on how to leverage Playbooks effectively for QA purposes.

1. Design Comprehensive Playbooks

Playbooks should be created to cover all functional use cases of your virtual agent. These automated tests ensure that critical functionality is thoroughly regression tested, providing confidence during the QA process. As your project's scope expands, your playbooks should also expand. Make it the responsibility of your conversational AI team to extend playbooks to include test coverage for new intents and use cases as the project grows. By designing comprehensive Playbooks, you can identify and have the opportunity to address any issues or inconsistencies before your customers do.

2. Leverage Various Playbook Creation Methods

Cognigy.AI provides multiple ways to create Playbooks, catering to different workflows and requirements. You can manually create Playbooks using the Cognigy agent menu or leverage automated methods, such as:

Interaction Panel: Create a playbook from a conversation.
Insights Transcripts: Generate a playbook from a conversation transcript.
Playbooks API: Design a custom approach to create Playbooks using the Playbooks API remotely. Refer to the OpenAPI documentation for details.

3. Choose the most effective Playbook Run Method

There are several means to activate Playbooks, allowing you to incorporate them seamlessly into your deployment process and QA workflows. Consider the following activation methods based on your specific requirements:

Manual Playbook Running via Interaction Panel: Execute Playbooks manually using the Interaction Panel.
Triggering Playbook Runs from the Playbook Menu: Initiate Playbook runs directly from the Playbook menu.
Schedule a Playbook Run via API: Automate Playbook runs by scheduling them using the Playbooks API. Refer to the API documentation for implementation details.
Bulk Run Playbooks Using the CLI: Execute a series of Playbooks in bulk using the Cognigy CLI. See the CLI documentation on GitHub for more information.

4. Use of Playbook Assertions to Drive Outcomes

Playbook "assertions" enable spot testing of specific data points within the input, context, or profile. Some examples of how assertions can be used are listed below:

Validate that the expected intent was found.
Validate that a specific slot was found.
Validate that a specific result was received by a third-party system / API.
Validate data formats required for flow execution. For example, profile values exist.

You can add as many assertions to a single input as needed, but be careful not to add too much restriction to your testing, as the run results need to be meaningful to identify issues quickly. At the end of a Playbook run, an overall "score" is provided based on the number of successful assertions. Failed assertions indicate discrepancies or unexpected behavior, highlighting areas where the QA process requires attention.

5. Understand the Playbook test Scope

It is essential to note that Playbook runs are completed server-side. While they test the functionality of Cognigy NLU and flow execution, they do not validate interactions upstream or downstream of the Cognigy endpoint. Additionally, playbooks cannot be used to test handover integrations with human agent systems. Therefore, you should consider alternative solutions for testing the interactions between your front end and the Cognigy endpoint, as well as for human handovers.

6. Test API Integrations

When testing with API integrations, it is crucial to use static data. Playbook assertions perform 1:1 tests, meaning the API response must remain consistent during each test run. To ensure reliability, employ a standard dummy testing data set with an expected consistent response. Additionally, API responses can sometimes take longer than standard flow executions. Ensure you accommodate these longer response times by increasing the allowable timeout on playbook steps that trigger external systems and may take longer to execute.

7. Regression test NLU Models

Playbooks are the perfect tool for regression testing the NLU model during your version release process. The NLU playbook should contain a series of utterances that provide comprehensive coverage of hitting all intents in the model at least once. Each utterance should have an assertion for the expected intent that it will trigger.

The key idea is to avoid scripting playbook utterances as 1:1 example sentence matches, rather to ensure the playbook utterances are accurately hitting intents without a 100% score but within your set intent threshold settings. So don't copy your example sentences into playbooks; write playbook utterances that are similar but unique to ensure testing quality. Running these tests before you deploy a newly trained NLU model provides a perfect regression test, ensuring you are alerted when any changes to the NLU model negatively impact the model's performance from its original state.

8. Playbooks are not counted towards License Consumption

You can run Playbooks without incurring any impact on your Cognigy license consumption. Take advantage of this free feature and run Playbooks as frequently as needed, even on production systems. This advantage allows you to test your chatbot's functionality without budgetary constraints thoroughly.

Following these recommendations, you can effectively leverage Cognigy.AI Playbooks as a key tool for automated QA testing during your virtual agent deployment process. Thorough testing using Playbooks will enhance the quality and reliability of your conversational AI solutions, providing a seamless user experience and reducing the likelihood of regressions making their way into production solutions. With Playbooks, you can ensure that your virtual agent functions flawlessly and delivers exceptional user experiences.