This article describes strategies to improve the speech-to-text recognition of information required for user verification and other processes.

In voice projects, it might be necessary to recognize statements that aren't part of the standard vernacular. In other words, phrases and statements are not recognized by the native STT systems. A few examples are:

Uncommon names or names which are spelled with an accent
Addresses, especially street names
Product names
Uncommon spelling in general

Often these need to be recognized as part of a user verification process. For example, the user must give their name and address to confirm their identity.

Anyone who has tried setting this up knows that voice specifically can cause many problems in this regard, which is why we have developed different strategies to compensate for these difficulties. These implement both functionalities already available in the STT systems as well as additional strategies using Cognigy-specific functionality.

Speech to Text Provider Information used in this article

The functionality described on this page uses Microsoft Azure as an example. However, similar strategies should be possible with Google and other speech providers.

Context Phrases/Hints

Cognigy's native Voice Gateway allow you to use so-called "Context Phrases" or "Hints" to train the virtual agent to recognize certain phrases.

In other words, you are telling the Speech to Text provider that it should expect a certain word that it otherwise might not recognize. In the below use cases, we will show you how.

Static Phrases within Context

The settings for this in Cognigy Voice Gateway can be found in the "Set Session Config" Node in the "Recognizer (STT)" section. You can simply add phrases to the "STT Hints" fields. In this example, we are using company names. These fields increase dynamically, and each phrase needs to be added to its own field.

If you already have a custom speech model in your Microsoft Azure instance, you can add these settings in the "Enable Advanced STT" section:

Dynamic Phrases within Context

You might also have situations where you don't know what phrases need to be recognized ahead of time. For example, if you have a verification process and need a user's street address or name.

If you can call up the user's information with, for example, an API, we can also change the phrases to be recognized dynamically.

In both cases, we assume you can create an array of phrases in a pattern similar to this, which can be saved to the context:

{
  "names":[
    "Heltewig",
    "Satoshi"
  ]
}

Dynamic Phrases: Cognigy Voice Gateway

In the native Cognigy Voice Gateway, we can use the Recognizer (STT)

Cognigy_Set Session_Config_Dynamic_Hints.png

In this case, we assume we already have an array of phrases in the context under the variable "name".

Regarding Names

Something to remember when recognizing common names with uncommon spellings (eg. Patrick - Patrik, Eddie - Edi): Even with the context phrases/hints defined, the more "common" spelling usually wins in the recognition. This means that you might need to use fuzzy matching (described below) in addition to context phrases/hints like this.

Fuzzy Search / Fuzzy Matching

If using context phrases still doesn't work for your use case, then it is also possible to use the Fuzzy Search Node and try to estimate the user input. In other words, to match what the STT understood compared to what is expected.

The Fuzzy Search Node also expects an array with the values which it needs to match against in a similar pattern to the one's array we used for the context phrases:

[
    "Heltewig",
    "Satoshi"
]

Fuzzy Search: Static Phrases

The node will compare the input and the array, return the values, and score how close the input and value are. The closer the score to "1", the closer the match.

Of course, the value in "Search Pattern" can also be replaced by tokens or Cognigy script.

Fuzzy Search: Dynamic Phrases

If you wish to add the values in the array dynamically, replace the array in the Source Data field with the following:

{
  "$cs":{
    "script":"context.names",
    "type":"array"
  }
}

This will tell the search to look in the "names" array in the context object.

NLU Transformers

You can also use NLU transformers to dynamically change how certain phrases are recognized. Often NLU transformers are used to integrate external NLU systems into Cognigy. However, we can also use it to manipulate data directly in the Cognigy NLU.
You can read more about transformers here.

A good example of when to use Transformers is numbers which aren't always transcribed properly by the STT.

Here is a short example of using Pre-NLU transformers to recognize numbers such as 'Septante' and 'Nonante' in Belgian French, which oftentimes presents a challenge to STT systems.

preNlu:async({ text, data, language })=>{
        if(language ==='fr-FR'){ // Check if language is French
            data["suggestion"]="No suggestions found";
            const numbers ={ // Define numbers
                "zéro":0,
                "un":1,
                "deux":2,
                "trois":3,
                "quatre":4,
                "cinq":5,
                "six":6,
                "sept":7,
                "huit":8,
                "neuf":9,
                "septante":70,
                "nonante":90,
                "et":""
            };

            function convertToDigit(customerNumber){
                customerNumber = customerNumber.split(" ").map(customerNumber => customerNumber.toLowerCase());
                customerNumber.forEach((group, i)=>{
                    if(numbers[group] !== undefined){
                        customerNumber[i]= numbers[group].toString()
                    } else {
                        if(!isNaN(parseInt(group))){
                            let num = parseInt(group).toString().split("").join(" ")
                            customerNumber[i]= num
                        }
                    }
                })
                customerNumber = customerNumber.join(' ');
                return customerNumber;
            }
            data["suggestion"]= convertToDigit(text)
            text = convertToDigit(text)
        }
        return{
            data,
            text
        };
    }

This will manipulate the user input before it is sent to the virtual agent, which means the actual virtual agent will not notice the manipulation. If you need help setting this up, please contact a consultant of your trust or write to our support. We'll help you.

Lexicons

Lexicons can also be used to recognize phrases that sound similar. A great explanation can be found here: Voice Gateway – Handling Homophones.

Voice: Strategies for Hard to Understand Phrases

Context Phrases/Hints

Static Phrases within Context

Dynamic Phrases within Context

Dynamic Phrases: Cognigy Voice Gateway

Fuzzy Search / Fuzzy Matching

Fuzzy Search: Static Phrases

Fuzzy Search: Dynamic Phrases

NLU Transformers

Lexicons

Comments

Voice: Strategies for Hard to Understand Phrases

Context Phrases/Hints

Static Phrases within Context

Dynamic Phrases within Context

Dynamic Phrases: Cognigy Voice Gateway

Fuzzy Search / Fuzzy Matching

Fuzzy Search: Static Phrases

Fuzzy Search: Dynamic Phrases

NLU Transformers

Lexicons

Best Practices

Comments