Let's imagine you want to use one virtual agent to deploy Cognigy voice automation across multiple countries, or the place you want to deploy your virtual voice agent has multiple official languages which need to be supported.
With Cognigy it is possible to recognize the language the user is speaking and to change the STT (Speech-To-Text) and TTS (Text-To-Speech) automatically.
This can be done with both Cognigy's native VoiceGateway or with AudioCodes.
For VoiceGateway
Setup Language Provider
To use this functionality, you need to be using either Microsoft Azure or Google as your TTS and STT provider. In our example we will be using Microsoft but the basic principle in the same.
To configure the speech service, you need to open your Voice Gateway UI/Self Service Portal and select Speech on the left-hand side.
- Click Add Speech Service and select Microsoft Azure Speech Services as the vendor.
- Choose the account you want to use it with and select both TTS and SST. (Although in this case only STT is relevant)
Continue by selecting Use Hosted Azure Service, choosing a region and entering the API key (we can also implement a self hosted/Docker Azure instance but we are using the public cloud version as an example).
Save your changes by clicking Save.
- Once finished, add this service to the application you wish to use it in (this can also be done directly in the Set Session Parameters Node).
Overview of the speech settings
Adding to the Flow
Set Session Config
Go into your flow and click on the "+" where you want to add the language recognition functionality.
We suggest after a part in the flow where the user needs to say a longer text as this helps with language recognition.
Navigate to Extension – Voice Gateway - Set Session Config to add the Set Session Config Node.
Next up, open the node and click on the Recognizer (STT) drop down menu.
Here scroll down, until you see the check box for Recognize Language, and click on the box to activate it. Additional fields will appear below the checkbox. Here you can define which languages you wish to detect. In this example its set to look for German, Japanese or French as alternative languages to English.
The output for the language recognition will be stored in the output as:
The format is usually the ISO 639 Language Code followed by a dash ("-") and then the ISO 3166 Country Code.
For example:
"de-DE" = German (Germany)
"fr-FR" = French (France)
"ja-JP" = Japanese (Japan)
"en-US" = English (United States)
Flow Structure and Switching Languages
After setting up the Set Session Config Node you can configure a simple Lookup Node to switch to the proper language when it’s detected. In order to switch the language, you will have to setup another Set Session Config Node, in which you define the language that has been detected as main language. That would look something like this:
Don't forget to deactivate the language detection in the new Set Session Config Node otherwise language detection will continue throughout the call.
This can cause latency in the bot, but also in general might lead to unexpected behavior.
Redetecting Languages
If you wish to redetect the language you will need to add another Set Session Config Node which reactivates the languages detection.
For countries with more than one national language where the user might interact with the bot in one language but the answer might be in another (for example a native English speaker in Quebec trying to pronounce French street names) you can also set this up as an Activity Parameter for that one specific question.
For AudioCodes
AudioCodes Version Information
The functionality described on this page is only available in version 3.4 or later of Audiocodes Voice Gateway Connection Enterprise/VAIC.
In order to use this functionality you need to be using Microsoft Azure as your TTS and STT provider.
Set Session Parameters
Go into your flow and click on the "+" where you want to add the language recognition functionality. Choose Extension - Audiocodes - Set Session Parameters to add the Set Session Parameters node. :
Image 1: Set Session Parameters
Open the node and scroll down to Advanced and open up the drop down settings. Expand it to see the Additional Session Parameters JSON field.
Image 2 Where to Find Advanced Settings
Image 3: JSON field for Advanced Settings
Define Languages
You can use the Additional Session Parameters field to add additional AudioCodes settings not available in any of the other fields of the Set Session Parameters node.
In this situation we will add language recognition for speech to text (Microsoft) as described in the AudioCodes speech customization documentation.
As an example let's say our main language is English but we want to try to recognize either German, French or Japanese. We can then add the following JSON information into the Additional Session Parameters field:
"languageDetectionActivate": true, // Activate language detection
"languageDetectionAutoSwitch": true, // Tell AudioCodes to switch language automatically
"alternativeLanguages": [
"language": "de-DE", // Tell voice agent to listen for a specific language
"voiceName": "de-DE-KatjaNeural" // Tell voice agent which voice to use if this language is recognized (languageDetectionAutoSwitch must be set to true)
"language": "fr-FR",
"voiceName": "fr-FR-DeniseNeural"
"language": "ja-JP",
"voiceName": "ja-JP-KeitaNeural"
So in this example if the user were to speak German after the set session parameters node, it would switch to the Katja Neural voice.
A list of the voices provided by Microsoft can be found here. But please keep in mind some voices are only available in certain Azure regions. Please talk to your AudioCodes engineer for more details.
Reactivating Language Detection
After a language has been set language recognition will be deactivated to make the language does not accidentally change a second time. If for some reason you need to reactivate language recognition you will need to add another Set Session Parameters node with the same settings.
Data Received from AudioCodes
Even if you do not want to switch the TTS and STT automatically you can still use the language recognition settings. In both cases we will receive data from AudioCodes telling us what language was spoken.
This can be found in the user input data object in the following field and will return the language in the Azure format (en-US, de-DE etc.):
You can then use this data, for example, to switch the locale of the agent or set the language for automatic translation, if so desired.
Conversational Design Considerations
As a last note the most important thing to consider here is the length of the user input. If the sentence the user says is too short to recognize the language it will not work. In order to set this properly you will need to make sure the conversation is designed in a way that the user says a longer sentence. A simple greeting like "Hallo" for example will not be recognized easily in an agent where both Dutch and German need to be recognized.