Cognigy Voice Gateway - Advanced Configuration

After connecting Cognigy VG to your phone number/SIP trunk and building a first simple Flow, you might want to take the next step and configure advanced settings to create even better user experiences. Some of these advanced settings are the topic of this article.

Overview

Cognigy VG comes with a large number of configuration settings which can be controlled directly from within your Cognigy Flow. These settings can be applied individually to two scopes:

  • sessionParams - the settings apply to the whole session from the time of applying them
  • activityParams - the settings apply only to the current activity (e.g. sendMessage)

In order to set these settings, you can use the Cognigy VG Extension, found here.

 

Setting Session Parameters

Session parameters can comfortably be set with the "Set Session Parameters" Node. When executed, the settings will apply for the remainder of the session.

 

Setting Activity Parameters

Activity parameters can be set per activity. If for example set on "Send Message", they will only impact the execution of this activity. An example would be setting "Barge In" to true only for a long message that is sent, allowing the user to interrupt the voicebot during this message, but not afterwards.

 

Parameter Details

STT Settings

These settings apply to the Speech-to-Text engine (e.g. Azure Speech Services).

Parameter Type Description
Language Code Text

Defines the language (e.g., "en-ZA" for South African English) of the voicebot conversation and is used for TTS and STT functionality. The value is obtained from the service provider.

  STT:
  Azure: The parameter is configured with the value from the 'Locale' column in Azure's Speech-Text table (e.g., "en-GB").
  Google: The parameter is configured with the value from the 'languageCode' (BCP-47) column in Google's Cloud Speech-to-Text table (e.g., "nl-NL").

 

  TTS:
  Azure: The parameter is  configured with the value from the 'Locale' column in Azure's Text-to-Speech table (e.g., "it-IT").
  Google: The parameter is configured with the value from the 'Language code' column in Google's Cloud Text-to-Speech table (e.g., "en-US").
  AWS: The parameter is configured with the value from the 'Language' column in Amazon's Polly TTS table (e.g., "de-DE").

 

Disable STT Punctuation Toggle

Prevents the STT response from the Voice Gateway to include punctuation marks.

  on: Enabled. Punctuation is excluded.
  off: (Default) Disabled. Punctuation is included.

Note: This requires support from the STT engine

 

TTS Settings

These settings apply to the Text-to-Speech engine (e.g. Azure Speech Services). 

Parameter Type Description
Voice Name Text

Defines the voice name for the TTS service.

  Azure: The parameter is configured with the value from the 'Short voice name' column in Azure's Text-to-Speech table (e.g., "it-IT-ElsaNeural").
  Google: The parameter is configured with the value from the 'Voice name' column in Google's Cloud Text-to-Speech table (e.g., "en-US-Wavenet-A").
  AWS: The parameter is configured with the value from the 'Name/ID' column in Amazon's Polly TTS table (e.g., "Hans").
  Almagu: The parameter is configured with the value from the 'Voice' column in Almagu's TTS table (e.g., "Osnat").
Disable TTS Cache Toggle

Enables caching of TTS (audio) results from the Flow. Therefore, if the Voice Gateway needs to send a request for TTS to a TTS provider and this text has been requested before, it retrieves the result from its cache instead of requesting it again from the TTS provider.

  on: Enabled
  off: (Default) Disabled

 

 

DTMF

These settings apply to the DTMF (dual tone multi frequency) features of Voice Gateway.

Parameter Type Description
Send DTMF Toggle

Enables the sending of DTMF events to the Flow.

  on: Enabled
  off: (Default) Disabled

Note: For configuring the DTMF collection and sending method, see the dtmfCollect parameter.

DTMF Collect Toggle

 

Defines the DTMF digit collection and sending method.

  on: Enabled. The Voice Gateway first collects all the DTMF digits entered by the user, and only then sends them all together to the Flow.
  off: (Default) Disabled. As the Voice Gateway receives a DTMF digit entered by the user, it sends that single digit to the Flow. In other words, it sends each DTMF digit one at a time to the flow.

Note:

  When enabled, you can configure additional settings using the following parameters: dtmfCollectInterDigitTimeoutMS, dtmfCollectMaxDigits, and dtmfCollectSubmitDigit.
  If the sendDTMF parameter is configured to off(default), incoming DTMF digits are ignored by the Voice Gateway even if the dtmfCollect parameter is configured to true.
    off: (Default) Disabled

 

DTMF Collect Timeout Number

Defines the timeout (in milliseconds) that the Voice Gateway waits for the user to press another digit before it sends all the digits to the Flow. If the timeout expires since the last digit entered by the user, the Voice.AI Gateway sends all the collected digits to the Flow(as a DTMF message), without waiting for the maximum number of expected digits or for the "submit" digit. The timeout is triggered after the user enters the first DTMF digit and is reset after each digit.

The valid value range is 0 to unlimited. The default is 2000.

DTMF Collect Max Digits Number

Defines the maximum number of DTMF digits that the Voice Gateway expects to receive from the user. Once the Voice Gateway receives and collects this number of digits entered by the user, it immediately sends all the digits to the Flow (as a DTMF message), without waiting for the timeout to expire or for the "submit" digit.

The valid value range is 0 (disabled) to unlimited. The default is 5. If configured to 0, the DTMF collection and sending method is according to dtmfCollectInterDigitTimeoutMS or dtmfCollectSubmitDigit.

DTMF Collect Submit Digit Text

Defines a special DTMF "submit" digit that when received from the user, the Voice Gateway immediately sends all the collected digits to the Flow (as a DTMF message), without waiting for the timeout to expire or for the maximum number of expected digits.

The valid value is any symbol on a phone keypad. The default is # (pound key). If you want to disable this parameter, configure it to "" (empty string).

 

Barge In

Barge In stands for the ability of the user to interrupt the voicebot by speaking during a running prompt.

Parameter Type Description
Barge In Toggle

Enables the Barge-In feature.

  on: Enabled, When the voicebot is playing a response to the user (playback of Flow message), the user can "barge-in" (interrupt) and start speaking. This terminates the voicebot response, allowing the voicebot to listen to the new speech input from the user (i.e., Voice Gateway sends detected utterance to the Flow).
  off: (Default) Disabled. The Voice Gateway doesn't expect speech input from the user until the voicebot has finished playing its response to the user. In other words, the user can't "barge-in" until the voicebot message response has finished playing.
Barge In on DTMF Toggle

Enables the Barge-In on DTMF feature.

  on: (Default) Enabled. When the voicebot is playing a response to the user (playback of Flow message), the user can "barge-in" (interrupt) with a DTMF digit. This terminates the voicebot response, allowing the voicebot to listen to and process the digits sent from the user.
  off: Disabled. The Voice Gateway doesn't expect DTMF input from the user until the voicebot has finished playing its response to the user. In other words, the user can't "barge-in" until the voicebot message response has finished playing.

Note:

  If you enable this feature (i.e., bargeInOnDTMF configured to true), you also need to enable the sending of DTMF digits (see the sendDTMF parameter).
Number

Defines the minimum number of words that the user must say for the Voice Gateway to consider it a barge-in. For example, if configured to 4 and the user only says 3 words during the bot's playback response, no barge-in occurs.

The valid range is 1 to 5. The default is 1.

 

Continuous ASR

These settings relate to the continuos ASR feature of the Voice Gatway.

Parameter Type Description

Enable Continuous ASR

Toggle

Enables the Continuous ASR feature. Continuous ASR enables the Voice Gateway to concatenate multiple STT recognitions of the user and then send them as a single textual message to the bot.

  on: Enabled
  off: (Default) Disabled

 

Continuous ASR Digits Text

This parameter is applicable when the Continuous ASR feature is enabled.

Defines a special DTMF key, which if pressed, causes the Voice Gateway to immediately send the accumulated recognitions of the user to the Flow. For example, if configured to "#" and the user presses the pound key (#) on the phone's keypad, the device concatenates the accumulated recognitions and then sends them as one single textual message to the Flow.

The default is "#".

Note: Using this feature incurs an additional delay from the user’s perspective because the speech is not sent immediately to the Flow after it has been recognized. To overcome this delay, configure the parameter to a value that is appropriate to your environment.

Continuous ASR Timeout Number

This parameter is applicable when the Continuous ASR feature is enabled.

Defines the automatic speech recognition (ASR) timeout (in milliseconds). When the device detects silence from the user for a duration configured by this parameter, it concatenates all the accumulated STT recognitions and sends them as one single textual message to the Flow.

The valid value is 2,500 (i.e., 2.5 seconds) to 60,000 (i.e., 1 minute). The default is 3,000.

 

User Timeouts

Parameter Type Description
No User Input Timeout (ms) Number

Defines the maximum time (in milliseconds) that the Voice Gateway waits for input from the user.

If no input is received when this timeout expires, you can configure the Voice Gateway to play a textual (see the "No User Input Prompt" parameter) or an audio (see the "No User Input URL" parameter) prompt to ask the user to say something. If there is still no input from the user, you can configure the Voice Gateway to prompt the user again. The number of times to prompt is configured by the "No User Input Retries" parameter.

If the "Send No User Input Event" parameter is configured to "on" and the timeout expires, the Voice Gateway sends an event to Cognigy.AI, indicating how many times the timer has expired.

The default is 0 (i.e., feature disabled).

Note:

  DTMF (any input) is considered as user input (in addition to user speech) if the "Send DTMF" parameter is configured to "on".
  If you have configured a prompt to play when the timeout expires, the timer is triggered only after playing the prompt to the user.
No User Input Retries Number

Defines the maximum number of allowed timeouts (configured by the "No User Input Timeout" parameter) for no user input. If you have configured a prompt to play (see the "No User Input Prompt" or "No User Input URL" parameter), the prompt is payed each time the timeout expires.

The default is 0 (i.e., only one timeout).

For more information on the no user input feature, see the "No User Input Timeout" parameter.

Note: If you have configured a prompt to play upon timeout expiry, the timer is triggered only after playing the prompt to the user.

Send No User Input Event Toggle

Enables the Voice Gateway to send an event message to the Flow if there is no user input for the duration configured by the "No User Input Timeout" parameter, indicating how many times the timer has expired ('value' field):

    {
      "type": "event",
      "name": "noUserInput",
      "value": 1
    }
  on: Enabled.
  off: (Default) Disabled.

 

No User Input Prompt Text

Defines the textual prompt to play to the user when no input has been received from the user when the timeout expires (configured by "No User Input Timeout").

The prompt can be configured in plain text or in Speech Synthesis Markup Language (SSML) format:

By default, the parameter is not configured.

  Plain-text example:
    {
      "name": "LondonTube",
      "provider": "my_azure",
      "displayName": "London Tube",
      "userNoInputTimeoutMS": 5000,
      "userNoInputSpeech":
        "Hi there. Please say something"
    }
  SSML example:
    {
      "name": "LondonTube",
      "provider": "my_azure",
      "displayName": "London Tube",
      "userNoInputTimeoutMS": 5000,
      "userNoInputSpeech":
        <speak>
        "This is
        <say-as interpret-as="characters">
          SSML
        </say-as>
        "
        </speak>
   }

For more information on the no user input feature, see the "No User Input Timeout".

Note:

  If you have also configured to play an audio prompt (see the "No User Input URL" parameter), the "No User Input Prompt" takes precedence.
  The supported SSML elements depend on the text-to-speech provider:
  Google: https://cloud.google.com/text-to-speech/docs/ssml
  Azure: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup#supported-ssml-elements
  AWS: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
No User Input URL Text

Defines the URL from where the audio prompt is played to the user when no input has been received from the user when the timeout expires (configured by "No User Input Timeout").

By default, the parameter is not configured.

For more information on the no user input feature, see the "No User Input Timeout".

Note: If you have also configured to play a textual prompt (see the "No User Input Prompt" parameter), the "No User Input Prompt" takes precedence.

 

Bot Timeouts

Parameter Type Description
No Flow Input Timeout (ms) Number

Defines the maximum time (in milliseconds) that the Voice Gateway waits for input from the Flow.

If no input is received from the Flow when this timeout expires, you can configure the Voice Gateway to play a textual (see the "No Flow Input Prompt" parameter) or an audio (see the "No Flow Input URL" parameter) prompt to the user.

The default is 0 (i.e., feature disabled).

No Flow Input Retries Number

Defines the maximum number of allowed timeouts (configured by the "No Flow Input Timeout" parameter) for Flow input. If you have configured a prompt to play (see the "No Flow Input Prompt" parameter) or an audio (see the "No Flow Input URL" parameter), the prompt is played to the user each time the timeout expires.

The default is 0 (i.e., only one timeout – no retries).

For more information on the no flow input feature, see the "No Flow Input Timeout" parameter.

Note: If you have configured a prompt to play upon timeout expiry, the timer is triggered only after playing the prompt to the user.

No Flow Input Prompt Text

Defines the textual prompt to play to the user when no input has been received from the Flows when the timeout expires (configured by "No Flow Input Timeout").

The prompt can be configured in plain text or in Speech Synthesis Markup Language (SSML) format:

  Plain-text example:
    {
      "name": "LondonTube",
      "provider": "my_azure",
      "displayName": "London Tube",
      "botNoInputTimeoutMS": 5000,
      "botNoInputSpeech":
        "Please wait for bot input"
    }
  SSML example:
    {
      "name": "LondonTube",
      "provider": "my_azure",
      "displayName": "London Tube",
      "botNoInputTimeoutMS": 5000,
      "botNoInputSpeech":
        <speak>
        "This is
        <say-as interpret-as="characters">
          SSML
        </say-as>
        "
        </speak>
    }

By default, the parameter is not configured.

Note:

  For more information on the no flow input feature, see the "No Flow Input Timeout" parameter.
  If you have also configured to play an audio prompt (see the "No Flow Input URL" parameter), the "No Flow Input Prompt" takes precedence.
  This feature requires a text-to-speech provider. It will not work when the speech is synthesized by the flow framework.
  The supported SSML elements depend on the text-to-speech provider:
  Google: https://cloud.google.com/text-to-speech/docs/ssml
  Azure: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup#supported-ssml-elements
  AWS: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
No Flow Input URL Text

Defines the URL from where the audio prompt is played to the user when no input has been received from the Flow when the timeout expires (configured by "No Flow Input Timeout ").

By default, the parameter is not configured.

For more information on the no Flow input feature, see the "No Flow Input Timeout".

Note: If you have also configured to play a textual prompt (see the "No Flow Input Prompt" parameter), the "No Flow Input Prompt" takes precedence.

 

Azure Configuration

Parameter Type Description
Azure STT Mode Select

Defines the Azure STT recognition mode.

  conversation (default)
  dictation
  interactive

Note: The parameter is applicable only to the Microsoft Azure STT service.

Azure STT Context ID Text
Azure speech-to-text engine: This parameter controls Azure's Custom Speech model. The parameter can be set to the endpoint ID that is used when accessing the STT engine. For more information on how to obtain the endpoint ID, go to https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-deploy-model.

Note:

  The parameter can be used by the Flow, as long as the STT engine is Azure or AudioCodes DNN.
  For Azure STT, the Custom Speech model must be deployed on the same subscription used for the Azure STT engine.
  When using other STT engines, the parameter has no affect.
Enable Audio Logging Toggle

Enables recording and logging of audio from the user (endpoint) that the Voice Gateway sends to the STT engine. The recording is done by the STT engine and stored on the STT engine.

  on: Instructs the STT engine to enable audio logging.
  off: Instructs the STT engine to disable audio logging.

When the parameter is not defined (default), audio logging is according to the STT engine.

Note: The parameter and audio logging is applicable only when using the Azure STT.

 

Google Configuration

Parameter Type Description
Select Defines the Google STT interaction type. For more information, see https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#InteractionType.
Google Cloud STT Context Phrases Text Array

When using Google's Cloud STT engine, this parameter controls Speech Context phrases.

The parameter can list phrases or words that is passed to the STT engine as "hints" for improving the accuracy of speech recognitions.

For more information on speech context (speech adaptation) as well details regarding tokens (class tokens) that can be used in phrases, go to https://cloud.google.com/speech-to-text/docs/speech-adaptation.

For example, whenever a speaker says "weather" frequently, you want the STT engine to transcribe it as "weather" and not "whether". To do this, the parameter can be used to create a context for this word (and other similar phrases associated with weather):

"sttContextPhrases": ["weather"]

Note:

  The parameter can be used when the STT engine is Google.
  When using other STT engines, the parameter has no affect.
Google Cloud STT Context Boost Number

Defines the boost number for context recognition of the speech context phrase configured by sttContextPhrases. Speech-adaptation boost allows you to increase the recognition model bias by assigning more weight to some phrases than others. For example, when users say "weather" or "whether", you may want the STT to recognize the word as weather.

For more information, see https://cloud.google.com/speech-to-text/docs/context-strength.  

Note:

  The parameter can be used when the STT engine is Google.
  When using other STT engines, the parameter has no affect.

 


Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful