Cognigy Sessions: Building a Phone Bot with Voice Gateway

In this episode of Cognigy Sessions we’ll show you how to build your own Phone Bot with Cognigy Voice Gateway. You will learn how to create voice experiences within the Cognigy.AI Flow Editor, how to connect your Virtual Agent to the phone network and how to design outstanding experiences for end users with voice automation.

Join the Cognigy.AI User Community

Do you have a question or want to share your experiences? Join the Cognigy.AI User Community to connect with your peers or get help from our Conversational AI specialists!

Welcome to Cognigy Sessions. Cognigy Sessions is a Techinar series from Conversational AI experts for experts. Each episode will give you deep insights into a new Cognigy.AI topic. Not only will it show you advanced concepts and the art of possible in a hands-on demo, it will also empower you to apply those learnings to your own projects by going far beyond any marketing collateral.

Today's session is about building a phone bot with Cognigy Voice Gateway. Voice Gateway is a software offering from Cognigy that allows to deploy Virtual Agents to phone lines. It provides out-of-the-box connectors to contact center infrastructure and seamlessly works together with Cognigy.AI that many of you are already familiar with. Here's what you will learn in this episode: We'll show you step-by-step how to build a voice Agent. We share some design best practices for voice experiences and you'll get deep insights into the machinery to create the most advanced bots technology can provide today. This recording is also available in our Help Center. Please follow the link to access additional resources, go to our community for questions and discussions, or to start a free trial if you're not already a Cognigy.AI user. And with that, let's start the session.

Hi, everyone, my name is Phil, and today I'm going to be taking you through building phone bots with Cognigy Voice Gateway. Now, before we get started, let's take a look at a high-level architecture. Your customer will be calling either from a mobile phone, a landline, or a browser and maybe they end up at a contact center solution like Avaya, Genesys, Cisco or RingCentral, or any other. Now, what you want to do is you want those customers to talk to a Virtual Agent, which you have built in Cognigy.AI. Going through the NLU, the Flows maybe being connected to backend systems to retrieve order status information or to trigger an action and a backend system.

Now, in order to connect the two, the Caller and Cognigy.AI, you need the Cognigy Voice Gateway. This is the tool that can process the audio stream, turn it into text and then pass that on to Cognigy.AI for further processing. Now, in order to turn the audio to text or the speech to text, we use what's called a Speech to Text solution. And here we integrate with various solutions, such as such like Google, Azure or Yandex, Amazon or others. Now, on the way back, when Cognigy answers something, this is text that reaches the Cognigy Voice Gateway, which we then turn from Text to Speech again with this variety of solutions that we're integrating with, and then the customer will hear the answer. And today we're going to look at how to build such voicebots and what the capabilities are that Cognigy.AI in conjunction with Cognigy Voicebot deliver.

If you want to follow along, you need two things. You need access to Cognigy.AI, which if you don't have it already, you can sign up for at "signup.cognigy.ai". You also need access to a Cognigy Voice Gateway, which you can either get directly from Cognigy or if you don't have the opportunity for that, you can get something similar at "pnc.audiocodes.com" because Cognigy Voice Gateway is 100% compatible with the Audiocodes protocol. So you can use pnc.audiocodes.com and connect it to Cognigy Voice Gateway Endpoints inside Cognigy.AI. And with that, let's get started.

So what we have here is we have a Cognigy Flow that is connected to the Voice Gateway using what we call an Endpoint. So we've created a Voice Gateway Endpoint and when someone calls this number, this number here, then it lands on this Flow. And that's the conversation we're having. Now we've started by putting in a Once Node, which is a Node that only once at the beginning of the session or once when it's hit, goes through this on First Time branch and then it goes through this one Afterwards. The reason why we've done that is because the Voice Gateway conversation always starts with a data-only message and also with a message that doesn't contain text. And if we want to return a welcome, then we can do that here in this part and then stop the Flow. And then the next message that's coming in from the user, it's going to go through this Afterwards branch.

Now we pass things to this Once Node and then here we reach the Cognigy Send Message, Voice Gateway Send Message Node and you can see we have added some SSML, we have an SSML Editor up here. That lets you in a very easy fashion add this Speech Synthesis Markup Language. And if we want to put a break, for example, in, then we can put that in here. We can just edit the text like this. And then here we have another message, which is a next message and you said text. So presumably when we call it, it's going to say Welcome to Cognigy. And then when I answer anything to that, it's just going to mirror that back to me using this message here. So let's let's try that out.

Welcome to Cognigy.

Hello.

Next message, and you said "Hello"

My name is Phillipp.

Next message, and you said, "my name is Philipp"

So that works really well. Let's look at some more advanced features that we would have in here. You can find all the Voice Gateway Nodes here under Extensions, under VG, which stands for Voice Gateway. And here we would have, for example, also the Set Session Parameters Node. Now, there is a lot of configuration options on the Voice Gateway and you can set all of those on a session level or on an activity level. So the message would be an activity. If you set them on a session level, they are active for the remainder of the session. So, for example, I could say Disable STT Punctuation, right. What this does, it strips a full stop from the end of the Speech to Text conversion. So if I say something like airport, it's not going to return airport full stop, but it's just going to return airport, which can be actually quite handy. Now, I save this, then it will now be for the remainder of the session. From this moment on, it will already be the same here.

[And I can give you an example. We just heard the voice. We can, for example, also change the Text-to-Speech voice. So let's change this to Guy. And then now we will hear when I call this bot that from the beginning on - because we are running into this Set Session Parameters Node here first - the voice will be different.

Welcome to Cognigy.

If I drag this Node below, then this will only happen after the activity.

Welcome to Cognigy.

Hello!

Next message and you said "Hello".

Right, so we can control what happens in the conversation here. And what we can also do is we can control it on an activity basis, meaning we only want those settings that I'm going to make to be active for this specific activity, like sent message. For that, we activate the Set Activity Parameters, and then here we can, for example, activate something like Barge in. Barge in stands for the ability to interrupt the bot. So we say, I want to be able to interrupt the bot. Minimum words are zero, so I can say anything that's going to interrupt the bot and it's going to go to the next message. Now, this really only makes sense if this is a very long message. So if we just maybe copy this piece here "Welcome to Cognigy... Welcome to Cognigy... Welcome to Cognigy". And now Barge in is active. I will be able to interrupt the bot on this message only. On this message here I will not be able to interrupt the bot because Barge in is not active for this message. So let's try that out.

Welcome to Cognigy... Welcome to Cognigy...

Hey! Hey!

Next message, and you said "Hey! Hey!"

My name is Philipp.

Interrupt, interrupt.

So, you could see I could interrupt the bot on this message because we set the activity parameter Barge in to True. I could no longer do this here because we haven't set it on the full session basis.

Now, there's a plethora of options that we can set in the Set Session Parameters Node. And I'm just going to go through some of those and I'll show you some fun stuff we can do with that.

When you create the bot and when you set it up in the Voice Gateway, it will have a Speech to Text engine set and it will have a language set for the detection and will also have a language set for the Text to Speech. So the STT and TTS engines are set on a voicebots level, but you can override them from within here as we have seen earlier when we switched it here to Guy, for example. Right. So you can put in another Language code. And then from that moment on, the Speech to Text will recognize in that language. You can disable the STT punctuation, which we already looked at, and you can also change the voice name for the Text to Speech and you can disable the TTS Cache. The TTS Cache means that when we send, for example, a Hello World to the Text to Speech engine, and it is retrieving the speech for that, so the sound "Hello World". It's not going to do that every single time, but it's really only going to do this once and cache it in the Voice Gateway. Now, if we don't want it to be cached for whatever reason, like, for example, for privacy reasons, we can then put this toggle here, and then it's not going to cache anything for the Text to Speech.

Now, let's have some fun with that. I'm just going to clean that up here so we don't have to listen to this all the time. So "Welcome to Cognigy" is fine. And then we are not going to listen to this message here. But I've actually prepared something further down here, which is an If Node, and let me just change this back. So I'm just doing a mod here, so it's this on one input and then this on another input. So the first time we're coming in, it's going to say currently you're speaking in English and you said X, but please say something in German now and then we're setting a session parameter where we're changing the STT to German. And we're changing the TTS also to a German, but Swiss German. Now, when we say the next thing in German that's going to say something like "Du hast X gesagt, bitte jetzt wieder in Englisch" and it's changing the STT and TTS engines back to English. So let's try that out.

Welcome to Cognigy.

Hello!

Currently, you're speaking in English and you said "Hello", but please say something in German now.

Guten Tag

Du hast "Guten Tag" gesagt, jetzt bitte wieder in Englisch

Oh, wow, that was amazing.

Currently, you're speaking in English and you said "oh wow that was amazing", but please say something in German now.

Das ist ja wirklich der Wahnsinn!

Du hast "Das ist ja wirklich der Wahnsinn" gesagt, jetzt bitte wieder in Englisch.

So as we can see, we can switch the languages for both the STT and TTS back and forth using the Set Session Parameters Node, or of course, you could do this also in the activity parameters here if you want.

Right. So for one activity, you could say something in German, for the next activity, you could say something English and you would always have the right accents and so on. Now, let's disabled this again and just put another Send Message Node here.

To look at some of the other parameters that we have, so if we just, uh, mirror the text back at us that's being said.

Now, what we also here have here is DTMF. DTMF stands for the numbers you have on your dial pad on your phone. So you can activate DTMF and now if I press a button on my phone, it's going to send that number. So if I press one, it's going to send one. If I press two, it's going to send two. It's going to arrive in Cognigy as input.text. There's also a specific data event that's coming. But if I press one-two, this is going to be two messages. It's going to send one and it's going to send two. Now let's see what that sounds like.

Welcome to Cognigy.

You said five.

You said two.

So when it said you said two, I really pressed three and two and it only sent me the last one. So of course that's not that useful. Sometimes we want to collect more than one number so we can activate what's called DTMF Collect. DTMF Collect has a timeout, so it stops collecting numbers after X milliseconds. So it's set to 2000 here to up to two seconds is going to stop collecting. It's also going to stop collecting after I've inserted after I have typed five numbers or if I press the DTMF Collect Submit Digit, which is this character here. So let's try that out at once five numbers like for example, a zip code.

Welcome to Cognigy.

You said forty thousand two hundred and twenty-one.

Now I'm going to type only two.

You said 12.

Yeah, so I type two, and so I type one and two, and then I pressed # and then it ended because that is the settings that we made here in the in the DTMF settings.

Barge in we already looked at. I can activate Barge in and I can I can put a minimum number of words that need to be detected in order for Barge in to work. And I can also Barge in on DTMF. Now so when I come back in with DTMF characters if I want. There's a lot more stuff to look at.

I just want to quickly highlight one or two more. Depending on the STT engine you're using, you have specific configuration options. So on Azure you have an STT mode. So you have different modes, dictation, interactive, conversation, and so on. And I'm just, I would ask you to look at the Azure documentation to see what those actually do, but they impact the Speech to Text. How that's that's happening. And if you're using Azure custom speech, then you have a context ID and you can switch between your different Azure customer speech models.

If you're using Google Speech or Text, then again, you have different dictation types, different interaction types and you have context phrases which you can inject. There's actually a really cool feature that will also soon be in the in Azure. But right now it's only in Google where I can say, ok, for example, my last name is something that Speech to Text engines have a hard time recognizing. So I could put my last name in here. And this will then be sent to the Google STT engine saying, hey, this is a word that could come up now, right. And you can send that as a hint essentially, and you can say, well, and I want to boost that word with a number of ten. Right. So this will be boosted tenfold in comparison to other words that might be detected.

There's always the option to add more Session Parameters down here. And you can do that in those settings here. But right now, I don't want to save anymore. Now, there's, of course, also other Nodes that we haven't looked at yet. And I'm just quickly highlighting those.

We have a Hang Up Node that is going to hang up the call after, um, my first message. So pretty standard stuff. Let's quickly turn that off again and do the hang-up.

Welcome to Cognigy.

Hello.

You said "Hello".

So after I said 'Hello", it hung up.

You also have Handover, which hands the call to a different SIP target. So you can you can enter a number here, with the tel-target URI and then it's going to call that number. You can enter a SIP target as well and you can add a Referral URL, which is passed on to the SIP target. So, you can use that or you can have specific SIP headers that you can send into the Voice Gateway as well. But that's really a little bit too advanced now. So, I'm not going to highlight that that more.

We then also have a Node called Play URL and this Node lets you play a WAV file from the Internet. So, what you do here is you just at a media URL, and this can be a WAV file or a raw file. So now we're taking a WAV file here and you can say whether it should cache the audio and whether you can interrupt it. So maybe interrupting here is actually is a good one. So, we allow Barge in while this this file here is played. Right. Let's,try this out.

Welcome to Cognigy.

Hello.

You said Hello. "Great, we're like the Consumer Reports for Investors...".

Barge!

You said...

So, there I barged in and it stopped the playback. By the way, you can do the same thing that you can do here also in the SSML. So, you can say "and I will play audio now" and then you have audio here, there. And you can put this into the source. This will do the same thing, not technically, but to the user it will do the same thing. So, what's going to happen here? It will send all of this to your Text to Speech engine of choice. So, for example, Microsoft Azure and then Microsoft will retrieve this and pipe this into your audio stream, whereas this Play URL thing here actually retrieves this and plays it directly from the Voice Gateway. And then, for example, here you also have the specific settings which you would not have only for the audio portion of the sent message. Let me just change that back otherwise, in my other examples, we will always hear this audio.

Now, there's other Nodes in here as well, like, for example, Call Recording or Agent Assist. These are for specific use cases. For the Call Recording you require a call recording server that you can do a SIP refer to. We don't have this here in this example. And for Agent Assist you also need this configured on your Voice Gateway and your SBC. And we also don't have that. And maybe the last one to show here is Send Meta Data. This sends data via SIP Info messages. So, if your contact center can retrieve or receive those messages and do something with them, like show them to an Agent or something like this, then this is something you can do through the Send Meta Data.

We're also getting information from the customer. We're not just getting their phone number, but the Cognigy Voice Gateway, it's actually extracting a lot of information from that phone number and exposing it in the Flow to help with organizing your call Flows. We can see the information in the Input object, but we can also use it based on some Cognigy Tokens that we have. So if you go, for example, into a condition field, you press control space, you see all the Tokens, and if you type VG for Voice Gateway, you can see some of the Tokens that are being exposed. So in that way you could, for example, say, well, if the caller is calling from a mobile, which is a caller number type mobile, then we send out an SMS. We can say if the caller type is mobile and then here, for example, we could send out an SMS via Twillio or another provider. You can get more information about this feature in the documentation. Here you can go to Endpoints, Voice Gateway, and you can see the number of metadata. You can see all the information that is being exposed.

With that, I want to say thank you very much. I hope you like what you saw. I hope it intrigued you to build phone bots. It is really a lot of fun and we are looking forward to seeing what you will be building with it. Thank you very much.

Thank you very much for watching our session on Cognigy Voice Gateway. Please don't forget to visit our Cognigy.AI Help Center under support.cognigy.com for additional resources and information and a link to our community where you can ask questions or leave us feedback for this episode. Thank you very much for watching and see you next time.

Cognigy Sessions: Building a Phone Bot with Voice Gateway

Cognigy Sessions - Deep Dive Conversational AI

Comments