If you are building a voice agent with LLM and Knowledge AI Integration you may find it difficult to control how the Text-to-Speech services pronounce certain words like product names and company names.
You might also simply want to universally define a certain style of saying words, like making sure "etc." is always pronounced "et cetera" or if you refer to something in your documents as "Ins. No." and it should be pronounced "Insurance number".
A great way to do this is via SSML and an XML based lexicon.
What is a lexicon?
Warning!
Please be aware this has no relation to lexicons in Cognigy for slot matching.
Lexicons are XML files which contain a list of words and their pronunciation. A great article from Microsoft describing these lexicons can be found here:
However, to keep it short: Lexicons use SSML to tell the bot what type of pronunciation to use.
There are a few key things which need be kept in mind however when using lexicons.
- They must be self hosted: Cognigy does not offer hosting for lexicons to you will need to save the file in an external service.
- They must be publicly accessible.
- Not all Text-to-Speeches services can use lexicons.
Hosting Information
TTS Lexicon needs to be hosted in a high availability setup, such as a CDN or a website subpage.
How to set up a lexicon
Let's assume you have a place to save the lexicons. Now we just need to create the template. As mentioned before this needs to be an XML file with the following format:
<?xml version="1.0" encoding="utf-8"?>
<lexicon xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
version="1.0"
alphabet="ipa"
xml:lang="en-US"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon">
<lexeme>
<grapheme>Cognigy</grapheme>
<phoneme>ˈkɔːɡnɪˌdʒi</phoneme>
</lexeme>
<lexeme>
<grapheme>AI</grapheme>
<alias>Ei Eye</alias>
</lexeme>
<lexeme>
<grapheme>Insurance No.</grapheme>
<alias>Insurance Number</alias>
</lexeme>
</lexicon>
In this example we are defining two types of pronunciation:
- Phoneme - Uses the International Phonetic Alphabet in order to define the way the word should be pronounced.
- Alias - Uses the standard pronunciation of the target language to define how the word or phrase should be pronounced.
Language dependency
The phonemes which can be used are dependant on the language defined in the XML. In our example this is en-US, for US English. An article describing what phonemes can be used with which language can be found here: SSML Phonetic Alphabets
Integrating lexicons into a conversation
You can add the lexicon to you bot in a Say or Question Node directly with the following pattern:
<lexicon uri="https://example.com/yourLexicon.xml"/>I am the Cognigy AI insurance bot. Please tell me your Insurance No.
You can also use CognigyScript if you want to add more dynamic information, for example an answer via generative AI:
Now when the bot is called it will use the pronunciations in the lexicon by default.
Comments
0 comments