Introduction to Speech Synthesis Markup Language - SSML
Speech Synthesis Markup Language (SSML) is an XML-based markup language for the Web and other applications that enable access to the functionalities using speech. This markup language is used for speech interaction with an application or web content. Thus the applications that are developed using this Speech Synthesis Markup Language are so rich in interaction with the user.
Speech Synthesis Markup Language provides the authors to control speech in many ways. The pronunciation, volume, gender of speech, and other properties of speech can be controlled by Speech Synthesis Markup Language. This markup language is a W3C recommendation for the standards sought by the Voice Browser Working Group.
The purpose of Speech Synthesis Markup Language is to assist in the synthesis process which provides the output of an SSML document as speech. The different elements of SSML assist in different stages of the speech synthesis process. It is better to know the different stages of the speech synthesis process. The different stages of the synthesis process are:
1. XML parse
The above six stages of synthesis are important for the conversion of the SSML document as voice output.
The first stage of the synthesis is XML parse, during which an XML parser is used for extracting the content from the document tree of the SSML document. Based on the extracted content the structure of the document is analyzed during the next stage. This stage of the synthesis process which does the structure analysis influences the voice output that results out of synthesis. The order in which the voice output is given depends on the structure analysis stage.
The third stage of the synthesis process which is the Text Normalization. During this stage it is determined what should be spoken out for the word in the document. Each and every language has different ways of voice output for the same content. Hence this has to be decided at this stage. For example, if the document contains 3/4, the voice output can be three quarters or third April or fourth March. This kind of decisions is taken at this stage of the synthesis process.
In the next stage of the process which is the Text-to-phoneme conversion, the word decided in the earlier stages are broken into phonemes which are the basis of pronunciation. Prosody Analysis is the stage in which the pitch, timing, pausing, and emphasis on the words are analyzed. The properties are called Prosodic features which are very important for speech output. The elements such as emphasis, break, and prosody are used in the SSML document to assist in this stage of synthesis.
The final stage is the waveform production during which the output is in the form of audio. The information that is got in the fourth and fifth stage of the process, namely the Text-to-phoneme conversion stage and the Prosody Analysis stage are used in the final stage for the audio output. The voice element that is used in the document can request a particular type of voice such as male or female. An audio element is also used to insert audio files that are to be played.
The structure of an SSML document can be well understood with an example which is given below:
<sub alias="International Phonetic Association">IPA</sub>
element is the root element of an SSML document. You may note that the
version of the SSML is given as a value in the attribute version
of the <speak> element. The appropriate namespaces and the location
of the SSML schema are also included in this element. The language in
which the voice is to be produced is also given in the xml:lang attribute.
This xml:lang attribute is also allowed in the <p> and the <s>
element. The <p> element is a paragraph element which may have sentences
in the <s> element.
element in the SSML document is an <lexicon> element. This element
gives the URI of the pronunciation lexicon document. There may be cases
where you would like to use a female voice to render some content. In
such scenario you can used the <voice> element which has a gender
attribute. If you set the gender attribute to female, the
content is delivered in a female voice. The voice element also has an
name attribute where you specify the name of the person whose voice should
There are many elements in Speech Synthesis Markup Language which help in rendering rich voice content for the web or any applications. For details of all the elements of SSML you may check out specification in the URL http://www.w3.org/TR/speech-synthesis/.
Visit XML Training Material Guide Homepage
Amazon and the Amazon logo are trademarks of Amazon.com, Inc. or its affiliates.Copyright - © 2004 - 2021 - All Rights Reserved.