index.html 50.6 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="EN">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Speech Synthesis Markup Language Version 1.1 Requirements</title>
<style type="text/css" xml:space="preserve">
/**/
pre.example {
   font-family: monospace;
   white-space: pre;
   background: #CCCCFF;
   border: solid black thin;
   margin-left: 0;
   padding: 0.5em;
   font-size: 85%;
   width: 97%;
}
pre.dtd {
   font-family: "Lucida Console", "Courier New", monospace;
   white-space: pre;
   background: #CCFFCC;
   border: solid black thin;
   margin-left: 0;
   padding: 0.5em;
}

.ipa { font-family: "Lucida Sans Unicode", monospace; }

table { width: 100% }
td { background: #EAFFEA }

.tocline { list-style: disc; list-style: none; }
.hide { display: none }
.issues { font-style: italic; color: green }

.recentremove {
    text-decoration: line-through;
    color: black;
}
.recentnew {
        color: red;
}

.remove {
    text-decoration: line-through;
    color: maroon;
}
.new {
        color: fuchsia;
}
.elements {
    font-family: monospace;
    font-weight: bold;
}
.attributes {
    font-family: monospace;
    font-weight: bold;
}
code.att {
    font-family: monospace;
        font-weight: bold;
}
a.adef {
    font-family: monospace;
    font-weight: bold;
}
a.aref {
    font-family: monospace;
    font-weight: bold;
}
a.edef {
    font-family: monospace;
    font-weight: bold;
}
a.eref {
    font-family: monospace;
    font-weight: bold;
}
.tocline1 {list-style: disc; list-style: none; }


    /**/
</style>
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" />
  </head>

<body>
  <div class="head">
    <p>
      <a href="http://www.w3.org/" shape="rect">
      <img height="48" width="72" alt="W3C" src="http://www.w3.org/Icons/w3c_home" /></a>    </p>

    <h1 class="notoc" id="h1">Speech Synthesis Markup Language Version 1.1 Requirements</h1>
    <h2 class="notoc" id="date">W3C Working Draft <i> <span>11 June</span> 200</i><span>7</span></h2>

    <dl>
      <dt>This version:</dt>
      
      
      <dd>
        <a href="http://www.w3.org/TR/2007/WD-ssml11reqs-20070611/" shape="rect">http://www.w3.org/TR/2007/WD-ssml11reqs-20070611/</a>
	  </dd>

      <dt>Latest version:</dt>
      <dd>
        <a href="http://www.w3.org/TR/ssml11reqs/" shape="rect">http://www.w3.org/TR/ssml11reqs/</a>
      </dd>

      <dt>Previous version:</dt>
      
      <dd>
        <a href="http://www.w3.org/TR/2006/WD-ssml11reqs-20061219/" shape="rect">http://www.w3.org/TR/2006/WD-ssml11reqs-20061219/</a>
	  </dd>

	  <dt>Editors:</dt>
      <dd>Daniel C. Burnett, Nuance </dd>
      <dd>双志伟 (Zhi Wei Shuang), IBM</dd>

	  <dt>Authors:</dt>
      <dd>Scott McGlashan, HP</dd>
	  <dd>Andrew Wahbe, Genesys</dd>
      <dd>夏海荣 (Hairong Xia), Panasonic</dd>
      <dd>严峻 (Yan Jun), iFLYTEK</dd>
      <dd>吴志勇 (Zhiyong Wu), Chinese University of Hong Kong</dd>
	  <dd />
    </dl>

<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &copy; 2007 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>&reg;</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>

    <hr />
  </div>

<h2><a id="abstract" name="abstract" shape="rect">Abstract</a></h2>

<p>In 2005<span>,</span>  2006<span>, and 2007</span> the W3C held workshops to understand the ways, if any, in which the design of SSML 1.0 limited its usefulness for authors of applications in Asian, Eastern European, and Middle Eastern languages. In 2006 an SSML subgroup of the W3C Voice Browser Working Group was formed to review this input and develop requirements for changes necessary to support those languages. This document contains those requirements. </p>
  
<h2 class="notoc">Status of this Document</h2>

<p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/" shape="rect">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p>

<p> This is the  <span>11 June</span> 200<span>7</span> W3C Working Draft of          "<span class="notoc">Speech Synthesis Markup Language Version 1.1 Requirements</span>".</p>
<p>This document describes the requirements for changes to the SSML 1.0 specification required to fulfill the charter given in [<a href="#charter" shape="rect">Section 1.2</a>].
  This is  <span>the second</span> Working Draft. The group does not expect this document to become a W3C Recommendation.<span> Changes since the previous version are listed in <a href="#changes" shape="rect">Appendix A</a>. </span></p>
<p>This document has been produced as part of the <a href="http://www.w3.org/Voice/Activity.html" shape="rect">W3C Voice Browser Activity</a>,
following the procedures set out for the <a href="http://www.w3.org/Consortium/Process/" shape="rect">W3C Process</a>. The
authors of this document are members of the <a href="http://www.w3.org/Voice/" shape="rect">Voice Browser Working
Group</a>. You are encouraged to subscribe to
the public discussion list &lt;<a href="mailto:www-voice@w3.org" shape="rect">www-voice@w3.org</a>&gt; and to mail us
your comments. To subscribe, send an email to &lt;<a href="mailto:www-voice-request@w3.org" shape="rect">www-voice-request@w3.
org</a>&gt; with the word <em>subscribe</em> in the subject line
(include the word <em>unsubscribe</em> if you want to unsubscribe).
A <a href="http://lists.w3.org/Archives/Public/www-voice/" shape="rect">public
archive</a> is available online.</p>

<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/" shape="rect">5 February 2004 W3C Patent Policy</a>. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34665/status" shape="rect">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential" shape="rect">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure" shape="rect">section 6 of the W3C Patent Policy</a>. </p>

<p>Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>

<h2>
  <a id="toc" name="toc" shape="rect">Table of Contents</a></h2>
<ul class="toc">
  <li class="tocline">1. <a href="#intro" shape="rect">Introduction</a></li>
  <li class="tocline">2. <a href="#general" shape="rect">General Requirements</a></li>
  <li class="tocline">3. <a href="#consistency" shape="rect">Speech Interface Framework Consistency  Requirements</a></li>
  <li class="tocline">4. <a href="#wordboundary" shape="rect">Token/Word Boundary Requirements</a></li>
  <li class="tocline">5. <a href="#script" shape="rect">Phonetic Alphabet and Pronunciation Script Requirements</a></li>
  <li class="tocline">6. <a href="#lang" shape="rect">Language Category Requirements</a></li>
  <li class="tocline">7. <a href="#name" shape="rect">Name/Proper Noun Identification Requirements</a></li>
  <li class="tocline">8. <a href="#future" shape="rect">Future Study</a></li>
  <li class="tocline">9. <a href="#ref" shape="rect">References</a></li>
  <li class="tocline">10. <a href="#acks" shape="rect">Acknowledgements</a></li>
  <li class="tocline">Appendix A <a href="#changes" shape="rect">Changes since the previous version </a></li>
</ul>
<h2 id="g1">
  <a id="intro" name="intro" shape="rect">1. Introduction</a></h2>

<p>This document establishes a prioritized
list of requirements for speech synthesis markup which any
proposed markup language should address. This document addresses
both procedure and requirements for the specification development.
In addition to general requirements, the requirements are addressed in separate sections on Speech Interface Framework Consistency, 
Token/Word Boundary, Phonetic Alphabet and Pronunciation Script, Language Category, and Name/Proper Noun Identification Requirements, followed by <a href="#future" shape="rect">Future Study</a> and <a href="#acks" shape="rect">Acknowledgements</a> sections.</p>

<h3 id="g11"><a id="background" name="background" shape="rect">1.1 Background and motivation </a></h3>
<p>As a W3C standard, one of the aims of SSML (see [<a href="#ref-SSML" shape="rect">SSML</a>] for description) is to be suitable and convenient for use by application authors and vendors worldwide. A brief review of the most broadly-spoken world languages [<a href="#ref-languages" shape="rect">LANGUAGES</a>] shows a number of languages that are in large commercial or emerging markets for speech synthesis technologies but for which there was limited or no participation by either native speakers or experts during the development of SSML 1.0. To determine in what ways, if any, SSML is limited by its design with respect to supporting these languages, the W3C held  <span>three</span> workshops on the Internationalization of SSML. The first workshop [<a href="#ref-WS" shape="rect">WS</a>], in Beijing, PRC, in October 2005, focused primarily on Chinese, Korean, and Japanese languages, and the second [<a href="#ref-WS2" shape="rect">WS2</a>], in Crete, Greece, in May 2006, focused primarily on Arabic, Indian, and Eastern European languages.<span> The third workshop [<a href="#ref-WS3" shape="rect">WS3</a>], in Hyderabad, India, in January 2007, focused heavily on Indian and Middle Eastern languages. </span></p>
<p>These  <span>three</span> workshops resulted in excellent suggestions for changes to SSML, describing the ways in which SSML 1.0 has been extended and enhanced around the world. An encouraging result from the workshops was that many of the problems might be solvable using similar, if not identical, solutions. In fact, it may be possible to increase dramatically the usefulness of SSML for many application authors around the world by making a limited number of carefully-planned changes to SSML 1.0. That is the goal of this effort. </p>

<h3 id="g12"><a name="charter" id="charter" shape="rect" />1.2 SSML 1.1 subgroup charter</h3>
<p>The scope for a W3C recommendation for SSML 1.1 is modifications to SSML 1.0 to </p>
<ol>
  <li>Provide broadened language support
    <ol type="a">
        <li>For Mandarin, Cantonese, Hindi*, Arabic*, Russian*, Korean*, and  Japanese, we will identify and address language phenomena that must be  addressed to enable support for the language. Where possible we will  address these phenomena in a way that is most broadly useful across  many languages. We have chosen these languages because of their economic  impact and expected group expertise and contribution. </li>
        <li>We will also consider phenomena of other languages for which  there is both sufficient economic impact and group expertise and  contribution.</li>
    </ol>
  </li>
  <li>Fix incompatibilities with other Voice Browser Working Group  languages, including Pronunciation Lexicon Specification [<a href="#ref-PLS" shape="rect">PLS</a>], Speech  Recognition Grammar Format [<a href="#ref-SRGS" shape="rect">SRGS</a>], and VoiceXML 2.0/2.1 (e.g., caching  attributes and error processing.) [<a href="#ref-VXML2" shape="rect">VXML2</a>, <a href="#ref-VXML21" shape="rect">VXML21</a>].</li>
</ol>

<p>VCR-like controls are out of scope for SSML 1.1. We may discuss  &lt;say-as&gt; (see [<a href="#ref-sayas" shape="rect">SAYAS</a>]) issues that are related to the SSML 1.1 work above and  collect requirements for the next document that addresses  &lt;say-as&gt; values. We will not create specifications for additional  &lt;say-as&gt; values but may publish a separate Note containing the  &lt;say-as&gt; requirements specifically related to the SSML 1.1 work.  We will follow <a href="http://www.w3.org/2005/10/Process-20051014/tr.html#Reports" shape="rect">standard W3C procedures</a>.</p>
<p>* provided there is sufficient group expertise and contribution for these languages</p>
  <h3 id="g13"><a name="process" id="process" shape="rect" />1.3 Requirements development process </h3>
<p>The <a href="#general" shape="rect"><span>G</span>eneral <span>R</span>equirements</a> in section 2 arose out of SSML-specific and general Voice Browser Working Group discussions. The  <a href="#consistency" shape="rect">Speech Interface Framework Consistency Requirements</a>  in section 3 were generated by the Voice Browser Working Group. The SSML subgroup developed the <a href="#charter" shape="rect">charter</a>. The remaining requirements were  then developed as follows:</p>
<p>First, the SSML subgroup grouped topics presented and discussed at the workshops (see <a href="#background" shape="rect">Section 1.1</a>)  into the following categories: </p>
<dl>
  <dt>Short-term</dt>
  <dd>The group agrees to work on these topics. </dd>
  <dt>Long-term</dt>
  <dd>After the short-term work is complete the group will revisit these topics to determine whether or not they belong in the scope of SSML 1.1 and can be completed by the SSML subgroup. </dd>
  <dt>Experts needed</dt>
  <dd>We need experts in other relevant languages to actively participate in the subgroup before we can make the decision to work on these topics in this subgroup. </dd>
  <dt>Other SSML work  </dt>
  <dd>These topics are out of scope for SSML 1.1. These items belong in SSML 2.0 or later, a separate &lt;say-as&gt; Note (see [<a href="#ref-sayas" shape="rect">SAYAS</a>]), etc. </dd>
</dl>
<p>The following table shows how the topics were categorized. There is no implied ordering within each column. </p>
<table width="100%" border="1" summary="Table of topics by category">
  <tr>
    <th scope="col" rowspan="1" colspan="1">Short-term (group agrees to work on this) </th>
    <th scope="col" rowspan="1" colspan="1">Long-term (after short-term work will revisit to determine if belongs in group) </th>
    <th scope="col" rowspan="1" colspan="1">Experts needed (in order to make decision to work on this in this subgroup) </th>
    <th scope="col" rowspan="1" colspan="1">Other SSML work (SSML 2.0 or later, &lt;say-as&gt; Note, etc. </th>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">Token/word boundaries </td>
    <td rowspan="1" colspan="1">Tones</td>
    <td rowspan="1" colspan="1">Providing number, case, gender agreement info </td>
    <td rowspan="1" colspan="1">Special words </td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">Phonetic alphabets </td>
    <td rowspan="1" colspan="1">Expand Part-Of-Speech support</td>
    <td rowspan="1" colspan="1">Syllable markup </td>
    <td rowspan="1" colspan="1">Tone sandhi </td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">Verify that RFC3066 language categories are complete enough that we do not need anything new beyond xml:lang to identify languages and dialects </td>
    <td rowspan="1" colspan="1">Text with multiple languages (changing xml:lang without changing voice; separately specifying language of content and language to speak) </td>
    <td rowspan="1" colspan="1">Diacritics, SMS text, simplified/alternate text </td>
    <td rowspan="1" colspan="1">Enhance prosody rate to include "speech units per time unit" where speech units would be syllable, mora, phoneme, foot, etc. and time unit would be seconds, ms, minutes, etc.(would address mora/sec request) </td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">Chinese names (say-as requirements) </td>
    <td rowspan="1" colspan="1"> </td>
    <td rowspan="1" colspan="1">Sub-word unit demarcation and annotation </td>
    <td rowspan="1" colspan="1">Background sound (may be handled best by VoiceXML3 work) </td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1">Ruby</td>
    <td rowspan="1" colspan="1"> </td>
    <td rowspan="1" colspan="1">Transliteration</td>
    <td rowspan="1" colspan="1">Expressive elements </td>
  </tr>
  <tr>
    <td rowspan="1" colspan="1"> </td>
    <td rowspan="1" colspan="1"> </td>
    <td rowspan="1" colspan="1"> </td>
    <td rowspan="1" colspan="1">Sentence structure </td>
  </tr>
</table>
<p>Next, for each topic in the Short-term list, we developed one or more problem statements. Where applicable, the problem statements have been included in this document.<br clear="none" />
We then generated requirements to address the problem statements.</p>
<p>It is interesting to note that the three Long-term topics have been addressed by the requirements developed while working on the Short-term topics: Tones are addressed via the pronunciation alphabets, Part-Of-Speech support may be at least partially addressed via requirement 4.2.3, and Text with multiple languages is being addressed as part of the language category requirements.</p>
<p>The topics in the remaining two categories (Experts needed and Other SSML work) are listed and briefly described in the <a href="#future" shape="rect">Future Study</a> section. </p>

<h2 id="g2"><a name="general" shape="rect">2. General Requirements</a></h2>

<h3 id="g21">2.1 Backwards <span>c</span>ompatibility</h3>
<p>SSML 1.1 should be backwards compatible with SSML 1.0 except where modification is necessary to satisfy other requirements in this document.</p>

<h3 id="g22">2.2 Use of IRIs instead of URIs</h3>
<p>SSML 1.1 may use Internationalized Resource Identifiers [<a href="#ref-rfc3987" shape="rect">RFC3987</a>] instead of URIs.    </p>

<h2 id="g3"><a name="consistency" shape="rect">3. Speech Interface Framework Consistency Requirements </a> </h2>
<p>This section must include requirements that make SSML consistent with the other Speech Interface Framework specifications, including VoiceXML 2.0/2.1, PLS, SRGS, and SISR in both behavior and syntax, where possible. </p>

<h3 id="g31">3.1 Caching attributes</h3>

<h4 id="g311">3.1.1 &lt;audio&gt; caching attributes</h4>
<p>SSML must support the maxage and maxstale attributes for the &lt;audio&gt; element as supported in VoiceXML 2.1.<br clear="none" />
  SSML lacks these attributes, so it is not clear how SSML enforces
(or even has) a caching model for audio resources.</p>

<h4 id="g312">3.1.2 &lt;lexicon&gt; caching attributes</h4>
<p>SSML must support the maxage and maxstale attributes for the &lt;lexicon&gt; element.</p>

<h4 id="g313">3.1.3 Caching defaults</h4>
<p>SSML should provide a mechanism for an author to set default values for the maxage and maxstale attributes. </p>

<h3 id="g32">3.2 Error messages in VoiceXML 3.0</h3>
<p> SSML should provide error messages  and include detail.</p>
<p>SSML 1.0 defines <em>error</em>  <span>[</span>SSML <span>§</span>1.5<span>]</span> as "Error 
  Results are undefined. A conforming synthesis processor may detect and
  report an error and may recover from it." Note that in the case of an &lt;audio&gt;
  where there is a protocol error fetching the URI resource, or whether
  the resource cannot be played, VoiceXML might log this information in its
  session variables. 
  The
  error information likely to be required: URI itself, protocol response
  code and a reason (textual description). It is expected that the SSML
  processor would recover from this error (play fallback content if
  specified, or ignore the element). </p>
  
<h3 id="g33">3.3 <span>"</span>type<span>"</span> attribute</h3>
<p>The &lt;audio&gt; element should be extended with a type attribute to 
  indicate
the media type of the URI. It may be used</p>
<ol type="a">
  <li>to indicate to the web
    server a preferred mime type, and</li>
  <li> to indicate the type of resource
    where such information isn't already covered by the protocol (e.g. file
    protocol).</li>
</ol>
<p> The handling of the requested type versus an authoritative
type returned by a protocol would follow  the <span>same approach</span> described for the type in &lt;lexicon&gt; [<a href="#ref-SSML" shape="rect">SSML</a> Section 3.1.4]. On a type mismatch, the processor should play the audio if it can. </p>

<h3 id="g34">3.4 VCR controls in VoiceXML</h3>
<p>SSML should be modified as necessary to operate effectively with VCR controls VoiceXML is looking to introduce.</p>
<p>3.4.1 SSML 1.1 should provide a mechanism to indicate that only a subset of the entire &lt;speak&gt; content is to be rendered. This mechanism should allow designation of  the start and end of the subset  based on time  offsets from the beginning of the &lt;speak&gt; content, the end of the &lt;speak&gt; content, and marks within the content.</p>
<p>3.4.2 It would be nice if SSML 1.1 provided a mechanism to indicate that only a subset of the content of an &lt;audio&gt; element is to be rendered. This mechanism, if provided, should allow designation of the start and end of the subset based on time offsets from the beginning of the &lt;audio&gt; content, the end of the &lt;audio&gt; content, and marks within the content.</p>
<p>3.4.3
  SSML 1.1 should provide a mechanism to adjust the speed of the rendered &lt;speak&gt; content.</p>
<p>3.4.4 It would be nice if SSML 1.1 provided a mechanism to either adjust or set the average pitch of the rendered &lt;speak&gt; content. </p>
<p>3.4.5 SSML 1.1 should provide a mechanism to either adjust or set the volume of the rendered &lt;speak&gt; content. </p>

<h3 id="g35">3.5 Lexicon synchronization</h3>
<p>Authors must be given explicit control over which &lt;lexicon&gt;-specified lexicons are active for which portions of the document. This will allow explicit activation/deactivation of lexicons. </p>

<h3 id="g36">3.6 Prefetching <span>s</span>upport</h3>
<p> It would be nice if SSML were modified to support prefetching of audio as defined by the "fetchhint" attribute of the &lt;audio&gt; tag  in VoiceXML 2.0 [<a href="#ref-VXML2" shape="rect">VXML2</a>]. The exact mechanism used by the VoiceXML interpreter to instruct the SSML processor to prefetch audio may be out of scope. However, SSML should at a minimum recommend behavior for asserting audio resource freshness at the point of playback. This clarifies  how audio resource prefetching and caching behaviors interact.</p>

<h3 id="g37">3.7 External reference to text structure </h3>
<p>SSML 1.1 must provide a way to uniquely reference <span>&lt;</span>p<span>&gt;</span>, <span>&lt;</span>s<span>&gt;</span>, and the new word-level element (see <a href="#wordboundary" shape="rect">Section 4</a>) for cross-referencing by external documents.</p>

<h2 id="g4"> <a id="wordboundary" name="wordboundary" shape="rect">4. Token/Word Boundary Requirements</a> </h2>
<p>This section must include requirements that address the following problem statement:</p>
<blockquote>
  <p>All TTS systems make use of word boundaries to do synthesis.  All Chinese/Thai/Japanese systems today must do additional processing to identify word boundaries because white-space is not normally used as a boundary identifier in written language.  In this processing, errors that occur can cause poorer output quality and even misunderstandings.  Overall TTS performance for these systems can be improved if document authors can hand-label the word boundaries where errors are expected or found to occur. </p>
</blockquote>

<h3 id="g41">4.1 Word boundary disambiguation </h3>
<p>SSML <span>1.1</span> must provide a mechanism to eliminate word segmentation ambiguities. This is necessary in order to render languages</p>
<ul>
  <li> that frequently do not use white-space as a boundary identifier, such as Chinese, Thai, and Japanese</li>
  <li>that use white space for syllable segmentation, such as Vietnamese</li>
  <li>that use white space for other purposes, such as Urdu</li>
</ul>
<p>Resulting benefits can include improved  cues for prosodic control (e.g., pause) and may assist the synthesis processor in selection of the correct pronunciation for homographs.</p>

<h3 id="g42">4.2 Annotation of words </h3>
<p>4.2.1 SSML <span>1.1</span> must provide a mechanism for annotating words.</p>
<p>4.2.2 SSML <span>1.1</span> must standardize an annotation of the language using mechanisms similar to those used elsewhere in the specification to identify language.</p>
<p>4.2.3 SSML <span>1.1</span>  must standardize a mechanism  to refer to the correct pronunciation in the Pronunciation Lexicon Specification, in particular when there are multiple pronunciations for the same orthography. This will enhance the existing implied correspondence between words and pronunciation lexicons.</p>

<h2 id="g5"> <a id="script" name="script" shape="rect">5. Phonetic Alphabet and Pronunciation Script Requirements</a> </h2>
<p>This section must include requirements that address the following problem statement:</p>
<blockquote>
  <p>Although IPA (and its textual equivalents) provides a way to write every pronunciation for every language, for some languages there are alternative pronunciation scripts (not necessarily phonetic/phonemic) that are already widely known and used; these scripts may still require some modifications to be useful within SSML.  SSML requires support for IPA and permits any string to be used as the value of the "alphabet" attribute in the &lt;phoneme&gt; element.  However, TTS vendors for these languages want a standard reference for their pronunciation scripts.  This might require extra work to define a standard reference. </p>
</blockquote>

<h3 id="g51">5.1 Registry for <span>a</span>lternative <span>p</span>ronunciation <span>s</span>cripts</h3>
<p>5.1.1 SSML 1.1 must enable the use of values for the "alphabet" attribute of the &lt;phoneme&gt; element that are defined in a registry that can be updated independent of SSML. This registry and its registration policy must be defined by the SSML subgroup.</p>
<p>The intent of this change is to encourage the standardization of alternative pronunciation scripts, for example Pinyin for Mandarin, Jyutping for Cantonese, and Ruby for Japanese. </p>
<p>As part of the discussion on the registration policy, the SSML subgroup should consider the following: </p>
<ul>
  <li>Permitting non-conflicting vendor-specific entries in the registry </li>
  <li>For some languages, their alternative pronunciation scripts were created before the widespread usage of computers. These scripts cannot easily be input on computers. As a result, there are some widely accepted modified versions of these pronunciation scripts designed for computer input. For example, when typing Mandarin Pinyin into a computer, “v” is usually used to replace “ü”. We may support the widely accepted modified versions of pronunciation scripts. </li>
</ul>
<p>5.1.2 The registry named in 4.1.1 should be maintained through IANA. </p>

<h2 id="g6"> <a id="lang" name="lang" shape="rect">6. Language Category Requirements</a> </h2>
<p>This section must include requirements that address the following problem statement:</p>
<blockquote>
  <p>The xml:lang attribute in SSML is the only way to identify the language.  It represents both the natural (human) language of the text content and the natural (human) language the synthesis processor is to produce.  For languages whose scripts are ideographs rather than pronunciation-related, we are not sure that the permitted values for xml:lang, as specified by RFC3066, are detailed enough to distinguish among languages (and their dialects) that use the same ideographs. </p>
</blockquote>

<h3 id="g61">6.1 Successor to RFC3066 support </h3>
<p>SSML 1.1 must ensure the use of a version of xml:lang that uses the successor specification to <span>RFC3066</span> [<a href="#ref-rfc3066" shape="rect">RFC3066</a>] (for example, <span>BCP47</span> [<a href="#ref-bcp47" shape="rect">BCP47</a>]).</p>
<p>This will provide sufficient flexibility to indicate all of the needed languages, scripts, dialects, and their variants. </p>

<h3 id="g62">6.2 xml:lang requirements</h3>
<p>6.2.1 SSML 1.1 must clearly state that the 'xml:lang' attribute  identifies the language of the content.</p>
<p>6.2.2 SSML 1.1 must clearly state that processors are expected to determine how to render the content based on the value of the 'xml:lang' attribute and must document expected rendering behavior for the xml:lang values they support. </p>
<p>6.2.3 SSML 1.1 must specify that selection of xml:lang and voice are independent. It is the responsibility of the TTS vendor to decide and document which languages are supported by which voices and in what way. </p>

<h2 id="g7"><a id="name" name="name" shape="rect">7. Name/Proper Noun Identification Requirements</a></h2>
<p>This section must include requirements on a future version of &lt;say-as&gt; to support better interpretation of Chinese names and Korean proper nouns.</p>
<p> In some languages, it is necessary to do some special handing to identify  names/proper nouns.   For example, in some Asian languages, the pronunciation of characters used in Chinese surnames and Korean proper nouns will change. If the name/proper noun is properly marked, there is a predictable pronunciation for it.     Such a requirement is crucial and must be satisfied because, in languages such as Chinese and Korean, there is no obvious tag to identify names/proper nouns from other contents (e.g. there is no capitalization as used in English) and it is often difficult for the speech synthesis processor to automatically identify all the names/proper nouns properly.</p>
<p>It is also important to identify which part of a name is the surname and which part(s) is/are the given name(s) since there might be  several patterns of different surname/given name combinations. For example,</p>
<ul>
  <li> the Chinese name "司馬光" (in Chinese pinyin /si1 ma3 guang1/) has the surname "司馬" (two characters, /si1 ma3/) and given name "光" (one character, /guang1/)</li>
  <li> the Chinese name "司先超" (in Chinese pinyin /si1 xian1 chao1/) has the surname "司" (only one character, /si1/) and given name "先超" (two characters, /xian1 chao1/). </li>
</ul>

<h3 id="g71">7.1 Identify content as proper noun </h3>
<p>A future version of SSML must provide a mechanism to identify content as a proper noun.</p>

<h3 id="g72">7.2 Identify content as name</h3>
<p>A future version of SSML must provide a mechanism to identify content as a name. This might be done by creating a new "name" value for the interpret-as attribute of the &lt;say-as&gt; element, along with appropriate values for the format and detail attributes. </p>

<h3 id="g73">7.3 Identify name sub-content as surname</h3>
<p>A future version of SSML must provide a mechanism to identify a portion of a name as  the surname. </p>

<h2 id="g8"><a id="future" name="future" shape="rect">8. Future Study</a></h2>
<p>This section contains issues that were identified during
requirements capture but which have not been directly incorporated
into the current set of requirements. The descriptions are not intended to be exhaustive but rather to give a brief explanation of the core idea(s) of the topics. </p>

<h3 id="g81">8.1 Number, gender, case agreement </h3>
<p>Japanese, Hungarian, and Arabic words all vary by number,  gender, case, and/or category. An example difficulty occurs in reading numeric values from news feeds, since the actual spoken numbers may change based on implied context. By providing this context the synthesizer can generate the proper word.</p>

<h3 id="g82">8.2 Syllable markup</h3>
<p>The two main use cases/motivations for this capability are </p>
<ol>
		  <li>boundary delineation/foundational unit: For languages that are syllable-based or for which syllable boundaries are important (e<span>.</span>g., for morphological analysis), this capability could be quite useful. It may be that other existing requirements for arbitrary pronunciation alphabets can mitigate this somewhat by allowing authors to use a boundary-marking alphabet targeted at their own language. </li>
		  <li>desire for prosodic or other markup at this level: This is a special case of <a href="#subword" shape="rect">Section 8.4</a>, below. </li>
</ol>
		<p>The current belief is that this markup is not needed in order to accomplish the stated objectives of SSML 1.1. Since markup of syllables and particularly the use of prosodic markup at a syllable level challenges the implicit word-level foundation of SSML 1.0, changes of this nature are likely to be far-reaching in consequence for the language. Unless this is later discovered to be necessary, this work should wait for a fuller rewrite of SSML than is anticipated for SSML 1.1. </p>
		
<h3 id="g83">8.3 Diacritics, SMS text, simplified/alternate text </h3>
<p>There are a number of cases where SSML is used to render other-than-traditional forms of text. The most common of these appears to be mobile text messages. It is fairly common to see significantly abbreviated text (such as "cul8r" for "see you later" in English) and, for non-English languages, text that does not properly use native character sets. Examples include dropped diacritics in Polish (eg., the word pączek written as the word paczek) or the use of the three-symbol string '}|{' to represent the Russian letter 'Ж'.</p>

<h3 id="g84"><a name="subword" shape="rect">8.4 Sub-word unit demarcation and annotation</a></h3>
<p>In Chinese, the foundational writing unit is the character, and although there may be many different pronunciations for a given character, each pronunciation is only a single syllable. It is thus common in Chinese synthesis processors to be able to control prosodic information such as contrastive stress at the syllable level.</p>
		<p>Hungarian is a highly agglutinative language whose significant morphological variations  are represented in the orthography. Thus, contrastive stress may need to be marked at a sub-word level. For example, “Nem a dobo<strong>zon</strong>, hanem a dobo<strong>zban</strong> van a könyv” means “The book is not <strong>in</strong> the box, but <strong>on</strong> the box.”</p>
		<p>Note that the approaches currently being considered to address the requirements in <a href="#wordboundary" shape="rect">Section 4</a> may provide a limited ability to do sub-word prosodic annotation. </p>

<div>
<h3 id="g85">8.5 Transliteration </h3>

<p>Many of the languages on the Indian subcontinent are based on a common set of underlying phonemic units and have writing systems (scripts) that are based on these underlying units. The scripts for these languages may differ substantially from one another, however, and from the historic Indian script specifically designed for writing pronunciations. Additionally, because of the spread of communication systems in which it is easier to write in Latin scripts (or ASCII, in particular) than in native scripts, India has seen a proliferation of ASCII-based writing systems that are also based on the same underlying phonemic units. Unfortunately, these ASCII-based writing systems are not standardized.</p>

<p>The challenge for speech synthesis systems today is that the system will often use several lexicons, each of which uses a different pronunciation writing system. Pronunciations given inline by an author may also be in a different (and potentially non-standard) writing system. This challenge is currently addressed for Indian speech synthesis systems by using <em>transliteration</em> among <em>code pages</em>. Each <em>code page</em> describes how a particular writing system maps into a canonical writing system. It is thus possible for a synthesis processor to know how to convert any text into a representation of pronunciation that can be looked up in a lexicon. </p>

<p>Although the need to use different pronunciation alphabets will be addressed for standard alphabets, i.e., those for the different Indian languages, to address the user-specific ASCII representations a more generic mapping facility might be needed. Such a capability might also address the common issue of how to map mobile phone short message text into the standard grapheme representations used in a lexicon.</p>
</div>

<h3 id="g86">8.<span>6</span> Special words </h3>

        <p>Many new values for the "interpret-as" attribute of the &lt;say-as&gt; element have been suggested. Common ones include URI, email address, postal address, and email. Although clearly useful, these values are similar, if not identical, to ones considered during the development of the Say-as Note [<a href="#ref-sayas" shape="rect">SAYAS</a>]. It is not clear which, if any, of the values suggested are critically, or at least more, necessary for languages other than those for which SSML 1.0 works well today. These suggestions from the workshops may be incorporated into future work on the &lt;say-as&gt; element<span>, which is</span> outside the <span>scope of the</span> SSML 1.1 effort.</p>

        <h3 id="g87">8.<span>7</span>  Tone Sandhi</h3>
<p>When the nominal tones of sequences of syllables in Chinese match certain patterns, the actual spoken tones change in predictable ways. For example, in Mandarin if two tone 3 syllables occur together, the first will actually be pronounced as tone 2 instead of tone 3. Similar, but different, rules apply for Cantonese and for the many other spoken languages that use the written Han characters. This need may be addressed sufficiently by other requirements in this document.</p>

<h3 id="g88">8.<span>8</span> More flexible prosody rate</h3>
<p>The rate attribute of the &lt;prosody&gt; element in SSML 1.0 only allows for relative changes to the speech rate, not absolute settings. A primary reason for this was lack of agreement on what units would be used to set the rate -- phonemes, syllables, words, etc. With the feedback received so far, it would be possible to enhance the prosody rate to permit absolute values of the form " X speech units per time unit" where speech units could be selected by the author to be syllable, mora, phoneme, foot, etc. and time units could be selected by the author to be seconds, ms, minutes, etc. This is a good example of a feature that should be considered if and when an SSML 2.0 is developed.</p>

<h3 id="g89">8.<span>9</span> Background sound</h3>
<p>There are many requests to permit a separate audio track to be established to provide background speech, music, or other audio. This feature is about audio mixing rather speech synthesis, so either it should be handled outside of SSML (via SMIL [<a href="#ref-smil2" shape="rect">SMIL2</a>] or via a future version of VoiceXML) or a more thorough analysis of what audio mixing capabilities are desired should be done as part of a future version of SSML. </p>

<h3 id="g810">8.<span>10</span> Expressive elements</h3>
<p>There are requests for speaking style ("news", "sports", etc.) and emotion portrayal ("angry", "joyful", "sad") that represent high-level requests that result in rather sophisticated speech production changes, and historically there has been insufficient agreement on how these styles would be rendered. However, this is slowly changing -- see, for example<span>, the W3C Emotion Incubator Group</span> [<a href="#ref-emotion" shape="rect">EMOTION</a>]. This category of request most definitely should be considered when developing a future version of SSML.</p>

<h3 id="g811">8.1<span>1</span> Sentence structure</h3>
<p>SSML 1.0 has only two explicit logical structure elements: &lt;paragraph&gt; and &lt;sentence&gt;. In addition, whitespace is used as an implicit word boundary. There have been requests to provide other sub-sentence structure such as phrase markers (and explicit word marking, one of the requirements earlier in this document). The motivations for such features vary slightly but usually center around providing improved prosodic control. This is a good topic to reconsider in a future, possibly completely rewritten, version of SSML. </p>

<h2 id="g9"><a id="ref" name="ref" shape="rect">9. References</a></h2>
<dl>
  <dt>
    <a id="ref-bcp47" name="ref-bcp47" shape="rect">[BCP47]</a>  </dt>
  <dd><a href="http://www.ietf.org/rfc/bcp/bcp47.txt" shape="rect">IETF BCP47</a>, currently represented by <cite><a href="http://www.ietf.org/rfc/rfc4646.txt" shape="rect">Tags for the Identification of Languages</a></cite>,
	 A. Phillips, M. Davis, Editors. IETF, September 2006. This RFC is available 
at <a href="http://www.ietf.org/rfc/rfc4646.txt" shape="rect">http://www.ietf.org/rfc/rfc4646.txt</a>.</dd>
  <dt>
    <a id="ref-emotion" name="ref-emotion" shape="rect">[EMOTION]</a>  </dt>
  <dd>
    <a href="http://www.w3.org/2005/Incubator/emotion/" shape="rect">W3C Emotion Incubator Group</a>, World Wide Web Consortium. The group's website is available 
at <a href="http://www.w3.org/2005/Incubator/emotion/" shape="rect">http://www.w3.org/2005/Incubator/emotion/</a>.</dd>
  <dt>
    <a id="ref-ipahndbk" name="ref-ipahndbk" shape="rect">[IPA]</a>  </dt>
  <dd><cite>
    <a href="http://www2.arts.gla.ac.uk/ipa/handbook.html" shape="rect">Handbook of the 
International Phonetic Association</a>
    </cite>, International Phonetic Association, 
Editors. Cambridge University Press, July 1999. Information on the Handbook is available 
at <a href="http://www2.arts.gla.ac.uk/ipa/handbook.html" shape="rect">http://www2.arts.gla.ac.uk/ipa/handbook.html</a>.</dd>
  <dt>
    <a id="ref-languages" name="ref-languages" shape="rect">[LANGUAGES]</a>  </dt>
  <dd><cite>
    <a href="http://www.krysstal.com/spoken.html" shape="rect">The 30 Most Spoken Languages of the World</a>
    </cite>, KryssTal, 2006.  The website is available  
at <a href="http://www.krysstal.com/spoken.html" shape="rect">http://www.krysstal.com/spoken.html</a>.</dd>
  <dt>
    <a id="ref-PLS" name="ref-pls" shape="rect">[PLS]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/" shape="rect">Pronunciation Lexicon Specification (PLS) Version 1.0</a></cite>,
    Paolo Baggia, Editor. World Wide Web Consortium, 26 October 2006. This version of
      the PLS Working Draft is <a href="http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/" shape="rect">http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/</a> and is a Work in Progress. The latest version is available at <a href="http://www.w3.org/TR/pronunciation-lexicon/" shape="rect">http://www.w3.org/TR/pronunciation-lexicon/</a>.</dd>
  <dt>
    <a id="ref-rfc3066" name="ref-rfc3066" shape="rect">[RFC3066]</a>  </dt>
  <dd><cite>
    <a href="http://www.ietf.org/rfc/rfc3066.txt" shape="rect">Tags for the Identification 
of Languages</a></cite>,
    H. Alvestrand, Editor. IETF, January 2001. This RFC is available 
at <a href="http://www.ietf.org/rfc/rfc3066.txt" shape="rect">http://www.ietf.org/rfc/rfc3066.txt</a>.</dd>
  <dt>
    <a id="ref-rfc3987" name="ref-rfc3987" shape="rect">[RFC3987]</a>  </dt>
  <dd><cite>
    <a href="http://www.ietf.org/rfc/rfc3987.txt" shape="rect">Internationalized Resource Identifiers (IRIs)</a></cite>,
	 M. Duerst and M. Suignard, Editors. IETF, January 2005. This RFC is available 
at <a href="http://www.ietf.org/rfc/rfc3987.txt" shape="rect">http://www.ietf.org/rfc/rfc3987.txt</a>.</dd>
  <dt>
    <a id="ref-SRGS" name="ref-srgs" shape="rect">[SRGS]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/" shape="rect">Speech Recognition 
Grammar Specification Version 1.0</a>
    </cite>, Andrew Hunt and Scott McGlashan, Editors. World Wide
Web Consortium, 16 March 2004. This version of the SRGS 1.0 Recommendation is 
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/" shape="rect">http://www.w3.org/TR/2004/REC-speech-grammar-20040316/</a>.
The latest version is available at <a href="http://www.w3.org/TR/speech-grammar/" shape="rect">http://www.w3.org/TR/speech-grammar/</a>.</dd>
  <dt>
    <a id="ref-sayas" name="ref-sayas" shape="rect">[SAYAS]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/" shape="rect">SSML 1.0 say-as attribute values</a>
    </cite>, Daniel C. Burnett and Paolo Baggia, Editors. World Wide
Web Consortium, 26 May 2005. This version of the Say-as Note is 
<a href="http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/" shape="rect">http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/</a>.
The latest version is available at <a href="http://www.w3.org/TR/ssml-sayas/" shape="rect">http://www.w3.org/TR/ssml-sayas/</a>.</dd>
  <dt>
    <a id="ref-smil2" name="ref-smil2" shape="rect">[SMIL2]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/TR/2005/REC-SMIL2-20051213/" shape="rect">Synchronized Multimedia Integration Language</a>
	</cite>, Dick Bulterman, et al., Editors. World Wide
Web Consortium, 13 December 2005. This version of the SMIL 2 Recommendation is 
<a href="http://www.w3.org/TR/2005/REC-SMIL2-20051213/" shape="rect">http://www.w3.org/TR/2005/REC-SMIL2-20051213/</a>.
The latest version is available at <a href="http://www.w3.org/TR/SMIL2/" shape="rect">http://www.w3.org/TR/SMIL2/</a>.</dd>
  <dt>
    <a id="ref-SSML" name="ref-ssml" shape="rect">[SSML]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/" shape="rect">Speech Synthesis 
Markup Language (SSML) Version 1.0</a>
	</cite>, Daniel C. Burnett, et al., Editors. World Wide
Web Consortium, 7 September 2004. This version of the SSML 1.0 Recommendation is 
<a href="http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/" shape="rect">http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/</a>.
The latest version is available at <a href="http://www.w3.org/TR/speech-synthesis/" shape="rect">http://www.w3.org/TR/speech-synthesis/</a>.</dd>
  <dt>
    <a id="ref-VXML2" name="ref-vxml2" shape="rect">[VXML2]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/" shape="rect">Voice Extensible Markup Language (VoiceXML) Version 2.0</a>
	</cite>, Scott McGlashan, et al., Editors. World Wide
Web Consortium, 16 March 2004. This version of the VoiceXML 2.0 Recommendation is 
<a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/" shape="rect">http://www.w3.org/TR/2004/REC-voicexml20-20040316/</a>.
The latest version is available at <a href="http://www.w3.org/TR/voicexml20/" shape="rect">http://www.w3.org/TR/voicexml20/</a>.</dd>
  <dt>
    <a id="ref-VXML21" name="ref-vxml21" shape="rect">[VXML21]</a>  </dt>
  
  <dd><cite>
    <a href="http://www.w3.org/TR/2007/PR-voicexml21-20070425/" shape="rect">Voice Extensible Markup Language (VoiceXML) 2.1</a></cite>,
	 Matt Oshry, et al., Editors. World Wide Web Consortium, 25 April 2007. This version of the VoiceXML 2.1 Proposed Recommendation ishttp://www.w3.org/TR/2007/PR-voicexml21-20070425/.
The <a href="http://www.w3.org/TR/voicexml21/" shape="rect">latest version</a> is available at http://www.w3.org/TR/voicexml21/.</dd>
  <dt>
    <a id="ref-WS" name="ref-WS" shape="rect">[WS]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/2005/08/SSML/ssml-workshop-agenda.html" shape="rect">Minutes</a></cite>,
	 W3C Workshop on Internationalizing the Speech Synthesis Markup Language, 2-3 November 2005. The agenda and minutes are available at  
<a href="http://www.w3.org/2005/08/SSML/ssml-workshop-agenda.html" shape="rect">http://www.w3.org/2005/08/SSML/ssml-workshop-agenda.html</a>.</dd>
  <dt>
    <a id="ref-WS2" name="ref-WS2" shape="rect">[WS2]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/2006/02/SSML/minutes.html" shape="rect">Minutes</a></cite>,
	 W3C Workshop on Internationalizing the Speech Synthesis Markup Language, 30-31 May 2006. The agenda is available at <a href="http://www.w3.org/2006/02/SSML/agenda.html" shape="rect">http://www.w3.org/2006/02/SSML/agenda.html</a>.
	 The minutes are available at  
<a href="http://www.w3.org/2006/02/SSML/minutes.html" shape="rect">http://www.w3.org/2006/02/SSML/minutes.html</a>.</dd>

  <dt>
    <a id="ref-WS3" name="ref-WS3" shape="rect">[WS3]</a>  </dt>
  <dd><cite>
    <a href="http://www.w3.org/2006/10/SSML/minutes.html" shape="rect">Minutes</a></cite>,
	 W3C Workshop on Internationalizing the Speech Synthesis Markup Language, 13-14 January 2007. The agenda is available at <a href="http://www.w3.org/2006/10/SSML/agenda.html" shape="rect">http://www.w3.org/2006/10/SSML/agenda.html</a>.
	 The minutes are available at  
<a href="http://www.w3.org/2006/02/SSML/minutes.html" shape="rect">http://www.w3.org/2006/10/SSML/minutes.html</a>.</dd>
</dl>
		
<h2 id="g10"><a id="acks" name="acks" shape="rect">10. Acknowledgements</a></h2>
<p>The editors wish to thank the members of the <a href="http://www.w3.org/Voice/" shape="rect">Voice Browser Working Group</a> involved in this activity <i>(listed in family name alphabetical order)</i>:</p>
<dl>
  <dd>芦村和幸 (Kazuyuki Ashimura), W3C</dd>
  <dd>Paolo Baggia, Loquendo</dd>
  <dd>Paul Bagshaw, France Telecom</dd>
  <dd>Jerry Carter, Nuance</dd>
  <dd>馮恬瑩 (Tiffany Fung), Chinese University of Hong Kong</dd>
  <dd>黄力行 (Lixing Huang), Chinese Academy of Sciences</dd>
  <dd>Jim Larson, Intel</dd>
  <dd>楼晓雁 (Lou Xiaoyan), Toshiba</dd>
  <dd>蒙美玲 (Helen Meng), Chinese University of Hong Kong</dd>
  <dd>陶建华 (JianHua Tao), Chinese Academy of Sciences</dd>
  <dd>王霞 (Wang Xia), Nokia</dd>
</dl>

<h2 id="gA"><a id="changes" name="changes" shape="rect">Appendix A. Changes since the previous version</a></h2>
<ul>
  <li>Updated Abstract and Section 1.1 to include the third workshop.</li>
  <li>Added Transliteration to the categorization table in Section 1.3. Described it in new section 8.5. Renumbered prior sections 8.5-8.10 to be 8.6-8.11.</li>
  <li>Noted in Section 8.4 that expected changes may partially address this need.  </li>
</ul>
<!--
<p>
  <a href="http://validator.w3.org/check?uri=referer" shape="rect">
    <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88"/>
  </a>
</p>
<p>
  <a href="http://jigsaw.w3.org/css-validator/">
    <img style="border:0;width:88px;height:31px"
       	 src="http://jigsaw.w3.org/css-validator/images/vcss"
		 alt="Valid CSS!" />
  </a>
</p>-->
</body>
</html>