index.html 90.5 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="EN" lang="EN">
<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
  <title>Voice Extensible Markup Language (VoiceXML) 3.0 Requirements</title>
  <style type="text/css" xml:space="preserve">
.add   { background-color: #FFFF99; }
.remove   { background-color: #FF9999; text-decoration: line-through }
.issues { font-style: italic; font-weight: bold; color: green }

.tocline { list-style: none; }</style>
  <link rel="stylesheet" type="text/css"
  href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" />
</head>

<body>

<div class="head">
<p><a href="http://www.w3.org/"><img alt="W3C"
src="http://www.w3.org/Icons/w3c_home" height="48" width="72" /></a></p>

<h1 class="notoc" id="h1">Voice Extensible Markup Language (VoiceXML) 3.0
Requirements</h1>

<h2 class="notoc" id="date">W3C Working Draft <i>8 August 2008</i></h2>
<dl>
  <dt>This version:</dt>
    <dd><a
      href="http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/">http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/
      </a></dd>
  <dt>Latest version:</dt>
    <dd><a
      href="http://www.w3.org/TR/vxml30reqs/">http://www.w3.org/TR/vxml30reqs/
      </a></dd>
  <dt>Previous version:</dt>
    <dd>This is the first version. </dd>
  <dt>Editors:</dt>
    <dd>Jeff Hoepfinger, SandCherry</dd>
    <dd>Emily Candell, Comverse</dd>
  <dt>Authors:</dt>
    <dd>Jim Barnett, Aspect</dd>
    <dd>Mike Bodell, Microsoft</dd>
    <dd>Dan Burnett, Voxeo</dd>
    <dd>Jerry Carter, Nuance</dd>
    <dd>Scott McGlashan, HP</dd>
    <dd>Ken Rehor, Cisco</dd>
</dl>

<p class="copyright"><a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
© 2008 <a href="http://www.w3.org/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
href="http://www.csail.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> rules apply.</p>
</div>
<hr />

<h2 class="notoc"><a id="abstract" name="abstract">Abstract</a></h2>

<p>The W3C Voice Browser working group aims to develop specifications to
enable access to the Web using spoken interaction. This document is part of a
set of requirement studies for voice browsers, and provides details of the
requirements for marking up spoken dialogs.</p>

<h2><a id="status" name="status">Status of this document</a></h2>

<p><em>This section describes the status of this document at the time of its
publication. Other documents may supersede this document. A list of current
W3C publications and the latest revision of this technical report can be
found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>
at http://www.w3.org/TR/.</em></p>

<p>This is the 8 August 2008 W3C Working Draft of "Voice Extensible Markup
Language (VoiceXML) 3.0 Requirements".</p>

<p>This document describes the requirements for marking up dialogs for spoken
interaction required to fulfill the charter given in <a
href="http://www.w3.org/2006/12/voice-charter.html#scope">the Voice Browser
Working Group Charter</a>, and indicates how the W3C Voice Browser Working
Group has satisfied these requirements via the publication of working drafts
and recommendations. This is a First Public Working Draft. The group does not
expect this document to become a W3C Recommendation.</p>

<p>This document has been produced as part of the <a
href="http://www.w3.org/Voice/Activity.html" shape="rect">W3C Voice Browser
Activity</a>, following the procedures set out for the <a
href="http://www.w3.org/Consortium/Process/" shape="rect">W3C Process</a>.
The authors of this document are members of the <a
href="http://www.w3.org/Voice/" shape="rect">Voice Browser Working Group</a>.
You are encouraged to subscribe to the public discussion list &lt;<a
href="mailto:www-voice@w3.org" shape="rect">www-voice@w3.org</a>&gt; and to
mail us your comments. To subscribe, send an email to &lt;<a
href="mailto:www-voice-request@w3.org"
shape="rect">www-voice-request@w3.org</a>&gt; with the word
<em>subscribe</em> in the subject line (include the word <em>unsubscribe</em>
if you want to unsubscribe). A <a
href="http://lists.w3.org/Archives/Public/www-voice/" shape="rect">public
archive</a> is available online.</p>

<p>This specification is a Working Draft of the Voice Browser working group
for review by W3C members and other interested parties. It is a draft
document and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use W3C Working Drafts as reference material or
to cite them as other than "work in progress".</p>

<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does not expect this document to become a W3C Recommendation. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34665/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>

<p>Publication as a Working Draft does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite this
document as other than work in progress.</p>

<h2><a id="toc" name="toc" shape="rect">Table of Contents</a></h2>
<ul class="toc">
  <li class="tocline">0. <a href="#intro" shape="rect">Introduction</a></li>
  <li class="tocline">1. <a href="#modality-reqs" shape="rect">Modality
    Requirements</a></li>
  <li class="tocline">1.1 <a href="#mod-csmo" shape="rect">Coordinated,
    Simultaneous Multimodal Output</a></li>
  <li class="tocline">1.2 <a href="#mod-usmo" shape="rect">Uncoordinated,
    Simultaneous Multimodal Output</a></li>
  <li class="tocline">2. <a href="#functional-reqs" shape="rect">Functional
    Requirements</a></li>
  <li class="tocline">2.1 <a href="#funct-vcr" shape="rect">VCR
  Controls</a></li>
  <li class="tocline">2.2 <a href="#funct-media" shape="rect">Media
    Control</a></li>
  <li class="tocline">2.3 <a href="#funct-siv" shape="rect">Speaker
    Verification</a></li>
  <li class="tocline">2.4 <a href="#funct-event" shape="rect">External Event
    Handling while a dialog in progress</a></li>
  <li class="tocline">2.5 <a href="#funct-pls" shape="rect">Pronunciation
    Lexicon Specification</a></li>
  <li class="tocline">2.6 <a href="#funct-emma" shape="rect">EMMA</a></li>
  <li class="tocline">2.7 <a href="#funct-upload" shape="rect">Synchronous
    Upload of Recordings</a></li>
  <li class="tocline">2.8 <a href="#funct-speed" shape="rect">Speed
    Control</a></li>
  <li class="tocline">2.9 <a href="#funct-volume" shape="rect">Volume
    Control</a></li>
  <li class="tocline">2.10 <a href="#funct-record" shape="rect">Media
    Recording</a></li>
  <li class="tocline">2.11 <a href="#funct-mediaformat" shape="rect">Media
    Formats</a></li>
  <li class="tocline">2.12 <a href="#funct-datamodel" shape="rect">Data
    Model</a></li>
  <li class="tocline">2.13 <a href="#funct-submitprocessing"
    shape="rect">Submit Processing</a></li>
  <li class="tocline">3. <a href="#format-reqs" shape="rect">Format
    Requirements</a></li>
  <li class="tocline">3.1 <a href="#format-flow" shape="rect">Flow
    Language</a></li>
  <li class="tocline">3.2 <a href="#format-semmod" shape="rect">Semantic
    Model Definition</a></li>
  <li class="tocline">4. <a href="#other-reqs" shape="rect">Other
    Requirements</a></li>
  <li class="tocline">4.1 <a href="#other-vxml" shape="rect">Consistent with
    other Voice Browser Working Group Specs</a></li>
  <li class="tocline">4.2 <a href="#other-other" shape="rect">Consistent with
    other Specs</a></li>
  <li class="tocline">4.3 <a href="#other-simplify" shape="rect">Simplify
    existing VoiceXML Tasks</a></li>
  <li class="tocline">4.4 <a href="#other-maintain" shape="rect">Maintain
    Functionality from Previous VXML Versions</a></li>
  <li class="tocline">4.5 <a href="#other-crs" shape="rect">Address Change
    Requests from Previous VXML Versions</a></li>
  <li class="tocline">5. <a href="#acknowledgments"
    shape="rect">Acknowledgments</a></li>
  <li class="tocline">Appendix A. <a href="#prev-reqs" shape="rect">Previous
    Requirements</a></li>
</ul>

<h2><a id="intro" name="intro">0. Introduction</a></h2>

<p>The main goal of this activity is to establish the current status of the
Voice Browser Working Group Activities relative to the requirements defined
in <a href="http://www.w3.org/TR/1999/WD-voice-dialog-reqs-19991223">Previous
Requirements Document</a> and define additional requirements to drive future
Voice Browser Working Group activities based on Voice Community experience
with existing standards</p>

<p>The process will consist of the following steps:</p>
<ol>
  <li>Identify how the existing requirements have been satisfied by the
    standards defined by the Voice Browser Working Group, other W3C Working
    Groups or other standards bodies. Note that references to VoiceXML 2.0
    imply that VoiceXML 2.1 also satisfies the requirement.</li>
  <li>Identify the requirements that have not yet been satisfied and
    determine if they are still valid requirements</li>
  <li>Identify new requirements based on input from working group members and
    submission to the W3C Voice Browser Public Mailing List &lt;<a
    href="mailto:www-voice@w3.org">www-voice@w3.org</a>&gt; (<a
    href="http://www.w3.org/Archives/Public/www-voice/">archive</a>)</li>
  <li>Prioritize remaining requirements and identify road map by which the
    Voice Browser Working Group plans to address these items</li>
</ol>

<h3><a id="S0_1" name="S0_1"></a>0.1 Scope</h3>

<p>The previous requirements definition activity focused on defining three
types of requirements on the voice markup language: modality, functional, and
format.</p>
<ul>
  <li><b>Modality</b> requirements concern the types of modalities (media in
    combination with an input/output mechanism) supported by the markup
    language for user input and system output. (For the Voice Browser Working
    Group, the modalities supported are speech, video and DTMF. Requirements
    regarding other modalities will be handled by the <a
    href="http://www.w3.org/2002/mmi/">Multimodal Interaction Working
    Group.</a>)</li>
  <li><b>Functional</b> requirements concern the behavior (or operational
    semantics) which results from interpreting a voice markup language.</li>
  <li><b>Format</b> requirements constrain the format (or syntax) of the
    voice markup language itself.</li>
</ul>

<p>The environment and capabilities of the voice browser interpreting the
markup language affects these requirements. There may be differences in the
modality and functional requirements for desktop versus telephony-based
environments (and in the latter case, between fixed, mobile and Internet
telephony environments). The capabilities of the voice browser device also
impacts on requirements. Requirements affected by the environment or
capabilities of the voice browser device will be explicitly marked as
such.</p>

<h3><a id="S0_2" name="S0_2"></a>0.2 Terminology</h3>

<p>Although defining a dialog is highly problematic, some basic definitions
must be provided to establish a common basis of understanding and avoid
confusion. The following terminology is based upon an event-driven model of
dialog interaction.<br />
<br />
</p>

<table summary="first column gives term, second gives description" border="1"
cellpadding="6" width="85%">
  <tbody>
    <tr>
      <th>Voice Markup Language</th>
      <td>a language in which voice dialog behavior is specified. The
        language may include reference to style and scripting elements which
        can also determine dialog behavior.</td>
    </tr>
    <tr>
      <th>Voice Browser</th>
      <td>a software device which interprets a voice markup language and
        generates a dialog with voice output and/or input, and possibly other
        modalities.</td>
    </tr>
    <tr>
      <th>Dialog</th>
      <td>a model of interactive behavior underlying the interpretation of
        the markup language. The model consists of states, variables, events,
        event handlers, inputs and outputs.</td>
    </tr>
    <tr>
      <th>State</th>
      <td>the basic interactional unit defined in the markup language; for
        example, an &lt; input &gt; element in HTML. A state can specify
        variables, event handlers, outputs and inputs. A state may describe
        output content to be presented to the user, input which the user can
        enter, event handlers describing, for example, which variables to
        bind and which state to transition to when an event occur.</td>
    </tr>
    <tr>
      <th>Events</th>
      <td>generated when a state is executed by the voice browser; for
        example, when outputs or inputs in a state are rendered or
        interpreted. Events are typed and may include information; for
        example, an input event generated when an utterance is recognized may
        include the string recognized, an interpretation, confidence score,
        and so on.</td>
    </tr>
    <tr>
      <th>Event Handlers</th>
      <td>are specified in the voice markup language and describe how events
        generated by the voice browser are to be handled. Interpretation of
        events may bind variables, or map the current state into another
        state (possibly itself).</td>
    </tr>
    <tr>
      <th>Output</th>
      <td>content specified in an element of the markup language for
        presentation to the user. The content is rendered by the voice
        browser; for example, audio files or text rendered by a TTS. Output
        can also contain parameters for the output device; for example,
        volume of audio file playback, language for TTS, etc. Events are
        generated when, for example, the audio file has been played.</td>
    </tr>
    <tr>
      <th>Input</th>
      <td>content (and its interpretation) specified in an element of the
        markup language which can be given as input by a user; for example, a
        grammar for DTMF and speech input. Events are generated by the voice
        browser when, for example, the user has spoken an utterance and
        variables may be bound to information contained in the event. Input
        can also specify parameters for the input device; for example,
        timeout parameters, etc.</td>
    </tr>
  </tbody>
</table>

<p>The dialog requirements for the voice markup language are annotated with
the following priorities. If a feature is deferred from the initial
specification to a future release, consideration may be given to leaving open
a path for future incorporation of the feature.<br />
<br />
</p>

<table summary="first column gives priority name, second its description"
border="1" cellpadding="6" width="85%">
  <tbody>
    <tr>
      <th>must have</th>
      <td>The first official specification must define the feature.</td>
    </tr>
    <tr>
      <th>should have</th>
      <td>The first official specification should define the feature if
        feasible but may defer it until a future release.</td>
    </tr>
    <tr>
      <th>nice to have</th>
      <td>The first official specification may define the feature if time
        permits, however, its priority is low.</td>
    </tr>
    <tr>
      <th>future revision</th>
      <td>It is not intended that the first official specification include
        the feature.</td>
    </tr>
  </tbody>
</table>

<h2><a id="modality-reqs" name="modality-reqs">1. Modality
Requirements</a></h2>
<!-- <p><span class="owner">Owner: Scott McGlashan</span><br /> -->
<!-- <span class="note">Note: These requirements will be coordinated with the -->
<!-- Multimodal Interaction Subgroup.</span></p> -->

<h3><a id="mod-csmo" name="mod-csmo">1.1 Coordinated, Simultaneous Multimodal
Output (nice to have)</a></h3>

<p>1.1.1 The markup language specifies that content is to be simultaneously
rendered in multiple modalities (e.g. audio and video) and that output
rendering is coordinated. For example, graphical output on a cellular
telephone display is coordinated with spoken output.</p>

<h3><a id="mod-usmo" name="mod-usmo">1.2 Uncoordinated, Simultaneous
Multimodal Output (nice to have)</a></h3>

<p>1.2.1 The markup language specifies that content is to be simultaneously
rendered in multiple modalities (e.g. audio and video) and that output
rendering is uncoordinated. For example, graphical output on a cellular
telephone display is uncoordinated with spoken output.</p>

<h2><a id="functional-reqs" name="functional-reqs">2. Functional
Requirements</a></h2>

<p>These requirements are intended to ensure that the markup language is
capable of specifying cooperative dialog behavior characteristic of
state-of-the-art spoken dialog systems. In general, the voice browser should
compensate for its own limitations in knowledge and performance compared with
equivalent human agents; for example, compensate for limitations in speech
recognition capability by confirming spoken user input when necessary.</p>

<h3><a id="funct-vcr" name="funct-vcr">2.1 VCR Controls (must have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete.</span></p> -->

<h4><a id="S2_1_1" name="S2_1_1"></a>2.1.1 VoiceXML 3.0 MUST provide a
mechanism giving an application developer a high-level of control of audio
and video playback.</h4>

<h4><a id="S2_1_1_1" name="S2_1_1_1"></a>2.1.1.1 It MUST be possible to
invoke media controls by DTMF or speech input (other input mechanisms may be
supported).</h4>

<h4><a id="S2_1_1_2" name="S2_1_1_2"></a>2.1.1.2 Media controls MUST not
disable normal user input: i.e. input for media control and input for
application input MUST be possible simultaneously.</h4>

<h4><a id="S2_1_1_3" name="S2_1_1_3"></a>2.1.1.3 Input associated with media
controls MUST be treated in the same way as other inputs. Resolution of best
match follows standard VoiceXML 2.0 precedence and scoping rules.</h4>

<h4><a id="S2_1_1_4" name="S2_1_1_4"></a>2.1.1.4 It MUST be possible for user
input to be interpreted as seek controls -- fast forward and rewind -- during
media output playback.</h4>

<h4><a id="S2_1_1_5" name="S2_1_1_5"></a>2.1.1.5 The seek control MUST allow
fast forward and rewind to be specified in time - seconds, milliseconds -
relative to the current playback position.</h4>

<h4><a id="S2_1_1_6" name="S2_1_1_6"></a>2.1.1.6 The seek control MUST allow
fast forward and rewind to be specified relative to &lt;mark&gt; elements in
the output.</h4>

<h4><a id="S2_1_1_7" name="S2_1_1_7"></a>2.1.1.7 The seek control MUST not
affect the selection of alternative content: i.e. the same (alternative)
content MUST be used.</h4>

<h4><a id="S2_1_1_8" name="S2_1_1_8"></a>2.1.1.8 It MUST be possible for user
input to be interpreted as pause/resume during media output playback.</h4>

<h4><a id="S2_1_1_9" name="S2_1_1_9"></a>2.1.1.9 It MUST be possible for the
different inputs to control pause and resume.</h4>

<h3><a id="funct-media" name="funct-media">2.2 Media Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: These requirements were reversed engineered from the -->
<!-- VoiceXML 3.0 spec editor's draft.</span></p> -->

<h4><a id="S2_2_1_" name="S2_2_1_"></a>2.2.1. It MUST be possible to specify
a media clip begin value, specified in time, as an offset from the start of
the media clip to begin playback.</h4>

<h4><a id="S2_2_2_" name="S2_2_2_"></a>2.2.2. It MUST be possible to specify
a media clip end value, specified in time, as an offset from the start of the
media clip to end playback.</h4>

<h4><a id="S2_2_3_" name="S2_2_3_"></a>2.2.3. It MUST be possible to specify
a repeat duration, specified in time, as the amount of time the media file
will repeat playback.</h4>

<h4><a id="S2_2_4_" name="S2_2_4_"></a>2.2.4. It MUST be possible to specify
a repeat count, specified as a non-negative integer, as the number of times
the media file will repeat playback.</h4>

<h4><a id="S2_2_5_" name="S2_2_5_"></a>2.2.5. It MUST be possible to specify
a gain , specified as a percentage, as the percent to adjust the amplitude
playback of the original waveform.</h4>

<h4><a id="S2_2_6_" name="S2_2_6_"></a>2.2.6. It MUST be possible to specify
a speed, specified as a percentage, as the percent to adjust the speed
playback of the original waveform.</h4>

<h3><a id="funct-siv" name="funct-siv">2.3 Speaker Verification (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Ken Rehor</span><br /> -->
<!-- <span class="note">Note: Ken reviewed and thought these were -->
<!-- complete</span></p> -->

<h4><a id="S2_3_1" name="S2_3_1"></a>2.3.1 The markup language MUST provide
the ability to verify a speaker's identity through a dialog containing both
acoustic verification and knowledge verification.</h4>

<p>The acoustic verification may compare speech samples to an existing model
(kept in some, possibly external, repository) of that speaker's voice. A
verification result returns a value indicating whether the acoustic and
knowledge tests were accepted or rejected. Results for verification and
results for recognition may be returned simultaneously.</p>

<h4><a id="S2_3_1_1" name="S2_3_1_1"></a>2.3.1.1 VoiceXML 3.0 MUST support
SIV for end-user dialogs</h4>

<p>Note: The security administrator's interface is out-of-scope for
VoiceXML.</p>

<h4><a id="S2_3_1_2" name="S2_3_1_2"></a>2.3.1.2 SIV features MUST be
integrated with VoiceXML 3.0.</h4>

<p>SIV features such as enrollment and verification are voice dialogs. SIV
must be compatible and complementary with other VoiceXML 3.0 dialog
constructs such as speech recognition.</p>

<h4><a id="S2_3_1_3" name="S2_3_1_3"></a>2.3.1.3 VoiceXML 3.0 MUST be able to
be used without SIV.</h4>

<p>SIV features must be part of VoiceXML 3.0 but may not be needed in all
application scenarios or implementations. Not all voice dialogs need SIV.</p>

<h4><a id="S2_3_1_4" name="S2_3_1_4"></a>2.3.1.4 SIV MUST be able to be used
without other input modalities.</h4>

<p>Some SIV processing techniques operate without using any ASR.</p>

<h4><a id="S2_3_1_5" name="S2_3_1_5"></a>2.3.1.5 SIV features MUST be able to
operate in multi-factor environments.</h4>

<p>Some applications require the use of SIV along with other means of
authentication: biometric (e.g. fingerprint, hand, retina, DNA) or
non-biometric (e.g. caller ID, geolocation, personal knowledge, etc.).</p>

<h4><a id="S2_3_1_6" name="S2_3_1_6"></a>2.3.1.6 SIV-specific events MUST be
defined.</h4>

<p>SIV processing engines and network protocols (e.g. MRCP) generate events
related to their operation and use. These events must be made available in a
manner consistent with other VoiceXML events. Event naming structure must
allow for vendor-specific and application-specific events.</p>

<h4><a id="S2_3_1_7" name="S2_3_1_7"></a>2.3.1.7 SIV-specific properties MUST
be defined.</h4>

<p>These properties are provided to configure the operation of the SIV
processing engines (analogous to "Generic Speech Recognition Properties"
defined in <a href="http://www.w3.org/TR/voicexml20/#dml6.3.2">VoiceXML 2.0
Section 6.3.2</a>).</p>

<h4><a id="S2_3_1_8" name="S2_3_1_8"></a>2.3.1.8 The SIV result MUST be
available in the result structure used by the host environment (e.g. VoiceXML
3.0, MMI).</h4>

<p>Note that this does not require EMMA in all cases, such as non-VoiceXML
3.0 environments. This also does not specify the version of EMMA.</p>

<h4><a id="S2_3_1_8_1" name="S2_3_1_8_1"></a>2.3.1.8.1 VoiceXML 3.0 SIV
result MUST be representable in EMMA.</h4>

<p>VoiceXML 3.0 must specify the format of the result structure and version
of EMMA.</p>

<h4><a id="S2_3_1_9" name="S2_3_1_9"></a>2.3.1.9 SIV syntax SHOULD adhere to
the W3C guidelines for security handling.</h4>

<p>This includes:</p>
<ul>
  <li>XML encryption</li>
  <li>XML signature processing,</li>
  <li>possibly TLS or non-XML security, such as the NIST SP 800-63 guideline
    for remote authentication.</li>
</ul>

<p>The following security aspects are out-of-charter for VoiceXML:<br />
</p>
<ul>
  <li>The security administrator's interface</li>
  <li>Whether security aspects may be modified by the security
  administrators</li>
  <li>Requirements for securing the SIV data</li>
</ul>

<h4><a id="S2_3_1_11" name="S2_3_1_11"></a>2.3.1.11 SIV features MUST support
enrollment.</h4>

<p>Enrollment is the process of collecting voice samples from a person and
the subsequent generation and storage of voice reference models associated
with that person.</p>

<h4><a id="S2_3_1_12" name="S2_3_1_12"></a>2.3.1.12 SIV features MUST support
verification.</h4>

<p>Verification is the process of comparing an utterance against a single
reference model based on a single claimed identity (e.g., user ID, account
number). A verification result includes both a score and a decision.</p>

<h4><a id="S2_3_1_13" name="S2_3_1_13"></a>2.3.1.13 SIV features MUST support
identification.</h4>

<p>Identification is verification with multiple identity claims. An
identification result includes both the verification results for all of the
individual identity claims, and the identifier of a single reference model
that matches the input utterance best.</p>

<h4><a id="S2_3_1_14" name="S2_3_1_14"></a>2.3.1.14 SIV features SHOULD
support supervised adaptation.</h4>

<p>The application should have control over whether a voice model is updated
or modified based on the results of a verification.<br />
</p>

<h4><a id="S2_3_1_15" name="S2_3_1_15"></a>2.3.1.15 SIV features MUST support
concurrent SIV processing.</h4>

<p>An application developer must be able to specify at the individual turn
level that one or more of the following types of processing need to be
performed concurrently:</p>
<ul>
  <li>ASR</li>
  <li>Audio recording</li>
  <li>Buffering (SIV)</li>
  <li>Authentication (SIV)</li>
  <li>Enrollment (SIV)</li>
  <li>Adaptation (SIV)</li>
</ul>
Note: "Concurrent" means at the dialog specification level. A platform may
choose to implement these functions sequentially. 

<h4><a id="S2_3_1_15_1" name="S2_3_1_15_1"></a>2.3.1.15.1 SIV features SHOULD
support other concurrent audio processing.</h4>

<p>Concurrent processing of other forms of audio processing (e.g., channel
detection, gender detection) should also be permitted but remain optional.</p>

<h4><a id="S2_3_1_16" name="S2_3_1_16"></a>2.3.1.16 SIV features MUST be able
to accept text from the application for presentation to the user.</h4>

<p>Text-prompted SIV applications require prompts to match the expected
response. The application is responsible for the content of the dialog but
VoiceXML is responsible for the presentation.</p>

<h4><a id="S2_3_1_16_1" name="S2_3_1_16_1"></a>2.3.1.16.1 SIV SHOULD be
architecturally agnostic</h4>

<p>Many different SIV processing technologies exist. The VoiceXML 3.0 SIV
architecture should avoid dependencies upon specific engine technologies.</p>

<h3><a id="funct-event" name="funct-event">2.4 External Event handling while
a dialog is in progress (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->

<h4><a id="S2_4_1" name="S2_4_1"></a>2.4.1 It MUST be possible for external
entities to inject events into running dialogs. The dialog author MUST be
able to control when such events are processed and what actions are taken
when they are processed.</h4>

<h4><a id="S2_4_2" name="S2_4_2"></a>2.4.2 Among the possible results of
processing such events MUST be pausing, resuming, and terminating the dialog.
The VoiceXML 3.0 specification MAY define default handlers for certain such
external events.</h4>

<h4><a id="S2_4_3" name="S2_4_3"></a>2.4.3 It MUST be possible for running
dialogs to send events into the <a
href="http://www.w3.org/TR/mmi-arch/">Multimodal Interaction
Framework.</a></h4>

<h3><a id="funct-pls" name="funct-pls"></a>2.5 <a
href="http://www.w3.org/TR/pronunciation-lexicon/">Pronunciation Lexicon
Specification (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: There was some discussion in Orlando F2F on being -->
<!-- able to define lexicons using normal scoping rules, but there was no -->
<!-- agreement reached</span> </p> -->

<h4><a id="S2_5_1" name="S2_5_1"></a>2.5.1 The author MUST be able to define
lexicons that span an entire VoiceXML application.</h4>

<h3><a id="funct-emma" name="funct-emma"></a>2.6 <a
href="http://www.w3.org/TR/emma/">EMMA Specification (must have)</a></h3>

<h4><a id="S2_6_1_" name="S2_6_1_"></a>2.6.1. The application author MUST be
able to specify the preferred format of the input result within VoiceXML. If
not specified, the default format is EMMA.</h4>

<h4><a id="S2_6_2" name="S2_6_2"></a>2.6.2 All available semantic information
(ie. content that could have meaning) from the input MUST be accessible to
the application author. This result MUST be navigable by the application
author.</h4>

<p>The exact form of navigation will depend on the format and decisions
around the preferred data model made by the working group. If the result is a
string, string processing functions are expected to be available. If the
result is an XML document, DOM or E4X-like functions are expected to be
supported.</p>

<h4><a id="S2_6_3_" name="S2_6_3_"></a>2.6.3. VoiceXML 3 (or profiles) MUST
describe how the default result format is mapped into the application's data
model.</h4>

<p>VoiceXML 3 will declare one or more mandatory result formats.</p>

<h4><a id="S2_6_4" name="S2_6_4"></a>2.6.4 The application author SHOULD be
able to specify specific result content not to be logged.</h4>

<p>This will allow the author to prevent logging of confidential or sensitive
information.</p>

<h3><a id="funct-upload" name="funct-upload">2.7 Synchronous Upload of
Recordings (must have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->

<h4><a id="S2_7_1" name="S2_7_1"></a>2.7.1 VoiceXML 3.0 MUST enable
synchronous uploads of recordings while the recording is in progress</h4>

<h4><a id="S2_7_1_1" name="S2_7_1_1"></a>2.7.1.1 It MUST be possible to
specify the upload destination of the recording in the &lt;record&gt;
element</h4>

<h4><a id="S2_7_1_2" name="S2_7_1_2"></a>2.7.1.2 The upload destination MUST
be an HTTP URI</h4>

<h4><a id="S2_7_1_3" name="S2_7_1_3"></a>2.7.1.3 The application developer
MAY specify HTTP PUT or HTTP POST as the recording upload method</h4>

<h4><a id="S2_7_1_4" name="S2_7_1_4"></a>2.7.1.4 This feature MUST be
backward compatible with VoiceXML 2.0/2.1 record functionality</h4>

<h3><a id="funct-speed" name="funct-speed">2.8 Speed Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->

<h4><a id="S2_8_1" name="S2_8_1"></a>2.8.1 It MUST be possible for user input
to change the speed of media output playback.</h4>

<h4><a id="S2_8_2" name="S2_8_2"></a>2.8.2 It MUST be possible to map the
values for speed control to the rate attribute of prosody</h4>

<h4><a id="S2_8_3" name="S2_8_3"></a>2.8.3 Values for speed controls MAY be
specified as properties which follow the standard VoiceXML scoping model.
Default values are specified at session scope. Values specified on the
control element take priority over inherited properties.</h4>

<h3><a id="funct-volume" name="funct-volume">2.9 Volume Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->

<h4><a id="S2_9_1" name="S2_9_1"></a>2.9.1 It MUST be possible for user input
to change the volume of media output playback.</h4>

<h4><a id="S2_9_1_1" name="S2_9_1_1"></a>2.9.1.1 Values for volume controls
MAY be specified as properties which follow the standard VoiceXML scoping
model. Default values are specified at session scope. Values specified on the
control element take priority over inherited properties.</h4>

<h4><a id="S2_9_1_2" name="S2_9_1_2"></a>2.9.1.2 It MUST be possible to map
the values for volume control to the volume attribute of prosody in SSML.</h4>

<h3><a id="funct-record" name="funct-record">2.10 Media Recording (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Ken Rehor</span></p> -->

<h4><a id="S2_10_1" name="S2_10_1"></a>2.10.1 Recording Modes</h4>

<p>Form item recording mode (Requirements section 2.10.1.1 and 2.10.1.2)
captures media from the caller (only) during the collect phase of a dialog.
Partial- and Whole-Session recording captures media from the caller, system,
and/or called party (in the cases of a transferred endpoint) in a
multichannel or single (mixed) channel recording. Duration of these
recordings depends on the type.</p>

<h4><a id="S2_10_1_1" name="S2_10_1_1"></a>2.10.1.1 Form Item equivalent
(e.g. VoiceXML 2.0 &lt;record&gt;)</h4>
<!-- <span class="note">Note: Audio endpointing controls are defined in Section
2.10.3.</span> -->

<h4><a id="S2_10_1_1_1" name="S2_10_1_1_1"></a>2.10.1.1.1 VoiceXML 3.0 MUST
be able to record input from a user.</h4>

<h4><a id="S2_10_1_2" name="S2_10_1_2"></a>2.10.1.2 Utterance Recording</h4>
<!-- <span class="note">Note: Should this be generalized to handle other media -->
<!-- like video?<br /> -->
<!-- Note: Should this be supported in the case of DTMF-only?</span> -->

<p>Utterance recording mode is recording that occurs during an ASR or SIV
form item. The audio may be endpointed, usually by the speech engine.</p>

<h4><a id="S2_10_1_2_1" name="S2_10_1_2_1"></a>2.10.1.2.1 VoiceXML 3.0 MUST
support recording of a user's utterance during an form item
[recordutterance]</h4>

<h4><a id="S2_10_1_2_2" name="S2_10_1_2_2"></a>2.10.1.2.2 VoiceXML 3.0 MUST
support the control of utterance recording via a &lt;property&gt;.</h4>

<h4><a id="S2_10_1_2_3" name="S2_10_1_2_3"></a>2.10.1.2.3 VoiceXML 3.0 MUST
support the control of utterance recording via an attribute on input
items.</h4>

<h4><a id="S2_10_1_3" name="S2_10_1_3"></a>2.10.1.3 Session Recording</h4>

<p>Session recording begins with a start command. It continues until:</p>
<ul>
  <li>a pause command;  a resume command continues recording;</li>
  <li>a stop command;</li>
  <li>the end of the VoiceXML session;</li>
  <li>an error occurs.</li>
</ul>

<p>Recording configuration and parameter requirements are defined in Section
2.10.2.</p>

<h4><a id="S2_10_1_3_1" name="S2_10_1_3_1"></a>2.10.1.3.1 VoiceXML 3.0 MUST
be able to record part of a VoiceXML session.</h4>

<h4><a id="S2_10_1_3_2" name="S2_10_1_3_2"></a>2.10.1.3.2 VoiceXML 3.0 MUST
be able to record an entire dialog.</h4>

<h4><a id="S2_10_1_4" name="S2_10_1_4"></a>2.10.1.4 Restricted Session
Recording</h4>

<p>Restricted session recording begins with a start command and continues
until:</p>
<ul>
  <li>the end of the session;</li>
  <li>an error occurs.</li>
</ul>

<p>See Table 1 for applicable controls.</p>

<h4><a id="S2_10_1_5" name="S2_10_1_5"></a>2.10.1.5 Multiple instances</h4>

<h4><a id="S2_10_1_5_1" name="S2_10_1_5_1"></a>2.10.1.5.1 VoiceXML 3.0 MUST
be able to support multiple simultaneous recordings of different types during
a call.</h4>

<h4><a id="S2_10_2_" name="S2_10_2_"></a>2.10.2. Recording Configuration and
Parameters</h4>

<p>This matrix specifies which features apply to which recording types.</p>

<table style="text-align: left; width: 722px; height: 166px;" border="1"
cellpadding="1" cellspacing="0">
  <tbody>
    <tr>
      <td>Feature Requirement /<br />
        Recording type</td>
      <td>Dialog</td>
      <td>Utterance</td>
      <td>Session</td>
      <td>Restricted<br />
        Session</td>
    </tr>
    <tr>
      <td>2.10.2.1 Recording starts when caller begins speaking</td>
      <td>Y</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.2 Initial silence interval cancels recording</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.3 Final silence ends recording</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.4 Maximum recording time</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.5 Terminate recording with DTMF input</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.6 Grammar control: modal operation</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.7 Media format</td>
      <td>Y</td>
      <td>Y</td>
      <td>Y</td>
      <td>Y</td>
    </tr>
    <tr>
      <td>2.10.2.8 Recording indicator</td>
      <td>N</td>
      <td>N</td>
      <td>Y</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.9 Channel assignment</td>
      <td>N</td>
      <td>N</td>
      <td>Y</td>
      <td>Y</td>
    </tr>
    <tr>
      <td>2.10.2.10 Channel groups</td>
      <td>N</td>
      <td>N</td>
      <td>Y</td>
      <td>Y</td>
    </tr>
    <tr>
      <td>2.10.2.11 Buffer control</td>
      <td>Y</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
    </tr>
  </tbody>
</table>

<p>Table 1: Recording Configuration and Parameter Application</p>

<p>(Attributes from VoiceXML 2.0 are indicated in brackets [].)</p>

<h4><a id="S2_10_2_1" name="S2_10_2_1"></a>2.10.2.1 Recording starts when
caller begins speaking</h4>

<p>VoiceXML 3.0 must support dynamic start-of-recording based on when a
caller starts to speak</p>

<p>Voice Activity Detection used to determine when to initiate
recording. This feature can be disabled.</p>

<h4><a id="S2_10_2_2" name="S2_10_2_2"></a>2.10.2.2 Initial silence interval
cancels recording</h4>

<p>VoiceXML 3.0 must support specification of an interval of silence at the
beginning of the recording cycle to terminate recording [timeout].</p>

<p>A noinput event will be thrown if no audio is collected.</p>

<h4><a id="S2_10_2_3" name="S2_10_2_3"></a>2.10.2.3 Final silence ends
recording</h4>

<p>VoiceXML 3.0 must support specification of an interval of silence that
indicates end of speech to terminate recording [finalsilence]</p>

<p>Voice Activity Detection used to determine when to stop recording. This
feature can be disabled.</p>

<p>Finalsilence interval may be used to specify the amount of silent audio to
be removed from the recording.</p>

<h4><a id="S2_10_2_4" name="S2_10_2_4"></a>2.10.2.4 Maximum recording
time</h4>

<p>VoiceXML 3.0 must support specification of the maximum allowable recording
time [maxtime].</p>

<h4><a id="S2_10_2_5" name="S2_10_2_5"></a>2.10.2.5 Terminate recording via
DTMF input</h4>

<p>VoiceXML 3.0 must provide a mechanism to control DTMF termination of an
active record [dtmfterm]</p>

<h4><a id="S2_10_2_6" name="S2_10_2_6"></a>2.10.2.6 Grammar control: Modal
operation</h4>

<h4><a id="S2_10_2_6_1" name="S2_10_2_6_1"></a>2.10.2.6.1 VoiceXML 3.0 MUST
provide a mechanism to control whether non-local DTMF grammars are active
during recording [modal]</h4>

<h4><a id="S2_10_2_6_2" name="S2_10_2_6_2"></a>2.10.2.6.2 VoiceXML 3.0 MUST
provide a mechanism to control whether non-local speech recognition grammars
are active during recording [modal]</h4>

<h4><a id="S2_10_2_7" name="S2_10_2_7"></a>2.10.2.7 Media format</h4>

<p>VoiceXML 3.0 must enable specification of the media type of the recording
[type]</p>

<h4><a id="S2_10_2_8" name="S2_10_2_8"></a>2.10.2.8 Recording Indicator</h4>

<h4><a id="S2_10_2_8_1" name="S2_10_2_8_1"></a>2.10.2.8.1 VoiceXML 3.0 MUST
optionally support playing a beep tone to the user before recording begins.
[beep]</h4>

<h4><a id="S2_10_2_8_2" name="S2_10_2_8_2"></a>2.10.2.8.2 VoiceXML 3.0 MUST
optionally support displaying a visual indication to the user before
recording begins.</h4>

<h4><a id="S2_10_2_8_3" name="S2_10_2_8_3"></a>2.10.2.8.3 VoiceXML 3.0 MUST
optionally support displaying a visual indication to the user during
recording.</h4>

<p>Use cases:</p>
<ol>
  <li>Display a countdown timer to indicate when recording will begin (could
    be accomplished by playing a file immediately before the record
  function)</li>
  <li>Display an indicator while recording is active (e.g. full screen,
    partial screen, icon, etc.)</li>
</ol>

<h4><a id="S2_10_2_9" name="S2_10_2_9"></a>2.10.2.9 Channel Assignment</h4>

<h4><a id="S2_10_2_9_1" name="S2_10_2_9_1"></a>2.10.2.9.1 VoiceXML 3.0 MUST
be able to record and store each media path independently.</h4>

<h4><a id="S2_10_2_9_2" name="S2_10_2_9_2"></a>2.10.2.9.2 VoiceXML 3.0 MUST
enable each media path to be recorded in the same multi-channel file.</h4>

<h4><a id="S2_10_2_9_3" name="S2_10_2_9_3"></a>2.10.2.9.3 VoiceXML 3.0 MUST
enable each media path to be recorded into separate files.</h4>

<h4><a id="S2_10_2_9_4" name="S2_10_2_9_4"></a>2.10.2.9.4 VoiceXML 3.0 MAY be
able to mix all voice paths into a single recording channel.</h4>

<h4><a id="S2_10_2_10" name="S2_10_2_10"></a>2.10.2.10 Channel Groups</h4>

<h4><a id="S2_10_2_10_1" name="S2_10_2_10_1"></a>2.10.2.10.1 One or more
channels within the same session MUST be controllable as a group.</h4>

<p>These groups can be used to simultaneously apply other recording controls
to more than one media channel (e.g. mute two channels simultaneously).
Applies whether channels are in same file or in separate files (implies
concept of group of channels *not* part of the same file).</p>

<p>A command to "start recording" must specify the details for that recording
session:</p>
<ul>
  <li>media type</li>
  <li>number of channels and channel assignment (e.g. channel x, group y
     represented as a variable of the format x.y)</li>
  <li>channel assignment</li>
  <li>(specific parameters to be determined)</li>
</ul>

<h4><a id="S2_10_2_11" name="S2_10_2_11"></a>2.10.2.11 Buffer Controls</h4>

<h4><a id="S2_10_2_11_1" name="S2_10_2_11_1"></a>2.10.2.11.1 VoiceXML 3.0
MUST provide a mechanism to enable additional recording time before the start
of speaking ("pre" buffer)</h4>

<h4><a id="S2_10_2_11_2" name="S2_10_2_11_2"></a>2.10.2.11.2 VoiceXML 3.0
MUST provide a mechanism to enable specification of additional recording time
after the end of speaking ("post" buffer).</h4>

<h4><a id="S2_10_2_11_3" name="S2_10_2_11_3"></a>2.10.2.11.3 VoiceXML 3.0 MAY
provide a mechanism to enable specification of the pre and post recording
duration.</h4>

<p>The duration provided by the platform is up to the amount of audio the
application requested. If that amount of audio is not available, the platform
is required to provide the amount of audio that is available.</p>
<!-- <span class="note">Note: Should this feature be under developer or platform -->
<!-- control?</span> -->

<h4><a id="S2_10_3_1" name="S2_10_3_1"></a>2.10.3.1 Audio Muting</h4>

<h4><a id="S2_10_3_1_1" name="S2_10_3_1_1"></a>2.10.3.1.1 VoiceXML 3.0 MUST
enable muting of an audio recording at any time for a specified length of
time or until otherwise indicated to un-mute.</h4>

<h4><a id="S2_10_3_1_2" name="S2_10_3_1_2"></a>2.10.3.1.2 Audio to insert
while muting can optionally be specified via a URI.</h4>
<!-- <span class="note">Note: Issues arise if inserted audio is shorter than mute -->
<!-- duration.</span>  -->

<h4><a id="S2_10_3_1_3" name="S2_10_3_1_3"></a>2.10.3.1.3 Optionally record
the mute duration either in the recorded data or in associated meta data
(e.g. a mark (out of band) or via a log channel or some other method)</h4>
<!-- <span class="note">Note: Is it a breach of security to keep track of the -->
<!-- mute/blank/pause duration?</span>  -->

<h4><a id="S2_10_3_1_5" name="S2_10_3_1_5"></a>2.10.3.1.5 Mute MUST be
controllable for each channel independently.</h4>

<h4><a id="S2_10_3_1_6" name="S2_10_3_1_6"></a>2.10.3.1.6 Mute MUST be
controllable for all channels in a group.</h4>

<h4><a id="S2_10_3_2" name="S2_10_3_2"></a>2.10.3.2 Blanking</h4>

<h4><a id="S2_10_3_2_1" name="S2_10_3_2_1"></a>2.10.3.2.1 VoiceXML 3.0 MUST
enable blanking of a video recording at any time for a specified length of
time or until otherwise indicated to un-blank.</h4>

<h4><a id="S2_10_3_2_2" name="S2_10_3_2_2"></a>2.10.3.2.2 A video or still
image to replace video stream while blanking can be optionally specified via
a URI.</h4>

<h4><a id="S2_10_3_2_2_1" name="S2_10_3_2_2_1"></a>2.10.3.2.2.1 An error will
be thrown in the case of platforms that cannot handle the media type referred
to by the URI.</h4>

<h4><a id="S2_10_3_2_3" name="S2_10_3_2_3"></a>2.10.3.2.3 The media inserted
by default MUST be the same length as the blank duration.</h4>

<p>If video, repeat until un-blank.</p>

<h4><a id="S2_10_3_2_4" name="S2_10_3_2_4"></a>2.10.3.2.4 The video being
inserted MUST optionally be specified to span a length less than the actual
mute/un-mute duration.</h4>

<h4><a id="S2_10_3_2_5" name="S2_10_3_2_5"></a>2.10.3.2.5 Blanking MUST be
controllable separately from other media channels.</h4>

<h4><a id="S2_10_3_3" name="S2_10_3_3"></a>2.10.3.3 Grouped Blanking and
Muting</h4>

<h4><a id="S2_10_3_3_1" name="S2_10_3_3_1"></a>2.10.3.3.1 It MUST be possible
to simultaneously blank video and mute audio that are in the same media
group.</h4>

<h4><a id="S2_10_3_4" name="S2_10_3_4"></a>2.10.3.4 Pause and Resume</h4>

<h4><a id="S2_10_3_4_1" name="S2_10_3_4_1"></a>2.10.3.4.1 VoiceXML 3.0 MUST
enable a recording to be paused until explicitly restarted.</h4>

<h4><a id="S2_10_3_4_2" name="S2_10_3_4_2"></a>2.10.3.4.2 VoiceXML 3.0 MUST
enable an indicator to be optionally specified in the file to denote that
recording was paused, then resumed.</h4>

<h4><a id="S2_10_3_4_3" name="S2_10_3_4_3"></a>2.10.3.4.3 VoiceXML 3.0 MAY
optionally enable the notation of the pause duration either in the recorded
data or in associated meta data (e.g. a mark (out of band) or via a log
channel or some other method)</h4>

<p>The mechanism is platform-specific.</p>

<h4><a id="S2_10_3_5" name="S2_10_3_5"></a>2.10.3.5 Arbitrary Start, Stop,
Restart/append</h4>

<h4><a id="S2_10_3_5_1" name="S2_10_3_5_1"></a>2.10.3.5.1 VoiceXML 3.0 MUST
be able to start a recording at any time.</h4>

<h4><a id="S2_10_3_5_2" name="S2_10_3_5_2"></a>2.10.3.5.2 VoiceXML 3.0 MUST
be able to stop an active recording at any time.</h4>

<h4><a id="S2_10_3_5_3" name="S2_10_3_5_3"></a>2.10.3.5.3 VoiceXML 3.0 MUST
be able to restart / append to a previously active recording at any time.
(during the session via reference to the recording)</h4>

<h4><a id="S2_10_3_5_4" name="S2_10_3_5_4"></a>2.10.3.5.4 optionally record
the pause duration either in the recorded data or in associated meta data
(e.g. a mark (out of band) or via a log channel or some other method)</h4>

<p>Recording is available for playback or upload once a recording is
'stopped'.</p>

<p>If a recording was stopped and uploaded, then later appended, the
application will need to keep track of when to upload the new version.</p>

<h4><a id="S2_10_4_" name="S2_10_4_"></a>2.10.4. Media types</h4>

<h4><a id="S2_10_4_1" name="S2_10_4_1"></a>2.10.4.1 Audio recording</h4>

<h4><a id="S2_10_4_1_1" name="S2_10_4_1_1"></a>2.10.4.1.1 VoiceXML 3.0 MUST
be able to record an incoming audio stream.</h4>

<h4><a id="S2_10_4_2" name="S2_10_4_2"></a>2.10.4.2 Video recording</h4>

<h4><a id="S2_10_4_2_1" name="S2_10_4_2_1"></a>2.10.4.2.1 VoiceXML 3.0 MUST
support recording of an incoming video stream.</h4>

<h4><a id="S2_10_4_2_2" name="S2_10_4_2_2"></a>2.10.4.2.2 VoiceXML 3.0 MUST
support recording of an incoming video stream with synchronized audio.</h4>

<h4><a id="S2_10_4_3" name="S2_10_4_3"></a>2.10.4.3 Media Type
specification</h4>

<h4><a id="S2_10_4_3_1" name="S2_10_4_3_1"></a>2.10.4.3.1 VoiceXML 3.0 MUST
be able to set the format of the media type of the recording according to
IETF RFC 4288 [RFC4288].</h4>

<h4><a id="S2_10_4_4" name="S2_10_4_4"></a>2.10.4.4 Media formats and
codecs</h4>

<h4><a id="S2_10_4_4_1" name="S2_10_4_4_1"></a>2.10.4.4.1 VoiceXML 3.0 MUST
support specification of the media format and corresponding codec.</h4>

<h4><a id="S2_10_4_5" name="S2_10_4_5"></a>2.10.4.5 Platform support of media
types</h4>

<h4><a id="S2_10_4_5_1" name="S2_10_4_5_1"></a>2.10.4.5.1 VoiceXML 3.0
platforms MUST support all media types that are indicated as required by the
VoiceXML 3.0 Recommendation (types to be determined).</h4>

<p>Note: This does not mean all possible media types are supported on all
platforms.</p>

<h4><a id="S2_10_5_" name="S2_10_5_"></a>2.10.5. Media Processing</h4>

<h4><a id="S2_10_5_1" name="S2_10_5_1"></a>2.10.5.1 Media processing MAY
occur either in real-time or as a post-processing function.</h4>

<p>DEFAULT: specific to each processing type</p>

<h4><a id="S2_10_5_2" name="S2_10_5_2"></a>2.10.5.2 Tone Clamping</h4>

<p>Use cases:</p>
<ol>
  <li>Voicemail terminated with DTMF.</li>
  <li>Whole-session recording where DTMF input must be removed for privacy or
    other reasons.</li>
</ol>

<h4><a id="S2_10_5_2_1" name="S2_10_5_2_1"></a>2.10.5.2.1 VoiceXML 3.0 MAY
optionally provide a means to specify if DTMF tones are to be removed from
the recording.</h4>

<p>DEFAULT: Tones are not removed from the recording</p>

<p>DEFAULT: If tone clamping is enabled, it is performed after recording has
completed (not in real-time).</p>

<h4><a id="S2_10_5_3" name="S2_10_5_3"></a>2.10.5.3 Audio Processing Mode</h4>

<h4><a id="S2_10_5_3_1" name="S2_10_5_3_1"></a>2.10.5.3.1 VoiceXML 3.0 MUST
optionally provide a means to specify if automatic audio level controls (e.g.
Dynamic Range Compression, Limiting, Automatic Gain Control (AGC), etc.) are
to be applied to the recording or if  the recording is to be raw.</h4>

<p>DEFAULT: raw</p>
Editor's note: how to specify: 
<ul>
  <li>raw or processed</li>
  <li>type of processing</li>
  <li>parameters specific to each processor or implementation</li>
  <li>multiple processing operations (?)</li>
  <li>real-time or post-processing</li>
</ul>

<h4><a id="S2_10_6_" name="S2_10_6_"></a>2.10.6. Recording data</h4>

<h4><a id="S2_10_6_1" name="S2_10_6_1"></a>2.10.6.1 The following information
MUST be reported after recording has completed.</h4>
<ul>
  <li>Recording duration in milliseconds</li>
  <li>Recording size in bytes</li>
  <li>DTMF terminating string if recording was terminated via DTMFTERM, or
    DTMF input available in application.lastresult</li>
  <li>Indication if recording was terminated due to reaching maxtime</li>
  <li>Format of the recording, as specified by RFC 4288</li>
</ul>

<h4><a id="S2_10_7" name="S2_10_7"></a>2.10.7 Upload, Storage, Caching</h4>

<h4><a id="S2_10_7_1" name="S2_10_7_1"></a>2.10.7.1 Destination</h4>

<h4><a id="S2_10_7_1_1" name="S2_10_7_1_1"></a>2.10.7.1.1 VoiceXML 3.0 MUST
support specification of the destination of the recording buffer [dest].</h4>

<h4><a id="S2_10_7_3" name="S2_10_7_3"></a>2.10.7.3 A local cache of the
recording MUST be optionally available to the application (e.g. V2 semantics
of form item)</h4>

<h4><a id="S2_10_7_4" name="S2_10_7_4"></a>2.10.7.4 It MUST be possible to
specify the upload to be either a synchronous or asynchronous operation.</h4>

<h4><a id="S2_10_7_5" name="S2_10_7_5"></a>2.10.7.5 It MUST be possible to
select the upload to be available realtime, at the end of the call, or
indefinitely after the end of the call.</h4>

<h4><a id="S2_10_7_6" name="S2_10_7_6"></a>2.10.7.6 All modes other than
indefinite upload shall expose any errors in recording or upload to the
application.</h4>

<h4><a id="S2_10_8_" name="S2_10_8_"></a>2.10.8. Errors and Events</h4>

<p>Errors and events as a result of media recording must be presented to the
application</p>

<p>Examples of types of errors possibly reported:</p>
<ul>
  <li>error.unsupported.format (the requested media type is not
  supported)</li>
  <li>error.unavailable.format (the requested media type is currently not
    available)</li>
  <li>error during upload</li>
  <li>disk full, other disk errors</li>
  <li>permissions:</li>
  <li>error.noauthorization (or error.noresource if want it hidden from
    potential attacker?<br />
  </li>
</ul>

<h3><a id="funct-mediaformat" name="funct-mediaformat">2.11 Media
Formats</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: These were recently added on the 6/24/2008 -->
<!-- call.</span></p> -->

<h4><a id="CS2_10_8_" name="CS2_10_8_"></a>VoiceXML 3 MUST support these
categories of media capabilities:</h4>
<ul>
  <li>Audio Basic: audio only, with header or not (e.g. RIFF or AU
  header)</li>
  <li>Audio Rich: audio (one or more channels), plus meta data (e.g. header,
    marks, transcription, etc.)</li>
  <li>Multi-media: one or more media channels (e.g. audio, video,images,
    etc.) plus meta data (e.g. header, marks, transcription, etc.)</li>
</ul>

<p>This does not imply platform support requirements. For example, a
particular platform may support Audio Basic but not Audio Rich. Another might
support Audio Rich but not all meta data elements.</p>

<h3><a id="funct-datamodel" name="funct-datamodel">2.12 Data Model (must
have)</a></h3>

<p>TBD.</p>

<h3><a id="funct-submitprocessing" name="funct-submitprocessing">2.11 Submit
Processing (must have)</a></h3>

<p>TBD.</p>

<h2><a id="format-reqs" name="format-reqs">3. Format Requirements</a></h2>

<h3><a id="format-flow" name="format-flow">3.1 Flow Language (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->

<p>A flow control language will be developed in conjunction with VoiceXML 3.0
(i.e. <a href="http://www.w3.org/TR/scxml/">SCXML</a>)</p>

<h4><a id="S3_1_1" name="S3_1_1"></a>3.1.1 The flow control language will
allow the separation of business logic from media control and user
interaction.</h4>

<h4><a id="S3_1_2" name="S3_1_2"></a>3.1.2 The flow control language will be
able to invoke VoiceXML 3.0 scripts, passing data into them and receiving
results back when the scripts terminate.</h4>

<h4><a id="S3_1_3" name="S3_1_3"></a>3.1.3 The flow control language will be
suitable for use as an Interaction Manager in the Multimodal Architecture
Framework.</h4>

<h4><a id="S3_1_4" name="S3_1_4"></a>3.1.4 The flow control language will be
based on state-machine concepts.</h4>

<h4><a id="S3_1_5" name="S3_1_5"></a>3.1.5 The flow control language will be
able to receive asynchronous messages from external entities.</h4>

<h4><a id="S3_1_6" name="S3_1_6"></a>3.1.6 The flow control language will be
able to send messages to external entities.</h4>

<h4><a id="S3_1_7" name="S3_1_7"></a>3.1.7 The flow control language will not
contain any media-specific concepts such as ASR or TTS.</h4>

<h3><a id="format-semmod" name="format-semmod">3.2 Semantic Model Definition
(must have)</a></h3>
<!-- <p><span class="owner">Owner: Mike Bodell</span></p> -->

<h4><a id="S3_2_1" name="S3_2_1"></a>3.2.1 The precise semantics of all VXML
3.0 tags MUST be provided</h4>

<h4><a id="S3_2_2" name="S3_2_2"></a>3.2.2 The semantic model MUST be the
authoritative description of VXML 3.0 functionality</h4>

<h4><a id="S3_2_3" name="S3_2_3"></a>3.2.3 Different conformance profiles
MUST be possible, but they MUST be defined in terms of the semantic
model.</h4>

<h4><a id="S3_2_4" name="S3_2_4"></a>3.2.4 The semantic model descriptions of
VXML 3.0 MUST be able to express all of the functionality of VXML 2.1</h4>

<h4><a id="S3_2_5" name="S3_2_5"></a>3.2.5 Extensions to VXML 3.0 SHOULD be
able to build on the semantic model descriptions</h4>

<h2><a id="other-reqs" name="other-reqs">4. Other Requirements</a></h2>

<h3><a id="other-vxml" name="other-vxml">4.1 Consistent with other Voice
Browser Working Group specs (must have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett</span></p> -->

<h4><a id="S4_1_1" name="S4_1_1"></a>4.1.1 Wherever similar functionality to
that of another Voice Browser Working Group specification is available, this
language MUST use a syntax similar to that used in the relevant
specification.</h4>

<h4><a id="S4_1_2" name="S4_1_2"></a>4.1.2 For data that is likely to be
represented in another Voice Browser Working Group markup language (eg., SRGS
or EMMA) or used by another Voice Browser Working Group language, there MUST
be a clear definition of the mapping between the two data
representations.</h4>

<h4><a id="S4_1_3" name="S4_1_3"></a>4.1.3 It MUST be possible to pass
Internet-related document and server information (caching parameters,
xml:base, etc.) from this language to other VBWG language processors for
embedded VBWG languages.</h4>

<h3><a id="other-other" name="other-other">4.2 Consistent with other specs
(XML, MMI, I18N, Accessibility, MRCP, Backplane Activities) (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->

<h4><a id="S4_2_1" name="S4_2_1"></a>4.2.1 MRCP</h4>

<h4><a id="S4_2_1_1" name="S4_2_1_1"></a>4.2.1.1 This language MUST support a
profile that can be implemented using MRCPv2.</h4>

<h4><a id="S4_2_1_2" name="S4_2_1_2"></a>4.2.1.2 Where possible, this
language SHOULD remain compatible with MRCPv2 in terms of data formats (SRGS,
SSML).</h4>

<h4><a id="S4_2_2_" name="S4_2_2_"></a>4.2.2. <a
href="http://www.w3.org/TR/mmi-arch/">MMI</a></h4>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span></p> -->

<p>There must be at least one profile of VoiceXML 3.0 in which all of the
following requirements are supported.</p>

<h4><a id="S4_2_2_1" name="S4_2_2_1"></a>4.2.2.1 It MUST be possible for
VoiceXML 3.0 implementations to receive, process, and generate MMI life cycle
events. Some events maybe handled automatically, while others maybe under
author control.</h4>

<h4><a id="S4_2_2_2" name="S4_2_2_2"></a>4.2.2.2 VoiceXML 3.0 MUST provide a
way for the author to specify the exact functions required for the
application such that the platform can allocate the minimum necessary
resources.</h4>

<h4><a id="S4_2_2_3" name="S4_2_2_3"></a>4.2.2.3 VoiceXML 3.0 MUST be able to
provide EMMA-formatted information inside the data field of MMI life cycle
events.</h4>

<h4><a id="S4_2_2_4" name="S4_2_2_4"></a>4.2.2.4 VoiceXML 3.0 platforms MUST
specify one or more event I/O processors for interoperable exchange of life
cycle events. The Voice Browser Group requests public comment on what such
event processors should be or whether they should be part of the language at
all.</h4>

<h3><a id="other-simplify" name="other-simplify">4.3 Simplify Existing
VoiceXML Tasks (must have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->

<h4><a id="S4_3_1" name="S4_3_1"></a>4.3.1 This language MUST provide a
mechanism for authors to develop dialog managers (state-based, task-based,
rule-based, etc.) that are easily used and configured by other authors.</h4>

<h4><a id="S4_3_2" name="S4_3_2"></a>4.3.2 This language MUST provide
mechanisms to simplify authoring of these common tasks: (we need to collect a
list of common tasks)</h4>

<h3><a id="other-maintain" name="other-maintain">4.4 Maintain Functionality
from Previous VXML Versions</a></h3>

<h4><a id="S4_4_1" name="S4_4_1"></a>4.4.1 New features added in VoiceXML 3.0
MUST be backward compatible with previous VoiceXML versions</h4>

<h4><a id="S4_4_1_1" name="S4_4_1_1"></a>4.4.1.1 Functionality available in
VoiceXML 2.0 and VoiceXML 2.1 MUST be available in VoiceXML 3.0.</h4>

<h4><a id="S4_4_1_2" name="S4_4_1_2"></a>4.4.1.2 Applications written in
VoiceXML 2.0/2.1 MUST be portable to VoiceXML 3.0 without losing application
capabilities.</h4>

<h3><a id="other-crs" name="other-crs">4.5 Address Change Requests from
previous VoiceXML Versions (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Reviewed all deferred and open change requests from VXML -->
<!-- 2.0/2.1</span></p> -->

<h4><a id="S4_5_1" name="S4_5_1"></a>4.5.1 Deferred change requests from VXML
2.0 and 2.1 reevaluated for VXML 3.0</h4>

<p>In particular, the following deferred CRs reevaluated: R51, R92, R104,
R113, R145, R155, R156, R186, R230, R233, R348, R394, R528, R541, and
R565.</p>

<h4><a id="S4_5_2" name="S4_5_2"></a>4.5.2 Unassigned change requests from
VXML 2.0 and 2.1 reevaluated for VXML 3.0</h4>

<p>In particular, the following unassigned CRs reevaluated: R600, R614, R619,
R620, R622, R623, R624, R625, R626, R627, R628, R629, R631, and R632.</p>

<h2><a id="acknowledgments" name="acknowledgments">5. Acknowledgments</a></h2>

<p>TBD</p>

<h2><a id="prev-reqs" name="prev-reqs">Appendix A. Previous
Requirements</a></h2>

<p>The following requirements have been satisfied by previous Voice Browser
Working Group Specifications</p>

<h3><a id="A_1_1" name="A_1_1"></a>A.1.1 Audio Modality Input and Output
(must have) FULLY COVERED</h3>

<p>The markup language can specify which spoken user input is interpreted by
the voice browser, as well as the content rendered as spoken output by the
voice browser.</p>

<h4><a id="CA_1_1" name="CA_1_1"></a>Requirement Coverage</h4>

<p>Audio output: &lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>Audio input: &lt;grammar&gt; <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h3><a id="A_1_2" name="A_1_2"></a>A.1.2 Sequential multi-modal Input (must
have) FULLY COVERED</h3>

<p>The markup language specifies that user input from multiple modalities is
to be interpreted by the voice browser. There is no requirement that the
input modalities are simultaneously active. For example, a voice browser
interpreting the markup language in a telephony environment could accept DTMF
input in one dialog state, and spoken input in another.</p>

<h4><a id="CA_1_2" name="CA_1_2"></a>Requirement Coverage</h4>

<p>&lt;grammar&gt; mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h3><a id="A_1_3" name="A_1_3"></a>A.1.3 Unco-ordinated, Simultaneous,
Multi-modal Input (should have) FULLYCOVERED</h3>

<p>The markup language specifies that user input from different modalities is
to be interpreted at the same time. There is no requirement that
interpretation of the input modalities are co-ordinated. For example, a voice
browser in a desktop environment could accept keyboard input or spoken input
in same dialog state.</p>

<h4><a id="CA_1_3" name="CA_1_3"></a>Requirement Coverage</h4>

<p>&lt;grammar&gt; mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<p>&lt;field&gt; defining multiple &lt;grammar&gt;s with different mode
attribute values <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_4" name="A_1_4"></a>A.1.4 Co-ordinated, Simultaneous
Multi-modal Input (nice to have) FULLYCOVERED</h3>

<p>The markup language specifies that user input from multiple modalities is
interpreted at the same time and that interpretation of the inputs are
co-ordinated by the voice browser. For example, in a telephony environment,
the user can type<em>200</em> on the keypad and say <em>transfer to checking
account</em> and the interpretations are co-ordinated so that they are
understood as <em>transfer 200 to checking account</em>.</p>

<h4><a id="CA_1_4" name="CA_1_4"></a>Requirement Coverage</h4>

<p>&lt;grammar&gt; mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<p>&lt;field&gt; defining multiple &lt;grammar&gt;s with different mode
attribute values <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_5" name="A_1_5"></a>A.1.5 Sequential multi-modal Output (must
have) FULLY COVERED</h3>

<p>The markup language specifies that content is rendered in multiple
modalities by the voice browser. There is no requirement the output
modalities are rendered simultaneously. For example, a voice browser could
output speech in one dialog state, and graphics in another.</p>

<h4><a id="CA_1_5" name="CA_1_5"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_6" name="A_1_6"></a>A.1.6 Unco-ordinated, Simultaneous,
Multi-modal Output (nice to have)FULLY COVERED</h3>

<p>The markup language specifies that content is rendered in multiple
modalities at the same time. There is no requirement the rendering of output
modalities are co-ordinated. For example, a voice browser in a desktop
environment could display graphics and provide audio output at the same
time.</p>

<h4><a id="CA_1_6" name="CA_1_6"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_7" name="A_1_7"></a>A.1.7 Co-ordinated, Simultaneous
Multi-modal Output (nice to have) FULLYCOVERED</h3>

<p>The markup language specifies that content is to be simultaneously
rendered in multiple modalities and that output rendering is co-ordinated.
For example, graphical output on a cellular telephone display is co-ordinated
with spoken output.</p>

<h4><a id="CA_1_7" name="CA_1_7"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_1" name="A_2_1"></a>A.2.1 Mixed Initiative: Form Level (must
have) FULLY COVERED</h3>

<p>Mixed initiative refers to dialog where one participant take the
initiative by, for example, asking a question and expects the other
participant to respond to this initiative by, for example, answering the
question. The other participant, however, responds instead with an initiative
by asking another question. Typically, the first participant then responds to
this initiative, before the second participant responds to the original
initiative. This behavior is illustrated below:<br />
<br />
<em>S-A1: When do you want to fly to Paris?<br />
U-B1: What did you say?<br />
S-B2: I said when do you want to fly to Paris?<br />
U-A2: Tuesday.</em></p>

<p>where A1 is responded to in A2 after a nested interaction, or sub-dialog
in B1 and B2. Note that the B2 response itself could have been another
initiative leading to further nesting of the interaction.</p>

<p>The form-level mixed initiative requirement is that the markup language
can specify to the voice browser that it can take the initiative when user
expects a response, and also allow the user to take the initiative when it
expects a response where the content of these initiatives is relevant to the
task at hand, contains navigation instructions or concerns general
meta-communication issues. This mixed initiative requirement is particularly
important when processing form input (hence the name) and is further
elaborated in requirements A.2.1.1, A.2.1.2, A.2.1.3 and A.2.1.4 below.</p>

<h4><a id="CA_2_1" name="CA_2_1"></a>Requirement Coverage</h4>

<p>&lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;noinput&gt;, &lt;nomatch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_1_1" name="A_2_1_1"></a>A.2.1.1 Clarification Subdialog (must
have) FULLY COVERED</h4>

<p>The markup language can specify that a clarification sub-dialog should be
performed when the user provides incomplete, form-related information. For
example, in a flight enquiry service, the departure city and date may be
required but the user does not always provide all the information at once:<br
/>
<br />
<em>S1: How can I help you?<br />
U1: I want to fly to Paris.<br />
S2: When?<br />
U1: Monday</em></p>

<p>U1 is incomplete (or 'underinformative') with respect to the service (or
form) and the system then initiates a sub-dialog in S2 to collect the
required information. If additional parameters are required, further
sub-dialogs may be initiated.</p>

<h4><a id="CA_2_1_1" name="CA_2_1_1"></a>Requirement Coverage</h4>

<p>&lt;initial&gt;, &lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_1_2" name="A_2_1_2"></a>A.2.1.2 Confirmation Subdialog (must
have) FULLY COVERED</h4>

<p>The markup language can specify that a confirmation sub-dialog is to be
performed when the confidence associated with the interpretation of the user
input is too low.<br />
<br />
<em>U1: I want to fly to Paris.<br />
S1: Did you say 'I want a fly to Paris'?<br />
U2: Yes.<br />
S2: When?<br />
U3: ...</em></p>

<p>Note confirmation sub-dialogs take precedence over clarification
sub-dialogs.</p>

<h4><a id="CA_2_1_2" name="CA_2_1_2"></a>Requirement Coverage</h4>

<p>&lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p><i>name$</i>.confidence shadow variable <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_1_3" name="A_2_1_3"></a>A.2.1.3 Over-informative Input:
corrective (must have) FULLY COVERED</h4>

<p>The markup language can specify that unsolicited user input in a
sub-dialog which corrects earlier input is to be interpreted appropriately.
For example, in a confirmation sub-dialog users may provide corrective
information relevant to the form:<br />
<br />
<em>S1: Did you say you wanted to travel from Paris?<br />
U1: No, from Perros.</em> (modification) <em><br />
U1': Yes, from Paris</em> (repetition)</p>

<h4><a id="CA_2_1_3" name="CA_2_1_3"></a>Requirement Coverage</h4>

<p>&lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>$GARBAGE rule <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h4><a id="A_2_1_4" name="A_2_1_4"></a>A.2.1.4 Over-informative Input:
additional (nice to have) FULLYCOVERED</h4>

<p>The markup language can specify that unsolicited user input in a
sub-dialog which is not corrective but additional, relevant information for
the current form is to be interpreted appropriately. For example, in a
confirmation sub-dialog users may provide additional information relevant to
the form:<br />
<em>S1: Did you say you wanted to travel from Paris?<br />
U1: Yes, I want to fly to Paris on Monday around 11.30</em></p>

<h4><a id="CA_2_1_4" name="CA_2_1_4"></a>Requirement Coverage</h4>

<p>&lt;initial&gt;, &lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>form level &lt;grammar&gt;s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>

<h3><a id="A_2_2" name="A_2_2"></a>A.2.2 Mixed Initiative: Task Level (must
have) FULLY COVERED</h3>

<p>The markup language needs to address mixed initiative in dialogs which
involve more than one task (or topic). For example, a portal service may
allow the user to interact with a number of specific services such as car
hire, hotel reservation, flight enquiries, etc, which may be located on the
different web sites or servers. This requirement is further elaborated in
requirements A.2.2.1, A.2.2.2, A.2.2.3, A.2.2.4 and A.2.2.5 below.</p>

<h4><a id="A_2_2_1" name="A_2_2_1"></a>A.2.2.1 Explicit Task Switching (must
have) FULLY COVERED</h4>

<p>The markup language can specify how users can explicitly switch from one
task to another. For example, by means of a set of global commands which are
active in all tasks and which take the user to a specific task; e.g. <em>Take
me to car hire</em>, <em>Go to hotel reservations</em>.</p>

<h4><a id="CA_2_2_1" name="CA_2_2_1"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt;, &lt;submit&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>form level &lt;grammar&gt;s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>

<h4><a id="A_2_2_2" name="A_2_2_2"></a>A.2.2.2 Implicit Task Switching
(should have) FULLY COVERED</h4>

<p>The markup language can specify how users can implicitly switch from one
task to another. For example, by means of simply uttering a phrases relevant
to another task; <em>I want to reserve a McLaren F1 in Monaco next
Wednesday</em>.</p>

<h4><a id="CA_2_2_2" name="CA_2_2_2"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt;, &lt;submit&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>form level &lt;grammar&gt;s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>

<h4><a id="A_2_2_3" name="A_2_2_3"></a>A.2.2.3 Manual Return from Task Switch
(must have) FULLY COVERED</h4>

<p>The markup language can specify how users can explicitly return to a
previous task at any time. For example, by means of global task navigation
commands such as <em>previous task</em>.</p>

<h4><a id="CA_2_2_3" name="CA_2_2_3"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_2_4" name="A_2_2_4"></a>A.2.2.4 Automatic Return from Task
Switch (should have) FULLY COVERED</h4>

<p>The markup language can specify that users can automatically return to the
previous task upon completion or explicit cancellation of the current
task.</p>

<h4><a id="CA_2_2_4" name="CA_2_2_4"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_2_5" name="A_2_2_5"></a>A.2.2.5 Suspended Tasks (should have)
FULLY COVERED</h4>

<p>The markup language can specify that when task switching occurs the
previous task is suspended rather than canceled. Thus when the user returns
to the previous task, the interaction is resumed at the point it was
suspended.</p>

<h4><a id="CA_2_2_5" name="CA_2_2_5"></a>Requirement Coverage</h4>

<p>&lt;link&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_3" name="A_2_3"></a>A.2.3 Help Behavior (should have) FULLY
COVERED</h3>

<p>The markup language can specify help information when requested by the
user. Help information should be available in all dialog states.<br />
<em>S1: How can I help you?<br />
U1: What can you do?<br />
S2: I can give you flight information about flights between major cities
world-wide just like a travel agent. How can I help you?<br />
U1: I want a flight to Paris ...</em><br />
</p>

<p>Help information can be tapered so that it can be elaborated upon on
subsequent user requests.</p>

<h4><a id="CA_2_3" name="CA_2_3"></a>Requirement Coverage</h4>

<p>&lt;help&gt; using count attribute for tapering <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_4" name="A_2_4"></a>A.2.4 Error Correction Behavior (must
have) FULLY COVERED</h3>

<p>The markup language can specify how error events generated by the voice
browser are to be handled. For example, by initiating a sub-dialog to
describe and correct the error:<br />
<em>S1: How can I help you?<br />
U1: &lt;audio but no interpretation&gt;<br />
S2: Sorry, I didn't understand that. Where do you want to travel to?<br />
U2: Paris</em></p>

<p>The markup language can specify how specific types of errors encountered
in spoken dialog, e.g. no audio, too loud/soft, no interpretation, no audio,
internal error, etc, are to be handled as well as providing a general 'catch
all' method.</p>

<h4><a id="CA_2_4" name="CA_2_4"></a>Requirement Coverage</h4>

<p>&lt;error&gt;, &lt;nomatch&gt;, &lt;noinput&gt;, &lt;catch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_5" name="A_2_5"></a>A.2.5 Timeout Behavior (must have) FULLY
COVERED</h3>

<p>The markup language can specify what to do when the voice browser times
out waiting for input; for example, a timeout event can be handled by
repeating the current dialog state:<br />
<em>S1: Did you say Monday?<br />
U1: &lt;timeout&gt;<br />
S2: Did you say Monday?</em><br />
</p>

<p>Note that the strategy may be dependent upon the environment; in a desktop
environment, repetition for example may be irritating.</p>

<h4><a id="CA_2_5" name="CA_2_5"></a>Requirement Coverage</h4>

<p>&lt;noinput&gt;, &lt;catch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_6" name="A_2_6"></a>A.2.6 Meta-Commands (should have) FULLY
COVERED</h3>

<p>The markup language specifies a set of meta-command functions which are
available in all dialog states; for example, repeat, cancel, quit, operator,
etc.</p>

<p>The precise set of meta-commands will be co-ordinated with the Telephony
Speech Standards Committee.</p>

<p>The markup language should specify how the scope of meta-commands like
'cancel' is resolved.</p>

<h4><a id="CA_2_6" name="CA_2_6"></a>Requirement Coverage</h4>

<p>Universal Grammars <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_7" name="A_2_7"></a>A.2.7 Barge-in Behavior (should have)
FULLY COVERED</h3>

<p>The markup language specifies when the user is able to bargein on the
system output, and when it is not allowed.</p>

<p>Note: The output device may generate timestamped events when barge-in
occurs (see 3.9).</p>

<h4><a id="CA_2_7" name="CA_2_7"></a>Requirement Coverage</h4>

<p>bargein property <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_8" name="A_2_8"></a>A.2.8 Call Transfer (should have) FULLY
COVERED</h3>

<p>The markup language specifies a mechanism to allow transfer of the caller
to another line in a telephony environment. For example, in cases of dialog
breakdown, the user can be transferred to an operator (cf. 'callto' in HTML).
The markup language also provides a mechanism to deal with transfer failures
such as when the called line is busy or engaged.</p>

<h4><a id="CA_2_8" name="CA_2_8"></a>Requirement Coverage</h4>

<p>&lt;transfer&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;createcall&gt;, &lt;redirect&gt; <a
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>

<h3><a id="A_2_9" name="A_2_9"></a>A.2.9 Quit Behavior (must have) FULLY
COVERED</h3>

<p>The markup language provides a mechanism to terminate the session (cf.
user-terminated sessions via a 'quit' meta-command in 2.6).</p>

<h4><a id="CA_2_9" name="CA_2_9"></a>Requirement Coverage</h4>

<p>Universal Grammars <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_10" name="A_2_10"></a>A.2.10 Interaction with External
Components (must have) FULLY COVERED</h3>

<p>The markup language must support a generic component interface to allow
for the use of external components on the client and/or server side. The
interface provides a mechanism for transferring data between the markup
language's variables and the component. Examples of such data are:
configuration parameters (such as timeouts), and events for data input and
error codes. Except for event handling, a call to an external component does
not directly change the dialog state, i.e. the dialog continues in the state
from which the external component was called.</p>

<p>Examples of external components are pre-built dialog components and server
scripts. Pre-built dialogs are further described in Section A.3.3. Server
scripts can be used to interact with remote services, devices or
databases.</p>

<h4><a id="CA_2_10" name="CA_2_10"></a>Requirement Coverage</h4>

<p>&lt;property&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;submit&gt; namelist attribute, &lt;submit&gt;, &lt;goto&gt; query
string <a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
2.0</a></p>

<h3><a id="A_3_1" name="A_3_1"></a>A.3.1 Ease of Use (must have) FULLY
COVERED</h3>

<p>The markup language should be easy for designers to understand and author
without special tools or knowledge of vendor technology or protocols (dialog
design knowledge is still essential).</p>

<h4><a id="CA_3_1" name="CA_3_1"></a>Requirement Coverage</h4>

<p>Form Interpretation Algorithm (FIA) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_2" name="A_3_2"></a>A.3.2 Simplicity and Power (must have)
FULLY COVERED</h3>

<p>The markup language allows designers to rapidly develop simple dialogs
without the need to worry about interactional details but also allow
designers to take more control over interaction to develop complex
dialogs.</p>

<h4><a id="CA_3_2" name="CA_3_2"></a>Requirement Coverage</h4>

<p>Form Interpretation Algorithm (FIA) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_3" name="A_3_3"></a>A.3.3 Support for Modularity and Re-use
(should have) FULLY COVERED</h3>

<p>The markup language complies with the requirements of the Reusable Dialog
Components Subgroup.</p>

<p>The markup language can specify a number of pre-built dialog components.
This enables one to build a library of reusable 'dialogs'. This is useful for
handling both application specific input types, such as telephone numbers,
credit card number, etc as well as those that are more generic, such as
times, dates, numbers, etc.</p>

<h4><a id="CA_3_3" name="CA_3_3"></a>Requirement Coverage</h4>

<p>&lt;subdialog&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_4" name="A_3_4"></a>A.3.4 Naming (must have) FULLY COVERED</h3>

<p>Dialogs, states, inputs and outputs can be referenced by a URI in the
markup language.</p>

<h4><a id="CA_3_4" name="CA_3_4"></a>Requirement Coverage</h4>

<p>&lt;form&gt; id attribute, form item name attribute <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_5" name="A_3_5"></a>A.3.5 Variables (must have) FULLY
COVERED</h3>

<p>Variables can be defined and assigned values.</p>

<p>Variables can be scoped within namespaces: for example, state-level,
dialog-level, document-level, application-level or session-level. The markup
language defines the precise scope of all variables.</p>

<p>The markup language must specify if variables are atomic or structured.</p>

<p>Variables can be assigned default values. Assignment may be optional; for
example, in a flight reservation form, a 'special meal' variable need not be
assigned a value by the user.</p>

<p>Variables may be referred to in the output content of the markup
language.</p>

<p>The precise requirements on variables may be affected by W3C work on
modularity and XML schema datatypes.</p>

<h4><a id="CA_3_5" name="CA_3_5"></a>Requirement Coverage</h4>

<p>&lt;var&gt;, &lt;assign&gt;, &lt;script&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_6" name="A_3_6"></a>A.3.6 Variable Binding (must have) FULLY
COVERED</h3>

<p>User input can bind one or more state variables. A single input may bind a
single variable or it may bind multiple variables in any order; for example,
the following utterances result in the same variable bindings<br />
</p>
<ul>
  <li>Transfer $200 from savings to checking</li>
  <li>Transfer $200 to checking from savings</li>
  <li>Transfer from savings $200 to checking</li>
</ul>

<h4><a id="CA_3_6" name="CA_3_6"></a>Requirement Coverage</h4>

<p>application.lastresult$.interpretation <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_7" name="A_3_7"></a>A.3.7 Event Handler (must have) FULLY
COVERED</h3>

<p>The markup language provides an explicit event handling mechanism for
specifying actions to be carried out when events are generated in a dialog
state.</p>

<p>Event handlers can be ordered so that if multiple event handlers match the
current event, only the handler with the highest ranking is executed. By
default, event handler ranking is based on proximity and specificity: i.e.
the handler closest in the event hierarchy with the most specific matching
conditions.</p>

<p>Actions can be conditional upon variable assignments, as well as the type
and content of events (e.g. input events specifying media, content,
confidence, and so on).</p>

<p>Actions include: the binding of variables with information, for example,
information contained in events; transition to another dialog state
(including the current state).</p>

<h4><a id="CA_3_7" name="CA_3_7"></a>Requirement Coverage</h4>

<p>&lt;catch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;transition&gt; <a
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>

<h3><a id="A_3_8" name="A_3_8"></a>A.3.8 Builtin Event Handlers (should have)
FULLY COVERED</h3>

<p>The markup language can provide implicit event handlers which provide
default handling of, for example, timeout and error events as well as
handlers for situations, such as confirmation and clarification, where there
is a transition to a implicit dialog state. For example, there can be a
default handler for user input events such that if the recognition confidence
score is below a given threshold, then the input is confirmed in a
sub-dialog.</p>

<p>Properties of implicit event handlers (thresholds, counters, locale, etc)
can be explicitly customized in the markup language.</p>

<p>Implicit event handlers are always overridden by explicit handlers.</p>

<h4><a id="CA_3_8" name="CA_3_8"></a>Requirement Coverage</h4>

<p>Default event handlers (nomatch, noinput, error, etc...) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_9" name="A_3_9"></a>A.3.9 Output Content and Events (must
have) FULLY COVERED</h3>

<p>The markup language complies with the requirements developed by the Speech
Synthesis Markup Subgroup for output text content and parameter settings for
the output device. Requirements on multimodal output will be co-ordinated by
the Multimodal Interaction Subgroup (cf. Section 1).</p>

<p>In addition, the markup supports the following output features (if not
already defined in the Synthesis Markup):</p>
<ol>
  <li>Pre-recorded audio file output</li>
  <li>Streamed audio</li>
  <li>Playing/synthesizing sounds such as tones and beeps</li>
  <li>variable level of detail control over structured text</li>
</ol>

<p>The output device generates timestamped events including error events and
progress events (output started/stopped, current position).</p>

<h4><a id="CA_3_9" name="CA_3_9"></a>Requirement Coverage</h4>

<p>&lt;audio&gt;, &lt;prompt&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;speak&gt; and other SSML elements <a
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>

<p>application.lastresult$.markname, application.lastresult$.marktime <a
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>

<h3><a id="A_3_10" name="A_3_10"></a>A.3.10 Richer Output (nice to have)
FULLY COVERED</h3>

<p>The markup language allows for richer output than variable substitution in
the output content. For example, natural language generation of output
content.</p>

<h4><a id="CA_3_10" name="CA_3_10"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;speak&gt; and other SSML elements <a
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>

<h3><a id="A_3_11" name="A_3_11"></a>A.3.11 Input Content and Events (must
have) FULLY COVERED</h3>

<p>The markup language complies with the requirements developed by the
Grammar Representation Subgroup for the representation of speech grammar
content. Requirements on multimodal input will be co-ordinated by the
Multimodal Interaction Subgroup (cf. Section 1).</p>

<p>The markup language can specify the activation and deactivation of
multiple speech grammars. These can be user-defined, or builtin grammars
(digits, date, time, money, etc).</p>

<p>The markup language can specify parameters for speech grammar content
including timeout parameters --- maximum initial silence, maximum utterance
duration, maximum within-utterance pause --- energy thresholds necessary for
bargein, etc.</p>

<p>The input device generates timestamped events including input timeout and
error events, progress events (utterance started, interference, etc), and
recognition result events (including content, interpretation/variable
bindings, confidence).</p>

<p>In addition to speech grammars, the markup language allows input content
and events to be specified for DTMF and keyboard devices.</p>

<h4><a id="CA_3_11" name="CA_3_11"></a>Requirement Coverage</h4>

<p>timeout, completetimeout, incompletetimeout, interdigittimeout,
termtimeout properties <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>application.lastresult$.interpretation, application.lastresult$.confidence
<a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
2.0</a></p>

<p>application.lastresult$.markname, application.lastresult$.marktime <a
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>

<p>&lt;grammar&gt; and other elements <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h3><a id="A_4_1" name="A_4_1"></a>A.4.1 Event Handling (must have) FULLY
COVERED</h3>

<p>One key difference between contemporary event models (e.g. DOM Level 2,
'try-catch' in object-oriented programming) is whether the same event can be
handled by more than one event handler within the hierarchy. The markup
language must motivate whether it supports this feature or not.</p>

<h3><a id="A_4_2" name="A_4_2"></a>A.4.2 Logging (nice to have) FULLY
COVERED</h3>

<p>For development and testing it is important that data and events are to be
logged by the voice browser. At the most detailed level, this will include
logging of input and output audio data. A mechanism which allows logged data
to be retrieved from a voice browser, preferably via standard Internet
protocol (http, ftp, etc), is also required.</p>

<p>One approach is to require that the markup language can control logging
via, for example, an optional meta tag. Another approach is for logging to be
controlled by means other than the markup language, such as via proprietary
meta tags.</p>

<h4><a id="CA_4_2" name="CA_4_2"></a>Requirement Coverage</h4>

<p>&lt;log&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;log&gt; <a href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML
1.0</a></p>
</body>
</html>