index.html
90.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="EN" lang="EN">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<title>Voice Extensible Markup Language (VoiceXML) 3.0 Requirements</title>
<style type="text/css" xml:space="preserve">
.add { background-color: #FFFF99; }
.remove { background-color: #FF9999; text-decoration: line-through }
.issues { font-style: italic; font-weight: bold; color: green }
.tocline { list-style: none; }</style>
<link rel="stylesheet" type="text/css"
href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" />
</head>
<body>
<div class="head">
<p><a href="http://www.w3.org/"><img alt="W3C"
src="http://www.w3.org/Icons/w3c_home" height="48" width="72" /></a></p>
<h1 class="notoc" id="h1">Voice Extensible Markup Language (VoiceXML) 3.0
Requirements</h1>
<h2 class="notoc" id="date">W3C Working Draft <i>8 August 2008</i></h2>
<dl>
<dt>This version:</dt>
<dd><a
href="http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/">http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/
</a></dd>
<dt>Latest version:</dt>
<dd><a
href="http://www.w3.org/TR/vxml30reqs/">http://www.w3.org/TR/vxml30reqs/
</a></dd>
<dt>Previous version:</dt>
<dd>This is the first version. </dd>
<dt>Editors:</dt>
<dd>Jeff Hoepfinger, SandCherry</dd>
<dd>Emily Candell, Comverse</dd>
<dt>Authors:</dt>
<dd>Jim Barnett, Aspect</dd>
<dd>Mike Bodell, Microsoft</dd>
<dd>Dan Burnett, Voxeo</dd>
<dd>Jerry Carter, Nuance</dd>
<dd>Scott McGlashan, HP</dd>
<dd>Ken Rehor, Cisco</dd>
</dl>
<p class="copyright"><a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
© 2008 <a href="http://www.w3.org/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
href="http://www.csail.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> rules apply.</p>
</div>
<hr />
<h2 class="notoc"><a id="abstract" name="abstract">Abstract</a></h2>
<p>The W3C Voice Browser working group aims to develop specifications to
enable access to the Web using spoken interaction. This document is part of a
set of requirement studies for voice browsers, and provides details of the
requirements for marking up spoken dialogs.</p>
<h2><a id="status" name="status">Status of this document</a></h2>
<p><em>This section describes the status of this document at the time of its
publication. Other documents may supersede this document. A list of current
W3C publications and the latest revision of this technical report can be
found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>
at http://www.w3.org/TR/.</em></p>
<p>This is the 8 August 2008 W3C Working Draft of "Voice Extensible Markup
Language (VoiceXML) 3.0 Requirements".</p>
<p>This document describes the requirements for marking up dialogs for spoken
interaction required to fulfill the charter given in <a
href="http://www.w3.org/2006/12/voice-charter.html#scope">the Voice Browser
Working Group Charter</a>, and indicates how the W3C Voice Browser Working
Group has satisfied these requirements via the publication of working drafts
and recommendations. This is a First Public Working Draft. The group does not
expect this document to become a W3C Recommendation.</p>
<p>This document has been produced as part of the <a
href="http://www.w3.org/Voice/Activity.html" shape="rect">W3C Voice Browser
Activity</a>, following the procedures set out for the <a
href="http://www.w3.org/Consortium/Process/" shape="rect">W3C Process</a>.
The authors of this document are members of the <a
href="http://www.w3.org/Voice/" shape="rect">Voice Browser Working Group</a>.
You are encouraged to subscribe to the public discussion list <<a
href="mailto:www-voice@w3.org" shape="rect">www-voice@w3.org</a>> and to
mail us your comments. To subscribe, send an email to <<a
href="mailto:www-voice-request@w3.org"
shape="rect">www-voice-request@w3.org</a>> with the word
<em>subscribe</em> in the subject line (include the word <em>unsubscribe</em>
if you want to unsubscribe). A <a
href="http://lists.w3.org/Archives/Public/www-voice/" shape="rect">public
archive</a> is available online.</p>
<p>This specification is a Working Draft of the Voice Browser working group
for review by W3C members and other interested parties. It is a draft
document and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use W3C Working Drafts as reference material or
to cite them as other than "work in progress".</p>
<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does not expect this document to become a W3C Recommendation. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34665/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>
<p>Publication as a Working Draft does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite this
document as other than work in progress.</p>
<h2><a id="toc" name="toc" shape="rect">Table of Contents</a></h2>
<ul class="toc">
<li class="tocline">0. <a href="#intro" shape="rect">Introduction</a></li>
<li class="tocline">1. <a href="#modality-reqs" shape="rect">Modality
Requirements</a></li>
<li class="tocline">1.1 <a href="#mod-csmo" shape="rect">Coordinated,
Simultaneous Multimodal Output</a></li>
<li class="tocline">1.2 <a href="#mod-usmo" shape="rect">Uncoordinated,
Simultaneous Multimodal Output</a></li>
<li class="tocline">2. <a href="#functional-reqs" shape="rect">Functional
Requirements</a></li>
<li class="tocline">2.1 <a href="#funct-vcr" shape="rect">VCR
Controls</a></li>
<li class="tocline">2.2 <a href="#funct-media" shape="rect">Media
Control</a></li>
<li class="tocline">2.3 <a href="#funct-siv" shape="rect">Speaker
Verification</a></li>
<li class="tocline">2.4 <a href="#funct-event" shape="rect">External Event
Handling while a dialog in progress</a></li>
<li class="tocline">2.5 <a href="#funct-pls" shape="rect">Pronunciation
Lexicon Specification</a></li>
<li class="tocline">2.6 <a href="#funct-emma" shape="rect">EMMA</a></li>
<li class="tocline">2.7 <a href="#funct-upload" shape="rect">Synchronous
Upload of Recordings</a></li>
<li class="tocline">2.8 <a href="#funct-speed" shape="rect">Speed
Control</a></li>
<li class="tocline">2.9 <a href="#funct-volume" shape="rect">Volume
Control</a></li>
<li class="tocline">2.10 <a href="#funct-record" shape="rect">Media
Recording</a></li>
<li class="tocline">2.11 <a href="#funct-mediaformat" shape="rect">Media
Formats</a></li>
<li class="tocline">2.12 <a href="#funct-datamodel" shape="rect">Data
Model</a></li>
<li class="tocline">2.13 <a href="#funct-submitprocessing"
shape="rect">Submit Processing</a></li>
<li class="tocline">3. <a href="#format-reqs" shape="rect">Format
Requirements</a></li>
<li class="tocline">3.1 <a href="#format-flow" shape="rect">Flow
Language</a></li>
<li class="tocline">3.2 <a href="#format-semmod" shape="rect">Semantic
Model Definition</a></li>
<li class="tocline">4. <a href="#other-reqs" shape="rect">Other
Requirements</a></li>
<li class="tocline">4.1 <a href="#other-vxml" shape="rect">Consistent with
other Voice Browser Working Group Specs</a></li>
<li class="tocline">4.2 <a href="#other-other" shape="rect">Consistent with
other Specs</a></li>
<li class="tocline">4.3 <a href="#other-simplify" shape="rect">Simplify
existing VoiceXML Tasks</a></li>
<li class="tocline">4.4 <a href="#other-maintain" shape="rect">Maintain
Functionality from Previous VXML Versions</a></li>
<li class="tocline">4.5 <a href="#other-crs" shape="rect">Address Change
Requests from Previous VXML Versions</a></li>
<li class="tocline">5. <a href="#acknowledgments"
shape="rect">Acknowledgments</a></li>
<li class="tocline">Appendix A. <a href="#prev-reqs" shape="rect">Previous
Requirements</a></li>
</ul>
<h2><a id="intro" name="intro">0. Introduction</a></h2>
<p>The main goal of this activity is to establish the current status of the
Voice Browser Working Group Activities relative to the requirements defined
in <a href="http://www.w3.org/TR/1999/WD-voice-dialog-reqs-19991223">Previous
Requirements Document</a> and define additional requirements to drive future
Voice Browser Working Group activities based on Voice Community experience
with existing standards</p>
<p>The process will consist of the following steps:</p>
<ol>
<li>Identify how the existing requirements have been satisfied by the
standards defined by the Voice Browser Working Group, other W3C Working
Groups or other standards bodies. Note that references to VoiceXML 2.0
imply that VoiceXML 2.1 also satisfies the requirement.</li>
<li>Identify the requirements that have not yet been satisfied and
determine if they are still valid requirements</li>
<li>Identify new requirements based on input from working group members and
submission to the W3C Voice Browser Public Mailing List <<a
href="mailto:www-voice@w3.org">www-voice@w3.org</a>> (<a
href="http://www.w3.org/Archives/Public/www-voice/">archive</a>)</li>
<li>Prioritize remaining requirements and identify road map by which the
Voice Browser Working Group plans to address these items</li>
</ol>
<h3><a id="S0_1" name="S0_1"></a>0.1 Scope</h3>
<p>The previous requirements definition activity focused on defining three
types of requirements on the voice markup language: modality, functional, and
format.</p>
<ul>
<li><b>Modality</b> requirements concern the types of modalities (media in
combination with an input/output mechanism) supported by the markup
language for user input and system output. (For the Voice Browser Working
Group, the modalities supported are speech, video and DTMF. Requirements
regarding other modalities will be handled by the <a
href="http://www.w3.org/2002/mmi/">Multimodal Interaction Working
Group.</a>)</li>
<li><b>Functional</b> requirements concern the behavior (or operational
semantics) which results from interpreting a voice markup language.</li>
<li><b>Format</b> requirements constrain the format (or syntax) of the
voice markup language itself.</li>
</ul>
<p>The environment and capabilities of the voice browser interpreting the
markup language affects these requirements. There may be differences in the
modality and functional requirements for desktop versus telephony-based
environments (and in the latter case, between fixed, mobile and Internet
telephony environments). The capabilities of the voice browser device also
impacts on requirements. Requirements affected by the environment or
capabilities of the voice browser device will be explicitly marked as
such.</p>
<h3><a id="S0_2" name="S0_2"></a>0.2 Terminology</h3>
<p>Although defining a dialog is highly problematic, some basic definitions
must be provided to establish a common basis of understanding and avoid
confusion. The following terminology is based upon an event-driven model of
dialog interaction.<br />
<br />
</p>
<table summary="first column gives term, second gives description" border="1"
cellpadding="6" width="85%">
<tbody>
<tr>
<th>Voice Markup Language</th>
<td>a language in which voice dialog behavior is specified. The
language may include reference to style and scripting elements which
can also determine dialog behavior.</td>
</tr>
<tr>
<th>Voice Browser</th>
<td>a software device which interprets a voice markup language and
generates a dialog with voice output and/or input, and possibly other
modalities.</td>
</tr>
<tr>
<th>Dialog</th>
<td>a model of interactive behavior underlying the interpretation of
the markup language. The model consists of states, variables, events,
event handlers, inputs and outputs.</td>
</tr>
<tr>
<th>State</th>
<td>the basic interactional unit defined in the markup language; for
example, an < input > element in HTML. A state can specify
variables, event handlers, outputs and inputs. A state may describe
output content to be presented to the user, input which the user can
enter, event handlers describing, for example, which variables to
bind and which state to transition to when an event occur.</td>
</tr>
<tr>
<th>Events</th>
<td>generated when a state is executed by the voice browser; for
example, when outputs or inputs in a state are rendered or
interpreted. Events are typed and may include information; for
example, an input event generated when an utterance is recognized may
include the string recognized, an interpretation, confidence score,
and so on.</td>
</tr>
<tr>
<th>Event Handlers</th>
<td>are specified in the voice markup language and describe how events
generated by the voice browser are to be handled. Interpretation of
events may bind variables, or map the current state into another
state (possibly itself).</td>
</tr>
<tr>
<th>Output</th>
<td>content specified in an element of the markup language for
presentation to the user. The content is rendered by the voice
browser; for example, audio files or text rendered by a TTS. Output
can also contain parameters for the output device; for example,
volume of audio file playback, language for TTS, etc. Events are
generated when, for example, the audio file has been played.</td>
</tr>
<tr>
<th>Input</th>
<td>content (and its interpretation) specified in an element of the
markup language which can be given as input by a user; for example, a
grammar for DTMF and speech input. Events are generated by the voice
browser when, for example, the user has spoken an utterance and
variables may be bound to information contained in the event. Input
can also specify parameters for the input device; for example,
timeout parameters, etc.</td>
</tr>
</tbody>
</table>
<p>The dialog requirements for the voice markup language are annotated with
the following priorities. If a feature is deferred from the initial
specification to a future release, consideration may be given to leaving open
a path for future incorporation of the feature.<br />
<br />
</p>
<table summary="first column gives priority name, second its description"
border="1" cellpadding="6" width="85%">
<tbody>
<tr>
<th>must have</th>
<td>The first official specification must define the feature.</td>
</tr>
<tr>
<th>should have</th>
<td>The first official specification should define the feature if
feasible but may defer it until a future release.</td>
</tr>
<tr>
<th>nice to have</th>
<td>The first official specification may define the feature if time
permits, however, its priority is low.</td>
</tr>
<tr>
<th>future revision</th>
<td>It is not intended that the first official specification include
the feature.</td>
</tr>
</tbody>
</table>
<h2><a id="modality-reqs" name="modality-reqs">1. Modality
Requirements</a></h2>
<!-- <p><span class="owner">Owner: Scott McGlashan</span><br /> -->
<!-- <span class="note">Note: These requirements will be coordinated with the -->
<!-- Multimodal Interaction Subgroup.</span></p> -->
<h3><a id="mod-csmo" name="mod-csmo">1.1 Coordinated, Simultaneous Multimodal
Output (nice to have)</a></h3>
<p>1.1.1 The markup language specifies that content is to be simultaneously
rendered in multiple modalities (e.g. audio and video) and that output
rendering is coordinated. For example, graphical output on a cellular
telephone display is coordinated with spoken output.</p>
<h3><a id="mod-usmo" name="mod-usmo">1.2 Uncoordinated, Simultaneous
Multimodal Output (nice to have)</a></h3>
<p>1.2.1 The markup language specifies that content is to be simultaneously
rendered in multiple modalities (e.g. audio and video) and that output
rendering is uncoordinated. For example, graphical output on a cellular
telephone display is uncoordinated with spoken output.</p>
<h2><a id="functional-reqs" name="functional-reqs">2. Functional
Requirements</a></h2>
<p>These requirements are intended to ensure that the markup language is
capable of specifying cooperative dialog behavior characteristic of
state-of-the-art spoken dialog systems. In general, the voice browser should
compensate for its own limitations in knowledge and performance compared with
equivalent human agents; for example, compensate for limitations in speech
recognition capability by confirming spoken user input when necessary.</p>
<h3><a id="funct-vcr" name="funct-vcr">2.1 VCR Controls (must have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete.</span></p> -->
<h4><a id="S2_1_1" name="S2_1_1"></a>2.1.1 VoiceXML 3.0 MUST provide a
mechanism giving an application developer a high-level of control of audio
and video playback.</h4>
<h4><a id="S2_1_1_1" name="S2_1_1_1"></a>2.1.1.1 It MUST be possible to
invoke media controls by DTMF or speech input (other input mechanisms may be
supported).</h4>
<h4><a id="S2_1_1_2" name="S2_1_1_2"></a>2.1.1.2 Media controls MUST not
disable normal user input: i.e. input for media control and input for
application input MUST be possible simultaneously.</h4>
<h4><a id="S2_1_1_3" name="S2_1_1_3"></a>2.1.1.3 Input associated with media
controls MUST be treated in the same way as other inputs. Resolution of best
match follows standard VoiceXML 2.0 precedence and scoping rules.</h4>
<h4><a id="S2_1_1_4" name="S2_1_1_4"></a>2.1.1.4 It MUST be possible for user
input to be interpreted as seek controls -- fast forward and rewind -- during
media output playback.</h4>
<h4><a id="S2_1_1_5" name="S2_1_1_5"></a>2.1.1.5 The seek control MUST allow
fast forward and rewind to be specified in time - seconds, milliseconds -
relative to the current playback position.</h4>
<h4><a id="S2_1_1_6" name="S2_1_1_6"></a>2.1.1.6 The seek control MUST allow
fast forward and rewind to be specified relative to <mark> elements in
the output.</h4>
<h4><a id="S2_1_1_7" name="S2_1_1_7"></a>2.1.1.7 The seek control MUST not
affect the selection of alternative content: i.e. the same (alternative)
content MUST be used.</h4>
<h4><a id="S2_1_1_8" name="S2_1_1_8"></a>2.1.1.8 It MUST be possible for user
input to be interpreted as pause/resume during media output playback.</h4>
<h4><a id="S2_1_1_9" name="S2_1_1_9"></a>2.1.1.9 It MUST be possible for the
different inputs to control pause and resume.</h4>
<h3><a id="funct-media" name="funct-media">2.2 Media Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: These requirements were reversed engineered from the -->
<!-- VoiceXML 3.0 spec editor's draft.</span></p> -->
<h4><a id="S2_2_1_" name="S2_2_1_"></a>2.2.1. It MUST be possible to specify
a media clip begin value, specified in time, as an offset from the start of
the media clip to begin playback.</h4>
<h4><a id="S2_2_2_" name="S2_2_2_"></a>2.2.2. It MUST be possible to specify
a media clip end value, specified in time, as an offset from the start of the
media clip to end playback.</h4>
<h4><a id="S2_2_3_" name="S2_2_3_"></a>2.2.3. It MUST be possible to specify
a repeat duration, specified in time, as the amount of time the media file
will repeat playback.</h4>
<h4><a id="S2_2_4_" name="S2_2_4_"></a>2.2.4. It MUST be possible to specify
a repeat count, specified as a non-negative integer, as the number of times
the media file will repeat playback.</h4>
<h4><a id="S2_2_5_" name="S2_2_5_"></a>2.2.5. It MUST be possible to specify
a gain , specified as a percentage, as the percent to adjust the amplitude
playback of the original waveform.</h4>
<h4><a id="S2_2_6_" name="S2_2_6_"></a>2.2.6. It MUST be possible to specify
a speed, specified as a percentage, as the percent to adjust the speed
playback of the original waveform.</h4>
<h3><a id="funct-siv" name="funct-siv">2.3 Speaker Verification (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Ken Rehor</span><br /> -->
<!-- <span class="note">Note: Ken reviewed and thought these were -->
<!-- complete</span></p> -->
<h4><a id="S2_3_1" name="S2_3_1"></a>2.3.1 The markup language MUST provide
the ability to verify a speaker's identity through a dialog containing both
acoustic verification and knowledge verification.</h4>
<p>The acoustic verification may compare speech samples to an existing model
(kept in some, possibly external, repository) of that speaker's voice. A
verification result returns a value indicating whether the acoustic and
knowledge tests were accepted or rejected. Results for verification and
results for recognition may be returned simultaneously.</p>
<h4><a id="S2_3_1_1" name="S2_3_1_1"></a>2.3.1.1 VoiceXML 3.0 MUST support
SIV for end-user dialogs</h4>
<p>Note: The security administrator's interface is out-of-scope for
VoiceXML.</p>
<h4><a id="S2_3_1_2" name="S2_3_1_2"></a>2.3.1.2 SIV features MUST be
integrated with VoiceXML 3.0.</h4>
<p>SIV features such as enrollment and verification are voice dialogs. SIV
must be compatible and complementary with other VoiceXML 3.0 dialog
constructs such as speech recognition.</p>
<h4><a id="S2_3_1_3" name="S2_3_1_3"></a>2.3.1.3 VoiceXML 3.0 MUST be able to
be used without SIV.</h4>
<p>SIV features must be part of VoiceXML 3.0 but may not be needed in all
application scenarios or implementations. Not all voice dialogs need SIV.</p>
<h4><a id="S2_3_1_4" name="S2_3_1_4"></a>2.3.1.4 SIV MUST be able to be used
without other input modalities.</h4>
<p>Some SIV processing techniques operate without using any ASR.</p>
<h4><a id="S2_3_1_5" name="S2_3_1_5"></a>2.3.1.5 SIV features MUST be able to
operate in multi-factor environments.</h4>
<p>Some applications require the use of SIV along with other means of
authentication: biometric (e.g. fingerprint, hand, retina, DNA) or
non-biometric (e.g. caller ID, geolocation, personal knowledge, etc.).</p>
<h4><a id="S2_3_1_6" name="S2_3_1_6"></a>2.3.1.6 SIV-specific events MUST be
defined.</h4>
<p>SIV processing engines and network protocols (e.g. MRCP) generate events
related to their operation and use. These events must be made available in a
manner consistent with other VoiceXML events. Event naming structure must
allow for vendor-specific and application-specific events.</p>
<h4><a id="S2_3_1_7" name="S2_3_1_7"></a>2.3.1.7 SIV-specific properties MUST
be defined.</h4>
<p>These properties are provided to configure the operation of the SIV
processing engines (analogous to "Generic Speech Recognition Properties"
defined in <a href="http://www.w3.org/TR/voicexml20/#dml6.3.2">VoiceXML 2.0
Section 6.3.2</a>).</p>
<h4><a id="S2_3_1_8" name="S2_3_1_8"></a>2.3.1.8 The SIV result MUST be
available in the result structure used by the host environment (e.g. VoiceXML
3.0, MMI).</h4>
<p>Note that this does not require EMMA in all cases, such as non-VoiceXML
3.0 environments. This also does not specify the version of EMMA.</p>
<h4><a id="S2_3_1_8_1" name="S2_3_1_8_1"></a>2.3.1.8.1 VoiceXML 3.0 SIV
result MUST be representable in EMMA.</h4>
<p>VoiceXML 3.0 must specify the format of the result structure and version
of EMMA.</p>
<h4><a id="S2_3_1_9" name="S2_3_1_9"></a>2.3.1.9 SIV syntax SHOULD adhere to
the W3C guidelines for security handling.</h4>
<p>This includes:</p>
<ul>
<li>XML encryption</li>
<li>XML signature processing,</li>
<li>possibly TLS or non-XML security, such as the NIST SP 800-63 guideline
for remote authentication.</li>
</ul>
<p>The following security aspects are out-of-charter for VoiceXML:<br />
</p>
<ul>
<li>The security administrator's interface</li>
<li>Whether security aspects may be modified by the security
administrators</li>
<li>Requirements for securing the SIV data</li>
</ul>
<h4><a id="S2_3_1_11" name="S2_3_1_11"></a>2.3.1.11 SIV features MUST support
enrollment.</h4>
<p>Enrollment is the process of collecting voice samples from a person and
the subsequent generation and storage of voice reference models associated
with that person.</p>
<h4><a id="S2_3_1_12" name="S2_3_1_12"></a>2.3.1.12 SIV features MUST support
verification.</h4>
<p>Verification is the process of comparing an utterance against a single
reference model based on a single claimed identity (e.g., user ID, account
number). A verification result includes both a score and a decision.</p>
<h4><a id="S2_3_1_13" name="S2_3_1_13"></a>2.3.1.13 SIV features MUST support
identification.</h4>
<p>Identification is verification with multiple identity claims. An
identification result includes both the verification results for all of the
individual identity claims, and the identifier of a single reference model
that matches the input utterance best.</p>
<h4><a id="S2_3_1_14" name="S2_3_1_14"></a>2.3.1.14 SIV features SHOULD
support supervised adaptation.</h4>
<p>The application should have control over whether a voice model is updated
or modified based on the results of a verification.<br />
</p>
<h4><a id="S2_3_1_15" name="S2_3_1_15"></a>2.3.1.15 SIV features MUST support
concurrent SIV processing.</h4>
<p>An application developer must be able to specify at the individual turn
level that one or more of the following types of processing need to be
performed concurrently:</p>
<ul>
<li>ASR</li>
<li>Audio recording</li>
<li>Buffering (SIV)</li>
<li>Authentication (SIV)</li>
<li>Enrollment (SIV)</li>
<li>Adaptation (SIV)</li>
</ul>
Note: "Concurrent" means at the dialog specification level. A platform may
choose to implement these functions sequentially.
<h4><a id="S2_3_1_15_1" name="S2_3_1_15_1"></a>2.3.1.15.1 SIV features SHOULD
support other concurrent audio processing.</h4>
<p>Concurrent processing of other forms of audio processing (e.g., channel
detection, gender detection) should also be permitted but remain optional.</p>
<h4><a id="S2_3_1_16" name="S2_3_1_16"></a>2.3.1.16 SIV features MUST be able
to accept text from the application for presentation to the user.</h4>
<p>Text-prompted SIV applications require prompts to match the expected
response. The application is responsible for the content of the dialog but
VoiceXML is responsible for the presentation.</p>
<h4><a id="S2_3_1_16_1" name="S2_3_1_16_1"></a>2.3.1.16.1 SIV SHOULD be
architecturally agnostic</h4>
<p>Many different SIV processing technologies exist. The VoiceXML 3.0 SIV
architecture should avoid dependencies upon specific engine technologies.</p>
<h3><a id="funct-event" name="funct-event">2.4 External Event handling while
a dialog is in progress (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->
<h4><a id="S2_4_1" name="S2_4_1"></a>2.4.1 It MUST be possible for external
entities to inject events into running dialogs. The dialog author MUST be
able to control when such events are processed and what actions are taken
when they are processed.</h4>
<h4><a id="S2_4_2" name="S2_4_2"></a>2.4.2 Among the possible results of
processing such events MUST be pausing, resuming, and terminating the dialog.
The VoiceXML 3.0 specification MAY define default handlers for certain such
external events.</h4>
<h4><a id="S2_4_3" name="S2_4_3"></a>2.4.3 It MUST be possible for running
dialogs to send events into the <a
href="http://www.w3.org/TR/mmi-arch/">Multimodal Interaction
Framework.</a></h4>
<h3><a id="funct-pls" name="funct-pls"></a>2.5 <a
href="http://www.w3.org/TR/pronunciation-lexicon/">Pronunciation Lexicon
Specification (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: There was some discussion in Orlando F2F on being -->
<!-- able to define lexicons using normal scoping rules, but there was no -->
<!-- agreement reached</span> </p> -->
<h4><a id="S2_5_1" name="S2_5_1"></a>2.5.1 The author MUST be able to define
lexicons that span an entire VoiceXML application.</h4>
<h3><a id="funct-emma" name="funct-emma"></a>2.6 <a
href="http://www.w3.org/TR/emma/">EMMA Specification (must have)</a></h3>
<h4><a id="S2_6_1_" name="S2_6_1_"></a>2.6.1. The application author MUST be
able to specify the preferred format of the input result within VoiceXML. If
not specified, the default format is EMMA.</h4>
<h4><a id="S2_6_2" name="S2_6_2"></a>2.6.2 All available semantic information
(ie. content that could have meaning) from the input MUST be accessible to
the application author. This result MUST be navigable by the application
author.</h4>
<p>The exact form of navigation will depend on the format and decisions
around the preferred data model made by the working group. If the result is a
string, string processing functions are expected to be available. If the
result is an XML document, DOM or E4X-like functions are expected to be
supported.</p>
<h4><a id="S2_6_3_" name="S2_6_3_"></a>2.6.3. VoiceXML 3 (or profiles) MUST
describe how the default result format is mapped into the application's data
model.</h4>
<p>VoiceXML 3 will declare one or more mandatory result formats.</p>
<h4><a id="S2_6_4" name="S2_6_4"></a>2.6.4 The application author SHOULD be
able to specify specific result content not to be logged.</h4>
<p>This will allow the author to prevent logging of confidential or sensitive
information.</p>
<h3><a id="funct-upload" name="funct-upload">2.7 Synchronous Upload of
Recordings (must have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->
<h4><a id="S2_7_1" name="S2_7_1"></a>2.7.1 VoiceXML 3.0 MUST enable
synchronous uploads of recordings while the recording is in progress</h4>
<h4><a id="S2_7_1_1" name="S2_7_1_1"></a>2.7.1.1 It MUST be possible to
specify the upload destination of the recording in the <record>
element</h4>
<h4><a id="S2_7_1_2" name="S2_7_1_2"></a>2.7.1.2 The upload destination MUST
be an HTTP URI</h4>
<h4><a id="S2_7_1_3" name="S2_7_1_3"></a>2.7.1.3 The application developer
MAY specify HTTP PUT or HTTP POST as the recording upload method</h4>
<h4><a id="S2_7_1_4" name="S2_7_1_4"></a>2.7.1.4 This feature MUST be
backward compatible with VoiceXML 2.0/2.1 record functionality</h4>
<h3><a id="funct-speed" name="funct-speed">2.8 Speed Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->
<h4><a id="S2_8_1" name="S2_8_1"></a>2.8.1 It MUST be possible for user input
to change the speed of media output playback.</h4>
<h4><a id="S2_8_2" name="S2_8_2"></a>2.8.2 It MUST be possible to map the
values for speed control to the rate attribute of prosody</h4>
<h4><a id="S2_8_3" name="S2_8_3"></a>2.8.3 Values for speed controls MAY be
specified as properties which follow the standard VoiceXML scoping model.
Default values are specified at session scope. Values specified on the
control element take priority over inherited properties.</h4>
<h3><a id="funct-volume" name="funct-volume">2.9 Volume Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->
<h4><a id="S2_9_1" name="S2_9_1"></a>2.9.1 It MUST be possible for user input
to change the volume of media output playback.</h4>
<h4><a id="S2_9_1_1" name="S2_9_1_1"></a>2.9.1.1 Values for volume controls
MAY be specified as properties which follow the standard VoiceXML scoping
model. Default values are specified at session scope. Values specified on the
control element take priority over inherited properties.</h4>
<h4><a id="S2_9_1_2" name="S2_9_1_2"></a>2.9.1.2 It MUST be possible to map
the values for volume control to the volume attribute of prosody in SSML.</h4>
<h3><a id="funct-record" name="funct-record">2.10 Media Recording (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Ken Rehor</span></p> -->
<h4><a id="S2_10_1" name="S2_10_1"></a>2.10.1 Recording Modes</h4>
<p>Form item recording mode (Requirements section 2.10.1.1 and 2.10.1.2)
captures media from the caller (only) during the collect phase of a dialog.
Partial- and Whole-Session recording captures media from the caller, system,
and/or called party (in the cases of a transferred endpoint) in a
multichannel or single (mixed) channel recording. Duration of these
recordings depends on the type.</p>
<h4><a id="S2_10_1_1" name="S2_10_1_1"></a>2.10.1.1 Form Item equivalent
(e.g. VoiceXML 2.0 <record>)</h4>
<!-- <span class="note">Note: Audio endpointing controls are defined in Section
2.10.3.</span> -->
<h4><a id="S2_10_1_1_1" name="S2_10_1_1_1"></a>2.10.1.1.1 VoiceXML 3.0 MUST
be able to record input from a user.</h4>
<h4><a id="S2_10_1_2" name="S2_10_1_2"></a>2.10.1.2 Utterance Recording</h4>
<!-- <span class="note">Note: Should this be generalized to handle other media -->
<!-- like video?<br /> -->
<!-- Note: Should this be supported in the case of DTMF-only?</span> -->
<p>Utterance recording mode is recording that occurs during an ASR or SIV
form item. The audio may be endpointed, usually by the speech engine.</p>
<h4><a id="S2_10_1_2_1" name="S2_10_1_2_1"></a>2.10.1.2.1 VoiceXML 3.0 MUST
support recording of a user's utterance during an form item
[recordutterance]</h4>
<h4><a id="S2_10_1_2_2" name="S2_10_1_2_2"></a>2.10.1.2.2 VoiceXML 3.0 MUST
support the control of utterance recording via a <property>.</h4>
<h4><a id="S2_10_1_2_3" name="S2_10_1_2_3"></a>2.10.1.2.3 VoiceXML 3.0 MUST
support the control of utterance recording via an attribute on input
items.</h4>
<h4><a id="S2_10_1_3" name="S2_10_1_3"></a>2.10.1.3 Session Recording</h4>
<p>Session recording begins with a start command. It continues until:</p>
<ul>
<li>a pause command; a resume command continues recording;</li>
<li>a stop command;</li>
<li>the end of the VoiceXML session;</li>
<li>an error occurs.</li>
</ul>
<p>Recording configuration and parameter requirements are defined in Section
2.10.2.</p>
<h4><a id="S2_10_1_3_1" name="S2_10_1_3_1"></a>2.10.1.3.1 VoiceXML 3.0 MUST
be able to record part of a VoiceXML session.</h4>
<h4><a id="S2_10_1_3_2" name="S2_10_1_3_2"></a>2.10.1.3.2 VoiceXML 3.0 MUST
be able to record an entire dialog.</h4>
<h4><a id="S2_10_1_4" name="S2_10_1_4"></a>2.10.1.4 Restricted Session
Recording</h4>
<p>Restricted session recording begins with a start command and continues
until:</p>
<ul>
<li>the end of the session;</li>
<li>an error occurs.</li>
</ul>
<p>See Table 1 for applicable controls.</p>
<h4><a id="S2_10_1_5" name="S2_10_1_5"></a>2.10.1.5 Multiple instances</h4>
<h4><a id="S2_10_1_5_1" name="S2_10_1_5_1"></a>2.10.1.5.1 VoiceXML 3.0 MUST
be able to support multiple simultaneous recordings of different types during
a call.</h4>
<h4><a id="S2_10_2_" name="S2_10_2_"></a>2.10.2. Recording Configuration and
Parameters</h4>
<p>This matrix specifies which features apply to which recording types.</p>
<table style="text-align: left; width: 722px; height: 166px;" border="1"
cellpadding="1" cellspacing="0">
<tbody>
<tr>
<td>Feature Requirement /<br />
Recording type</td>
<td>Dialog</td>
<td>Utterance</td>
<td>Session</td>
<td>Restricted<br />
Session</td>
</tr>
<tr>
<td>2.10.2.1 Recording starts when caller begins speaking</td>
<td>Y</td>
<td>Y</td>
<td>N</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.2 Initial silence interval cancels recording</td>
<td>Y</td>
<td>N</td>
<td>N</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.3 Final silence ends recording</td>
<td>Y</td>
<td>N</td>
<td>N</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.4 Maximum recording time</td>
<td>Y</td>
<td>N</td>
<td>N</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.5 Terminate recording with DTMF input</td>
<td>Y</td>
<td>N</td>
<td>N</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.6 Grammar control: modal operation</td>
<td>Y</td>
<td>N</td>
<td>N</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.7 Media format</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
<td>Y</td>
</tr>
<tr>
<td>2.10.2.8 Recording indicator</td>
<td>N</td>
<td>N</td>
<td>Y</td>
<td>N</td>
</tr>
<tr>
<td>2.10.2.9 Channel assignment</td>
<td>N</td>
<td>N</td>
<td>Y</td>
<td>Y</td>
</tr>
<tr>
<td>2.10.2.10 Channel groups</td>
<td>N</td>
<td>N</td>
<td>Y</td>
<td>Y</td>
</tr>
<tr>
<td>2.10.2.11 Buffer control</td>
<td>Y</td>
<td>Y</td>
<td>N</td>
<td>N</td>
</tr>
</tbody>
</table>
<p>Table 1: Recording Configuration and Parameter Application</p>
<p>(Attributes from VoiceXML 2.0 are indicated in brackets [].)</p>
<h4><a id="S2_10_2_1" name="S2_10_2_1"></a>2.10.2.1 Recording starts when
caller begins speaking</h4>
<p>VoiceXML 3.0 must support dynamic start-of-recording based on when a
caller starts to speak</p>
<p>Voice Activity Detection used to determine when to initiate
recording. This feature can be disabled.</p>
<h4><a id="S2_10_2_2" name="S2_10_2_2"></a>2.10.2.2 Initial silence interval
cancels recording</h4>
<p>VoiceXML 3.0 must support specification of an interval of silence at the
beginning of the recording cycle to terminate recording [timeout].</p>
<p>A noinput event will be thrown if no audio is collected.</p>
<h4><a id="S2_10_2_3" name="S2_10_2_3"></a>2.10.2.3 Final silence ends
recording</h4>
<p>VoiceXML 3.0 must support specification of an interval of silence that
indicates end of speech to terminate recording [finalsilence]</p>
<p>Voice Activity Detection used to determine when to stop recording. This
feature can be disabled.</p>
<p>Finalsilence interval may be used to specify the amount of silent audio to
be removed from the recording.</p>
<h4><a id="S2_10_2_4" name="S2_10_2_4"></a>2.10.2.4 Maximum recording
time</h4>
<p>VoiceXML 3.0 must support specification of the maximum allowable recording
time [maxtime].</p>
<h4><a id="S2_10_2_5" name="S2_10_2_5"></a>2.10.2.5 Terminate recording via
DTMF input</h4>
<p>VoiceXML 3.0 must provide a mechanism to control DTMF termination of an
active record [dtmfterm]</p>
<h4><a id="S2_10_2_6" name="S2_10_2_6"></a>2.10.2.6 Grammar control: Modal
operation</h4>
<h4><a id="S2_10_2_6_1" name="S2_10_2_6_1"></a>2.10.2.6.1 VoiceXML 3.0 MUST
provide a mechanism to control whether non-local DTMF grammars are active
during recording [modal]</h4>
<h4><a id="S2_10_2_6_2" name="S2_10_2_6_2"></a>2.10.2.6.2 VoiceXML 3.0 MUST
provide a mechanism to control whether non-local speech recognition grammars
are active during recording [modal]</h4>
<h4><a id="S2_10_2_7" name="S2_10_2_7"></a>2.10.2.7 Media format</h4>
<p>VoiceXML 3.0 must enable specification of the media type of the recording
[type]</p>
<h4><a id="S2_10_2_8" name="S2_10_2_8"></a>2.10.2.8 Recording Indicator</h4>
<h4><a id="S2_10_2_8_1" name="S2_10_2_8_1"></a>2.10.2.8.1 VoiceXML 3.0 MUST
optionally support playing a beep tone to the user before recording begins.
[beep]</h4>
<h4><a id="S2_10_2_8_2" name="S2_10_2_8_2"></a>2.10.2.8.2 VoiceXML 3.0 MUST
optionally support displaying a visual indication to the user before
recording begins.</h4>
<h4><a id="S2_10_2_8_3" name="S2_10_2_8_3"></a>2.10.2.8.3 VoiceXML 3.0 MUST
optionally support displaying a visual indication to the user during
recording.</h4>
<p>Use cases:</p>
<ol>
<li>Display a countdown timer to indicate when recording will begin (could
be accomplished by playing a file immediately before the record
function)</li>
<li>Display an indicator while recording is active (e.g. full screen,
partial screen, icon, etc.)</li>
</ol>
<h4><a id="S2_10_2_9" name="S2_10_2_9"></a>2.10.2.9 Channel Assignment</h4>
<h4><a id="S2_10_2_9_1" name="S2_10_2_9_1"></a>2.10.2.9.1 VoiceXML 3.0 MUST
be able to record and store each media path independently.</h4>
<h4><a id="S2_10_2_9_2" name="S2_10_2_9_2"></a>2.10.2.9.2 VoiceXML 3.0 MUST
enable each media path to be recorded in the same multi-channel file.</h4>
<h4><a id="S2_10_2_9_3" name="S2_10_2_9_3"></a>2.10.2.9.3 VoiceXML 3.0 MUST
enable each media path to be recorded into separate files.</h4>
<h4><a id="S2_10_2_9_4" name="S2_10_2_9_4"></a>2.10.2.9.4 VoiceXML 3.0 MAY be
able to mix all voice paths into a single recording channel.</h4>
<h4><a id="S2_10_2_10" name="S2_10_2_10"></a>2.10.2.10 Channel Groups</h4>
<h4><a id="S2_10_2_10_1" name="S2_10_2_10_1"></a>2.10.2.10.1 One or more
channels within the same session MUST be controllable as a group.</h4>
<p>These groups can be used to simultaneously apply other recording controls
to more than one media channel (e.g. mute two channels simultaneously).
Applies whether channels are in same file or in separate files (implies
concept of group of channels *not* part of the same file).</p>
<p>A command to "start recording" must specify the details for that recording
session:</p>
<ul>
<li>media type</li>
<li>number of channels and channel assignment (e.g. channel x, group y
represented as a variable of the format x.y)</li>
<li>channel assignment</li>
<li>(specific parameters to be determined)</li>
</ul>
<h4><a id="S2_10_2_11" name="S2_10_2_11"></a>2.10.2.11 Buffer Controls</h4>
<h4><a id="S2_10_2_11_1" name="S2_10_2_11_1"></a>2.10.2.11.1 VoiceXML 3.0
MUST provide a mechanism to enable additional recording time before the start
of speaking ("pre" buffer)</h4>
<h4><a id="S2_10_2_11_2" name="S2_10_2_11_2"></a>2.10.2.11.2 VoiceXML 3.0
MUST provide a mechanism to enable specification of additional recording time
after the end of speaking ("post" buffer).</h4>
<h4><a id="S2_10_2_11_3" name="S2_10_2_11_3"></a>2.10.2.11.3 VoiceXML 3.0 MAY
provide a mechanism to enable specification of the pre and post recording
duration.</h4>
<p>The duration provided by the platform is up to the amount of audio the
application requested. If that amount of audio is not available, the platform
is required to provide the amount of audio that is available.</p>
<!-- <span class="note">Note: Should this feature be under developer or platform -->
<!-- control?</span> -->
<h4><a id="S2_10_3_1" name="S2_10_3_1"></a>2.10.3.1 Audio Muting</h4>
<h4><a id="S2_10_3_1_1" name="S2_10_3_1_1"></a>2.10.3.1.1 VoiceXML 3.0 MUST
enable muting of an audio recording at any time for a specified length of
time or until otherwise indicated to un-mute.</h4>
<h4><a id="S2_10_3_1_2" name="S2_10_3_1_2"></a>2.10.3.1.2 Audio to insert
while muting can optionally be specified via a URI.</h4>
<!-- <span class="note">Note: Issues arise if inserted audio is shorter than mute -->
<!-- duration.</span> -->
<h4><a id="S2_10_3_1_3" name="S2_10_3_1_3"></a>2.10.3.1.3 Optionally record
the mute duration either in the recorded data or in associated meta data
(e.g. a mark (out of band) or via a log channel or some other method)</h4>
<!-- <span class="note">Note: Is it a breach of security to keep track of the -->
<!-- mute/blank/pause duration?</span> -->
<h4><a id="S2_10_3_1_5" name="S2_10_3_1_5"></a>2.10.3.1.5 Mute MUST be
controllable for each channel independently.</h4>
<h4><a id="S2_10_3_1_6" name="S2_10_3_1_6"></a>2.10.3.1.6 Mute MUST be
controllable for all channels in a group.</h4>
<h4><a id="S2_10_3_2" name="S2_10_3_2"></a>2.10.3.2 Blanking</h4>
<h4><a id="S2_10_3_2_1" name="S2_10_3_2_1"></a>2.10.3.2.1 VoiceXML 3.0 MUST
enable blanking of a video recording at any time for a specified length of
time or until otherwise indicated to un-blank.</h4>
<h4><a id="S2_10_3_2_2" name="S2_10_3_2_2"></a>2.10.3.2.2 A video or still
image to replace video stream while blanking can be optionally specified via
a URI.</h4>
<h4><a id="S2_10_3_2_2_1" name="S2_10_3_2_2_1"></a>2.10.3.2.2.1 An error will
be thrown in the case of platforms that cannot handle the media type referred
to by the URI.</h4>
<h4><a id="S2_10_3_2_3" name="S2_10_3_2_3"></a>2.10.3.2.3 The media inserted
by default MUST be the same length as the blank duration.</h4>
<p>If video, repeat until un-blank.</p>
<h4><a id="S2_10_3_2_4" name="S2_10_3_2_4"></a>2.10.3.2.4 The video being
inserted MUST optionally be specified to span a length less than the actual
mute/un-mute duration.</h4>
<h4><a id="S2_10_3_2_5" name="S2_10_3_2_5"></a>2.10.3.2.5 Blanking MUST be
controllable separately from other media channels.</h4>
<h4><a id="S2_10_3_3" name="S2_10_3_3"></a>2.10.3.3 Grouped Blanking and
Muting</h4>
<h4><a id="S2_10_3_3_1" name="S2_10_3_3_1"></a>2.10.3.3.1 It MUST be possible
to simultaneously blank video and mute audio that are in the same media
group.</h4>
<h4><a id="S2_10_3_4" name="S2_10_3_4"></a>2.10.3.4 Pause and Resume</h4>
<h4><a id="S2_10_3_4_1" name="S2_10_3_4_1"></a>2.10.3.4.1 VoiceXML 3.0 MUST
enable a recording to be paused until explicitly restarted.</h4>
<h4><a id="S2_10_3_4_2" name="S2_10_3_4_2"></a>2.10.3.4.2 VoiceXML 3.0 MUST
enable an indicator to be optionally specified in the file to denote that
recording was paused, then resumed.</h4>
<h4><a id="S2_10_3_4_3" name="S2_10_3_4_3"></a>2.10.3.4.3 VoiceXML 3.0 MAY
optionally enable the notation of the pause duration either in the recorded
data or in associated meta data (e.g. a mark (out of band) or via a log
channel or some other method)</h4>
<p>The mechanism is platform-specific.</p>
<h4><a id="S2_10_3_5" name="S2_10_3_5"></a>2.10.3.5 Arbitrary Start, Stop,
Restart/append</h4>
<h4><a id="S2_10_3_5_1" name="S2_10_3_5_1"></a>2.10.3.5.1 VoiceXML 3.0 MUST
be able to start a recording at any time.</h4>
<h4><a id="S2_10_3_5_2" name="S2_10_3_5_2"></a>2.10.3.5.2 VoiceXML 3.0 MUST
be able to stop an active recording at any time.</h4>
<h4><a id="S2_10_3_5_3" name="S2_10_3_5_3"></a>2.10.3.5.3 VoiceXML 3.0 MUST
be able to restart / append to a previously active recording at any time.
(during the session via reference to the recording)</h4>
<h4><a id="S2_10_3_5_4" name="S2_10_3_5_4"></a>2.10.3.5.4 optionally record
the pause duration either in the recorded data or in associated meta data
(e.g. a mark (out of band) or via a log channel or some other method)</h4>
<p>Recording is available for playback or upload once a recording is
'stopped'.</p>
<p>If a recording was stopped and uploaded, then later appended, the
application will need to keep track of when to upload the new version.</p>
<h4><a id="S2_10_4_" name="S2_10_4_"></a>2.10.4. Media types</h4>
<h4><a id="S2_10_4_1" name="S2_10_4_1"></a>2.10.4.1 Audio recording</h4>
<h4><a id="S2_10_4_1_1" name="S2_10_4_1_1"></a>2.10.4.1.1 VoiceXML 3.0 MUST
be able to record an incoming audio stream.</h4>
<h4><a id="S2_10_4_2" name="S2_10_4_2"></a>2.10.4.2 Video recording</h4>
<h4><a id="S2_10_4_2_1" name="S2_10_4_2_1"></a>2.10.4.2.1 VoiceXML 3.0 MUST
support recording of an incoming video stream.</h4>
<h4><a id="S2_10_4_2_2" name="S2_10_4_2_2"></a>2.10.4.2.2 VoiceXML 3.0 MUST
support recording of an incoming video stream with synchronized audio.</h4>
<h4><a id="S2_10_4_3" name="S2_10_4_3"></a>2.10.4.3 Media Type
specification</h4>
<h4><a id="S2_10_4_3_1" name="S2_10_4_3_1"></a>2.10.4.3.1 VoiceXML 3.0 MUST
be able to set the format of the media type of the recording according to
IETF RFC 4288 [RFC4288].</h4>
<h4><a id="S2_10_4_4" name="S2_10_4_4"></a>2.10.4.4 Media formats and
codecs</h4>
<h4><a id="S2_10_4_4_1" name="S2_10_4_4_1"></a>2.10.4.4.1 VoiceXML 3.0 MUST
support specification of the media format and corresponding codec.</h4>
<h4><a id="S2_10_4_5" name="S2_10_4_5"></a>2.10.4.5 Platform support of media
types</h4>
<h4><a id="S2_10_4_5_1" name="S2_10_4_5_1"></a>2.10.4.5.1 VoiceXML 3.0
platforms MUST support all media types that are indicated as required by the
VoiceXML 3.0 Recommendation (types to be determined).</h4>
<p>Note: This does not mean all possible media types are supported on all
platforms.</p>
<h4><a id="S2_10_5_" name="S2_10_5_"></a>2.10.5. Media Processing</h4>
<h4><a id="S2_10_5_1" name="S2_10_5_1"></a>2.10.5.1 Media processing MAY
occur either in real-time or as a post-processing function.</h4>
<p>DEFAULT: specific to each processing type</p>
<h4><a id="S2_10_5_2" name="S2_10_5_2"></a>2.10.5.2 Tone Clamping</h4>
<p>Use cases:</p>
<ol>
<li>Voicemail terminated with DTMF.</li>
<li>Whole-session recording where DTMF input must be removed for privacy or
other reasons.</li>
</ol>
<h4><a id="S2_10_5_2_1" name="S2_10_5_2_1"></a>2.10.5.2.1 VoiceXML 3.0 MAY
optionally provide a means to specify if DTMF tones are to be removed from
the recording.</h4>
<p>DEFAULT: Tones are not removed from the recording</p>
<p>DEFAULT: If tone clamping is enabled, it is performed after recording has
completed (not in real-time).</p>
<h4><a id="S2_10_5_3" name="S2_10_5_3"></a>2.10.5.3 Audio Processing Mode</h4>
<h4><a id="S2_10_5_3_1" name="S2_10_5_3_1"></a>2.10.5.3.1 VoiceXML 3.0 MUST
optionally provide a means to specify if automatic audio level controls (e.g.
Dynamic Range Compression, Limiting, Automatic Gain Control (AGC), etc.) are
to be applied to the recording or if the recording is to be raw.</h4>
<p>DEFAULT: raw</p>
Editor's note: how to specify:
<ul>
<li>raw or processed</li>
<li>type of processing</li>
<li>parameters specific to each processor or implementation</li>
<li>multiple processing operations (?)</li>
<li>real-time or post-processing</li>
</ul>
<h4><a id="S2_10_6_" name="S2_10_6_"></a>2.10.6. Recording data</h4>
<h4><a id="S2_10_6_1" name="S2_10_6_1"></a>2.10.6.1 The following information
MUST be reported after recording has completed.</h4>
<ul>
<li>Recording duration in milliseconds</li>
<li>Recording size in bytes</li>
<li>DTMF terminating string if recording was terminated via DTMFTERM, or
DTMF input available in application.lastresult</li>
<li>Indication if recording was terminated due to reaching maxtime</li>
<li>Format of the recording, as specified by RFC 4288</li>
</ul>
<h4><a id="S2_10_7" name="S2_10_7"></a>2.10.7 Upload, Storage, Caching</h4>
<h4><a id="S2_10_7_1" name="S2_10_7_1"></a>2.10.7.1 Destination</h4>
<h4><a id="S2_10_7_1_1" name="S2_10_7_1_1"></a>2.10.7.1.1 VoiceXML 3.0 MUST
support specification of the destination of the recording buffer [dest].</h4>
<h4><a id="S2_10_7_3" name="S2_10_7_3"></a>2.10.7.3 A local cache of the
recording MUST be optionally available to the application (e.g. V2 semantics
of form item)</h4>
<h4><a id="S2_10_7_4" name="S2_10_7_4"></a>2.10.7.4 It MUST be possible to
specify the upload to be either a synchronous or asynchronous operation.</h4>
<h4><a id="S2_10_7_5" name="S2_10_7_5"></a>2.10.7.5 It MUST be possible to
select the upload to be available realtime, at the end of the call, or
indefinitely after the end of the call.</h4>
<h4><a id="S2_10_7_6" name="S2_10_7_6"></a>2.10.7.6 All modes other than
indefinite upload shall expose any errors in recording or upload to the
application.</h4>
<h4><a id="S2_10_8_" name="S2_10_8_"></a>2.10.8. Errors and Events</h4>
<p>Errors and events as a result of media recording must be presented to the
application</p>
<p>Examples of types of errors possibly reported:</p>
<ul>
<li>error.unsupported.format (the requested media type is not
supported)</li>
<li>error.unavailable.format (the requested media type is currently not
available)</li>
<li>error during upload</li>
<li>disk full, other disk errors</li>
<li>permissions:</li>
<li>error.noauthorization (or error.noresource if want it hidden from
potential attacker?<br />
</li>
</ul>
<h3><a id="funct-mediaformat" name="funct-mediaformat">2.11 Media
Formats</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: These were recently added on the 6/24/2008 -->
<!-- call.</span></p> -->
<h4><a id="CS2_10_8_" name="CS2_10_8_"></a>VoiceXML 3 MUST support these
categories of media capabilities:</h4>
<ul>
<li>Audio Basic: audio only, with header or not (e.g. RIFF or AU
header)</li>
<li>Audio Rich: audio (one or more channels), plus meta data (e.g. header,
marks, transcription, etc.)</li>
<li>Multi-media: one or more media channels (e.g. audio, video,images,
etc.) plus meta data (e.g. header, marks, transcription, etc.)</li>
</ul>
<p>This does not imply platform support requirements. For example, a
particular platform may support Audio Basic but not Audio Rich. Another might
support Audio Rich but not all meta data elements.</p>
<h3><a id="funct-datamodel" name="funct-datamodel">2.12 Data Model (must
have)</a></h3>
<p>TBD.</p>
<h3><a id="funct-submitprocessing" name="funct-submitprocessing">2.11 Submit
Processing (must have)</a></h3>
<p>TBD.</p>
<h2><a id="format-reqs" name="format-reqs">3. Format Requirements</a></h2>
<h3><a id="format-flow" name="format-flow">3.1 Flow Language (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->
<p>A flow control language will be developed in conjunction with VoiceXML 3.0
(i.e. <a href="http://www.w3.org/TR/scxml/">SCXML</a>)</p>
<h4><a id="S3_1_1" name="S3_1_1"></a>3.1.1 The flow control language will
allow the separation of business logic from media control and user
interaction.</h4>
<h4><a id="S3_1_2" name="S3_1_2"></a>3.1.2 The flow control language will be
able to invoke VoiceXML 3.0 scripts, passing data into them and receiving
results back when the scripts terminate.</h4>
<h4><a id="S3_1_3" name="S3_1_3"></a>3.1.3 The flow control language will be
suitable for use as an Interaction Manager in the Multimodal Architecture
Framework.</h4>
<h4><a id="S3_1_4" name="S3_1_4"></a>3.1.4 The flow control language will be
based on state-machine concepts.</h4>
<h4><a id="S3_1_5" name="S3_1_5"></a>3.1.5 The flow control language will be
able to receive asynchronous messages from external entities.</h4>
<h4><a id="S3_1_6" name="S3_1_6"></a>3.1.6 The flow control language will be
able to send messages to external entities.</h4>
<h4><a id="S3_1_7" name="S3_1_7"></a>3.1.7 The flow control language will not
contain any media-specific concepts such as ASR or TTS.</h4>
<h3><a id="format-semmod" name="format-semmod">3.2 Semantic Model Definition
(must have)</a></h3>
<!-- <p><span class="owner">Owner: Mike Bodell</span></p> -->
<h4><a id="S3_2_1" name="S3_2_1"></a>3.2.1 The precise semantics of all VXML
3.0 tags MUST be provided</h4>
<h4><a id="S3_2_2" name="S3_2_2"></a>3.2.2 The semantic model MUST be the
authoritative description of VXML 3.0 functionality</h4>
<h4><a id="S3_2_3" name="S3_2_3"></a>3.2.3 Different conformance profiles
MUST be possible, but they MUST be defined in terms of the semantic
model.</h4>
<h4><a id="S3_2_4" name="S3_2_4"></a>3.2.4 The semantic model descriptions of
VXML 3.0 MUST be able to express all of the functionality of VXML 2.1</h4>
<h4><a id="S3_2_5" name="S3_2_5"></a>3.2.5 Extensions to VXML 3.0 SHOULD be
able to build on the semantic model descriptions</h4>
<h2><a id="other-reqs" name="other-reqs">4. Other Requirements</a></h2>
<h3><a id="other-vxml" name="other-vxml">4.1 Consistent with other Voice
Browser Working Group specs (must have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett</span></p> -->
<h4><a id="S4_1_1" name="S4_1_1"></a>4.1.1 Wherever similar functionality to
that of another Voice Browser Working Group specification is available, this
language MUST use a syntax similar to that used in the relevant
specification.</h4>
<h4><a id="S4_1_2" name="S4_1_2"></a>4.1.2 For data that is likely to be
represented in another Voice Browser Working Group markup language (eg., SRGS
or EMMA) or used by another Voice Browser Working Group language, there MUST
be a clear definition of the mapping between the two data
representations.</h4>
<h4><a id="S4_1_3" name="S4_1_3"></a>4.1.3 It MUST be possible to pass
Internet-related document and server information (caching parameters,
xml:base, etc.) from this language to other VBWG language processors for
embedded VBWG languages.</h4>
<h3><a id="other-other" name="other-other">4.2 Consistent with other specs
(XML, MMI, I18N, Accessibility, MRCP, Backplane Activities) (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->
<h4><a id="S4_2_1" name="S4_2_1"></a>4.2.1 MRCP</h4>
<h4><a id="S4_2_1_1" name="S4_2_1_1"></a>4.2.1.1 This language MUST support a
profile that can be implemented using MRCPv2.</h4>
<h4><a id="S4_2_1_2" name="S4_2_1_2"></a>4.2.1.2 Where possible, this
language SHOULD remain compatible with MRCPv2 in terms of data formats (SRGS,
SSML).</h4>
<h4><a id="S4_2_2_" name="S4_2_2_"></a>4.2.2. <a
href="http://www.w3.org/TR/mmi-arch/">MMI</a></h4>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span></p> -->
<p>There must be at least one profile of VoiceXML 3.0 in which all of the
following requirements are supported.</p>
<h4><a id="S4_2_2_1" name="S4_2_2_1"></a>4.2.2.1 It MUST be possible for
VoiceXML 3.0 implementations to receive, process, and generate MMI life cycle
events. Some events maybe handled automatically, while others maybe under
author control.</h4>
<h4><a id="S4_2_2_2" name="S4_2_2_2"></a>4.2.2.2 VoiceXML 3.0 MUST provide a
way for the author to specify the exact functions required for the
application such that the platform can allocate the minimum necessary
resources.</h4>
<h4><a id="S4_2_2_3" name="S4_2_2_3"></a>4.2.2.3 VoiceXML 3.0 MUST be able to
provide EMMA-formatted information inside the data field of MMI life cycle
events.</h4>
<h4><a id="S4_2_2_4" name="S4_2_2_4"></a>4.2.2.4 VoiceXML 3.0 platforms MUST
specify one or more event I/O processors for interoperable exchange of life
cycle events. The Voice Browser Group requests public comment on what such
event processors should be or whether they should be part of the language at
all.</h4>
<h3><a id="other-simplify" name="other-simplify">4.3 Simplify Existing
VoiceXML Tasks (must have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->
<h4><a id="S4_3_1" name="S4_3_1"></a>4.3.1 This language MUST provide a
mechanism for authors to develop dialog managers (state-based, task-based,
rule-based, etc.) that are easily used and configured by other authors.</h4>
<h4><a id="S4_3_2" name="S4_3_2"></a>4.3.2 This language MUST provide
mechanisms to simplify authoring of these common tasks: (we need to collect a
list of common tasks)</h4>
<h3><a id="other-maintain" name="other-maintain">4.4 Maintain Functionality
from Previous VXML Versions</a></h3>
<h4><a id="S4_4_1" name="S4_4_1"></a>4.4.1 New features added in VoiceXML 3.0
MUST be backward compatible with previous VoiceXML versions</h4>
<h4><a id="S4_4_1_1" name="S4_4_1_1"></a>4.4.1.1 Functionality available in
VoiceXML 2.0 and VoiceXML 2.1 MUST be available in VoiceXML 3.0.</h4>
<h4><a id="S4_4_1_2" name="S4_4_1_2"></a>4.4.1.2 Applications written in
VoiceXML 2.0/2.1 MUST be portable to VoiceXML 3.0 without losing application
capabilities.</h4>
<h3><a id="other-crs" name="other-crs">4.5 Address Change Requests from
previous VoiceXML Versions (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Reviewed all deferred and open change requests from VXML -->
<!-- 2.0/2.1</span></p> -->
<h4><a id="S4_5_1" name="S4_5_1"></a>4.5.1 Deferred change requests from VXML
2.0 and 2.1 reevaluated for VXML 3.0</h4>
<p>In particular, the following deferred CRs reevaluated: R51, R92, R104,
R113, R145, R155, R156, R186, R230, R233, R348, R394, R528, R541, and
R565.</p>
<h4><a id="S4_5_2" name="S4_5_2"></a>4.5.2 Unassigned change requests from
VXML 2.0 and 2.1 reevaluated for VXML 3.0</h4>
<p>In particular, the following unassigned CRs reevaluated: R600, R614, R619,
R620, R622, R623, R624, R625, R626, R627, R628, R629, R631, and R632.</p>
<h2><a id="acknowledgments" name="acknowledgments">5. Acknowledgments</a></h2>
<p>TBD</p>
<h2><a id="prev-reqs" name="prev-reqs">Appendix A. Previous
Requirements</a></h2>
<p>The following requirements have been satisfied by previous Voice Browser
Working Group Specifications</p>
<h3><a id="A_1_1" name="A_1_1"></a>A.1.1 Audio Modality Input and Output
(must have) FULLY COVERED</h3>
<p>The markup language can specify which spoken user input is interpreted by
the voice browser, as well as the content rendered as spoken output by the
voice browser.</p>
<h4><a id="CA_1_1" name="CA_1_1"></a>Requirement Coverage</h4>
<p>Audio output: <prompt>, <audio> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p>Audio input: <grammar> <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
<h3><a id="A_1_2" name="A_1_2"></a>A.1.2 Sequential multi-modal Input (must
have) FULLY COVERED</h3>
<p>The markup language specifies that user input from multiple modalities is
to be interpreted by the voice browser. There is no requirement that the
input modalities are simultaneously active. For example, a voice browser
interpreting the markup language in a telephony environment could accept DTMF
input in one dialog state, and spoken input in another.</p>
<h4><a id="CA_1_2" name="CA_1_2"></a>Requirement Coverage</h4>
<p><grammar> mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
<h3><a id="A_1_3" name="A_1_3"></a>A.1.3 Unco-ordinated, Simultaneous,
Multi-modal Input (should have) FULLYCOVERED</h3>
<p>The markup language specifies that user input from different modalities is
to be interpreted at the same time. There is no requirement that
interpretation of the input modalities are co-ordinated. For example, a voice
browser in a desktop environment could accept keyboard input or spoken input
in same dialog state.</p>
<h4><a id="CA_1_3" name="CA_1_3"></a>Requirement Coverage</h4>
<p><grammar> mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
<p><field> defining multiple <grammar>s with different mode
attribute values <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_1_4" name="A_1_4"></a>A.1.4 Co-ordinated, Simultaneous
Multi-modal Input (nice to have) FULLYCOVERED</h3>
<p>The markup language specifies that user input from multiple modalities is
interpreted at the same time and that interpretation of the inputs are
co-ordinated by the voice browser. For example, in a telephony environment,
the user can type<em>200</em> on the keypad and say <em>transfer to checking
account</em> and the interpretations are co-ordinated so that they are
understood as <em>transfer 200 to checking account</em>.</p>
<h4><a id="CA_1_4" name="CA_1_4"></a>Requirement Coverage</h4>
<p><grammar> mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
<p><field> defining multiple <grammar>s with different mode
attribute values <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_1_5" name="A_1_5"></a>A.1.5 Sequential multi-modal Output (must
have) FULLY COVERED</h3>
<p>The markup language specifies that content is rendered in multiple
modalities by the voice browser. There is no requirement the output
modalities are rendered simultaneously. For example, a voice browser could
output speech in one dialog state, and graphics in another.</p>
<h4><a id="CA_1_5" name="CA_1_5"></a>Requirement Coverage</h4>
<p><prompt>, <audio> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_1_6" name="A_1_6"></a>A.1.6 Unco-ordinated, Simultaneous,
Multi-modal Output (nice to have)FULLY COVERED</h3>
<p>The markup language specifies that content is rendered in multiple
modalities at the same time. There is no requirement the rendering of output
modalities are co-ordinated. For example, a voice browser in a desktop
environment could display graphics and provide audio output at the same
time.</p>
<h4><a id="CA_1_6" name="CA_1_6"></a>Requirement Coverage</h4>
<p><prompt>, <audio> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_1_7" name="A_1_7"></a>A.1.7 Co-ordinated, Simultaneous
Multi-modal Output (nice to have) FULLYCOVERED</h3>
<p>The markup language specifies that content is to be simultaneously
rendered in multiple modalities and that output rendering is co-ordinated.
For example, graphical output on a cellular telephone display is co-ordinated
with spoken output.</p>
<h4><a id="CA_1_7" name="CA_1_7"></a>Requirement Coverage</h4>
<p><prompt>, <audio> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_1" name="A_2_1"></a>A.2.1 Mixed Initiative: Form Level (must
have) FULLY COVERED</h3>
<p>Mixed initiative refers to dialog where one participant take the
initiative by, for example, asking a question and expects the other
participant to respond to this initiative by, for example, answering the
question. The other participant, however, responds instead with an initiative
by asking another question. Typically, the first participant then responds to
this initiative, before the second participant responds to the original
initiative. This behavior is illustrated below:<br />
<br />
<em>S-A1: When do you want to fly to Paris?<br />
U-B1: What did you say?<br />
S-B2: I said when do you want to fly to Paris?<br />
U-A2: Tuesday.</em></p>
<p>where A1 is responded to in A2 after a nested interaction, or sub-dialog
in B1 and B2. Note that the B2 response itself could have been another
initiative leading to further nesting of the interaction.</p>
<p>The form-level mixed initiative requirement is that the markup language
can specify to the voice browser that it can take the initiative when user
expects a response, and also allow the user to take the initiative when it
expects a response where the content of these initiatives is relevant to the
task at hand, contains navigation instructions or concerns general
meta-communication issues. This mixed initiative requirement is particularly
important when processing form input (hence the name) and is further
elaborated in requirements A.2.1.1, A.2.1.2, A.2.1.3 and A.2.1.4 below.</p>
<h4><a id="CA_2_1" name="CA_2_1"></a>Requirement Coverage</h4>
<p><field> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><noinput>, <nomatch> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h4><a id="A_2_1_1" name="A_2_1_1"></a>A.2.1.1 Clarification Subdialog (must
have) FULLY COVERED</h4>
<p>The markup language can specify that a clarification sub-dialog should be
performed when the user provides incomplete, form-related information. For
example, in a flight enquiry service, the departure city and date may be
required but the user does not always provide all the information at once:<br
/>
<br />
<em>S1: How can I help you?<br />
U1: I want to fly to Paris.<br />
S2: When?<br />
U1: Monday</em></p>
<p>U1 is incomplete (or 'underinformative') with respect to the service (or
form) and the system then initiates a sub-dialog in S2 to collect the
required information. If additional parameters are required, further
sub-dialogs may be initiated.</p>
<h4><a id="CA_2_1_1" name="CA_2_1_1"></a>Requirement Coverage</h4>
<p><initial>, <field> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h4><a id="A_2_1_2" name="A_2_1_2"></a>A.2.1.2 Confirmation Subdialog (must
have) FULLY COVERED</h4>
<p>The markup language can specify that a confirmation sub-dialog is to be
performed when the confidence associated with the interpretation of the user
input is too low.<br />
<br />
<em>U1: I want to fly to Paris.<br />
S1: Did you say 'I want a fly to Paris'?<br />
U2: Yes.<br />
S2: When?<br />
U3: ...</em></p>
<p>Note confirmation sub-dialogs take precedence over clarification
sub-dialogs.</p>
<h4><a id="CA_2_1_2" name="CA_2_1_2"></a>Requirement Coverage</h4>
<p><field> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><i>name$</i>.confidence shadow variable <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h4><a id="A_2_1_3" name="A_2_1_3"></a>A.2.1.3 Over-informative Input:
corrective (must have) FULLY COVERED</h4>
<p>The markup language can specify that unsolicited user input in a
sub-dialog which corrects earlier input is to be interpreted appropriately.
For example, in a confirmation sub-dialog users may provide corrective
information relevant to the form:<br />
<br />
<em>S1: Did you say you wanted to travel from Paris?<br />
U1: No, from Perros.</em> (modification) <em><br />
U1': Yes, from Paris</em> (repetition)</p>
<h4><a id="CA_2_1_3" name="CA_2_1_3"></a>Requirement Coverage</h4>
<p><field> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p>$GARBAGE rule <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
<h4><a id="A_2_1_4" name="A_2_1_4"></a>A.2.1.4 Over-informative Input:
additional (nice to have) FULLYCOVERED</h4>
<p>The markup language can specify that unsolicited user input in a
sub-dialog which is not corrective but additional, relevant information for
the current form is to be interpreted appropriately. For example, in a
confirmation sub-dialog users may provide additional information relevant to
the form:<br />
<em>S1: Did you say you wanted to travel from Paris?<br />
U1: Yes, I want to fly to Paris on Monday around 11.30</em></p>
<h4><a id="CA_2_1_4" name="CA_2_1_4"></a>Requirement Coverage</h4>
<p><initial>, <field> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p>form level <grammar>s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>
<h3><a id="A_2_2" name="A_2_2"></a>A.2.2 Mixed Initiative: Task Level (must
have) FULLY COVERED</h3>
<p>The markup language needs to address mixed initiative in dialogs which
involve more than one task (or topic). For example, a portal service may
allow the user to interact with a number of specific services such as car
hire, hotel reservation, flight enquiries, etc, which may be located on the
different web sites or servers. This requirement is further elaborated in
requirements A.2.2.1, A.2.2.2, A.2.2.3, A.2.2.4 and A.2.2.5 below.</p>
<h4><a id="A_2_2_1" name="A_2_2_1"></a>A.2.2.1 Explicit Task Switching (must
have) FULLY COVERED</h4>
<p>The markup language can specify how users can explicitly switch from one
task to another. For example, by means of a set of global commands which are
active in all tasks and which take the user to a specific task; e.g. <em>Take
me to car hire</em>, <em>Go to hotel reservations</em>.</p>
<h4><a id="CA_2_2_1" name="CA_2_2_1"></a>Requirement Coverage</h4>
<p><link>, <goto>, <submit> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p>form level <grammar>s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>
<h4><a id="A_2_2_2" name="A_2_2_2"></a>A.2.2.2 Implicit Task Switching
(should have) FULLY COVERED</h4>
<p>The markup language can specify how users can implicitly switch from one
task to another. For example, by means of simply uttering a phrases relevant
to another task; <em>I want to reserve a McLaren F1 in Monaco next
Wednesday</em>.</p>
<h4><a id="CA_2_2_2" name="CA_2_2_2"></a>Requirement Coverage</h4>
<p><link>, <goto>, <submit> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p>form level <grammar>s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>
<h4><a id="A_2_2_3" name="A_2_2_3"></a>A.2.2.3 Manual Return from Task Switch
(must have) FULLY COVERED</h4>
<p>The markup language can specify how users can explicitly return to a
previous task at any time. For example, by means of global task navigation
commands such as <em>previous task</em>.</p>
<h4><a id="CA_2_2_3" name="CA_2_2_3"></a>Requirement Coverage</h4>
<p><link>, <goto> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h4><a id="A_2_2_4" name="A_2_2_4"></a>A.2.2.4 Automatic Return from Task
Switch (should have) FULLY COVERED</h4>
<p>The markup language can specify that users can automatically return to the
previous task upon completion or explicit cancellation of the current
task.</p>
<h4><a id="CA_2_2_4" name="CA_2_2_4"></a>Requirement Coverage</h4>
<p><link>, <goto> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h4><a id="A_2_2_5" name="A_2_2_5"></a>A.2.2.5 Suspended Tasks (should have)
FULLY COVERED</h4>
<p>The markup language can specify that when task switching occurs the
previous task is suspended rather than canceled. Thus when the user returns
to the previous task, the interaction is resumed at the point it was
suspended.</p>
<h4><a id="CA_2_2_5" name="CA_2_2_5"></a>Requirement Coverage</h4>
<p><link> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_3" name="A_2_3"></a>A.2.3 Help Behavior (should have) FULLY
COVERED</h3>
<p>The markup language can specify help information when requested by the
user. Help information should be available in all dialog states.<br />
<em>S1: How can I help you?<br />
U1: What can you do?<br />
S2: I can give you flight information about flights between major cities
world-wide just like a travel agent. How can I help you?<br />
U1: I want a flight to Paris ...</em><br />
</p>
<p>Help information can be tapered so that it can be elaborated upon on
subsequent user requests.</p>
<h4><a id="CA_2_3" name="CA_2_3"></a>Requirement Coverage</h4>
<p><help> using count attribute for tapering <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_4" name="A_2_4"></a>A.2.4 Error Correction Behavior (must
have) FULLY COVERED</h3>
<p>The markup language can specify how error events generated by the voice
browser are to be handled. For example, by initiating a sub-dialog to
describe and correct the error:<br />
<em>S1: How can I help you?<br />
U1: <audio but no interpretation><br />
S2: Sorry, I didn't understand that. Where do you want to travel to?<br />
U2: Paris</em></p>
<p>The markup language can specify how specific types of errors encountered
in spoken dialog, e.g. no audio, too loud/soft, no interpretation, no audio,
internal error, etc, are to be handled as well as providing a general 'catch
all' method.</p>
<h4><a id="CA_2_4" name="CA_2_4"></a>Requirement Coverage</h4>
<p><error>, <nomatch>, <noinput>, <catch> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_5" name="A_2_5"></a>A.2.5 Timeout Behavior (must have) FULLY
COVERED</h3>
<p>The markup language can specify what to do when the voice browser times
out waiting for input; for example, a timeout event can be handled by
repeating the current dialog state:<br />
<em>S1: Did you say Monday?<br />
U1: <timeout><br />
S2: Did you say Monday?</em><br />
</p>
<p>Note that the strategy may be dependent upon the environment; in a desktop
environment, repetition for example may be irritating.</p>
<h4><a id="CA_2_5" name="CA_2_5"></a>Requirement Coverage</h4>
<p><noinput>, <catch> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_6" name="A_2_6"></a>A.2.6 Meta-Commands (should have) FULLY
COVERED</h3>
<p>The markup language specifies a set of meta-command functions which are
available in all dialog states; for example, repeat, cancel, quit, operator,
etc.</p>
<p>The precise set of meta-commands will be co-ordinated with the Telephony
Speech Standards Committee.</p>
<p>The markup language should specify how the scope of meta-commands like
'cancel' is resolved.</p>
<h4><a id="CA_2_6" name="CA_2_6"></a>Requirement Coverage</h4>
<p>Universal Grammars <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_7" name="A_2_7"></a>A.2.7 Barge-in Behavior (should have)
FULLY COVERED</h3>
<p>The markup language specifies when the user is able to bargein on the
system output, and when it is not allowed.</p>
<p>Note: The output device may generate timestamped events when barge-in
occurs (see 3.9).</p>
<h4><a id="CA_2_7" name="CA_2_7"></a>Requirement Coverage</h4>
<p>bargein property <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_8" name="A_2_8"></a>A.2.8 Call Transfer (should have) FULLY
COVERED</h3>
<p>The markup language specifies a mechanism to allow transfer of the caller
to another line in a telephony environment. For example, in cases of dialog
breakdown, the user can be transferred to an operator (cf. 'callto' in HTML).
The markup language also provides a mechanism to deal with transfer failures
such as when the called line is busy or engaged.</p>
<h4><a id="CA_2_8" name="CA_2_8"></a>Requirement Coverage</h4>
<p><transfer> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><createcall>, <redirect> <a
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>
<h3><a id="A_2_9" name="A_2_9"></a>A.2.9 Quit Behavior (must have) FULLY
COVERED</h3>
<p>The markup language provides a mechanism to terminate the session (cf.
user-terminated sessions via a 'quit' meta-command in 2.6).</p>
<h4><a id="CA_2_9" name="CA_2_9"></a>Requirement Coverage</h4>
<p>Universal Grammars <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_2_10" name="A_2_10"></a>A.2.10 Interaction with External
Components (must have) FULLY COVERED</h3>
<p>The markup language must support a generic component interface to allow
for the use of external components on the client and/or server side. The
interface provides a mechanism for transferring data between the markup
language's variables and the component. Examples of such data are:
configuration parameters (such as timeouts), and events for data input and
error codes. Except for event handling, a call to an external component does
not directly change the dialog state, i.e. the dialog continues in the state
from which the external component was called.</p>
<p>Examples of external components are pre-built dialog components and server
scripts. Pre-built dialogs are further described in Section A.3.3. Server
scripts can be used to interact with remote services, devices or
databases.</p>
<h4><a id="CA_2_10" name="CA_2_10"></a>Requirement Coverage</h4>
<p><property> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><submit> namelist attribute, <submit>, <goto> query
string <a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
2.0</a></p>
<h3><a id="A_3_1" name="A_3_1"></a>A.3.1 Ease of Use (must have) FULLY
COVERED</h3>
<p>The markup language should be easy for designers to understand and author
without special tools or knowledge of vendor technology or protocols (dialog
design knowledge is still essential).</p>
<h4><a id="CA_3_1" name="CA_3_1"></a>Requirement Coverage</h4>
<p>Form Interpretation Algorithm (FIA) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_2" name="A_3_2"></a>A.3.2 Simplicity and Power (must have)
FULLY COVERED</h3>
<p>The markup language allows designers to rapidly develop simple dialogs
without the need to worry about interactional details but also allow
designers to take more control over interaction to develop complex
dialogs.</p>
<h4><a id="CA_3_2" name="CA_3_2"></a>Requirement Coverage</h4>
<p>Form Interpretation Algorithm (FIA) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_3" name="A_3_3"></a>A.3.3 Support for Modularity and Re-use
(should have) FULLY COVERED</h3>
<p>The markup language complies with the requirements of the Reusable Dialog
Components Subgroup.</p>
<p>The markup language can specify a number of pre-built dialog components.
This enables one to build a library of reusable 'dialogs'. This is useful for
handling both application specific input types, such as telephone numbers,
credit card number, etc as well as those that are more generic, such as
times, dates, numbers, etc.</p>
<h4><a id="CA_3_3" name="CA_3_3"></a>Requirement Coverage</h4>
<p><subdialog> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_4" name="A_3_4"></a>A.3.4 Naming (must have) FULLY COVERED</h3>
<p>Dialogs, states, inputs and outputs can be referenced by a URI in the
markup language.</p>
<h4><a id="CA_3_4" name="CA_3_4"></a>Requirement Coverage</h4>
<p><form> id attribute, form item name attribute <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_5" name="A_3_5"></a>A.3.5 Variables (must have) FULLY
COVERED</h3>
<p>Variables can be defined and assigned values.</p>
<p>Variables can be scoped within namespaces: for example, state-level,
dialog-level, document-level, application-level or session-level. The markup
language defines the precise scope of all variables.</p>
<p>The markup language must specify if variables are atomic or structured.</p>
<p>Variables can be assigned default values. Assignment may be optional; for
example, in a flight reservation form, a 'special meal' variable need not be
assigned a value by the user.</p>
<p>Variables may be referred to in the output content of the markup
language.</p>
<p>The precise requirements on variables may be affected by W3C work on
modularity and XML schema datatypes.</p>
<h4><a id="CA_3_5" name="CA_3_5"></a>Requirement Coverage</h4>
<p><var>, <assign>, <script> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_6" name="A_3_6"></a>A.3.6 Variable Binding (must have) FULLY
COVERED</h3>
<p>User input can bind one or more state variables. A single input may bind a
single variable or it may bind multiple variables in any order; for example,
the following utterances result in the same variable bindings<br />
</p>
<ul>
<li>Transfer $200 from savings to checking</li>
<li>Transfer $200 to checking from savings</li>
<li>Transfer from savings $200 to checking</li>
</ul>
<h4><a id="CA_3_6" name="CA_3_6"></a>Requirement Coverage</h4>
<p>application.lastresult$.interpretation <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_7" name="A_3_7"></a>A.3.7 Event Handler (must have) FULLY
COVERED</h3>
<p>The markup language provides an explicit event handling mechanism for
specifying actions to be carried out when events are generated in a dialog
state.</p>
<p>Event handlers can be ordered so that if multiple event handlers match the
current event, only the handler with the highest ranking is executed. By
default, event handler ranking is based on proximity and specificity: i.e.
the handler closest in the event hierarchy with the most specific matching
conditions.</p>
<p>Actions can be conditional upon variable assignments, as well as the type
and content of events (e.g. input events specifying media, content,
confidence, and so on).</p>
<p>Actions include: the binding of variables with information, for example,
information contained in events; transition to another dialog state
(including the current state).</p>
<h4><a id="CA_3_7" name="CA_3_7"></a>Requirement Coverage</h4>
<p><catch> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><transition> <a
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>
<h3><a id="A_3_8" name="A_3_8"></a>A.3.8 Builtin Event Handlers (should have)
FULLY COVERED</h3>
<p>The markup language can provide implicit event handlers which provide
default handling of, for example, timeout and error events as well as
handlers for situations, such as confirmation and clarification, where there
is a transition to a implicit dialog state. For example, there can be a
default handler for user input events such that if the recognition confidence
score is below a given threshold, then the input is confirmed in a
sub-dialog.</p>
<p>Properties of implicit event handlers (thresholds, counters, locale, etc)
can be explicitly customized in the markup language.</p>
<p>Implicit event handlers are always overridden by explicit handlers.</p>
<h4><a id="CA_3_8" name="CA_3_8"></a>Requirement Coverage</h4>
<p>Default event handlers (nomatch, noinput, error, etc...) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<h3><a id="A_3_9" name="A_3_9"></a>A.3.9 Output Content and Events (must
have) FULLY COVERED</h3>
<p>The markup language complies with the requirements developed by the Speech
Synthesis Markup Subgroup for output text content and parameter settings for
the output device. Requirements on multimodal output will be co-ordinated by
the Multimodal Interaction Subgroup (cf. Section 1).</p>
<p>In addition, the markup supports the following output features (if not
already defined in the Synthesis Markup):</p>
<ol>
<li>Pre-recorded audio file output</li>
<li>Streamed audio</li>
<li>Playing/synthesizing sounds such as tones and beeps</li>
<li>variable level of detail control over structured text</li>
</ol>
<p>The output device generates timestamped events including error events and
progress events (output started/stopped, current position).</p>
<h4><a id="CA_3_9" name="CA_3_9"></a>Requirement Coverage</h4>
<p><audio>, <prompt> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><speak> and other SSML elements <a
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>
<p>application.lastresult$.markname, application.lastresult$.marktime <a
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>
<h3><a id="A_3_10" name="A_3_10"></a>A.3.10 Richer Output (nice to have)
FULLY COVERED</h3>
<p>The markup language allows for richer output than variable substitution in
the output content. For example, natural language generation of output
content.</p>
<h4><a id="CA_3_10" name="CA_3_10"></a>Requirement Coverage</h4>
<p><prompt> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><speak> and other SSML elements <a
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>
<h3><a id="A_3_11" name="A_3_11"></a>A.3.11 Input Content and Events (must
have) FULLY COVERED</h3>
<p>The markup language complies with the requirements developed by the
Grammar Representation Subgroup for the representation of speech grammar
content. Requirements on multimodal input will be co-ordinated by the
Multimodal Interaction Subgroup (cf. Section 1).</p>
<p>The markup language can specify the activation and deactivation of
multiple speech grammars. These can be user-defined, or builtin grammars
(digits, date, time, money, etc).</p>
<p>The markup language can specify parameters for speech grammar content
including timeout parameters --- maximum initial silence, maximum utterance
duration, maximum within-utterance pause --- energy thresholds necessary for
bargein, etc.</p>
<p>The input device generates timestamped events including input timeout and
error events, progress events (utterance started, interference, etc), and
recognition result events (including content, interpretation/variable
bindings, confidence).</p>
<p>In addition to speech grammars, the markup language allows input content
and events to be specified for DTMF and keyboard devices.</p>
<h4><a id="CA_3_11" name="CA_3_11"></a>Requirement Coverage</h4>
<p>timeout, completetimeout, incompletetimeout, interdigittimeout,
termtimeout properties <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p>application.lastresult$.interpretation, application.lastresult$.confidence
<a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
2.0</a></p>
<p>application.lastresult$.markname, application.lastresult$.marktime <a
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>
<p><grammar> and other elements <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
<h3><a id="A_4_1" name="A_4_1"></a>A.4.1 Event Handling (must have) FULLY
COVERED</h3>
<p>One key difference between contemporary event models (e.g. DOM Level 2,
'try-catch' in object-oriented programming) is whether the same event can be
handled by more than one event handler within the hierarchy. The markup
language must motivate whether it supports this feature or not.</p>
<h3><a id="A_4_2" name="A_4_2"></a>A.4.2 Logging (nice to have) FULLY
COVERED</h3>
<p>For development and testing it is important that data and events are to be
logged by the voice browser. At the most detailed level, this will include
logging of input and output audio data. A mechanism which allows logged data
to be retrieved from a voice browser, preferably via standard Internet
protocol (http, ftp, etc), is also required.</p>
<p>One approach is to require that the markup language can control logging
via, for example, an optional meta tag. Another approach is for logging to be
controlled by means other than the markup language, such as via proprietary
meta tags.</p>
<h4><a id="CA_4_2" name="CA_4_2"></a>Requirement Coverage</h4>
<p><log> <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
<p><log> <a href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML
1.0</a></p>
</body>
</html>