index.html 56.5 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>Common Sense Suggestions for Developing Multimodal User
  Interfaces</title>
  <style type="text/css">
  ul.toc { font-size: 200%; }
  ul.toc li { list-style: none; font-size: 80%; margin-top: 0.5em; }
  li { margin-top: 0.5em; }
  table { width: 100%; border: none; margin-top: 1em;
        padding: 0.1em; background-color: gray }
  td, th { border: none; background: white; padding: 0.3em; margin: 0.1em }
  .suggestion { font-weight: bold; font-style: italic; margin-left: 1em; }
  caption { font-style: italic; font-size: 80% }
  </style>
  <link href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css"
  rel="stylesheet" type="text/css" />
</head>
<body xml:lang="en" lang="en">
<div class="head">
<a href="http://www.w3.org/"><img alt="W3C" height="48"
src="http://www.w3.org/Icons/w3c_home" width="72" /></a>

<h1>Common Sense Suggestions for Developing Multimodal
User Interfaces</h1>

<h2>W3C Working Group Note 11 September 2006</h2>

<dl>
  <dt>This version:</dt>
    <dd><a
      href="http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/">http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/</a></dd>
  <dt>Latest version:</dt>

    <dd><a
      href="http://www.w3.org/TR/mmi-suggestions/">http://www.w3.org/TR/mmi-suggestions/</a></dd>
  <dt>Previous version:</dt>
    <dd><em>This is the first publication.</em></dd>
  <dt>Editors:</dt>
    <dd>Jim Larson, Intel</dd>
</dl>

<p class="copyright"><a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
&#169; 2006 <a href="http://www.w3.org/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>&#174;</sup> (<a
href="http://www.csail.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,

<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> rules apply.</p>
</div>

<!-- end of head div -->

<hr title="Separator for header" />

<h2 id="abstract">Abstract</h2>

<p>This document is based on the accumulated experience of several
years of developing multimodal applications. It provides a
collection of common sense advice for developers of multimodal
user interfaces.</p>

<h2 id="status">Status of this Document</h2>

<p><em>This section describes the status of this document at
the time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest revision
of this technical report can be found in the
<a href="http://www.w3.org/TR/">W3C technical reports
index</a> at http://www.w3.org/TR/.</em></p>

<p>This document is a W3C Working Group Note. It represents
the views of the W3C Multimodal Interaction Working Group at
the time of publication. The document may be updated as new
technologies emerge or mature. Publication as a Working
Group Note does not imply endorsement by the W3C Membership.
This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate
to cite this document as other than work in progress.</p>

<p>This document is one of a series produced by the
<a href="http://www.w3.org/2002/mmi/Group/">Multimodal
Interaction Working Group</a> <em>(<a
 href="http://cgi.w3.org/MemberAccess/AccessRequest">Member 
Only Link</a>)</em>, part of the <a
href="http://www.w3.org/2002/mmi/">W3C Multimodal
Interaction Activity</a>. The MMI activity statement can
be seen at
<a href="http://www.w3.org/2002/mmi/Activity">http://www.w3.org/2002/mmi/Activity</a>.</p>

<p>Comments on this document can be sent to <a
href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a>,
the public forum for discussion of the W3C's work on
Multimodal Interaction. To subscribe, send an email to
<a href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>
with the word subscribe in the subject line (include the
word unsubscribe if you want to unsubscribe). The
<a href="http://lists.w3.org/Archives/Public/www-multimodal/">archive</a>
for the list is accessible online.</p>

<p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. This document is informative only. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34607/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>

<h2 id="contents">Table of Contents</h2>

<ul class="toc">
<li><a href="#Four_Major_Principles">Four Major Principles</a></li>
<li>1. <a href="#Satisfy_real-world_constraints">Satisfy
Real-world Constraints</a>

<ul>
<li><a href="#Task-oriented_Suggestions">Task-oriented
Suggestions</a>

<ul>
<li>1.1 <a href="#G11"> Suggestion: For each task, use the easiest
mode available on the device.</a></li>
</ul>
</li>
<li><a href="#Physical_Suggestions">Physical Suggestions</a>

<ul>
<li>1.2 <a href="#G12"></a><a href=
"#G12">Suggestion: If the use's hands are
busy, then use speech.</a></li>
<li>1.3 <a href="#G13">Suggestion: If the user's eyes are busy,
then use speech.</a></li>
<li>1.4 <a href="#G14">Suggestion: If the user may be walking, use
speech for input.</a></li>
</ul>
</li>
<li><a href="#Environmental_Suggestions">Environmental
Suggestions</a>

<ul>
<li>1.5 <a href="#G15">Suggestion: If the user may be in a noisy
environment, then use a pen or keys</a></li>
<li>1.6 <a href="#G16">Suggestion: If the user's manual dexterity
may be impaired, then use speech.</a></li>
</ul>
</li>
</ul>
</li>
<li>2. <a href="#Communicate_Clearly_Concisely_and">Communicate
Clearly, Concisely, and Consistently with Users</a> <a href=
"#Consistency_Suggestions"></a>

<ul>
<li><a href="#Consistency_Suggestions">Consistency Suggestions</a>

<ul>
<li>2.1 <a href="#G21">Suggestion: Phrase all prompts
consistently.</a></li>
<li>2.2 <a href="#G22">Suggestion: Enable the user to speak keyword
utterances rather than natural language sentences.</a></li>
<li>2.3 <a href="#G23">Suggestion: Switch presentation modes only
when the information is not easily presented in the current
mode.</a></li>
<li>2.4 <a href="#G24">Suggestion: Make commands
consistent.</a></li>
<li>2.5 <a href="#G25">Suggestion: Make the focus consistent across
modes</a></li>
</ul>
</li>
<li><a href="#Organizational_Suggestions">Organizational
Suggestions</a>

<ul>
<li>2.6 <a href="#G26">Suggestion: Use audio to indicate the verbal
structure.</a></li>
<li>2.7 <a href="#G28">Suggestion: Use pauses to divide information
into natural "chunks."</a></li>
<li>2.8 <a href="#G29">Suggestion: Use animation and sound to show
transitions.</a></li>
<li>2.9 <a href="#G210">Use voice navigation to reduce the number
of screens.</a></li>
<li>2.10 <a href="#G211">Synchronize multiple modalities
appropriately.</a></li>
<li>2.11 <a href="#G212">Keep the user interface as simple as
possible</a>.</li>
</ul>
</li>
</ul></li>

<li>3. <a href="#Help_Users_Recover_Quickly_and">Help Users
Recover Quickly and Efficiently from Errors</a>

<ul>
<li><a href="#Conversational_Suggestions">Conversational
Suggestions</a>

<ul>
<li>3.1 <a href="#G31">Suggestion: Users tend to use the same mode
that was used to prompt them.</a></li>
<li>3.2 <a href="#G32">Suggestion: If privacy is not a concern,
use speech as output to provide commentary or help.</a></li>
<li>3.3 <a href="#G33">Suggestion: Use directed user interfaces
unless the user is always knowledgeable and experienced in the
domain</a>.</li>
<li>3.4 <a href="#G34">Suggestion: Always provide context sensitive
help for every field and command</a></li>
</ul>
</li>
<li><a href="#Reliability_Suggestions">Reliability Suggestions</a>

<ul>
<li>3.5 <a href="#G35">Suggestion: The user always should be able
to easily determine if the device is listening to the
user.</a></li>
<li>3.6 <a href="#G36">Suggestion: The user always should be able
to easily determine how much longer the device will be
operational.</a></li>
<li>3.7 <a href="#G37">Suggestion: Support at least two input modes
so one input mode can be used when the other cannot.</a></li>
<li>3.8 <a href="#G38">Suggestion: Present words recognized by the
speech recognition system on the display so the user can verify
they are correct.</a></li>
<li>3.9 <a href="#G39">Suggestion: Display the n-best list to
enable easy speech recognition error correction</a></li>
<li>3.10 <a href="#G310">Try to keep response times less than 5
seconds. Inform the user of longer response times</a></li>
</ul>
</li>
</ul>
</li>

<li>4. <a href="#Make_Users_Feel_Comfortable">Make Users
Comfortable</a>

<ul>
<li><a href="#SpeakingMode">Listening mode</a>

<ul>
<li>4.1 <a href="#G41">Suggestion: Speak after pressing a speak key
which automatically releases after the user finishes
speaking.</a><a href="#System_Status"></a></li>
</ul>
</li>
<li><a href="#System_Status">System Status</a>

<ul>
<li>4.2 <a href="#G42">Suggestion: Always present the current
system status to the user.</a></li>
</ul>
</li>
<li><a href="#Human_memory_Constraints">Human-memory
Constraints</a>

<ul>
<li>4.3 <a href="#G43">Suggestion: Use the screen to ease stress on
the user's short-term memory.</a></li>
</ul>
</li>
<li><a href="#Social_Suggestions">Social Suggestions</a>

<ul>
<li>4.4 <a href="#G44">Suggestion: If the user may need privacy,
use a display rather than render speech.</a></li>
<li>4.5 <a href="#G45">Suggestion: If the user may desire privacy,
use a pen or keys.</a></li>
<li>4.6 <a href="#G46">Suggestion: If the device may be used during
a business meeting, then use a pen or keys (with the keyboard
sounds turned off).</a></li>
</ul>
</li>
<li><a href="#Advertising_Suggestions">Advertising Suggestions</a>

<ul>
<li>4.7 <a href="#G47">Suggestion: Use animation and sound to
attract the user's attention.</a></li>
<li>4.8 <a href="#G48">Suggestion: Use landmarks to help the know
where he is.</a></li>
</ul>
</li>
<li><a href="#Ambience_Suggestion">Ambience Suggestion</a>

<ul>
<li>4.9 <a href="#G49">Suggestion: Use audio and graphics design to
set the mood and convey emotion in games and entertainment
applications.</a></li>
<li style="list-style: none"><a href="#Summary">Summary</a></li>
</ul>
</li>
</ul>
</li>
</ul>

<hr title="Separator for introduction" />

<h2 id="introduction">Introduction</h2>

<p>When fonts were first introduced, many messages looked like ransom notes from 
  kidnappers. When color was introduced, many reports looked like they had barely 
  survived an explosion in a paint factory. To avoid these annoying user interfaces, 
  developers adopted suggestions and best practices for using fonts and colors.</p>

<p>With the introduction of multiple modes of input-voice, pen, and
keys-inexperienced developers may design loud, confusing, and
annoying user interfaces that result in low user performance and
high user discontent. This document attempts to enumerate a
collection of commonsense suggestions for developing high
performance and high preference multimodal user interfaces. We have
collected suggestions, techniques, and principles from many diverse
disciplines to generate the following suggestions for developing
multimodal user interfaces.</p>

<p>This set of suggestions originated in a brainstorming session with some of my 
students at the Oregon Graduate Institute of the Oregon Health and Sciences 
University. I categorized the suggestions, and showed them to several multimodal 
application developers, who added additional suggestions. These have been reviewed 
and revised by the W3C Multimodal Interaction Working Group. The suggestions 
will be reviewed by other relevant W3C working groups including Accessibility, 
Internationalization, and Mobile Web Initiative Best Practices.</p>

<p>Again, these are commonsense suggestions. You may think that no
one would ever develop user interfaces that violate these
suggestions, but developers have violated commonsense suggestions
before and will likely do so again. Use these suggestions as a
checklist when you design a multimodal interface. These suggestions
should help you to construct a multimodal user interface that
improves user performance and satisfaction, so intended people can
use your application easily and effectively.</p>

<p>These suggestions can be used as follows:</p>

<ol>
<li>
<p>Review the suggestions before designing a multimodal user
interface. The suggestions will assist you in making decisions as
you design your multimodal user interface.</p>
<p>Review the suggestions after designing a multimodal user interface. Use 
the suggestions as a check list to assess your design after it is completed. 
Some designers rank their user interface with respect to each suggestion, 
giving a high score if the user interface conforms to the suggestions and 
a low score if it does not.</p>
</li>
<li>
<p>The suggestions are only suggestions. There are situations when
every suggestion should be overridden, and these suggestions are no
exception. If there are good reasons for not following a
suggestions, then ignore the suggestion.</p>
</li>
<li>
<p>Some users will want to configure their user interface to satisfy their personal 
preferences. We encourage the use of configuration dialogs to help the user 
achieve the configuration that is best for him or her. We also note that 
many users are afraid of configuration, and are happy to use the user interface 
"as is," without ever configuring the system.</p>
</li>
</ol>

<h2 id="Four_Major_Principles">Four Major Principles</h2>

<p>The suggestions are organized into four major principles of user
interface design. The following four principles determine how
quickly users are able to learn and how effectively they are able
to perform desired tasks with the user interface:</p>

<ol>
<li>Satisfy real-world constraints</li>

<li>Communicate clearly, concisely, and consistently with users</li>

<li>Help users recover quickly and efficiently from errors</li>

<li>Make users comfortable</li>
</ol>

<p>Multimodal user interface developers should follow the above four principles 
  and apply the following suggestions to avoid many of the potential usability 
  problems caused by using modes incorrectly.</p>

<h2 id="Satisfy_real-world_constraints">1. Satisfy
Real-world Constraints</h2>

<p>Real-world constraints limit what the users may achieve with an
application. These limitations may be due to the nature of the task
the user intend to perform, other activities the user is
performing, physical limitations of the user, and conditions of the
environment in which the user will perform the task. The user
interface should be designed to compensate for these
limitations.</p>

<h3 id="Task-oriented_Suggestions"> Task-oriented
Suggestions</h3>

<p>The nature of the task influences the mode (or modes) users select to perform 
  the task. Tasks which are easy to perform in one mode may be difficult or impossible 
  to perform using another mode. Task-oriented suggestions suggest which tasks 
  lend themselves best to data entry using various modes of entry.</p>

<p>New mobile devices will enable users to enter data by speaking
into a microphone, writing with a stylus, and pressing keys on a
small keypad. These input modes can be used to perform the
following four basic manipulation tasks:</p>

<ul>
<li>Select objects (e.g., menu options)</li>

<li>Enter text</li>

<li>Enter symbols (e.g., part of mathematical equations)</li>

<li>Enter sketches or illustrations</li>
</ul>

<p>There are other basic tasks, but the tasks mentioned above are
performed most frequently in common applications using handheld
computers.</p>

<p>Table 1 summarizes how users perform the four basic tasks using
the following popular input modes:</p>

<ul>
<li><em>Voice</em> - The user speaks into a microphone.</li>

<li><em>Pen</em> - The user manipulates a pen to write, draw, or
point.</li>

<li><em>Keys</em> - The user manipulates a keyboard or keypad by
pressing keys.</li>
</ul>

<table summary="5 columns">
<caption>
  Table 1: Performing the four basic manipulation tasks using four popular input 
  modes, ranked from easiest (1) to most difficult (4) 
  </caption>
<tr>
<th>Content Manipulation Task</th>
<th>Voice Mode</th>
<th>Pen Mode</th>
<th>Keyboard/keypad</th>
<th>Mouse/Joystick</th>
</tr>
<tr>
<td>Select objects</td>
<td>(3) Speak the name of the object</td>
<td>(1) Point to or circle the object</td>
<td>(4) Press keys to position the cursor on the object and press
the <i>select key</i></td>
    <td class="c8" valign="top">(2) Point to and click on the object or drag to 
      select text</td>
</tr>
<tr>
<td>Enter text</td>
<td>(2) Speak the words in the text</td>
<td>(3) Write the text</td>
<td>(1) Press keys to spell the words in the text</td>
<td>(4) Spell the text by selecting letters from a soft
keyboard</td>
</tr>
<tr>
<td>Enter symbols</td>
<td>(3) Say the name of the symbol and where it should be
placed.</td>
<td>(1) Draw the symbol where it should be placed</td>    <td>(4) Enter one or more characters that together represent the symbol</td>
<td class="c8" valign="top">(2) Select the symbol from a menu and
indicate where it should be placed</td>
</tr>
<tr>
<td>Enter sketches or illustrations</td>
<td>(2) Verbally describe the sketch or illustration</td>
<td>(1) Draw the sketch or illustration</td>
<td>(4) Impossible</td>    <td>(3) Create the sketch by moving the mouse so it leaves a trail (similar 
      to an Etch-a-Sketch&trade;)</td>
</tr>
</table>

<p>Select objects. Object selection is easy with a pen-just point
to or circle the desired object. When using voice, just say the
name of the desired object, assuming the object has a name. With a
keyboard, press keys to position the cursor on the desired object
and press the <em>select</em> key.</p>

<p>Enter text. Each of the four modes can be used for text entry-the user speaks 
  words into a microphone, handwrites the words using a pen, presses keys on a 
  keypad to spell the words or selects letters from a soft keyboard. Most users 
  can speak and write easily. However, some training and practice may be necessary 
  to use a keyboard or mouse efficiently.</p>

<p>Enter symbols. Entering mathematical equations, special
characters, and signatures is easy with a pen, awkward and
time-consuming with a mouse, and most difficult with speech.</p>

<p>Enter sketches or illustrations. Drawing simple illustrations
and maps is easy with a pen, awkward with a mouse, and nearly
impossible with speech. When speaking, users must verbally describe
the illustration or map.</p>

<p>Each input mode has its strengths and weaknesses. Voice is good
for describing attributes. The pen is good for pointing and
sketching. Keys are good for entering text, numbers, and symbols. A
useful and efficient multimodal system uses the appropriate mode
for each entry.</p>

<p class="suggestion" id="G11"> 1.1. Suggestion: For each task, use
the easiest modes available on the device.</p>

<p>Suggestion examples include:</p>

<ul>
<li>To select an icon, use a pen or stylus to point to the
icon. (To aid in object section, highlight the object when
the cursor hovers above it. Highlight all selected objects.)</li>
<li>To enter text, use voice or a keypad.</li>
<li>To enter the symbols for a mathematical equation, use pen.
(or an onscreen keyboard with options for each symbol).</li>
<li>To draw a map, use a pen.</li>
</ul>

<h3 id="Physical_Suggestions">Physical Suggestions</h3>

<p>Different physical devices exhibit different usability
characteristics. The device's size, shape, and weight affect how it
may be used. Most important, the placement of a microphone and
speaker, the size of the display and writing surface, and the size
of keys in a keypad all affect the ease with which a user can enter
information by speaking, writing or pressing keys. Table 2
summarizes the three modes of input with respect to physical
usability issues.</p>

<table summary="4 columns">
  <caption>
  Table 2: Physical usability issues for the four most popular modes of information 
  entry 
  </caption>
  <tr> 
    <th>Device Usability Issues</th>
    <th>Voice Mode</th>
    <th>Pen Mode</th>
    <th>Keystrokes Mode</th>
    <th>Mouse/joystick mode</th>
  </tr>
  <tr> 
    <td>Required number of user hands</td>
    <td>None (plus possibly one to hold the device)</td>
    <td>One (plus possibly one to hold the device)</td>
    <td>One or two</td>
    <td>One</td>
  </tr>
  <tr> 
    <td>Required use of eyes</td>
    <td>No</td>
    <td>Yes</td>
    <td>Frequently, but some users can operate familiar keyboards without looking 
      at them</td>
    <td>Yes</td>
  </tr>
  <tr> 
    <td>Portable</td>
    <td>Yes, especially when walking</td>
    <td>Yes, but difficult while walking</td>
    <td>Yes, but difficult while walking</td>
    <td>Yes, but difficult while walking</td>
  </tr>
</table>

<p>Required number of user hands. A user's hands may be required when operating 
  machinery, assembling parts into a device, or creating an object of art. No 
  hands are needed to speak and listen to a voice user interface. A pen requires 
  one hand to hold the pen. A mouse requires one hand to hold the mouse and in 
  most cases requires a surface for the mouse to rest on. By their nature, handheld 
  devices also may require a hand to hold the device. A 12-key keypad requires 
  one hand to enter data, while a QWERTY keypad requires two hands to enter data 
  efficiently. Some users become skilled at holding a small QWERTY keyboard with 
  both hands and using their thumbs to type.</p>

<p class="suggestion" id="G12"> 1.2. Suggestion: If the user's hands
are unavailable for use, then make speech available.</p>

<p>Suggestion examples include:</p>

<ul>
<li>If the user is driving a car, use speech to ask for directions
to a restaurant.</li>
<li>If the user is repairing a machine, use speech to ask for the
next repair instruction.</li>
<li>If the user is preparing a meal, use speech to ask for the next
recipe instruction.</li>
</ul>

<p>Required use of eyes. A user's eyes should be focused primarily
on the road while driving a vehicle, on a physical device to be
constructed or repaired, or on subjects and their activities while
observing an experiment. Usually, users must look at what they are
writing with a pen or typing on a keypad. However, the user's eyes
may be free to observe his or her environment while speaking.</p>

<p class="suggestion" id="G13"> 1.3. Suggestion: If the user's eyes
are busy or not available, then make speech available.</p>

<p>Suggestion examples include:</p>

<ul>
<li>If the user is driving a car, use speech to manipulate a
radio.</li>
<li>If a guard is watching a TV monitor, use speech or hand
controls to manipulate the camera.</li>
<li>If a scientist is looking into a microscope, use speech to
dictate his or her observations.</li>
</ul>

<p>Portable. Speech and pen devices are very portable. Users may
use them while sitting, standing, walking, and sometimes while
running. Traditionally, keyboard devices are used only while the
user is not moving. Keypads requiring only one hand, like those
frequently found on handheld devices and telephones, can be used
while sitting or standing.</p>

<p class="suggestion" id="G14"> 1.4. Suggestion: If the user may be
walking, then make speech available</p>

<p>Suggestion examples include:</p>

<ul>
<li>While walking the streets of New York, use speech to ask
directions to the nearest subway station. (Both voice and a map may
be used to present directions to the user.)</li>
<li>While shopping in a department store, use speech to ask for the
location of a specific item.</li>
</ul>

<h3 id="Environmental_Suggestions">Environmental
Suggestions</h3>

<p>People work in environments that may not be ideal for some modes of user interfaces. 
  The environment might be noisy or quiet, hot or cold, light or dark, or moving 
  or stationary with a variety of distractions and possible dangers. Multimodal 
  user interfaces must be designed to work in the environments where they will 
  be used. Table 3 summarizes the environmental usability issues with respect 
  to four popular input modes.</p>

<table summary="4 columns">
  <caption>
  Table 3: Environmental usability issues for the four popular modes of information 
  entry 
  </caption>
  <tr> 
    <th>Device Usability Issues</th>
    <th>Voice Mode</th>
    <th>Pen Mode</th>
    <th>Keystroke Mode</th>
    <th>Mouse/joystick mode</th>
  </tr>
  <tr> 
    <td>Noisy environment</td>
    <td>Works poorly in a noisy environment</td>
    <td>Works well in a noisy environment</td>
    <td>Works well in a noisy environment</td>
    <td>Works well in a noisy environment</td>
  </tr>
  <tr> 
    <td>Other environmental concerns</td>
    <td>Works well independently of gloves</td>
    <td>Does not work well when users must wear thick gloves</td>
    <td>Does not work well when users must wear thick gloves</td>
    <td>Does not work well when users must wear thick gloves</td>
  </tr>
</table>

<p>Noisy environment. Because speech recognition systems pick up
background sounds, they often make mistakes if the user speaks in a
noisy environment.</p>

<p class="suggestion" id="G15"> 1.5. Suggestion: If the user may be in a noisy environment, 
  then use a pen, keys,or mouse.</p>

<p>Suggestion examples include:</p>

<ul>
<li>Use a pen or keys to enter a telephone number when in a noisy
airport.</li>
<li>Use a pen or keys to enter data when in a noisy shop.</li>
</ul>

<p>Other environmental concerns: Pen and keyboard devices are
difficult if the user must wear thick gloves, such as in a cold
environment or when protecting hands from rough objects.</p>

<p class="suggestion" id="G16"> 1.6. Suggestion: If the user's manual
dexterity may be impaired, then use speech.</p>

<p>A suggestion example is:</p>

<ul>
<li>If the user works in cold meat locker, works on a construction
site and handles rough material, or works with dangerous chemicals
and must wear gloves, then use voice to enter data.</li>
</ul>

<h2 id="Communicate_Clearly_Concisely_and">2.
Communicate Clearly, Concisely, and Consistently with Users</h2>

<p>Efficient communication is required if teams of people are to
achieve success in joint activities. Likewise, effective
communication between the user and the device is necessary for
achieving the user's goals. The multimodal user interface is the
conduit for all communication between the user and the device.
Communication should be clear and concise, avoiding ambiguities and
confusion. Communication styles should be consistent and systematic
so users know what to expect and can leverage the patterns and
rhythms in the dialog.</p>

<h3 id="Consistency_Suggestions">Consistency
Suggestions</h3>

<p>Consistency enables users to leverage conversational patterns to
accelerate their interaction. For example, users can follow a
consistent conversational rhythm without having to pause to adjust
to heterogeneous dialog styles.</p>

<p>Consistent prompts. If prompts are worded inconsistently, then
users must pause to decode each wording format. Users must spend
additional time and mental effort to respond to differently
structured questions. When prompts are consistently worded, users
can concentrate on the answers to questions rather than trying to
understand the questions.</p>

<p class="suggestion" id="G21">2.1. Suggestion: Phrase all prompts
consistently.</p>

<p>Suggestions examples include:</p>

<ul>
<li>To be consistent and encourage experienced users to barge-in,
consider using the following general voice prompt format:

<ol>
<li><em>Speak the name of the menu or form item.</em> The menu name
serves as a landmark. A <em>landmark</em> is a speech or non-speech
cue that marks a specific location within the dialog structure. By
providing a name, such as "main menu" or "thermostat," callers can
jump to this menu by speaking the menu name or return to the menu
when they get confused or lost. Also, repeating the menu name to
the caller confirms that the caller has reached the correct menu.
However, if the name is contained within the question and is not
needed as a landmark, then skip speaking the name.</li>
<li><em>Ask a question.</em> Often, this can be achieved with two
or three words. This should be enough to remind experienced callers
to respond without listening to the enumerated options. Novice
callers will listen to the enumerated options before speaking their
selection.</li>
<li><em>Enumerate options.</em> If there are a small number of
valid responses, then list the options so novice callers can hear
and select their desired option. However, if the user is likely to
know the set of valid responses, then skip this operation.</li>
</ol>
</li>
<li style="list-style: none">

<p>Experienced callers can barge-in after they hear the question,
while novice callers will respond after they hear the entire menu
option list.</p>
</li>
<li>Use the same terms in all prompts, whether the terms are text,
voice, or multimedia prompts.</li>
</ul>

<p><strong>Consistent command format.</strong> The current state of the art of 
  speech recognition and natural language technology does not always accurately 
  recognize and understand arbitrary complete sentences. Keyword recognition is 
  much faster and accurate. Many tasks lend themselves to keyword commands better 
  than natural language sentences.</p>

<p class="suggestion" id="G22">2.2. Suggestion: Enable the user to
speak keyword utterances rather than natural language
sentences.</p>

<p>Switching modes. Switching modes can be jarring and sometimes
surprising. For example, a user who has just answered three verbal
questions will be surprised if a textual question suddenly pops
up.</p>

<p class="suggestion" id="G23">2.3. Suggestion: Switch presentation
modes only when the information is not easily presented in the
current mode.</p>

<p>Suggestion examples include:</p>

<ul>
<li>If the user repeatedly experiences errors when using voice or
handwriting recognition, consider switching to a text mode. Text
mode often avoids the recognition errors occurring because of
heavily-accented speakers or poor handwriting.</li>
<li>Switch from audio to text output if the result of a verbal
query is large and the user is likely to become anxious listening
to the result.</li>
<li>Switch from audio output to graphical output if the result can
be structured as a table, graphic, or other illustration.</li>
</ul>

<p><strong>Command consistency.</strong> Using different commands
for the same purpose confuses users, as does using the same command
for multiple functions.</p>

<p class="suggestion" id="G24">2.4. Suggestion: Make commands
consistent.</p>

<p>Users tend to use the wording which is visually presented. Include the command 
  name on buttons and other navigational elements in the grammar for the voice 
  mode. All voice commands that achieve the same functionality should have the 
  same grammar. Users tend to use known commands from their daily use of computers. 
  Incorporate these commands into the grammar, even it they are not visually presented 
  in the GUI.</p>

<p>Suggestion exampless:</p>

<ul>
<li>If a button is labeled "exit," then "exit" should be in the
grammar for the voice mode.</li>
<li>If a user may say "exit" from each of three visual pages,
then the grammar for this command should be the same for all
three pages.</li>
<li>If users often use "exit" in many other applications, then use
"exit" in this application so that the user can apply knowledge
from other applications to this application.</li>
</ul>

<p class="suggestion" id="G25">2.5. Suggestion: Make the focus
consistent across modes</p>

<p>If the user is prompted to speak a value for a field, then
highlight that field in the GUI.</p>

<p>Suggestion examples:</p>

<ul>
<li>When filling out a form, highlight the field in the GUI when
the voice user interface prompts the user to speak a value for that
field.</li>
<li>Consistently highlight visual items in focus and consistently
highlight selected visual items.</li>
</ul>

<h3 id="Organizational_Suggestions">Organizational
Suggestions</h3>

<p>Grade school teachers always teach that organizing your thoughts before writing 
  a composition will dramatically improve its understandability. The same principle 
  applies to user interfaces. Organizing information and transitioning between 
  topics will improve the users' comprehension of and performance with the multimodal 
  interface. Information should be structured and organized in ways that are familiar 
  to the user.</p>

<p>Content structure. Audio cues help users understand audio
information. For example, use a click to introduce each item of a
bulleted list, increase the volume to emphasize highlighted text,
or use a whisper to speak parenthetical text.</p>

<p class="suggestion" id="G26">2.6. Suggestion: Use audio and/or
visual icons to indicate the content structure.</p>

<p>There are generally accepted icons to represent content
structure. for example, a clock may indicate that an application is
busy, arrows may represent next and previous pages, etc.</p>

<p>Because there are no standard assignments of meanings for sounds, common sense 
  and user testing should guide the dialog designer. Here are suggestions for 
  items that lend themselves to non-speech sounds:</p>

<ul>
<li><em>Links</em> Identify words that the user may say to jump to
another VoiceXML document by introducing them with a unique
sound.</li>
<li><em>Turn-taking tone</em> - A tone signals to the user that the
system has finished talking and that the user may speak.</li>
<li><em>Brand earcon</em> - Many businesses have audio icons, such
as the distinctive bong sound of AT&amp;T, the three tones of NBC,
and the four tones for "Intel Inside." These audio icons can be
presented to the user to announce that the user has arrived at the
company's site.</li>
  <li><em>Feedback</em> - The user needs to know if the speech application is 
    processing data or waiting for input. A non-speech sound, such as a percolating 
    coffee pot, is ideal for informing the user that the speech application system 
    is busy processing. It also reassures the user that the application is busy 
    and has not terminated abnormally. A bell tone is ideal for informing the 
    user that the system is ready for the user's input.</li>
<li><em>Barge-in temporarily disabled</em> - Designers may disable
barge-in when presenting advertisements or legal notices. To
prevent the user from barging-in, signal the user that barge-in is
temporarily disabled by presenting "barge-in disabled" and
"barge-in enabled" audio icons.</li>
<li><em>Bulleted list</em> - A short sound snippet can be used at
the beginning of each item on a list.</li>
</ul>

<p>Chunks of information. Users comprehend audio information more
easily if it is presented as blocks, or chunks, of information. For
example, users may not recognize "six, one, seven, two, two, five,
four, three, seven, six" as a telephone number, but they will
recognize "six, one, seven (pause) two, two, five (pause) four,
three, seven, six" as either an American or Canadian telephone
number.</p>

<p class="suggestion" id="G28">2.7. Suggestion: Use pauses to divide
information into natural "chunks."</p>

<p>Suggestion examples include:</p>

<ul>
<li><em>Chunking numbers</em> - Phone numbers, identification
numbers, and other sequences of numbers are frequently clustered
into groups of two or three numbers when spoken. A short pause
between the sets of groups helps users comprehend and remember the
number easier. For example, North American telephone numbers are
frequently spoken in three chunks: the three-digit area code, the
three-digit exchange number, and the four-digit subscriber
number.</li>
<li><em>Pause between instructions and options</em> - Placing a
pause between instructions and the options for prompts signals the
user when the instructions are complete. Experienced users may
barge-in after the instructions, but before hearing the list of
options.</li>
</ul>

<p>Transitions. A user may become disoriented if the information
content suddenly changes. Writers are well aware of the need for
transitions between topics. Similar transitions are needed for
visual and verbal information.</p>

<p class="suggestion" id="G29">2.8. Suggestion: Use animation and
sound to show transitions.</p>

<p>Suggestion examples:</p>

<ul>
<li>Display a turning page and present an audio sound to indicate
the transition between two pages.</li>
<li>Navigation: One study has shown that mobile users drop off at
the rate of 50% with each screen change. Voice navigation can be
used to reduce the number of screens.</li>
</ul>

<p class="suggestion" id="G210">2.9. Use voice navigation to reduce
the number of screens.</p>

<p><strong>Modality synchronization.</strong> Multiple modalities
should be appropriately synchronized. Here are some examples:</p>

<ol>
  <li>Stop talking/listening when the visual browser is minimized or exited.</li>
<li>The visual browser verbal browsers should present the same
information at the same time.</li>
<li>In a multifield form, the focus field of the visual browser
should correspond to the field prompt currently presented by the
verbal browser.</li>
</ol>

<p class="suggestion" id="G211">2.10. Synchronize multiple
modalities appropriately.</p>

<p><strong>Simplicity.</strong> Complex user interfaces are
confusing to the user and lead to errors. While this rule applies
to all user interfaces, it is especially important to multimodal
user interfaces.</p>

<p class="suggestion" id="G212">2.11. Keep the user interface as
simple as possible.</p>

<h2 id="Help_Users_Recover_Quickly_and">3.
Help Users Recover Quickly and Efficiently from Errors</h2>

<p>The user interface must help users recover quickly and
efficiently from errors. All users, especially novice users, will
occasionally fail to respond to a prompt appropriately. The user
interface must be designed to detect such errors and assist users
to recover naturally. The multimodal interface also should help
users learn how to use the user interface to achieve the desired
results quickly and efficiently.</p>

<h3 id="Conversational_Suggestions">Conversational
Suggestions</h3>

<p>Principles of conversational discourse suggest that the
suggestions for the nature, content, and format of information
exchanged between two humans may be applied to information
exchanged between a human and a computer.</p>

<p>Reflexive principle. The reflexive principle states that people
tend to respond in the same manner that they are prompted. For
example, if users are given long rambling prompts, they will likely
reply with long rambling responses.</p>

<p class="suggestion" id="G31">3.1. Suggestion: Enable users to use
the same mode that was used to prompt them.</p>

<p>Suggestion examples include:</p>

<ul>
<li>When spoken to, users will use their voices to respond.</li>
<li>When presented with a drawing, users will respond with another
drawing.</li>
<li>When presented with text, users will type their responses.</li>
</ul>

<p>Verbal help. Speech is more immediate and does not obscure
screen contents.</p>

<p class="suggestion" id="G32">3.2. Suggestion: If privacy is not a
concern, use speech as output to provide commentary or help.</p>

<p>Suggestion examples include:</p>

<ul>
<li>Use speech to present short messages such as help
information</li>
<li>Use keys to enter personal identification numbers.</li>
<li>When using an automatic bank teller, always use a keypad to
enter the account number.</li>
<li>When using a weight management application, enable users to
enter their weight using a pen or keypad.</li>
</ul>

<p>When privacy is not a concern, consider using speech for help
and error messages about the current contents in the diaplay,
possibly augmenting the display by highlighting the area in which
the error occurs.</p>

<p><strong>Directed user interface</strong>. While user-directed
and mixed initiative user interfaces may be useful for experienced
users, they are confusing and inhibiting for novice users. Directed
user interfaces always work for all classes of users. Directed
search provides the user with results they want quickly and
accurately.</p>

<p class="suggestion" id="G33">3.3. Suggestion: Use directed user
interfaces unless the user is always knowledgeable and experienced
in the domain.</p>

<p><strong>Context sensitive help</strong>. As an application becomes more complex, 
  offering the user more choices, offering help becomes mandatory. For simple 
  application with fewer choices, the user may need help only the first time the 
  application is run. A novice user may not know the meaning of a field or command.</p>

<p class="suggestion" id="G34">3.4. Suggestion: Always provide context
sensitive help for every field and command</p>

<p>Enable users to learn the purpose and function of every field,
and what values can be entered into the field.</p>

<p>Suggestion example:</p>

<ul>
  <li>It may not be clear to the user if the year field of a data should be two 
    digits for four digits. Context sensitive help should provide instructions 
    and possibly an example to clarify this.</li>
<li>Enable the user to ask "what can I say" or "what can I say
here" as well as "help." Show a list of available commands and/or
options.</li>
</ul>

<p>One advantage of verbal and visual modalities is that help can be offered using 
  speech and/or GUI interfaces.</p>

<h3 id="Reliability_Suggestions">Reliability
Suggestions</h3>

<p>Few situations are more frustrating to users than to have a
device at hand but not be able to use it.</p>

<p><strong>Operational status</strong>. Users need to know when the
device is listening to them speak and when the device is not
listening.</p>

<p class="suggestion" id="G35">3.5. Suggestion: The user always
should be able to easily determine if the device is listening to
the user.</p>

<p>Operational status can be presented as a light or icons
indicating the operational status of the device.</p>

<p>Power status. One especially frustrating situation is when the
device suddenly goes dead because the batteries are low.</p>

<p class="suggestion" id="G36">3.6. Suggestion: For devices with
batteries, user always should be able to easily determine how much
longer the device will be operational.</p>

<p>A suggestion example is:</p>

<ul>
<li>Use icons to indicate present the operational status of a
device, such as one or more icons or colors.Use a green icon to
indicate the that the device is operational. Use yellow to indicate
that power is in short supply. Better yet, display a meter or clock
indicating how much time the battery will continue to support the
operational device. (Note: because about 6 per cent of the male
population has some degree of color blindness, always use another
feature in addition to color. For example, use a "walking person"
icon that is green to indiate the device is operational, a battery
icon that is nearly emply with the color yellow to indicate that
the power is in short supply.)</li>
</ul>

<p>Backup mode. In Section 1, Table 1 summarized the various
strengths and weaknesses of using voice, pen, and keys as input
methods. Because user tasks, environmental situations, and user
distractions change, users should be able to switch modes when it
becomes inconvenient or impossible to use the primary mode of
input.</p>

<p class="suggestion" id="G37">3.7. Suggestion: Support at least two
input modes so one input mode can be used when the other
cannot.</p>

<p>Suggestion examples include:</p>

<ul>
<li>Enable the user to use a keypad when speaking or using a pen in
the event that the speech or handwriting recognition engine
fails.</li>
<li>Enable the user to speak or type if the user loses the pen or
input stylus.</li>
<li>Enable the user to speak if rain or snow will damage a
keypad.</li>
</ul>

<p><strong>Visual feedback</strong>. Sometimes speech recognition
systems misrecognize the words which a user speaks. It is useful to
present words recognized by the speech recognition system to the
user who can verify their correctness. In speech only systems, the
tiresome phrase "Did you say ...?" is the only option. However, in
multimodal systems, the recognized word can be presented on a
display.</p>

<p class="suggestion" id="G38">3.8. Suggestion: Present words
recognized by the speech recognition system on the display so the
user can verify they are correct.</p>

<p><strong>Correction mode</strong>. When the speech recognition
fails, the user needs to correct the error by entering the correct
word. While the user could simply speak again, a better approach is
to display the n-best list (the list of words the the speech
recognizer heard but did not select) so the user can select from
among these options rather than speak again (and possibly
experience the same error).</p>

<p class="suggestion" id="G39">3.9. Suggestion: Display the n-best
list to enable easy speech recognition error correction</p>

<p><strong>Response time.</strong> Response times greater than 5
seconds will significantly reduce usage. If a response time exceeds
this limit, inform the user that the computer is busy processing
the request.</p>

<p class="suggestion" id="G310">3.10. Try to keep response times less
than 5 seconds. Inform the user of longer response times.</p>

<h2 id="Make_Users_Feel_Comfortable">4. Make Users
Feel Comfortable</h2>

<p>Users often judge a computer application by its user interface.
If users do not like the user interface, the application will not
be used. If the user interface is not easy to learn and easy to
use, the application cannot be used successfully.</p>

<h3 id="SpeakingMode">Listening mode</h3>

<p>There are several possible listening modes, including</p>

<ul>
<li><em>Always listening</em> - Generally this requires an
attention word that signals the system that the user is about to
speak. Without first speaking the attention word, the system
assumes that the user is speaking to another person and does not
listen.</li>
<li><em>Push to speak</em> - The user must remember to hold down
the speak key while speaking to the computer</li>
<li><em>Speak after pressing a speak key and then press the speak
key again when finished</em> - The user must remember to press the
speak key a second time after the speaker stops speaking to the
computer.</li>
<li><em>Push to activate</em> - The user only needs to press a
speak key to speak to the computer.</li>
</ul>

<p>In theory, always listening would be the preferred listening mode. However, 
  this mode doesn't always work very well, and it makes heavy use of computer 
  resources. So the generally perferred mode is push to activate.</p>

<p class="suggestion" id="G41">4.1. Suggestion: Use push to activate listening mode 
  speak to a mobile device.</p>

<p>It is easy for users to press a speak key before talking. This
is similar to asking for permission to speak by raising your hand.
However, while speaking, it is desirable to concentrate on what is
being said without worring about holding down a key or pressing a
key when finished speaking.</p>

<h3 id="System_Status">System Status</h3>

<p>Users need feedback to determine whether the computer is
processing input data, is waiting for input, or is
malfunctioning.</p>

<p class="suggestion" id="G42">4.2. Suggestion: Always present the
current system status to the user.</p>

<p>Some suggestions for indicating if the computer is idle or busy
are shown in Table 4.</p>

<table summary="4 columns">
<caption>Table 4: Suggested indicators for the
current system status</caption>
<tr>
<th>Mode</th>
<th>Idle</th>
<th>Busy</th>
<th>Error</th>
</tr>
<tr>
<td>Text</td>
<td>"Ready for next input"</td>
<td>"Processing, please wait"</td>
<td>Explanation for the cause of the error and how to fix it</td>
</tr>
<tr>
<td>Icons</td>
<td>Green*</td>
<td>Red*</td>
<td>Blinking "danger" icon</td>
</tr>
<tr>
<td>Audio</td>
<td>Silence</td>
<td>Sounds of a clicking clock or a percolationg coffee pot</td>
<td>Emergency vehicle siren</td>
</tr>
</table>

<p>* Note: because about 6 per cent of the male population has some
degree of color blindness, always use another feature in addition
to color. For example, use a "standing person" icon that is green
to indiate the device is idle, and a "walking person" icon that is
red to indicate that the current system is busy.</p>

<h3 id="Human_memory_Constraints">Human-memory
Constraints</h3>

<p>Normally, human short-term memory holds only a limited number of items, so 
  it is necessary to keep verbal lists short. Instead of reading a list of options 
  to users, display the list so users will not forget the spoken information.</p>

<p class="suggestion" id="G43">4.3. Suggestion: Use the screen to
ease stress on the user's short-term memory.</p>

<p>Suggestion examples include:</p>

<ul>
<li>If a list of options contains more than 3 to 4 items, display
the list of options on a screen.</li>
<li>If possible, display the results of a query as a table. For
example, display travel schedules as a table instead of presenting
verbal text.</li>
<li>If the text contains more than two sentences, present the text
to the user visually rather than verbally</li>
</ul>

<h3 id="Social_Suggestions">Social Suggestions</h3>

<p>Social customs among people suggest suggestions for user
interfaces between users and devices.</p>

<p>Privacy. Speech presented by the device is not private. Others
in close proximity can hear the computer's speech. The display
provides greater privacy.</p>

<p class="suggestion" id="G44">4.4. Suggestion: If the user may need
privacy and the user is not using a headset, use a display rather
than render speech.</p>

<p>Speech uttered by the user is not private. Others in close
proximity can hear both the user. The keyboard/mouse and pen
provide greater privacy. Also, present asterisks for password
fields.</p>

<p class="suggestion" id="G45">4.5. Suggestion: If the user may need
privacy while he/she enters data, use a pen or keys.</p>

<p>Suggestion examples include:</p>

<ul>
<li>Use keys to enter personal identification numbers.</li>
<li>When using an automatic bank teller, always use a keypad to
enter the account number.</li>
<li>When using a weight management application, enable users to
enter their weight using a pen or keypad.</li>
</ul>

<p>A related suggestion is to present asterisks instead of displaying
private information (e.g., passwords) entered by the user.</p>

<p>Acceptance in meetings. Pen devices are accepted in meetings.
They replace a pen and pad of paper for taking notes. Keyboards and
keypads are becoming acceptable with the widespread use of laptops.
However, key sounds should be turned off. Usually, devices that
speak or are spoken to are not accepted in meetings without 
the use of earphones; and, in some cases, earphones may imply
that the user is not interested in the current discussion.</p>

<p class="suggestion" id="G46">4.6. Suggestion: If the device may be
used during a business meeting or in a public place, and no headset
is used, then use a pen or keys (with the keyboard sounds turned off).</p>

<h3 id="Advertising_Suggestions">Advertising
Suggestions</h3>

<p>Techniques from the field of advertising can be applied to user
interfaces to make them more appealing and interesting to the
user.</p>

<p>Important messages. Users must notice important messages.</p>

<p class="suggestion" id="G47">4.7. Suggestion: Use animation and
sound to attract the user's attention.</p>

<p>A suggestion example is:</p>

<ul>
<li>Animate the delivery of important events and messages so users
will notice them. Often this type of animation is accompanied with
sound, which also attracts the users' attention.</li>
</ul>

<p>Caution: Users tire of animation and sound quickly. Do not
overuse animation and sound.</p>

<p>Navigational aids. It is easy for a user to become "lost in space" when using 
  multimodal applications.</p>

<p class="suggestion" id="G48">4.8. Suggestion: Use landmarks to
help the know where he is.</p>

<p>Example Suggestions include:</p>

<ul>
<li>The "bong" heard at the beginning of long distance telephone
calls indicates the service is being offered by AT&amp;T.</li>
<li>The "Intel Inside" audio logo indicates that Intel supplied the
computer chip inside of a computing device.</li>
<li>Use the sound volume to indicate how close or far a user is
from a landmark.</li>
</ul>

<h3 id="Ambience_Suggestion">Ambience Suggestion</h3>

<p>Television and movie directors set the mood with set design,
lighting, and background music. Screen layout, colors, and
background music also create moods in multimodal user interfaces.
However, in some cases, moods and emotion may not be appropriate in
productivity applications.</p>

<p class="suggestion" id="G49">4.9. Suggestion: Use audio and graphics
design to set the mood and convey emotion in games and
entertainment applications.</p>

<p>Suggestion examples include:</p>

<ul>
<li>Use background music to introduce new scenes with the
appropriate mood. For example, discordant music indicates trouble
lies ahead, cheerful music signals a scene filled with goodwill,
and a dirge indicates a depressing scene.</li>
<li>Use background music to "set the stage." For example, classical
music for an art museum, calliope music for a circus or fun fair,
or bagpipes for a lonely scene in a ghost story.</li>
</ul>

<h3 id="Accessibility_Suggestions">Accessibility Suggestions</h3>

<p>Some users have special needs that when fulfilled, enable them
to gain all the benefits of computing generally available to users
without special needs. Users with limited or no sight, limited or
no hearing, or have a cognitive impairment should be able to
access the computer.</p>

<p class="suggestion" id="G410">4.10. Suggestion: For each traditional
output technique, provide an alternative output technique.</p>

<p>Suggestion examples include:</p>

<ul>
<li>Upon request, provide audio output for each visual output.
Reading values in different voices can highlight their value
and aid comprehension. (Some audio should be presented as sound:
A few well designed audio sounds, used consistently, will conve
meaning very clearly and much more quickly than spoken words.)</li>
<li>Upon request, provide visual output for each audio output.
Provide "closed captioning" for speech and video output. For
verbal messages, use text equivalents or flashing icons.</li>
<li>Consider using tactually controls such as the 12-key touch
pad, and the four-way navigation cross + center. These can be
powerful selection devices for the blind.</li>
</ul>

<p class="suggestion" id="G411">4.11. Suggestion: Enable users
to adjust the output presentation</p>

<p>Example suggestions include:</p>

<ul>
<li>Enable users to adjust the lighting and contrast of their
display for improved readability.</li>
<li>Enable users to adjust the volume and speech of audio for
improved hearing.</li>
<li>Upon request (and when privacy is not a concern), echo the
character string typed by a user as audio.</li>
<li>Enable users to turn off background images to avoid
distraction.</li>
<li>Enable blind users to turn off the screen. This will increase
the user's privacy.</li>
</ul>

<p>Designing user interfaces to support accessibility generally
results in better usability for all users.</p>

<h2 id="Summary">Summary</h2>

<p>Use these suggestions as a checklist when you first construct a multimodal user 
  interface. However, the final decisions about the usefulness and friendliness 
  of the user interface rest in an abundance of iterative usability testing. If 
  users do not like or cannot use the user interface, it does not matter if the 
  suggestions were followed. The user interface needs to be changed so users will 
  like and be productive with it, even when some suggestion may not have been followed. 
  The users' needs should be the foremost concern for multimodal user interface 
  designers and developers.</p>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>The following members of the W3C Multimodal Interaction Working Group contributed 
  suggested suggestions to this Note:</p>
<p>Deborah Dahl, W3C Invited Expert, contributed points that were raised during 
  a tutorial on Multimodal Interfaces presented at the Spring 2006 SpeechTEK/AVIOS 
  meeting.</p>
<p>Ingmar Kliche, T-Systems, contributed suggestions based on his work with developers 
  of multimodal applications at T-Systems.</p>
<p>Gerald McCobb, IBM, contributed suggestions based on his work with developers 
  of multimodal applications at IBM.</p>
</body>
</html>