index.html 90.5 KB

Raw Blame History Permalink

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="EN" lang="EN">
<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
  <title>Voice Extensible Markup Language (VoiceXML) 3.0 Requirements</title>
  <style type="text/css" xml:space="preserve">
.add   { background-color: #FFFF99; }
.remove   { background-color: #FF9999; text-decoration: line-through }
.issues { font-style: italic; font-weight: bold; color: green }

.tocline { list-style: none; }</style>
  <link rel="stylesheet" type="text/css"
  href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" />
</head>

<body>

<div class="head">
<p><a href="http://www.w3.org/"><img alt="W3C"
src="http://www.w3.org/Icons/w3c_home" height="48" width="72" /></a></p>

<h1 class="notoc" id="h1">Voice Extensible Markup Language (VoiceXML) 3.0
Requirements</h1>

<h2 class="notoc" id="date">W3C Working Draft <i>8 August 2008</i></h2>
<dl>
  <dt>This version:</dt>
    <dd><a
      href="http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/">http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/
      </a></dd>
  <dt>Latest version:</dt>
    <dd><a
      href="http://www.w3.org/TR/vxml30reqs/">http://www.w3.org/TR/vxml30reqs/
      </a></dd>
  <dt>Previous version:</dt>
    <dd>This is the first version. </dd>
  <dt>Editors:</dt>
    <dd>Jeff Hoepfinger, SandCherry</dd>
    <dd>Emily Candell, Comverse</dd>
  <dt>Authors:</dt>
    <dd>Jim Barnett, Aspect</dd>
    <dd>Mike Bodell, Microsoft</dd>
    <dd>Dan Burnett, Voxeo</dd>
    <dd>Jerry Carter, Nuance</dd>
    <dd>Scott McGlashan, HP</dd>
    <dd>Ken Rehor, Cisco</dd>
</dl>

<p class="copyright"><a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
© 2008 <a href="http://www.w3.org/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
href="http://www.csail.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> rules apply.</p>
</div>
<hr />

<h2 class="notoc"><a id="abstract" name="abstract">Abstract</a></h2>

<p>The W3C Voice Browser working group aims to develop specifications to
enable access to the Web using spoken interaction. This document is part of a
set of requirement studies for voice browsers, and provides details of the
requirements for marking up spoken dialogs.</p>

<h2><a id="status" name="status">Status of this document</a></h2>

<p><em>This section describes the status of this document at the time of its
publication. Other documents may supersede this document. A list of current
W3C publications and the latest revision of this technical report can be
found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>
at http://www.w3.org/TR/.</em></p>

<p>This is the 8 August 2008 W3C Working Draft of "Voice Extensible Markup
Language (VoiceXML) 3.0 Requirements".</p>

<p>This document describes the requirements for marking up dialogs for spoken
interaction required to fulfill the charter given in <a
href="http://www.w3.org/2006/12/voice-charter.html#scope">the Voice Browser
Working Group Charter</a>, and indicates how the W3C Voice Browser Working
Group has satisfied these requirements via the publication of working drafts
and recommendations. This is a First Public Working Draft. The group does not
expect this document to become a W3C Recommendation.</p>

<p>This document has been produced as part of the <a
href="http://www.w3.org/Voice/Activity.html" shape="rect">W3C Voice Browser
Activity</a>, following the procedures set out for the <a
href="http://www.w3.org/Consortium/Process/" shape="rect">W3C Process</a>.
The authors of this document are members of the <a
href="http://www.w3.org/Voice/" shape="rect">Voice Browser Working Group</a>.
You are encouraged to subscribe to the public discussion list &lt;<a
href="mailto:www-voice@w3.org" shape="rect">www-voice@w3.org</a>&gt; and to
mail us your comments. To subscribe, send an email to &lt;<a
href="mailto:www-voice-request@w3.org"
shape="rect">www-voice-request@w3.org</a>&gt; with the word
<em>subscribe</em> in the subject line (include the word <em>unsubscribe</em>
if you want to unsubscribe). A <a
href="http://lists.w3.org/Archives/Public/www-voice/" shape="rect">public
archive</a> is available online.</p>

<p>This specification is a Working Draft of the Voice Browser working group
for review by W3C members and other interested parties. It is a draft
document and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use W3C Working Drafts as reference material or
to cite them as other than "work in progress".</p>

<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does not expect this document to become a W3C Recommendation. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34665/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>

<p>Publication as a Working Draft does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite this
document as other than work in progress.</p>

<h2><a id="toc" name="toc" shape="rect">Table of Contents</a></h2>
<ul class="toc">
  <li class="tocline">0. <a href="#intro" shape="rect">Introduction</a></li>
  <li class="tocline">1. <a href="#modality-reqs" shape="rect">Modality
    Requirements</a></li>
  <li class="tocline">1.1 <a href="#mod-csmo" shape="rect">Coordinated,
    Simultaneous Multimodal Output</a></li>
  <li class="tocline">1.2 <a href="#mod-usmo" shape="rect">Uncoordinated,
    Simultaneous Multimodal Output</a></li>
  <li class="tocline">2. <a href="#functional-reqs" shape="rect">Functional
    Requirements</a></li>
  <li class="tocline">2.1 <a href="#funct-vcr" shape="rect">VCR
  Controls</a></li>
  <li class="tocline">2.2 <a href="#funct-media" shape="rect">Media
    Control</a></li>
  <li class="tocline">2.3 <a href="#funct-siv" shape="rect">Speaker
    Verification</a></li>
  <li class="tocline">2.4 <a href="#funct-event" shape="rect">External Event
    Handling while a dialog in progress</a></li>
  <li class="tocline">2.5 <a href="#funct-pls" shape="rect">Pronunciation
    Lexicon Specification</a></li>
  <li class="tocline">2.6 <a href="#funct-emma" shape="rect">EMMA</a></li>
  <li class="tocline">2.7 <a href="#funct-upload" shape="rect">Synchronous
    Upload of Recordings</a></li>
  <li class="tocline">2.8 <a href="#funct-speed" shape="rect">Speed
    Control</a></li>
  <li class="tocline">2.9 <a href="#funct-volume" shape="rect">Volume
    Control</a></li>
  <li class="tocline">2.10 <a href="#funct-record" shape="rect">Media
    Recording</a></li>
  <li class="tocline">2.11 <a href="#funct-mediaformat" shape="rect">Media
    Formats</a></li>
  <li class="tocline">2.12 <a href="#funct-datamodel" shape="rect">Data
    Model</a></li>
  <li class="tocline">2.13 <a href="#funct-submitprocessing"
    shape="rect">Submit Processing</a></li>
  <li class="tocline">3. <a href="#format-reqs" shape="rect">Format
    Requirements</a></li>
  <li class="tocline">3.1 <a href="#format-flow" shape="rect">Flow
    Language</a></li>
  <li class="tocline">3.2 <a href="#format-semmod" shape="rect">Semantic
    Model Definition</a></li>
  <li class="tocline">4. <a href="#other-reqs" shape="rect">Other
    Requirements</a></li>
  <li class="tocline">4.1 <a href="#other-vxml" shape="rect">Consistent with
    other Voice Browser Working Group Specs</a></li>
  <li class="tocline">4.2 <a href="#other-other" shape="rect">Consistent with
    other Specs</a></li>
  <li class="tocline">4.3 <a href="#other-simplify" shape="rect">Simplify
    existing VoiceXML Tasks</a></li>
  <li class="tocline">4.4 <a href="#other-maintain" shape="rect">Maintain
    Functionality from Previous VXML Versions</a></li>
  <li class="tocline">4.5 <a href="#other-crs" shape="rect">Address Change
    Requests from Previous VXML Versions</a></li>
  <li class="tocline">5. <a href="#acknowledgments"
    shape="rect">Acknowledgments</a></li>
  <li class="tocline">Appendix A. <a href="#prev-reqs" shape="rect">Previous
    Requirements</a></li>
</ul>

<h2><a id="intro" name="intro">0. Introduction</a></h2>

<p>The main goal of this activity is to establish the current status of the
Voice Browser Working Group Activities relative to the requirements defined
in <a href="http://www.w3.org/TR/1999/WD-voice-dialog-reqs-19991223">Previous
Requirements Document</a> and define additional requirements to drive future
Voice Browser Working Group activities based on Voice Community experience
with existing standards</p>

<p>The process will consist of the following steps:</p>
<ol>
  <li>Identify how the existing requirements have been satisfied by the
    standards defined by the Voice Browser Working Group, other W3C Working
    Groups or other standards bodies. Note that references to VoiceXML 2.0
    imply that VoiceXML 2.1 also satisfies the requirement.</li>
  <li>Identify the requirements that have not yet been satisfied and
    determine if they are still valid requirements</li>
  <li>Identify new requirements based on input from working group members and
    submission to the W3C Voice Browser Public Mailing List &lt;<a
    href="mailto:www-voice@w3.org">www-voice@w3.org</a>&gt; (<a
    href="http://www.w3.org/Archives/Public/www-voice/">archive</a>)</li>
  <li>Prioritize remaining requirements and identify road map by which the
    Voice Browser Working Group plans to address these items</li>
</ol>

<h3><a id="S0_1" name="S0_1"></a>0.1 Scope</h3>

<p>The previous requirements definition activity focused on defining three
types of requirements on the voice markup language: modality, functional, and
format.</p>
<ul>
  <li><b>Modality</b> requirements concern the types of modalities (media in
    combination with an input/output mechanism) supported by the markup
    language for user input and system output. (For the Voice Browser Working
    Group, the modalities supported are speech, video and DTMF. Requirements
    regarding other modalities will be handled by the <a
    href="http://www.w3.org/2002/mmi/">Multimodal Interaction Working
    Group.</a>)</li>
  <li><b>Functional</b> requirements concern the behavior (or operational
    semantics) which results from interpreting a voice markup language.</li>
  <li><b>Format</b> requirements constrain the format (or syntax) of the
    voice markup language itself.</li>
</ul>

<p>The environment and capabilities of the voice browser interpreting the
markup language affects these requirements. There may be differences in the
modality and functional requirements for desktop versus telephony-based
environments (and in the latter case, between fixed, mobile and Internet
telephony environments). The capabilities of the voice browser device also
impacts on requirements. Requirements affected by the environment or
capabilities of the voice browser device will be explicitly marked as
such.</p>

<h3><a id="S0_2" name="S0_2"></a>0.2 Terminology</h3>

<p>Although defining a dialog is highly problematic, some basic definitions
must be provided to establish a common basis of understanding and avoid
confusion. The following terminology is based upon an event-driven model of
dialog interaction.<br />
<br />
</p>

<table summary="first column gives term, second gives description" border="1"
cellpadding="6" width="85%">
  <tbody>
    <tr>
      <th>Voice Markup Language</th>
      <td>a language in which voice dialog behavior is specified. The
        language may include reference to style and scripting elements which
        can also determine dialog behavior.</td>
    </tr>
    <tr>
      <th>Voice Browser</th>
      <td>a software device which interprets a voice markup language and
        generates a dialog with voice output and/or input, and possibly other
        modalities.</td>
    </tr>
    <tr>
      <th>Dialog</th>
      <td>a model of interactive behavior underlying the interpretation of
        the markup language. The model consists of states, variables, events,
        event handlers, inputs and outputs.</td>
    </tr>
    <tr>
      <th>State</th>
      <td>the basic interactional unit defined in the markup language; for
        example, an &lt; input &gt; element in HTML. A state can specify
        variables, event handlers, outputs and inputs. A state may describe
        output content to be presented to the user, input which the user can
        enter, event handlers describing, for example, which variables to
        bind and which state to transition to when an event occur.</td>
    </tr>
    <tr>
      <th>Events</th>
      <td>generated when a state is executed by the voice browser; for
        example, when outputs or inputs in a state are rendered or
        interpreted. Events are typed and may include information; for
        example, an input event generated when an utterance is recognized may
        include the string recognized, an interpretation, confidence score,
        and so on.</td>
    </tr>
    <tr>
      <th>Event Handlers</th>
      <td>are specified in the voice markup language and describe how events
        generated by the voice browser are to be handled. Interpretation of
        events may bind variables, or map the current state into another
        state (possibly itself).</td>
    </tr>
    <tr>
      <th>Output</th>
      <td>content specified in an element of the markup language for
        presentation to the user. The content is rendered by the voice
        browser; for example, audio files or text rendered by a TTS. Output
        can also contain parameters for the output device; for example,
        volume of audio file playback, language for TTS, etc. Events are
        generated when, for example, the audio file has been played.</td>
    </tr>
    <tr>
      <th>Input</th>
      <td>content (and its interpretation) specified in an element of the
        markup language which can be given as input by a user; for example, a
        grammar for DTMF and speech input. Events are generated by the voice
        browser when, for example, the user has spoken an utterance and
        variables may be bound to information contained in the event. Input
        can also specify parameters for the input device; for example,
        timeout parameters, etc.</td>
    </tr>
  </tbody>
</table>

<p>The dialog requirements for the voice markup language are annotated with
the following priorities. If a feature is deferred from the initial
specification to a future release, consideration may be given to leaving open
a path for future incorporation of the feature.<br />
<br />
</p>

<table summary="first column gives priority name, second its description"
border="1" cellpadding="6" width="85%">
  <tbody>
    <tr>
      <th>must have</th>
      <td>The first official specification must define the feature.</td>
    </tr>
    <tr>
      <th>should have</th>
      <td>The first official specification should define the feature if
        feasible but may defer it until a future release.</td>
    </tr>
    <tr>
      <th>nice to have</th>
      <td>The first official specification may define the feature if time
        permits, however, its priority is low.</td>
    </tr>
    <tr>
      <th>future revision</th>
      <td>It is not intended that the first official specification include
        the feature.</td>
    </tr>
  </tbody>
</table>

<h2><a id="modality-reqs" name="modality-reqs">1. Modality
Requirements</a></h2>
<!-- <p><span class="owner">Owner: Scott McGlashan</span><br /> -->
<!-- <span class="note">Note: These requirements will be coordinated with the -->
<!-- Multimodal Interaction Subgroup.</span></p> -->

<h3><a id="mod-csmo" name="mod-csmo">1.1 Coordinated, Simultaneous Multimodal
Output (nice to have)</a></h3>

<p>1.1.1 The markup language specifies that content is to be simultaneously
rendered in multiple modalities (e.g. audio and video) and that output
rendering is coordinated. For example, graphical output on a cellular
telephone display is coordinated with spoken output.</p>

<h3><a id="mod-usmo" name="mod-usmo">1.2 Uncoordinated, Simultaneous
Multimodal Output (nice to have)</a></h3>

<p>1.2.1 The markup language specifies that content is to be simultaneously
rendered in multiple modalities (e.g. audio and video) and that output
rendering is uncoordinated. For example, graphical output on a cellular
telephone display is uncoordinated with spoken output.</p>

<h2><a id="functional-reqs" name="functional-reqs">2. Functional
Requirements</a></h2>

<p>These requirements are intended to ensure that the markup language is
capable of specifying cooperative dialog behavior characteristic of
state-of-the-art spoken dialog systems. In general, the voice browser should
compensate for its own limitations in knowledge and performance compared with
equivalent human agents; for example, compensate for limitations in speech
recognition capability by confirming spoken user input when necessary.</p>

<h3><a id="funct-vcr" name="funct-vcr">2.1 VCR Controls (must have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete.</span></p> -->

<h4><a id="S2_1_1" name="S2_1_1"></a>2.1.1 VoiceXML 3.0 MUST provide a
mechanism giving an application developer a high-level of control of audio
and video playback.</h4>

<h4><a id="S2_1_1_1" name="S2_1_1_1"></a>2.1.1.1 It MUST be possible to
invoke media controls by DTMF or speech input (other input mechanisms may be
supported).</h4>

<h4><a id="S2_1_1_2" name="S2_1_1_2"></a>2.1.1.2 Media controls MUST not
disable normal user input: i.e. input for media control and input for
application input MUST be possible simultaneously.</h4>

<h4><a id="S2_1_1_3" name="S2_1_1_3"></a>2.1.1.3 Input associated with media
controls MUST be treated in the same way as other inputs. Resolution of best
match follows standard VoiceXML 2.0 precedence and scoping rules.</h4>

<h4><a id="S2_1_1_4" name="S2_1_1_4"></a>2.1.1.4 It MUST be possible for user
input to be interpreted as seek controls -- fast forward and rewind -- during
media output playback.</h4>

<h4><a id="S2_1_1_5" name="S2_1_1_5"></a>2.1.1.5 The seek control MUST allow
fast forward and rewind to be specified in time - seconds, milliseconds -
relative to the current playback position.</h4>

<h4><a id="S2_1_1_6" name="S2_1_1_6"></a>2.1.1.6 The seek control MUST allow
fast forward and rewind to be specified relative to &lt;mark&gt; elements in
the output.</h4>

<h4><a id="S2_1_1_7" name="S2_1_1_7"></a>2.1.1.7 The seek control MUST not
affect the selection of alternative content: i.e. the same (alternative)
content MUST be used.</h4>

<h4><a id="S2_1_1_8" name="S2_1_1_8"></a>2.1.1.8 It MUST be possible for user
input to be interpreted as pause/resume during media output playback.</h4>

<h4><a id="S2_1_1_9" name="S2_1_1_9"></a>2.1.1.9 It MUST be possible for the
different inputs to control pause and resume.</h4>

<h3><a id="funct-media" name="funct-media">2.2 Media Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: These requirements were reversed engineered from the -->
<!-- VoiceXML 3.0 spec editor's draft.</span></p> -->

<h4><a id="S2_2_1_" name="S2_2_1_"></a>2.2.1. It MUST be possible to specify
a media clip begin value, specified in time, as an offset from the start of
the media clip to begin playback.</h4>

<h4><a id="S2_2_2_" name="S2_2_2_"></a>2.2.2. It MUST be possible to specify
a media clip end value, specified in time, as an offset from the start of the
media clip to end playback.</h4>

<h4><a id="S2_2_3_" name="S2_2_3_"></a>2.2.3. It MUST be possible to specify
a repeat duration, specified in time, as the amount of time the media file
will repeat playback.</h4>

<h4><a id="S2_2_4_" name="S2_2_4_"></a>2.2.4. It MUST be possible to specify
a repeat count, specified as a non-negative integer, as the number of times
the media file will repeat playback.</h4>

<h4><a id="S2_2_5_" name="S2_2_5_"></a>2.2.5. It MUST be possible to specify
a gain , specified as a percentage, as the percent to adjust the amplitude
playback of the original waveform.</h4>

<h4><a id="S2_2_6_" name="S2_2_6_"></a>2.2.6. It MUST be possible to specify
a speed, specified as a percentage, as the percent to adjust the speed
playback of the original waveform.</h4>

<h3><a id="funct-siv" name="funct-siv">2.3 Speaker Verification (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Ken Rehor</span><br /> -->
<!-- <span class="note">Note: Ken reviewed and thought these were -->
<!-- complete</span></p> -->

<h4><a id="S2_3_1" name="S2_3_1"></a>2.3.1 The markup language MUST provide
the ability to verify a speaker's identity through a dialog containing both
acoustic verification and knowledge verification.</h4>

<p>The acoustic verification may compare speech samples to an existing model
(kept in some, possibly external, repository) of that speaker's voice. A
verification result returns a value indicating whether the acoustic and
knowledge tests were accepted or rejected. Results for verification and
results for recognition may be returned simultaneously.</p>

<h4><a id="S2_3_1_1" name="S2_3_1_1"></a>2.3.1.1 VoiceXML 3.0 MUST support
SIV for end-user dialogs</h4>

<p>Note: The security administrator's interface is out-of-scope for
VoiceXML.</p>

<h4><a id="S2_3_1_2" name="S2_3_1_2"></a>2.3.1.2 SIV features MUST be
integrated with VoiceXML 3.0.</h4>

<p>SIV features such as enrollment and verification are voice dialogs. SIV
must be compatible and complementary with other VoiceXML 3.0 dialog
constructs such as speech recognition.</p>

<h4><a id="S2_3_1_3" name="S2_3_1_3"></a>2.3.1.3 VoiceXML 3.0 MUST be able to
be used without SIV.</h4>

<p>SIV features must be part of VoiceXML 3.0 but may not be needed in all
application scenarios or implementations. Not all voice dialogs need SIV.</p>

<h4><a id="S2_3_1_4" name="S2_3_1_4"></a>2.3.1.4 SIV MUST be able to be used
without other input modalities.</h4>

<p>Some SIV processing techniques operate without using any ASR.</p>

<h4><a id="S2_3_1_5" name="S2_3_1_5"></a>2.3.1.5 SIV features MUST be able to
operate in multi-factor environments.</h4>

<p>Some applications require the use of SIV along with other means of
authentication: biometric (e.g. fingerprint, hand, retina, DNA) or
non-biometric (e.g. caller ID, geolocation, personal knowledge, etc.).</p>

<h4><a id="S2_3_1_6" name="S2_3_1_6"></a>2.3.1.6 SIV-specific events MUST be
defined.</h4>

<p>SIV processing engines and network protocols (e.g. MRCP) generate events
related to their operation and use. These events must be made available in a
manner consistent with other VoiceXML events. Event naming structure must
allow for vendor-specific and application-specific events.</p>

<h4><a id="S2_3_1_7" name="S2_3_1_7"></a>2.3.1.7 SIV-specific properties MUST
be defined.</h4>

<p>These properties are provided to configure the operation of the SIV
processing engines (analogous to "Generic Speech Recognition Properties"
defined in <a href="http://www.w3.org/TR/voicexml20/#dml6.3.2">VoiceXML 2.0
Section 6.3.2</a>).</p>

<h4><a id="S2_3_1_8" name="S2_3_1_8"></a>2.3.1.8 The SIV result MUST be
available in the result structure used by the host environment (e.g. VoiceXML
3.0, MMI).</h4>

<p>Note that this does not require EMMA in all cases, such as non-VoiceXML
3.0 environments. This also does not specify the version of EMMA.</p>

<h4><a id="S2_3_1_8_1" name="S2_3_1_8_1"></a>2.3.1.8.1 VoiceXML 3.0 SIV
result MUST be representable in EMMA.</h4>

<p>VoiceXML 3.0 must specify the format of the result structure and version
of EMMA.</p>

<h4><a id="S2_3_1_9" name="S2_3_1_9"></a>2.3.1.9 SIV syntax SHOULD adhere to
the W3C guidelines for security handling.</h4>

<p>This includes:</p>
<ul>
  <li>XML encryption</li>
  <li>XML signature processing,</li>
  <li>possibly TLS or non-XML security, such as the NIST SP 800-63 guideline
    for remote authentication.</li>
</ul>

<p>The following security aspects are out-of-charter for VoiceXML:<br />
</p>
<ul>
  <li>The security administrator's interface</li>
  <li>Whether security aspects may be modified by the security
  administrators</li>
  <li>Requirements for securing the SIV data</li>
</ul>

<h4><a id="S2_3_1_11" name="S2_3_1_11"></a>2.3.1.11 SIV features MUST support
enrollment.</h4>

<p>Enrollment is the process of collecting voice samples from a person and
the subsequent generation and storage of voice reference models associated
with that person.</p>

<h4><a id="S2_3_1_12" name="S2_3_1_12"></a>2.3.1.12 SIV features MUST support
verification.</h4>

<p>Verification is the process of comparing an utterance against a single
reference model based on a single claimed identity (e.g., user ID, account
number). A verification result includes both a score and a decision.</p>

<h4><a id="S2_3_1_13" name="S2_3_1_13"></a>2.3.1.13 SIV features MUST support
identification.</h4>

<p>Identification is verification with multiple identity claims. An
identification result includes both the verification results for all of the
individual identity claims, and the identifier of a single reference model
that matches the input utterance best.</p>

<h4><a id="S2_3_1_14" name="S2_3_1_14"></a>2.3.1.14 SIV features SHOULD
support supervised adaptation.</h4>

<p>The application should have control over whether a voice model is updated
or modified based on the results of a verification.<br />
</p>

<h4><a id="S2_3_1_15" name="S2_3_1_15"></a>2.3.1.15 SIV features MUST support
concurrent SIV processing.</h4>

<p>An application developer must be able to specify at the individual turn
level that one or more of the following types of processing need to be
performed concurrently:</p>
<ul>
  <li>ASR</li>
  <li>Audio recording</li>
  <li>Buffering (SIV)</li>
  <li>Authentication (SIV)</li>
  <li>Enrollment (SIV)</li>
  <li>Adaptation (SIV)</li>
</ul>
Note: "Concurrent" means at the dialog specification level. A platform may
choose to implement these functions sequentially.

<h4><a id="S2_3_1_15_1" name="S2_3_1_15_1"></a>2.3.1.15.1 SIV features SHOULD
support other concurrent audio processing.</h4>

<p>Concurrent processing of other forms of audio processing (e.g., channel
detection, gender detection) should also be permitted but remain optional.</p>

<h4><a id="S2_3_1_16" name="S2_3_1_16"></a>2.3.1.16 SIV features MUST be able
to accept text from the application for presentation to the user.</h4>

<p>Text-prompted SIV applications require prompts to match the expected
response. The application is responsible for the content of the dialog but
VoiceXML is responsible for the presentation.</p>

<h4><a id="S2_3_1_16_1" name="S2_3_1_16_1"></a>2.3.1.16.1 SIV SHOULD be
architecturally agnostic</h4>

<p>Many different SIV processing technologies exist. The VoiceXML 3.0 SIV
architecture should avoid dependencies upon specific engine technologies.</p>

<h3><a id="funct-event" name="funct-event">2.4 External Event handling while
a dialog is in progress (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->

<h4><a id="S2_4_1" name="S2_4_1"></a>2.4.1 It MUST be possible for external
entities to inject events into running dialogs. The dialog author MUST be
able to control when such events are processed and what actions are taken
when they are processed.</h4>

<h4><a id="S2_4_2" name="S2_4_2"></a>2.4.2 Among the possible results of
processing such events MUST be pausing, resuming, and terminating the dialog.
The VoiceXML 3.0 specification MAY define default handlers for certain such
external events.</h4>

<h4><a id="S2_4_3" name="S2_4_3"></a>2.4.3 It MUST be possible for running
dialogs to send events into the <a
href="http://www.w3.org/TR/mmi-arch/">Multimodal Interaction
Framework.</a></h4>

<h3><a id="funct-pls" name="funct-pls"></a>2.5 <a
href="http://www.w3.org/TR/pronunciation-lexicon/">Pronunciation Lexicon
Specification (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: There was some discussion in Orlando F2F on being -->
<!-- able to define lexicons using normal scoping rules, but there was no -->
<!-- agreement reached</span> </p> -->

<h4><a id="S2_5_1" name="S2_5_1"></a>2.5.1 The author MUST be able to define
lexicons that span an entire VoiceXML application.</h4>

<h3><a id="funct-emma" name="funct-emma"></a>2.6 <a
href="http://www.w3.org/TR/emma/">EMMA Specification (must have)</a></h3>

<h4><a id="S2_6_1_" name="S2_6_1_"></a>2.6.1. The application author MUST be
able to specify the preferred format of the input result within VoiceXML. If
not specified, the default format is EMMA.</h4>

<h4><a id="S2_6_2" name="S2_6_2"></a>2.6.2 All available semantic information
(ie. content that could have meaning) from the input MUST be accessible to
the application author. This result MUST be navigable by the application
author.</h4>

<p>The exact form of navigation will depend on the format and decisions
around the preferred data model made by the working group. If the result is a
string, string processing functions are expected to be available. If the
result is an XML document, DOM or E4X-like functions are expected to be
supported.</p>

<h4><a id="S2_6_3_" name="S2_6_3_"></a>2.6.3. VoiceXML 3 (or profiles) MUST
describe how the default result format is mapped into the application's data
model.</h4>

<p>VoiceXML 3 will declare one or more mandatory result formats.</p>

<h4><a id="S2_6_4" name="S2_6_4"></a>2.6.4 The application author SHOULD be
able to specify specific result content not to be logged.</h4>

<p>This will allow the author to prevent logging of confidential or sensitive
information.</p>

<h3><a id="funct-upload" name="funct-upload">2.7 Synchronous Upload of
Recordings (must have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->

<h4><a id="S2_7_1" name="S2_7_1"></a>2.7.1 VoiceXML 3.0 MUST enable
synchronous uploads of recordings while the recording is in progress</h4>

<h4><a id="S2_7_1_1" name="S2_7_1_1"></a>2.7.1.1 It MUST be possible to
specify the upload destination of the recording in the &lt;record&gt;
element</h4>

<h4><a id="S2_7_1_2" name="S2_7_1_2"></a>2.7.1.2 The upload destination MUST
be an HTTP URI</h4>

<h4><a id="S2_7_1_3" name="S2_7_1_3"></a>2.7.1.3 The application developer
MAY specify HTTP PUT or HTTP POST as the recording upload method</h4>

<h4><a id="S2_7_1_4" name="S2_7_1_4"></a>2.7.1.4 This feature MUST be
backward compatible with VoiceXML 2.0/2.1 record functionality</h4>

<h3><a id="funct-speed" name="funct-speed">2.8 Speed Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->

<h4><a id="S2_8_1" name="S2_8_1"></a>2.8.1 It MUST be possible for user input
to change the speed of media output playback.</h4>

<h4><a id="S2_8_2" name="S2_8_2"></a>2.8.2 It MUST be possible to map the
values for speed control to the rate attribute of prosody</h4>

<h4><a id="S2_8_3" name="S2_8_3"></a>2.8.3 Values for speed controls MAY be
specified as properties which follow the standard VoiceXML scoping model.
Default values are specified at session scope. Values specified on the
control element take priority over inherited properties.</h4>

<h3><a id="funct-volume" name="funct-volume">2.9 Volume Control (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
<!-- <span class="note">Note: Emily reviewed and felt these were -->
<!-- complete</span></p> -->

<h4><a id="S2_9_1" name="S2_9_1"></a>2.9.1 It MUST be possible for user input
to change the volume of media output playback.</h4>

<h4><a id="S2_9_1_1" name="S2_9_1_1"></a>2.9.1.1 Values for volume controls
MAY be specified as properties which follow the standard VoiceXML scoping
model. Default values are specified at session scope. Values specified on the
control element take priority over inherited properties.</h4>

<h4><a id="S2_9_1_2" name="S2_9_1_2"></a>2.9.1.2 It MUST be possible to map
the values for volume control to the volume attribute of prosody in SSML.</h4>

<h3><a id="funct-record" name="funct-record">2.10 Media Recording (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Ken Rehor</span></p> -->

<h4><a id="S2_10_1" name="S2_10_1"></a>2.10.1 Recording Modes</h4>

<p>Form item recording mode (Requirements section 2.10.1.1 and 2.10.1.2)
captures media from the caller (only) during the collect phase of a dialog.
Partial- and Whole-Session recording captures media from the caller, system,
and/or called party (in the cases of a transferred endpoint) in a
multichannel or single (mixed) channel recording. Duration of these
recordings depends on the type.</p>

<h4><a id="S2_10_1_1" name="S2_10_1_1"></a>2.10.1.1 Form Item equivalent
(e.g. VoiceXML 2.0 &lt;record&gt;)</h4>
<!-- <span class="note">Note: Audio endpointing controls are defined in Section
2.10.3.</span> -->

<h4><a id="S2_10_1_1_1" name="S2_10_1_1_1"></a>2.10.1.1.1 VoiceXML 3.0 MUST
be able to record input from a user.</h4>

<h4><a id="S2_10_1_2" name="S2_10_1_2"></a>2.10.1.2 Utterance Recording</h4>
<!-- <span class="note">Note: Should this be generalized to handle other media -->
<!-- like video?<br /> -->
<!-- Note: Should this be supported in the case of DTMF-only?</span> -->

<p>Utterance recording mode is recording that occurs during an ASR or SIV
form item. The audio may be endpointed, usually by the speech engine.</p>

<h4><a id="S2_10_1_2_1" name="S2_10_1_2_1"></a>2.10.1.2.1 VoiceXML 3.0 MUST
support recording of a user's utterance during an form item
[recordutterance]</h4>

<h4><a id="S2_10_1_2_2" name="S2_10_1_2_2"></a>2.10.1.2.2 VoiceXML 3.0 MUST
support the control of utterance recording via a &lt;property&gt;.</h4>

<h4><a id="S2_10_1_2_3" name="S2_10_1_2_3"></a>2.10.1.2.3 VoiceXML 3.0 MUST
support the control of utterance recording via an attribute on input
items.</h4>

<h4><a id="S2_10_1_3" name="S2_10_1_3"></a>2.10.1.3 Session Recording</h4>

<p>Session recording begins with a start command. It continues until:</p>
<ul>
  <li>a pause command;  a resume command continues recording;</li>
  <li>a stop command;</li>
  <li>the end of the VoiceXML session;</li>
  <li>an error occurs.</li>
</ul>

<p>Recording configuration and parameter requirements are defined in Section
2.10.2.</p>

<h4><a id="S2_10_1_3_1" name="S2_10_1_3_1"></a>2.10.1.3.1 VoiceXML 3.0 MUST
be able to record part of a VoiceXML session.</h4>

<h4><a id="S2_10_1_3_2" name="S2_10_1_3_2"></a>2.10.1.3.2 VoiceXML 3.0 MUST
be able to record an entire dialog.</h4>

<h4><a id="S2_10_1_4" name="S2_10_1_4"></a>2.10.1.4 Restricted Session
Recording</h4>

<p>Restricted session recording begins with a start command and continues
until:</p>
<ul>
  <li>the end of the session;</li>
  <li>an error occurs.</li>
</ul>

<p>See Table 1 for applicable controls.</p>

<h4><a id="S2_10_1_5" name="S2_10_1_5"></a>2.10.1.5 Multiple instances</h4>

<h4><a id="S2_10_1_5_1" name="S2_10_1_5_1"></a>2.10.1.5.1 VoiceXML 3.0 MUST
be able to support multiple simultaneous recordings of different types during
a call.</h4>

<h4><a id="S2_10_2_" name="S2_10_2_"></a>2.10.2. Recording Configuration and
Parameters</h4>

<p>This matrix specifies which features apply to which recording types.</p>

<table style="text-align: left; width: 722px; height: 166px;" border="1"
cellpadding="1" cellspacing="0">
  <tbody>
    <tr>
      <td>Feature Requirement /<br />
        Recording type</td>
      <td>Dialog</td>
      <td>Utterance</td>
      <td>Session</td>
      <td>Restricted<br />
        Session</td>
    </tr>
    <tr>
      <td>2.10.2.1 Recording starts when caller begins speaking</td>
      <td>Y</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.2 Initial silence interval cancels recording</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.3 Final silence ends recording</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.4 Maximum recording time</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.5 Terminate recording with DTMF input</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.6 Grammar control: modal operation</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.7 Media format</td>
      <td>Y</td>
      <td>Y</td>
      <td>Y</td>
      <td>Y</td>
    </tr>
    <tr>
      <td>2.10.2.8 Recording indicator</td>
      <td>N</td>
      <td>N</td>
      <td>Y</td>
      <td>N</td>
    </tr>
    <tr>
      <td>2.10.2.9 Channel assignment</td>
      <td>N</td>
      <td>N</td>
      <td>Y</td>
      <td>Y</td>
    </tr>
    <tr>
      <td>2.10.2.10 Channel groups</td>
      <td>N</td>
      <td>N</td>
      <td>Y</td>
      <td>Y</td>
    </tr>
    <tr>
      <td>2.10.2.11 Buffer control</td>
      <td>Y</td>
      <td>Y</td>
      <td>N</td>
      <td>N</td>
    </tr>
  </tbody>
</table>

<p>Table 1: Recording Configuration and Parameter Application</p>

<p>(Attributes from VoiceXML 2.0 are indicated in brackets [].)</p>

<h4><a id="S2_10_2_1" name="S2_10_2_1"></a>2.10.2.1 Recording starts when
caller begins speaking</h4>

<p>VoiceXML 3.0 must support dynamic start-of-recording based on when a
caller starts to speak</p>

<p>Voice Activity Detection used to determine when to initiate
recording. This feature can be disabled.</p>

<h4><a id="S2_10_2_2" name="S2_10_2_2"></a>2.10.2.2 Initial silence interval
cancels recording</h4>

<p>VoiceXML 3.0 must support specification of an interval of silence at the
beginning of the recording cycle to terminate recording [timeout].</p>

<p>A noinput event will be thrown if no audio is collected.</p>

<h4><a id="S2_10_2_3" name="S2_10_2_3"></a>2.10.2.3 Final silence ends
recording</h4>

<p>VoiceXML 3.0 must support specification of an interval of silence that
indicates end of speech to terminate recording [finalsilence]</p>

<p>Voice Activity Detection used to determine when to stop recording. This
feature can be disabled.</p>

<p>Finalsilence interval may be used to specify the amount of silent audio to
be removed from the recording.</p>

<h4><a id="S2_10_2_4" name="S2_10_2_4"></a>2.10.2.4 Maximum recording
time</h4>

<p>VoiceXML 3.0 must support specification of the maximum allowable recording
time [maxtime].</p>

<h4><a id="S2_10_2_5" name="S2_10_2_5"></a>2.10.2.5 Terminate recording via
DTMF input</h4>

<p>VoiceXML 3.0 must provide a mechanism to control DTMF termination of an
active record [dtmfterm]</p>

<h4><a id="S2_10_2_6" name="S2_10_2_6"></a>2.10.2.6 Grammar control: Modal
operation</h4>

<h4><a id="S2_10_2_6_1" name="S2_10_2_6_1"></a>2.10.2.6.1 VoiceXML 3.0 MUST
provide a mechanism to control whether non-local DTMF grammars are active
during recording [modal]</h4>

<h4><a id="S2_10_2_6_2" name="S2_10_2_6_2"></a>2.10.2.6.2 VoiceXML 3.0 MUST
provide a mechanism to control whether non-local speech recognition grammars
are active during recording [modal]</h4>

<h4><a id="S2_10_2_7" name="S2_10_2_7"></a>2.10.2.7 Media format</h4>

<p>VoiceXML 3.0 must enable specification of the media type of the recording
[type]</p>

<h4><a id="S2_10_2_8" name="S2_10_2_8"></a>2.10.2.8 Recording Indicator</h4>

<h4><a id="S2_10_2_8_1" name="S2_10_2_8_1"></a>2.10.2.8.1 VoiceXML 3.0 MUST
optionally support playing a beep tone to the user before recording begins.
[beep]</h4>

<h4><a id="S2_10_2_8_2" name="S2_10_2_8_2"></a>2.10.2.8.2 VoiceXML 3.0 MUST
optionally support displaying a visual indication to the user before
recording begins.</h4>

<h4><a id="S2_10_2_8_3" name="S2_10_2_8_3"></a>2.10.2.8.3 VoiceXML 3.0 MUST
optionally support displaying a visual indication to the user during
recording.</h4>

<p>Use cases:</p>
<ol>
  <li>Display a countdown timer to indicate when recording will begin (could
    be accomplished by playing a file immediately before the record
  function)</li>
  <li>Display an indicator while recording is active (e.g. full screen,
    partial screen, icon, etc.)</li>
</ol>

<h4><a id="S2_10_2_9" name="S2_10_2_9"></a>2.10.2.9 Channel Assignment</h4>

<h4><a id="S2_10_2_9_1" name="S2_10_2_9_1"></a>2.10.2.9.1 VoiceXML 3.0 MUST
be able to record and store each media path independently.</h4>

<h4><a id="S2_10_2_9_2" name="S2_10_2_9_2"></a>2.10.2.9.2 VoiceXML 3.0 MUST
enable each media path to be recorded in the same multi-channel file.</h4>

<h4><a id="S2_10_2_9_3" name="S2_10_2_9_3"></a>2.10.2.9.3 VoiceXML 3.0 MUST
enable each media path to be recorded into separate files.</h4>

<h4><a id="S2_10_2_9_4" name="S2_10_2_9_4"></a>2.10.2.9.4 VoiceXML 3.0 MAY be
able to mix all voice paths into a single recording channel.</h4>

<h4><a id="S2_10_2_10" name="S2_10_2_10"></a>2.10.2.10 Channel Groups</h4>

<h4><a id="S2_10_2_10_1" name="S2_10_2_10_1"></a>2.10.2.10.1 One or more
channels within the same session MUST be controllable as a group.</h4>

<p>These groups can be used to simultaneously apply other recording controls
to more than one media channel (e.g. mute two channels simultaneously).
Applies whether channels are in same file or in separate files (implies
concept of group of channels *not* part of the same file).</p>

<p>A command to "start recording" must specify the details for that recording
session:</p>
<ul>
  <li>media type</li>
  <li>number of channels and channel assignment (e.g. channel x, group y
     represented as a variable of the format x.y)</li>
  <li>channel assignment</li>
  <li>(specific parameters to be determined)</li>
</ul>

<h4><a id="S2_10_2_11" name="S2_10_2_11"></a>2.10.2.11 Buffer Controls</h4>

<h4><a id="S2_10_2_11_1" name="S2_10_2_11_1"></a>2.10.2.11.1 VoiceXML 3.0
MUST provide a mechanism to enable additional recording time before the start
of speaking ("pre" buffer)</h4>

<h4><a id="S2_10_2_11_2" name="S2_10_2_11_2"></a>2.10.2.11.2 VoiceXML 3.0
MUST provide a mechanism to enable specification of additional recording time
after the end of speaking ("post" buffer).</h4>

<h4><a id="S2_10_2_11_3" name="S2_10_2_11_3"></a>2.10.2.11.3 VoiceXML 3.0 MAY
provide a mechanism to enable specification of the pre and post recording
duration.</h4>

<p>The duration provided by the platform is up to the amount of audio the
application requested. If that amount of audio is not available, the platform
is required to provide the amount of audio that is available.</p>
<!-- <span class="note">Note: Should this feature be under developer or platform -->
<!-- control?</span> -->

<h4><a id="S2_10_3_1" name="S2_10_3_1"></a>2.10.3.1 Audio Muting</h4>

<h4><a id="S2_10_3_1_1" name="S2_10_3_1_1"></a>2.10.3.1.1 VoiceXML 3.0 MUST
enable muting of an audio recording at any time for a specified length of
time or until otherwise indicated to un-mute.</h4>

<h4><a id="S2_10_3_1_2" name="S2_10_3_1_2"></a>2.10.3.1.2 Audio to insert
while muting can optionally be specified via a URI.</h4>
<!-- <span class="note">Note: Issues arise if inserted audio is shorter than mute -->
<!-- duration.</span>  -->

<h4><a id="S2_10_3_1_3" name="S2_10_3_1_3"></a>2.10.3.1.3 Optionally record
the mute duration either in the recorded data or in associated meta data
(e.g. a mark (out of band) or via a log channel or some other method)</h4>
<!-- <span class="note">Note: Is it a breach of security to keep track of the -->
<!-- mute/blank/pause duration?</span>  -->

<h4><a id="S2_10_3_1_5" name="S2_10_3_1_5"></a>2.10.3.1.5 Mute MUST be
controllable for each channel independently.</h4>

<h4><a id="S2_10_3_1_6" name="S2_10_3_1_6"></a>2.10.3.1.6 Mute MUST be
controllable for all channels in a group.</h4>

<h4><a id="S2_10_3_2" name="S2_10_3_2"></a>2.10.3.2 Blanking</h4>

<h4><a id="S2_10_3_2_1" name="S2_10_3_2_1"></a>2.10.3.2.1 VoiceXML 3.0 MUST
enable blanking of a video recording at any time for a specified length of
time or until otherwise indicated to un-blank.</h4>

<h4><a id="S2_10_3_2_2" name="S2_10_3_2_2"></a>2.10.3.2.2 A video or still
image to replace video stream while blanking can be optionally specified via
a URI.</h4>

<h4><a id="S2_10_3_2_2_1" name="S2_10_3_2_2_1"></a>2.10.3.2.2.1 An error will
be thrown in the case of platforms that cannot handle the media type referred
to by the URI.</h4>

<h4><a id="S2_10_3_2_3" name="S2_10_3_2_3"></a>2.10.3.2.3 The media inserted
by default MUST be the same length as the blank duration.</h4>

<p>If video, repeat until un-blank.</p>

<h4><a id="S2_10_3_2_4" name="S2_10_3_2_4"></a>2.10.3.2.4 The video being
inserted MUST optionally be specified to span a length less than the actual
mute/un-mute duration.</h4>

<h4><a id="S2_10_3_2_5" name="S2_10_3_2_5"></a>2.10.3.2.5 Blanking MUST be
controllable separately from other media channels.</h4>

<h4><a id="S2_10_3_3" name="S2_10_3_3"></a>2.10.3.3 Grouped Blanking and
Muting</h4>

<h4><a id="S2_10_3_3_1" name="S2_10_3_3_1"></a>2.10.3.3.1 It MUST be possible
to simultaneously blank video and mute audio that are in the same media
group.</h4>

<h4><a id="S2_10_3_4" name="S2_10_3_4"></a>2.10.3.4 Pause and Resume</h4>

<h4><a id="S2_10_3_4_1" name="S2_10_3_4_1"></a>2.10.3.4.1 VoiceXML 3.0 MUST
enable a recording to be paused until explicitly restarted.</h4>

<h4><a id="S2_10_3_4_2" name="S2_10_3_4_2"></a>2.10.3.4.2 VoiceXML 3.0 MUST
enable an indicator to be optionally specified in the file to denote that
recording was paused, then resumed.</h4>

<h4><a id="S2_10_3_4_3" name="S2_10_3_4_3"></a>2.10.3.4.3 VoiceXML 3.0 MAY
optionally enable the notation of the pause duration either in the recorded
data or in associated meta data (e.g. a mark (out of band) or via a log
channel or some other method)</h4>

<p>The mechanism is platform-specific.</p>

<h4><a id="S2_10_3_5" name="S2_10_3_5"></a>2.10.3.5 Arbitrary Start, Stop,
Restart/append</h4>

<h4><a id="S2_10_3_5_1" name="S2_10_3_5_1"></a>2.10.3.5.1 VoiceXML 3.0 MUST
be able to start a recording at any time.</h4>

<h4><a id="S2_10_3_5_2" name="S2_10_3_5_2"></a>2.10.3.5.2 VoiceXML 3.0 MUST
be able to stop an active recording at any time.</h4>

<h4><a id="S2_10_3_5_3" name="S2_10_3_5_3"></a>2.10.3.5.3 VoiceXML 3.0 MUST
be able to restart / append to a previously active recording at any time.
(during the session via reference to the recording)</h4>

<h4><a id="S2_10_3_5_4" name="S2_10_3_5_4"></a>2.10.3.5.4 optionally record
the pause duration either in the recorded data or in associated meta data
(e.g. a mark (out of band) or via a log channel or some other method)</h4>

<p>Recording is available for playback or upload once a recording is
'stopped'.</p>

<p>If a recording was stopped and uploaded, then later appended, the
application will need to keep track of when to upload the new version.</p>

<h4><a id="S2_10_4_" name="S2_10_4_"></a>2.10.4. Media types</h4>

<h4><a id="S2_10_4_1" name="S2_10_4_1"></a>2.10.4.1 Audio recording</h4>

<h4><a id="S2_10_4_1_1" name="S2_10_4_1_1"></a>2.10.4.1.1 VoiceXML 3.0 MUST
be able to record an incoming audio stream.</h4>

<h4><a id="S2_10_4_2" name="S2_10_4_2"></a>2.10.4.2 Video recording</h4>

<h4><a id="S2_10_4_2_1" name="S2_10_4_2_1"></a>2.10.4.2.1 VoiceXML 3.0 MUST
support recording of an incoming video stream.</h4>

<h4><a id="S2_10_4_2_2" name="S2_10_4_2_2"></a>2.10.4.2.2 VoiceXML 3.0 MUST
support recording of an incoming video stream with synchronized audio.</h4>

<h4><a id="S2_10_4_3" name="S2_10_4_3"></a>2.10.4.3 Media Type
specification</h4>

<h4><a id="S2_10_4_3_1" name="S2_10_4_3_1"></a>2.10.4.3.1 VoiceXML 3.0 MUST
be able to set the format of the media type of the recording according to
IETF RFC 4288 [RFC4288].</h4>

<h4><a id="S2_10_4_4" name="S2_10_4_4"></a>2.10.4.4 Media formats and
codecs</h4>

<h4><a id="S2_10_4_4_1" name="S2_10_4_4_1"></a>2.10.4.4.1 VoiceXML 3.0 MUST
support specification of the media format and corresponding codec.</h4>

<h4><a id="S2_10_4_5" name="S2_10_4_5"></a>2.10.4.5 Platform support of media
types</h4>

<h4><a id="S2_10_4_5_1" name="S2_10_4_5_1"></a>2.10.4.5.1 VoiceXML 3.0
platforms MUST support all media types that are indicated as required by the
VoiceXML 3.0 Recommendation (types to be determined).</h4>

<p>Note: This does not mean all possible media types are supported on all
platforms.</p>

<h4><a id="S2_10_5_" name="S2_10_5_"></a>2.10.5. Media Processing</h4>

<h4><a id="S2_10_5_1" name="S2_10_5_1"></a>2.10.5.1 Media processing MAY
occur either in real-time or as a post-processing function.</h4>

<p>DEFAULT: specific to each processing type</p>

<h4><a id="S2_10_5_2" name="S2_10_5_2"></a>2.10.5.2 Tone Clamping</h4>

<p>Use cases:</p>
<ol>
  <li>Voicemail terminated with DTMF.</li>
  <li>Whole-session recording where DTMF input must be removed for privacy or
    other reasons.</li>
</ol>

<h4><a id="S2_10_5_2_1" name="S2_10_5_2_1"></a>2.10.5.2.1 VoiceXML 3.0 MAY
optionally provide a means to specify if DTMF tones are to be removed from
the recording.</h4>

<p>DEFAULT: Tones are not removed from the recording</p>

<p>DEFAULT: If tone clamping is enabled, it is performed after recording has
completed (not in real-time).</p>

<h4><a id="S2_10_5_3" name="S2_10_5_3"></a>2.10.5.3 Audio Processing Mode</h4>

<h4><a id="S2_10_5_3_1" name="S2_10_5_3_1"></a>2.10.5.3.1 VoiceXML 3.0 MUST
optionally provide a means to specify if automatic audio level controls (e.g.
Dynamic Range Compression, Limiting, Automatic Gain Control (AGC), etc.) are
to be applied to the recording or if  the recording is to be raw.</h4>

<p>DEFAULT: raw</p>
Editor's note: how to specify:
<ul>
  <li>raw or processed</li>
  <li>type of processing</li>
  <li>parameters specific to each processor or implementation</li>
  <li>multiple processing operations (?)</li>
  <li>real-time or post-processing</li>
</ul>

<h4><a id="S2_10_6_" name="S2_10_6_"></a>2.10.6. Recording data</h4>

<h4><a id="S2_10_6_1" name="S2_10_6_1"></a>2.10.6.1 The following information
MUST be reported after recording has completed.</h4>
<ul>
  <li>Recording duration in milliseconds</li>
  <li>Recording size in bytes</li>
  <li>DTMF terminating string if recording was terminated via DTMFTERM, or
    DTMF input available in application.lastresult</li>
  <li>Indication if recording was terminated due to reaching maxtime</li>
  <li>Format of the recording, as specified by RFC 4288</li>
</ul>

<h4><a id="S2_10_7" name="S2_10_7"></a>2.10.7 Upload, Storage, Caching</h4>

<h4><a id="S2_10_7_1" name="S2_10_7_1"></a>2.10.7.1 Destination</h4>

<h4><a id="S2_10_7_1_1" name="S2_10_7_1_1"></a>2.10.7.1.1 VoiceXML 3.0 MUST
support specification of the destination of the recording buffer [dest].</h4>

<h4><a id="S2_10_7_3" name="S2_10_7_3"></a>2.10.7.3 A local cache of the
recording MUST be optionally available to the application (e.g. V2 semantics
of form item)</h4>

<h4><a id="S2_10_7_4" name="S2_10_7_4"></a>2.10.7.4 It MUST be possible to
specify the upload to be either a synchronous or asynchronous operation.</h4>

<h4><a id="S2_10_7_5" name="S2_10_7_5"></a>2.10.7.5 It MUST be possible to
select the upload to be available realtime, at the end of the call, or
indefinitely after the end of the call.</h4>

<h4><a id="S2_10_7_6" name="S2_10_7_6"></a>2.10.7.6 All modes other than
indefinite upload shall expose any errors in recording or upload to the
application.</h4>

<h4><a id="S2_10_8_" name="S2_10_8_"></a>2.10.8. Errors and Events</h4>

<p>Errors and events as a result of media recording must be presented to the
application</p>

<p>Examples of types of errors possibly reported:</p>
<ul>
  <li>error.unsupported.format (the requested media type is not
  supported)</li>
  <li>error.unavailable.format (the requested media type is currently not
    available)</li>
  <li>error during upload</li>
  <li>disk full, other disk errors</li>
  <li>permissions:</li>
  <li>error.noauthorization (or error.noresource if want it hidden from
    potential attacker?<br />
  </li>
</ul>

<h3><a id="funct-mediaformat" name="funct-mediaformat">2.11 Media
Formats</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Note: These were recently added on the 6/24/2008 -->
<!-- call.</span></p> -->

<h4><a id="CS2_10_8_" name="CS2_10_8_"></a>VoiceXML 3 MUST support these
categories of media capabilities:</h4>
<ul>
  <li>Audio Basic: audio only, with header or not (e.g. RIFF or AU
  header)</li>
  <li>Audio Rich: audio (one or more channels), plus meta data (e.g. header,
    marks, transcription, etc.)</li>
  <li>Multi-media: one or more media channels (e.g. audio, video,images,
    etc.) plus meta data (e.g. header, marks, transcription, etc.)</li>
</ul>

<p>This does not imply platform support requirements. For example, a
particular platform may support Audio Basic but not Audio Rich. Another might
support Audio Rich but not all meta data elements.</p>

<h3><a id="funct-datamodel" name="funct-datamodel">2.12 Data Model (must
have)</a></h3>

<p>TBD.</p>

<h3><a id="funct-submitprocessing" name="funct-submitprocessing">2.11 Submit
Processing (must have)</a></h3>

<p>TBD.</p>

<h2><a id="format-reqs" name="format-reqs">3. Format Requirements</a></h2>

<h3><a id="format-flow" name="format-flow">3.1 Flow Language (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->

<p>A flow control language will be developed in conjunction with VoiceXML 3.0
(i.e. <a href="http://www.w3.org/TR/scxml/">SCXML</a>)</p>

<h4><a id="S3_1_1" name="S3_1_1"></a>3.1.1 The flow control language will
allow the separation of business logic from media control and user
interaction.</h4>

<h4><a id="S3_1_2" name="S3_1_2"></a>3.1.2 The flow control language will be
able to invoke VoiceXML 3.0 scripts, passing data into them and receiving
results back when the scripts terminate.</h4>

<h4><a id="S3_1_3" name="S3_1_3"></a>3.1.3 The flow control language will be
suitable for use as an Interaction Manager in the Multimodal Architecture
Framework.</h4>

<h4><a id="S3_1_4" name="S3_1_4"></a>3.1.4 The flow control language will be
based on state-machine concepts.</h4>

<h4><a id="S3_1_5" name="S3_1_5"></a>3.1.5 The flow control language will be
able to receive asynchronous messages from external entities.</h4>

<h4><a id="S3_1_6" name="S3_1_6"></a>3.1.6 The flow control language will be
able to send messages to external entities.</h4>

<h4><a id="S3_1_7" name="S3_1_7"></a>3.1.7 The flow control language will not
contain any media-specific concepts such as ASR or TTS.</h4>

<h3><a id="format-semmod" name="format-semmod">3.2 Semantic Model Definition
(must have)</a></h3>
<!-- <p><span class="owner">Owner: Mike Bodell</span></p> -->

<h4><a id="S3_2_1" name="S3_2_1"></a>3.2.1 The precise semantics of all VXML
3.0 tags MUST be provided</h4>

<h4><a id="S3_2_2" name="S3_2_2"></a>3.2.2 The semantic model MUST be the
authoritative description of VXML 3.0 functionality</h4>

<h4><a id="S3_2_3" name="S3_2_3"></a>3.2.3 Different conformance profiles
MUST be possible, but they MUST be defined in terms of the semantic
model.</h4>

<h4><a id="S3_2_4" name="S3_2_4"></a>3.2.4 The semantic model descriptions of
VXML 3.0 MUST be able to express all of the functionality of VXML 2.1</h4>

<h4><a id="S3_2_5" name="S3_2_5"></a>3.2.5 Extensions to VXML 3.0 SHOULD be
able to build on the semantic model descriptions</h4>

<h2><a id="other-reqs" name="other-reqs">4. Other Requirements</a></h2>

<h3><a id="other-vxml" name="other-vxml">4.1 Consistent with other Voice
Browser Working Group specs (must have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett</span></p> -->

<h4><a id="S4_1_1" name="S4_1_1"></a>4.1.1 Wherever similar functionality to
that of another Voice Browser Working Group specification is available, this
language MUST use a syntax similar to that used in the relevant
specification.</h4>

<h4><a id="S4_1_2" name="S4_1_2"></a>4.1.2 For data that is likely to be
represented in another Voice Browser Working Group markup language (eg., SRGS
or EMMA) or used by another Voice Browser Working Group language, there MUST
be a clear definition of the mapping between the two data
representations.</h4>

<h4><a id="S4_1_3" name="S4_1_3"></a>4.1.3 It MUST be possible to pass
Internet-related document and server information (caching parameters,
xml:base, etc.) from this language to other VBWG language processors for
embedded VBWG languages.</h4>

<h3><a id="other-other" name="other-other">4.2 Consistent with other specs
(XML, MMI, I18N, Accessibility, MRCP, Backplane Activities) (must
have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->

<h4><a id="S4_2_1" name="S4_2_1"></a>4.2.1 MRCP</h4>

<h4><a id="S4_2_1_1" name="S4_2_1_1"></a>4.2.1.1 This language MUST support a
profile that can be implemented using MRCPv2.</h4>

<h4><a id="S4_2_1_2" name="S4_2_1_2"></a>4.2.1.2 Where possible, this
language SHOULD remain compatible with MRCPv2 in terms of data formats (SRGS,
SSML).</h4>

<h4><a id="S4_2_2_" name="S4_2_2_"></a>4.2.2. <a
href="http://www.w3.org/TR/mmi-arch/">MMI</a></h4>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span></p> -->

<p>There must be at least one profile of VoiceXML 3.0 in which all of the
following requirements are supported.</p>

<h4><a id="S4_2_2_1" name="S4_2_2_1"></a>4.2.2.1 It MUST be possible for
VoiceXML 3.0 implementations to receive, process, and generate MMI life cycle
events. Some events maybe handled automatically, while others maybe under
author control.</h4>

<h4><a id="S4_2_2_2" name="S4_2_2_2"></a>4.2.2.2 VoiceXML 3.0 MUST provide a
way for the author to specify the exact functions required for the
application such that the platform can allocate the minimum necessary
resources.</h4>

<h4><a id="S4_2_2_3" name="S4_2_2_3"></a>4.2.2.3 VoiceXML 3.0 MUST be able to
provide EMMA-formatted information inside the data field of MMI life cycle
events.</h4>

<h4><a id="S4_2_2_4" name="S4_2_2_4"></a>4.2.2.4 VoiceXML 3.0 platforms MUST
specify one or more event I/O processors for interoperable exchange of life
cycle events. The Voice Browser Group requests public comment on what such
event processors should be or whether they should be part of the language at
all.</h4>

<h3><a id="other-simplify" name="other-simplify">4.3 Simplify Existing
VoiceXML Tasks (must have)</a></h3>
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->

<h4><a id="S4_3_1" name="S4_3_1"></a>4.3.1 This language MUST provide a
mechanism for authors to develop dialog managers (state-based, task-based,
rule-based, etc.) that are easily used and configured by other authors.</h4>

<h4><a id="S4_3_2" name="S4_3_2"></a>4.3.2 This language MUST provide
mechanisms to simplify authoring of these common tasks: (we need to collect a
list of common tasks)</h4>

<h3><a id="other-maintain" name="other-maintain">4.4 Maintain Functionality
from Previous VXML Versions</a></h3>

<h4><a id="S4_4_1" name="S4_4_1"></a>4.4.1 New features added in VoiceXML 3.0
MUST be backward compatible with previous VoiceXML versions</h4>

<h4><a id="S4_4_1_1" name="S4_4_1_1"></a>4.4.1.1 Functionality available in
VoiceXML 2.0 and VoiceXML 2.1 MUST be available in VoiceXML 3.0.</h4>

<h4><a id="S4_4_1_2" name="S4_4_1_2"></a>4.4.1.2 Applications written in
VoiceXML 2.0/2.1 MUST be portable to VoiceXML 3.0 without losing application
capabilities.</h4>

<h3><a id="other-crs" name="other-crs">4.5 Address Change Requests from
previous VoiceXML Versions (must have)</a></h3>
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
<!-- <span class="note">Reviewed all deferred and open change requests from VXML -->
<!-- 2.0/2.1</span></p> -->

<h4><a id="S4_5_1" name="S4_5_1"></a>4.5.1 Deferred change requests from VXML
2.0 and 2.1 reevaluated for VXML 3.0</h4>

<p>In particular, the following deferred CRs reevaluated: R51, R92, R104,
R113, R145, R155, R156, R186, R230, R233, R348, R394, R528, R541, and
R565.</p>

<h4><a id="S4_5_2" name="S4_5_2"></a>4.5.2 Unassigned change requests from
VXML 2.0 and 2.1 reevaluated for VXML 3.0</h4>

<p>In particular, the following unassigned CRs reevaluated: R600, R614, R619,
R620, R622, R623, R624, R625, R626, R627, R628, R629, R631, and R632.</p>

<h2><a id="acknowledgments" name="acknowledgments">5. Acknowledgments</a></h2>

<p>TBD</p>

<h2><a id="prev-reqs" name="prev-reqs">Appendix A. Previous
Requirements</a></h2>

<p>The following requirements have been satisfied by previous Voice Browser
Working Group Specifications</p>

<h3><a id="A_1_1" name="A_1_1"></a>A.1.1 Audio Modality Input and Output
(must have) FULLY COVERED</h3>

<p>The markup language can specify which spoken user input is interpreted by
the voice browser, as well as the content rendered as spoken output by the
voice browser.</p>

<h4><a id="CA_1_1" name="CA_1_1"></a>Requirement Coverage</h4>

<p>Audio output: &lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>Audio input: &lt;grammar&gt; <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h3><a id="A_1_2" name="A_1_2"></a>A.1.2 Sequential multi-modal Input (must
have) FULLY COVERED</h3>

<p>The markup language specifies that user input from multiple modalities is
to be interpreted by the voice browser. There is no requirement that the
input modalities are simultaneously active. For example, a voice browser
interpreting the markup language in a telephony environment could accept DTMF
input in one dialog state, and spoken input in another.</p>

<h4><a id="CA_1_2" name="CA_1_2"></a>Requirement Coverage</h4>

<p>&lt;grammar&gt; mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h3><a id="A_1_3" name="A_1_3"></a>A.1.3 Unco-ordinated, Simultaneous,
Multi-modal Input (should have) FULLYCOVERED</h3>

<p>The markup language specifies that user input from different modalities is
to be interpreted at the same time. There is no requirement that
interpretation of the input modalities are co-ordinated. For example, a voice
browser in a desktop environment could accept keyboard input or spoken input
in same dialog state.</p>

<h4><a id="CA_1_3" name="CA_1_3"></a>Requirement Coverage</h4>

<p>&lt;grammar&gt; mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<p>&lt;field&gt; defining multiple &lt;grammar&gt;s with different mode
attribute values <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_4" name="A_1_4"></a>A.1.4 Co-ordinated, Simultaneous
Multi-modal Input (nice to have) FULLYCOVERED</h3>

<p>The markup language specifies that user input from multiple modalities is
interpreted at the same time and that interpretation of the inputs are
co-ordinated by the voice browser. For example, in a telephony environment,
the user can type<em>200</em> on the keypad and say <em>transfer to checking
account</em> and the interpretations are co-ordinated so that they are
understood as <em>transfer 200 to checking account</em>.</p>

<h4><a id="CA_1_4" name="CA_1_4"></a>Requirement Coverage</h4>

<p>&lt;grammar&gt; mode attribute: dtmf,voice <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<p>&lt;field&gt; defining multiple &lt;grammar&gt;s with different mode
attribute values <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_5" name="A_1_5"></a>A.1.5 Sequential multi-modal Output (must
have) FULLY COVERED</h3>

<p>The markup language specifies that content is rendered in multiple
modalities by the voice browser. There is no requirement the output
modalities are rendered simultaneously. For example, a voice browser could
output speech in one dialog state, and graphics in another.</p>

<h4><a id="CA_1_5" name="CA_1_5"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_6" name="A_1_6"></a>A.1.6 Unco-ordinated, Simultaneous,
Multi-modal Output (nice to have)FULLY COVERED</h3>

<p>The markup language specifies that content is rendered in multiple
modalities at the same time. There is no requirement the rendering of output
modalities are co-ordinated. For example, a voice browser in a desktop
environment could display graphics and provide audio output at the same
time.</p>

<h4><a id="CA_1_6" name="CA_1_6"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_1_7" name="A_1_7"></a>A.1.7 Co-ordinated, Simultaneous
Multi-modal Output (nice to have) FULLYCOVERED</h3>

<p>The markup language specifies that content is to be simultaneously
rendered in multiple modalities and that output rendering is co-ordinated.
For example, graphical output on a cellular telephone display is co-ordinated
with spoken output.</p>

<h4><a id="CA_1_7" name="CA_1_7"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt;, &lt;audio&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_1" name="A_2_1"></a>A.2.1 Mixed Initiative: Form Level (must
have) FULLY COVERED</h3>

<p>Mixed initiative refers to dialog where one participant take the
initiative by, for example, asking a question and expects the other
participant to respond to this initiative by, for example, answering the
question. The other participant, however, responds instead with an initiative
by asking another question. Typically, the first participant then responds to
this initiative, before the second participant responds to the original
initiative. This behavior is illustrated below:<br />
<br />
<em>S-A1: When do you want to fly to Paris?<br />
U-B1: What did you say?<br />
S-B2: I said when do you want to fly to Paris?<br />
U-A2: Tuesday.</em></p>

<p>where A1 is responded to in A2 after a nested interaction, or sub-dialog
in B1 and B2. Note that the B2 response itself could have been another
initiative leading to further nesting of the interaction.</p>

<p>The form-level mixed initiative requirement is that the markup language
can specify to the voice browser that it can take the initiative when user
expects a response, and also allow the user to take the initiative when it
expects a response where the content of these initiatives is relevant to the
task at hand, contains navigation instructions or concerns general
meta-communication issues. This mixed initiative requirement is particularly
important when processing form input (hence the name) and is further
elaborated in requirements A.2.1.1, A.2.1.2, A.2.1.3 and A.2.1.4 below.</p>

<h4><a id="CA_2_1" name="CA_2_1"></a>Requirement Coverage</h4>

<p>&lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;noinput&gt;, &lt;nomatch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_1_1" name="A_2_1_1"></a>A.2.1.1 Clarification Subdialog (must
have) FULLY COVERED</h4>

<p>The markup language can specify that a clarification sub-dialog should be
performed when the user provides incomplete, form-related information. For
example, in a flight enquiry service, the departure city and date may be
required but the user does not always provide all the information at once:<br
/>
<br />
<em>S1: How can I help you?<br />
U1: I want to fly to Paris.<br />
S2: When?<br />
U1: Monday</em></p>

<p>U1 is incomplete (or 'underinformative') with respect to the service (or
form) and the system then initiates a sub-dialog in S2 to collect the
required information. If additional parameters are required, further
sub-dialogs may be initiated.</p>

<h4><a id="CA_2_1_1" name="CA_2_1_1"></a>Requirement Coverage</h4>

<p>&lt;initial&gt;, &lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_1_2" name="A_2_1_2"></a>A.2.1.2 Confirmation Subdialog (must
have) FULLY COVERED</h4>

<p>The markup language can specify that a confirmation sub-dialog is to be
performed when the confidence associated with the interpretation of the user
input is too low.<br />
<br />
<em>U1: I want to fly to Paris.<br />
S1: Did you say 'I want a fly to Paris'?<br />
U2: Yes.<br />
S2: When?<br />
U3: ...</em></p>

<p>Note confirmation sub-dialogs take precedence over clarification
sub-dialogs.</p>

<h4><a id="CA_2_1_2" name="CA_2_1_2"></a>Requirement Coverage</h4>

<p>&lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p><i>name$</i>.confidence shadow variable <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_1_3" name="A_2_1_3"></a>A.2.1.3 Over-informative Input:
corrective (must have) FULLY COVERED</h4>

<p>The markup language can specify that unsolicited user input in a
sub-dialog which corrects earlier input is to be interpreted appropriately.
For example, in a confirmation sub-dialog users may provide corrective
information relevant to the form:<br />
<br />
<em>S1: Did you say you wanted to travel from Paris?<br />
U1: No, from Perros.</em> (modification) <em><br />
U1': Yes, from Paris</em> (repetition)</p>

<h4><a id="CA_2_1_3" name="CA_2_1_3"></a>Requirement Coverage</h4>

<p>&lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>$GARBAGE rule <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h4><a id="A_2_1_4" name="A_2_1_4"></a>A.2.1.4 Over-informative Input:
additional (nice to have) FULLYCOVERED</h4>

<p>The markup language can specify that unsolicited user input in a
sub-dialog which is not corrective but additional, relevant information for
the current form is to be interpreted appropriately. For example, in a
confirmation sub-dialog users may provide additional information relevant to
the form:<br />
<em>S1: Did you say you wanted to travel from Paris?<br />
U1: Yes, I want to fly to Paris on Monday around 11.30</em></p>

<h4><a id="CA_2_1_4" name="CA_2_1_4"></a>Requirement Coverage</h4>

<p>&lt;initial&gt;, &lt;field&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>form level &lt;grammar&gt;s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>

<h3><a id="A_2_2" name="A_2_2"></a>A.2.2 Mixed Initiative: Task Level (must
have) FULLY COVERED</h3>

<p>The markup language needs to address mixed initiative in dialogs which
involve more than one task (or topic). For example, a portal service may
allow the user to interact with a number of specific services such as car
hire, hotel reservation, flight enquiries, etc, which may be located on the
different web sites or servers. This requirement is further elaborated in
requirements A.2.2.1, A.2.2.2, A.2.2.3, A.2.2.4 and A.2.2.5 below.</p>

<h4><a id="A_2_2_1" name="A_2_2_1"></a>A.2.2.1 Explicit Task Switching (must
have) FULLY COVERED</h4>

<p>The markup language can specify how users can explicitly switch from one
task to another. For example, by means of a set of global commands which are
active in all tasks and which take the user to a specific task; e.g. <em>Take
me to car hire</em>, <em>Go to hotel reservations</em>.</p>

<h4><a id="CA_2_2_1" name="CA_2_2_1"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt;, &lt;submit&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>form level &lt;grammar&gt;s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>

<h4><a id="A_2_2_2" name="A_2_2_2"></a>A.2.2.2 Implicit Task Switching
(should have) FULLY COVERED</h4>

<p>The markup language can specify how users can implicitly switch from one
task to another. For example, by means of simply uttering a phrases relevant
to another task; <em>I want to reserve a McLaren F1 in Monaco next
Wednesday</em>.</p>

<h4><a id="CA_2_2_2" name="CA_2_2_2"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt;, &lt;submit&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>form level &lt;grammar&gt;s <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
1.0</a></p>

<h4><a id="A_2_2_3" name="A_2_2_3"></a>A.2.2.3 Manual Return from Task Switch
(must have) FULLY COVERED</h4>

<p>The markup language can specify how users can explicitly return to a
previous task at any time. For example, by means of global task navigation
commands such as <em>previous task</em>.</p>

<h4><a id="CA_2_2_3" name="CA_2_2_3"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_2_4" name="A_2_2_4"></a>A.2.2.4 Automatic Return from Task
Switch (should have) FULLY COVERED</h4>

<p>The markup language can specify that users can automatically return to the
previous task upon completion or explicit cancellation of the current
task.</p>

<h4><a id="CA_2_2_4" name="CA_2_2_4"></a>Requirement Coverage</h4>

<p>&lt;link&gt;, &lt;goto&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h4><a id="A_2_2_5" name="A_2_2_5"></a>A.2.2.5 Suspended Tasks (should have)
FULLY COVERED</h4>

<p>The markup language can specify that when task switching occurs the
previous task is suspended rather than canceled. Thus when the user returns
to the previous task, the interaction is resumed at the point it was
suspended.</p>

<h4><a id="CA_2_2_5" name="CA_2_2_5"></a>Requirement Coverage</h4>

<p>&lt;link&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_3" name="A_2_3"></a>A.2.3 Help Behavior (should have) FULLY
COVERED</h3>

<p>The markup language can specify help information when requested by the
user. Help information should be available in all dialog states.<br />
<em>S1: How can I help you?<br />
U1: What can you do?<br />
S2: I can give you flight information about flights between major cities
world-wide just like a travel agent. How can I help you?<br />
U1: I want a flight to Paris ...</em><br />
</p>

<p>Help information can be tapered so that it can be elaborated upon on
subsequent user requests.</p>

<h4><a id="CA_2_3" name="CA_2_3"></a>Requirement Coverage</h4>

<p>&lt;help&gt; using count attribute for tapering <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_4" name="A_2_4"></a>A.2.4 Error Correction Behavior (must
have) FULLY COVERED</h3>

<p>The markup language can specify how error events generated by the voice
browser are to be handled. For example, by initiating a sub-dialog to
describe and correct the error:<br />
<em>S1: How can I help you?<br />
U1: &lt;audio but no interpretation&gt;<br />
S2: Sorry, I didn't understand that. Where do you want to travel to?<br />
U2: Paris</em></p>

<p>The markup language can specify how specific types of errors encountered
in spoken dialog, e.g. no audio, too loud/soft, no interpretation, no audio,
internal error, etc, are to be handled as well as providing a general 'catch
all' method.</p>

<h4><a id="CA_2_4" name="CA_2_4"></a>Requirement Coverage</h4>

<p>&lt;error&gt;, &lt;nomatch&gt;, &lt;noinput&gt;, &lt;catch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_5" name="A_2_5"></a>A.2.5 Timeout Behavior (must have) FULLY
COVERED</h3>

<p>The markup language can specify what to do when the voice browser times
out waiting for input; for example, a timeout event can be handled by
repeating the current dialog state:<br />
<em>S1: Did you say Monday?<br />
U1: &lt;timeout&gt;<br />
S2: Did you say Monday?</em><br />
</p>

<p>Note that the strategy may be dependent upon the environment; in a desktop
environment, repetition for example may be irritating.</p>

<h4><a id="CA_2_5" name="CA_2_5"></a>Requirement Coverage</h4>

<p>&lt;noinput&gt;, &lt;catch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_6" name="A_2_6"></a>A.2.6 Meta-Commands (should have) FULLY
COVERED</h3>

<p>The markup language specifies a set of meta-command functions which are
available in all dialog states; for example, repeat, cancel, quit, operator,
etc.</p>

<p>The precise set of meta-commands will be co-ordinated with the Telephony
Speech Standards Committee.</p>

<p>The markup language should specify how the scope of meta-commands like
'cancel' is resolved.</p>

<h4><a id="CA_2_6" name="CA_2_6"></a>Requirement Coverage</h4>

<p>Universal Grammars <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_7" name="A_2_7"></a>A.2.7 Barge-in Behavior (should have)
FULLY COVERED</h3>

<p>The markup language specifies when the user is able to bargein on the
system output, and when it is not allowed.</p>

<p>Note: The output device may generate timestamped events when barge-in
occurs (see 3.9).</p>

<h4><a id="CA_2_7" name="CA_2_7"></a>Requirement Coverage</h4>

<p>bargein property <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_8" name="A_2_8"></a>A.2.8 Call Transfer (should have) FULLY
COVERED</h3>

<p>The markup language specifies a mechanism to allow transfer of the caller
to another line in a telephony environment. For example, in cases of dialog
breakdown, the user can be transferred to an operator (cf. 'callto' in HTML).
The markup language also provides a mechanism to deal with transfer failures
such as when the called line is busy or engaged.</p>

<h4><a id="CA_2_8" name="CA_2_8"></a>Requirement Coverage</h4>

<p>&lt;transfer&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;createcall&gt;, &lt;redirect&gt; <a
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>

<h3><a id="A_2_9" name="A_2_9"></a>A.2.9 Quit Behavior (must have) FULLY
COVERED</h3>

<p>The markup language provides a mechanism to terminate the session (cf.
user-terminated sessions via a 'quit' meta-command in 2.6).</p>

<h4><a id="CA_2_9" name="CA_2_9"></a>Requirement Coverage</h4>

<p>Universal Grammars <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_2_10" name="A_2_10"></a>A.2.10 Interaction with External
Components (must have) FULLY COVERED</h3>

<p>The markup language must support a generic component interface to allow
for the use of external components on the client and/or server side. The
interface provides a mechanism for transferring data between the markup
language's variables and the component. Examples of such data are:
configuration parameters (such as timeouts), and events for data input and
error codes. Except for event handling, a call to an external component does
not directly change the dialog state, i.e. the dialog continues in the state
from which the external component was called.</p>

<p>Examples of external components are pre-built dialog components and server
scripts. Pre-built dialogs are further described in Section A.3.3. Server
scripts can be used to interact with remote services, devices or
databases.</p>

<h4><a id="CA_2_10" name="CA_2_10"></a>Requirement Coverage</h4>

<p>&lt;property&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;submit&gt; namelist attribute, &lt;submit&gt;, &lt;goto&gt; query
string <a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
2.0</a></p>

<h3><a id="A_3_1" name="A_3_1"></a>A.3.1 Ease of Use (must have) FULLY
COVERED</h3>

<p>The markup language should be easy for designers to understand and author
without special tools or knowledge of vendor technology or protocols (dialog
design knowledge is still essential).</p>

<h4><a id="CA_3_1" name="CA_3_1"></a>Requirement Coverage</h4>

<p>Form Interpretation Algorithm (FIA) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_2" name="A_3_2"></a>A.3.2 Simplicity and Power (must have)
FULLY COVERED</h3>

<p>The markup language allows designers to rapidly develop simple dialogs
without the need to worry about interactional details but also allow
designers to take more control over interaction to develop complex
dialogs.</p>

<h4><a id="CA_3_2" name="CA_3_2"></a>Requirement Coverage</h4>

<p>Form Interpretation Algorithm (FIA) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_3" name="A_3_3"></a>A.3.3 Support for Modularity and Re-use
(should have) FULLY COVERED</h3>

<p>The markup language complies with the requirements of the Reusable Dialog
Components Subgroup.</p>

<p>The markup language can specify a number of pre-built dialog components.
This enables one to build a library of reusable 'dialogs'. This is useful for
handling both application specific input types, such as telephone numbers,
credit card number, etc as well as those that are more generic, such as
times, dates, numbers, etc.</p>

<h4><a id="CA_3_3" name="CA_3_3"></a>Requirement Coverage</h4>

<p>&lt;subdialog&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_4" name="A_3_4"></a>A.3.4 Naming (must have) FULLY COVERED</h3>

<p>Dialogs, states, inputs and outputs can be referenced by a URI in the
markup language.</p>

<h4><a id="CA_3_4" name="CA_3_4"></a>Requirement Coverage</h4>

<p>&lt;form&gt; id attribute, form item name attribute <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_5" name="A_3_5"></a>A.3.5 Variables (must have) FULLY
COVERED</h3>

<p>Variables can be defined and assigned values.</p>

<p>Variables can be scoped within namespaces: for example, state-level,
dialog-level, document-level, application-level or session-level. The markup
language defines the precise scope of all variables.</p>

<p>The markup language must specify if variables are atomic or structured.</p>

<p>Variables can be assigned default values. Assignment may be optional; for
example, in a flight reservation form, a 'special meal' variable need not be
assigned a value by the user.</p>

<p>Variables may be referred to in the output content of the markup
language.</p>

<p>The precise requirements on variables may be affected by W3C work on
modularity and XML schema datatypes.</p>

<h4><a id="CA_3_5" name="CA_3_5"></a>Requirement Coverage</h4>

<p>&lt;var&gt;, &lt;assign&gt;, &lt;script&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_6" name="A_3_6"></a>A.3.6 Variable Binding (must have) FULLY
COVERED</h3>

<p>User input can bind one or more state variables. A single input may bind a
single variable or it may bind multiple variables in any order; for example,
the following utterances result in the same variable bindings<br />
</p>
<ul>
  <li>Transfer $200 from savings to checking</li>
  <li>Transfer $200 to checking from savings</li>
  <li>Transfer from savings $200 to checking</li>
</ul>

<h4><a id="CA_3_6" name="CA_3_6"></a>Requirement Coverage</h4>

<p>application.lastresult$.interpretation <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_7" name="A_3_7"></a>A.3.7 Event Handler (must have) FULLY
COVERED</h3>

<p>The markup language provides an explicit event handling mechanism for
specifying actions to be carried out when events are generated in a dialog
state.</p>

<p>Event handlers can be ordered so that if multiple event handlers match the
current event, only the handler with the highest ranking is executed. By
default, event handler ranking is based on proximity and specificity: i.e.
the handler closest in the event hierarchy with the most specific matching
conditions.</p>

<p>Actions can be conditional upon variable assignments, as well as the type
and content of events (e.g. input events specifying media, content,
confidence, and so on).</p>

<p>Actions include: the binding of variables with information, for example,
information contained in events; transition to another dialog state
(including the current state).</p>

<h4><a id="CA_3_7" name="CA_3_7"></a>Requirement Coverage</h4>

<p>&lt;catch&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;transition&gt; <a
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>

<h3><a id="A_3_8" name="A_3_8"></a>A.3.8 Builtin Event Handlers (should have)
FULLY COVERED</h3>

<p>The markup language can provide implicit event handlers which provide
default handling of, for example, timeout and error events as well as
handlers for situations, such as confirmation and clarification, where there
is a transition to a implicit dialog state. For example, there can be a
default handler for user input events such that if the recognition confidence
score is below a given threshold, then the input is confirmed in a
sub-dialog.</p>

<p>Properties of implicit event handlers (thresholds, counters, locale, etc)
can be explicitly customized in the markup language.</p>

<p>Implicit event handlers are always overridden by explicit handlers.</p>

<h4><a id="CA_3_8" name="CA_3_8"></a>Requirement Coverage</h4>

<p>Default event handlers (nomatch, noinput, error, etc...) <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<h3><a id="A_3_9" name="A_3_9"></a>A.3.9 Output Content and Events (must
have) FULLY COVERED</h3>

<p>The markup language complies with the requirements developed by the Speech
Synthesis Markup Subgroup for output text content and parameter settings for
the output device. Requirements on multimodal output will be co-ordinated by
the Multimodal Interaction Subgroup (cf. Section 1).</p>

<p>In addition, the markup supports the following output features (if not
already defined in the Synthesis Markup):</p>
<ol>
  <li>Pre-recorded audio file output</li>
  <li>Streamed audio</li>
  <li>Playing/synthesizing sounds such as tones and beeps</li>
  <li>variable level of detail control over structured text</li>
</ol>

<p>The output device generates timestamped events including error events and
progress events (output started/stopped, current position).</p>

<h4><a id="CA_3_9" name="CA_3_9"></a>Requirement Coverage</h4>

<p>&lt;audio&gt;, &lt;prompt&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;speak&gt; and other SSML elements <a
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>

<p>application.lastresult$.markname, application.lastresult$.marktime <a
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>

<h3><a id="A_3_10" name="A_3_10"></a>A.3.10 Richer Output (nice to have)
FULLY COVERED</h3>

<p>The markup language allows for richer output than variable substitution in
the output content. For example, natural language generation of output
content.</p>

<h4><a id="CA_3_10" name="CA_3_10"></a>Requirement Coverage</h4>

<p>&lt;prompt&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;speak&gt; and other SSML elements <a
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>

<h3><a id="A_3_11" name="A_3_11"></a>A.3.11 Input Content and Events (must
have) FULLY COVERED</h3>

<p>The markup language complies with the requirements developed by the
Grammar Representation Subgroup for the representation of speech grammar
content. Requirements on multimodal input will be co-ordinated by the
Multimodal Interaction Subgroup (cf. Section 1).</p>

<p>The markup language can specify the activation and deactivation of
multiple speech grammars. These can be user-defined, or builtin grammars
(digits, date, time, money, etc).</p>

<p>The markup language can specify parameters for speech grammar content
including timeout parameters --- maximum initial silence, maximum utterance
duration, maximum within-utterance pause --- energy thresholds necessary for
bargein, etc.</p>

<p>The input device generates timestamped events including input timeout and
error events, progress events (utterance started, interference, etc), and
recognition result events (including content, interpretation/variable
bindings, confidence).</p>

<p>In addition to speech grammars, the markup language allows input content
and events to be specified for DTMF and keyboard devices.</p>

<h4><a id="CA_3_11" name="CA_3_11"></a>Requirement Coverage</h4>

<p>timeout, completetimeout, incompletetimeout, interdigittimeout,
termtimeout properties <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>application.lastresult$.interpretation, application.lastresult$.confidence
<a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
2.0</a></p>

<p>application.lastresult$.markname, application.lastresult$.marktime <a
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>

<p>&lt;grammar&gt; and other elements <a
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>

<h3><a id="A_4_1" name="A_4_1"></a>A.4.1 Event Handling (must have) FULLY
COVERED</h3>

<p>One key difference between contemporary event models (e.g. DOM Level 2,
'try-catch' in object-oriented programming) is whether the same event can be
handled by more than one event handler within the hierarchy. The markup
language must motivate whether it supports this feature or not.</p>

<h3><a id="A_4_2" name="A_4_2"></a>A.4.2 Logging (nice to have) FULLY
COVERED</h3>

<p>For development and testing it is important that data and events are to be
logged by the voice browser. At the most detailed level, this will include
logging of input and output audio data. A mechanism which allows logged data
to be retrieved from a voice browser, preferably via standard Internet
protocol (http, ftp, etc), is also required.</p>

<p>One approach is to require that the markup language can control logging
via, for example, an optional meta tag. Another approach is for logging to be
controlled by means other than the markup language, such as via proprietary
meta tags.</p>

<h4><a id="CA_4_2" name="CA_4_2"></a>Requirement Coverage</h4>

<p>&lt;log&gt; <a
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>

<p>&lt;log&gt; <a href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML
1.0</a></p>
</body>
</html>