NOTE-EMMAreqs-20030113
23.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Requirements for EMMA</title>
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-NOTE"/>
</head>
<body>
<div class="head">
<a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/w3c_home" alt="W3C" height="48" width="72" /></a>
<h1>Requirements for EMMA</h1>
<h2>W3C Note 13 January 2003</h2>
<dl>
<dt>This version:</dt>
<dd><a href="http://www.w3.org/TR/2003/NOTE-EMMAreqs-20030113">http://www.w3.org/TR/2003/NOTE-EMMAreqs-20030113</a></dd>
<dt>Latest version:</dt>
<dd><a href="http://www.w3.org/TR/EMMAreqs">http://www.w3.org/TR/EMMAreqs</a></dd>
<dt>Previous versions:</dt>
<dd>This is the first public version</dd>
<dt>Editors:</dt>
<dd>Stéphane H. Maes, Oracle Corporation <a href="mailto:stephane.maes@oracle.com"><stephane.maes@oracle.com></a></dd>
<dd>Stephen Potter, Microsoft <a href="mailto:stephane.maes@oracle.com"><spotter@microsoft.com></a></dd>
</dl>
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"> Copyright</a> © 2003 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.lcs.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</a> rules apply.</p> <!-- end copyright -->
<hr />
</div> <!-- end of head -->
<h2 class="notoc"><a id="abstract"
name="abstract">Abstract</a></h2>
<p>This document describes requirements for the Extensible
MultiModal Annotation language (EMMA) specification under
development in the <a href="/2002/mmi/">W3C Multimodal Interaction
Activity</a>. EMMA is intended as a data format for the interface
between input processors and interaction management systems. It will
define the means for recognizers to annotate application specific
data with information such as confidence scores, time stamps, input
mode (e.g. key strokes, speech or pen), alternative recognition
hypotheses, and partial recognition results, etc. EMMA is a target
data format for the semantic interpretation specification being
developed in the <a href="/Voice/">Voice Browser Activity</a>, and
which describes annotations to speech grammars for extracting
application specific data as a result of speech recognition. EMMA
supercedes earlier work on the natural language semantics markup
language in the Voice Browser Activity.</p>
<h2 id="Status">Status of this Document</h2>
<p><em>This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. The latest status of this document series is maintained
at the
<abbr title="the World Wide Web Consortium">W3C</abbr>.</em></p>
<p>W3C's <a href="http://www.w3.org/2002/mmi/">Multimodal
Interaction Activity</a> is developing specifications for extending
the Web to support multiple modes of interaction. This document
provides the basis for guiding and evaluating subsequent work on a
specification for a data format (EMMA) that acts as an exchange
mechanism between input processors and interaction management
components in a multimodal application. These components are
introduced in the <a href="/TR/mmi-framework/">W3C Multimodal
Interaction Framework</a>.</p>
<p>This document is a NOTE made available by the W3C for archival
purposes, and is not expected to undergo frequent changes. Publication
of this Note by W3C indicates no endorsement by W3C or the W3C Team,
or any W3C Members. A list of current W3C technical reports and
publications, including Recommendations, Working Drafts, and Notes
can be found at <a
href="http://www.w3.org/TR/">http://www.w3.org/TR/</a>.</p>
<p>This document has been produced as part of the <a
href="http://www.w3.org/2002/mmi/">W3C Multimodal Interaction
Activity</a>,<span class="c1"><a
href="http://www.w3.org/2002/mmi/Activity.html"></a></span>
following the procedures set out for the <a
href="http://www.w3.org/Consortium/Process/">W3C Process</a>. The
authors of this document are members of the <a
href="http://www.w3.org/2002/mmi/Group/">Multimodal Interaction
Working Group</a> (<a
href="http://cgi.w3.org/MemberAccess/AccessRequest">W3C Members
only</a>). This is a Royalty Free Working Group, as described in
W3C's <a href="/TR/2002/NOTE-patent-practice-20020124">Current
Patent Practice</a> NOTE. Working Group participants are required
to provide <a href="http://www.w3.org/2002/01/mmi-ipr.html">patent
disclosures</a>.</p>
<p>Please send comments about this document to the public mailing
list: <a
href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a> (<a
href="http://lists.w3.org/Archives/Public/www-multimodal/">public
archives</a>). To subscribe, send an email to <<a
href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>>
with the word <em>subscribe</em> in the subject line (include the
word <em>unsubscribe</em> if you want to unsubscribe).</p>
<h2 id="toc">Table of Contents</h2>
<dl>
<dd>
<a href="#0">Introduction</a>
</dd>
<dd>
<a href="#1">1. Scope of EMMA </a>
</dd>
<dd>
<a href="#2">2. Data model requirements </a>
</dd>
<dd>
<a href="#3">3. Annotation requirements </a>
</dd>
<dd>
<a href="#4">4. Integration with other work </a>
</dd>
</dl>
<hr />
<p>
<i><b><a name="0">Introduction</a></b></i>
</p>
<p>
Extensible MultiModal Annotation language (EMMA) is the markup language
used to represent human input to a multimodal application.
As such, it may be seen in terms of the <a href="http://www.w3.org/TR/mmi-framework/">W3C Multimodal Interaction Framework</a>
as the exchange mechanism between
user input devices and the <span>interaction</span> management capabilities of an application.
</p>
<h4 id="principles">General Principles</h4>
<p>
An EMMA document can be considered to hold three types of data:
</p>
<ul>
<li>
<b>instance data</b><br />
<span>
The slots and values corresponding to input information
which is meaningful to the consumer of an EMMA document.
Instances are
application-specific and
built by input processors at runtime.
Given that utterances may be ambiguous with respect to input values,
an EMMA document may hold more than one instance.
</span>
</li>
<li>
<b>data model</b><br />
<span>
The constraints on structure and content of an instance.
The data model is typically pre-established by an application, and
may be implicit, that is, unspecified.
</span>
</li>
<li>
<b>metadata</b><br />
<span>
Annotations associated with the data contained in the instance.
Annotation values are added by input processors at runtime.
</span>
</li>
</ul>
<p>
<span>
Given the assumptions above about the nature of data represented
in an EMMA document, the following general principles apply to the design of EMMA:</span></p>
<ul>
<li>
The
<span>main prescriptive content</span>
of the EMMA specification will consist of metadata: EMMA will provide a means
to express the metadata annotations which require standardization.
<span>(Notice, however, that such annotations may express
the relationship among all the types of data within an EMMA document.)</span></li>
<li>
The instance and its data model is assumed to be specified in XML, but EMMA
will remain agnostic to the XML format used to express these. (The
instance XML is assumed to be sufficiently structured to enable the association
of annotative data.)</li>
</ul>
<p>The following sections apply these principles in terms of the scope of EMMA,
the requirements on the contents and syntax of data model and annotations, and
EMMA integration with other work.
</p>
<hr />
<h3><a name="1"> 1. Scope
<span>and General Requirements</span></a>
</h3>
<ul>
<li>
<strong>EMMA must be able to represent the following kinds of input:</strong>
<ul>
<li>
<i>1.1</i> input in any human language</li>
<li>
<i>1.2</i> input from the modalities and
devices specified in the next section</li>
<li>
input reflecting the results of the following processes:
<ul>
<li>
<i>1.3</i> token interpretation from signal
(e.g. speech+<a href="http://www.w3.org/TR/speech-grammar/">SRGS</a>)</li>
<li>
<i>1.4</i> semantic interpretation from
token/signal (e.g. text+<abbr title="Natural Language">NL</abbr> parsing/speech+<a href="http://www.w3.org/TR/speech-grammar/">SRGS</a>+<a href="http://www.w3.org/TR/semantic-interpretation/">SI</a>)</li>
</ul>
</li>
<li>
input gained in any of the following ways:
<ul>
<li>
<i>1.5</i> single modality input</li>
<li>
<i>1.6</i> sequential modality input,
<span>that is:
single-modality inputs presented in sequence </span></li>
<li>
<i>1.7</i> simultaneous modality input (as
defined in the main <a href="http://www.w3.org/TR/mmi-reqs/">MMI requirements doc</a>).</li>
<li>
<i>1.8</i> composite modality input (as
defined in the main <a href="http://www.w3.org/TR/mmi-reqs/">MMI requirements doc</a>).</li>
</ul>
</li>
</ul>
</li>
</ul>
<p> </p>
<ul>
<li>
<strong>EMMA must be able to represent input from the following modalities, devices and
architectures:</strong>
<ul>
<li>
human language input modalities
<ul>
<li>
<i>1.9</i> text</li>
<li>
<i>1.10</i> speech</li>
<li>
<i>1.11</i> handwriting</li>
<li>
<i>1.12</i> other modalities identified by
the <a href="http://www.w3.org/2003/01/EMMAreqs.html">MMI Requirements document</a> as required
</li>
<li>
<i>1.13</i> combinations of the above
modalities</li>
</ul>
</li>
<li>
devices
<ul>
<li>
<i>1.14</i> telephones (i.e. no device
processing, proxy agent)</li>
<li>
<i>1.15</i> thin clients (i.e. limited
device processing)</li>
<li>
<i>1.16</i> rich clients (i.e. powerful
device processing)</li>
<li>
<i>1.17</i> everything in this range</li>
</ul>
</li>
<li>
known and foreseeable network configurations
<ul>
<li>
<i>1.18</i> architectures</li>
<li>
<i>1.19</i> protocols</li>
</ul>
</li>
<li>
<i>1.20</i> extensibility to further
devices and modalities
</li>
</ul>
<p> </p>
</li>
<li>
<strong>Representation of output and other uses
</strong>
<p>
EMMA
<span> is considered primarily</span>
as a representation of user input, and it is in this context that the rest of
this document defines the requirements on EMMA. Given that the focus of EMMA is on
meta information, sufficient need is not seen at this stage to define standard
annotations for system output
nor for general message content between system components.
However, the following requirement is included
to ensure that EMMA may still be used in these cases where necessary.
</p>
<ul>
<!-- <li><i>1.21</i> EMMA as
a representation from which system output markup may be
generated or for general purpose component communication
must not be precluded.</li> -->
<li><i>1.21</i> The
following uses of EMMA must not be precluded:
<ul>
<li>a representation from which system output markup may
be generated;</li>
<li>a language for general purpose communication among
system components.</li>
</ul>
</li>
</ul>
<p> </p>
</li>
<li>
<strong>Ease of use and portability</strong>
<ul>
<li>
<i>1.22</i> EMMA content must be accessible
via standard means (e.g. XPath).
</li>
<li>
<i>1.23</i> Queries on EMMA content must be
easy to author.</li>
<li>
<i>1.24</i> The EMMA specification must
enable portability of EMMA documents across applications.
</li>
</ul>
</li>
</ul>
<hr />
<h3><a name="2">2. Data model requirements</a>
</h3>
<ul>
<li>
<strong>Data model content</strong>
<p>The following requirements apply to the use of data models in EMMA
documents</p>
<ul>
<li>
<i>2.1</i> use of a data model and
constraints must be possible, for the purposes of validation and
interoperability
</li>
<li>
<i>2.2</i> use of a data model will not be
required<ul>
<li>
in other words, it must be possible to rely on an implicit data model.</li>
</ul>
</li>
<li>
<i>2.3</i>
<span>it must be possible in a single EMMA document
to associate different data models with different instances</span></li>
</ul>
<p>
<span>
It is assumed that the combination and decomposition of data models
will be supported by data model description formats (e.g. XML Schema),
and that the comparison of data models is enabled by standard
XML comparison mechanisms (e.g. use of XSLT, XPath). Therefore this functionality
is not considered a requirement on EMMA data modelling.
</span>
</p>
</li>
<li>
<strong>Data model description formats</strong>
<p>The following requirements apply to the description format of data
models used in EMMA documents</p>
<ul>
<li>
<i>2.4</i> existing standard formats must
be able to be used, for example:
<ul>
<li>
arbitrary XML</li>
<li>
XML Schema</li>
<li>
XForms</li>
</ul>
</li>
<li>
<i>2.5</i> no single description format is
required<br />
<span> The use of a data model in EMMA is for the purpose of
validating an EMMA instance against the constraints of a data model.
Since Web applications today use different formats to specify data models, e.g.
XML Schema, XForms, Relax-NG, etc., the principle that EMMA does not require
a single format enables EMMA to be used in a variety of application contexts.
The concern that this may lead to problems of interoperability has been discussed,
and will be reviewed during production of the specification.
</span>
</li>
<li>
<i>2.6</i> data model declarations
must be able to be specified
inline or referenced
</li>
</ul>
</li>
</ul>
<hr />
<h3><a name="3">3. Annotation requirements</a>
</h3>
<ul>
<li>
<strong>Annotation content</strong>
<p>
<span>EMMA must enable the specification of the following features. </span>
For each annotation feature, "local" annotation is assumed: that is, that the
association of the annotation may be at any level within the instance
structure, and not only at the highest level.
</p>
<ul>
<li>
<b>General meta data</b>
<ul>
<li>
<i>3.1</i> lack of input
</li>
<li>
<i>3.2</i> uninterpretable input
</li>
<li>
<i>3.3</i> identification of input source
</li>
<li>
<i>3.4</i> time stamps
</li>
<li>
<i>3.5</i> relative positioning of input
events
<br />
(NB: This requirement is covered explicitly by time stamps, but reflects use of
EMMA in environments in which times tamping may not be possible.)
</li>
<li>
<i>3.6</i> temporal grouping of input
events
</li>
<li>
<i>3.7</i> human language of input
</li>
<li>
<i>3.8</i> identification of input modality
</li>
</ul>
</li>
<li>
<b>Annotational structure </b>
<ul>
<li>
<i>3.9</i> association to corresponding
instance element annotated
</li>
<li>
<i>3.10</i> reference to data model
definition
</li>
<li>
<i>3.11</i> composite multimodal input:
representation of input from multiple modalities.
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li>
<b>Recognition (signal --> tokens processing)</b>
<ul>
<li>
<i>3.12</i> reference to signal
</li>
<li>
<i>3.13</i> reference to processing used
(e.g. <a href="http://www.w3.org/TR/speech-grammar/">SRGS</a> grammar)
</li>
<li>
<i>3.14</i> tokens of utterance
</li>
<li>
<i>3.15</i> ambiguity
<br />
This enables a tree-based representation of local ambiguity. That is,
alternatives are expressible for given nodes in the structure.
</li>
<li>
<i>3.16</i> confidence scores of
recognition
</li>
</ul>
</li>
</ul>
<ul>
<li>
<b>Interpretation (tokens --> semantic processing)</b>
<ul>
<li>
<i>3.17</i> tokens of utterance
</li>
<li>
<i>3.18</i> reference to processing used
(e.g. <a href="http://www.w3.org/TR/speech-grammar/">SRGS</a>)
</li>
<li>
<i>3.19</i> ambiguity
</li>
<li>
<i>3.20</i> confidence scores of
interpretation
</li>
</ul>
</li>
</ul>
<ul>
<li>
<b>Recognition and Interpretation (signal --> semantic processing)</b>
<ul>
<li>
<i>3.21</i> <i>union of
Recognition/Interpretation features,
<span>(e.g. <a href="http://www.w3.org/TR/speech-grammar/">SRGS</a> + <a href="http://www.w3.org/TR/semantic-interpretation/">SI</a>)</span></i>
</li>
</ul>
<br />
</li>
<li>
<b>Modality-dependent annotations</b>
<ul>
<li>
<i>3.22</i> EMMA must be extensible to
annotations which are specific to particular modalities, e.g. those of:
<ul>
<li>
speech
</li>
<li>
handwriting
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>
</p>
<ul>
<li>
<strong>Annotation syntax</strong>
<p>The following requirements apply to the syntax that will be used in
EMMA to express annotative data:</p>
<ul>
<li>
<i>3.23</i> must enable association of
annotations with instance data</li>
<li>
<i>3.24</i> must be compatible with RDF
conceptual framework</li>
<li>
<i>3.25</i> must enable extensibility
(optional/proprietary annotations)</li>
<li>
<i>3.26</i> (nice to have) may enable the
specification of word graphs in addition to local ambiguity.<br />
NB - this is not currently seen as a necessary feature, and is unlikely to be
sufficiently high priority to be addressed in the specification.
</li>
</ul>
</li>
</ul>
<hr />
<h3><a name="4">4. Integration with other work</a></h3>
<p><i>4.1</i> Where such alignment
is appropriate, EMMA must enable the use and integration of widely adopted
standard specifications and features. The following activities are considered
most relevant in this respect:
</p>
<ol>
<li>
W3C activities
<ul>
<li>
MMI activities
<ul>
<li>
MMI general requirements</li>
<li>
Events subgroup requirements</li>
<li>
Integration subgroup requirements</li>
<li>
Ink subgroup requirements</li>
</ul>
</li>
<li>
Voice Browser activities
<ul>
<li>
<a href="http://www.w3.org/TR/speech-grammar/">SRGS</a>: EMMA must enable results from speech using <a href="http://www.w3.org/TR/speech-grammar/">SRGS</a></li>
<li>
<a href="http://www.w3.org/TR/semantic-interpretation/">SI</a>: EMMA must enable results from speech using <a href="http://www.w3.org/TR/speech-grammar/">SRGS</a> with <a href="http://www.w3.org/TR/semantic-interpretation/">SI</a> output</li>
</ul>
</li>
<li>
Other W3C activities
<ul>
<li>
Relevant XML-related activities
</li>
<li>
RDF working group
</li>
</ul>
</li>
</ul>
</li>
<li>
Other organizations and standards
<ul>
<li>
SpeechSC (IETF)
</li>
</ul>
</li>
</ol>
</body>
</html>