index.html 126 KB

Raw Blame History Permalink

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en"><head><META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>XML Binary Characterization Measurement Methodologies</title><style type="text/css">
code           { font-family: monospace; }

div.constraint,
div.issue,
div.note,
div.notice     { margin-left: 2em; }

ol.enumar      { list-style-type: decimal; }
ol.enumla      { list-style-type: lower-alpha; }
ol.enumlr      { list-style-type: lower-roman; }
ol.enumua      { list-style-type: upper-alpha; }
ol.enumur      { list-style-type: upper-roman; }


div.exampleInner pre { margin-left: 1em;
                       margin-top: 0em; margin-bottom: 0em}
div.exampleOuter {border: 4px double gray;
                  margin: 0em; padding: 0em}
div.exampleInner { background-color: #d5dee3;
                   border-top-width: 4px;
                   border-top-style: double;
                   border-top-color: #d3d3d3;
                   border-bottom-width: 4px;
                   border-bottom-style: double;
                   border-bottom-color: #d3d3d3;
                   padding: 4px; margin: 0em }
div.exampleWrapper { margin: 4px }
div.exampleHeader { font-weight: bold;
                    margin: 4px}
</style><link type="text/css" rel="stylesheet" href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css"></head><body><div class="head"><p><a href="http://www.w3.org/"><img width="72" height="48" alt="W3C" src="http://www.w3.org/Icons/w3c_home"></a></p>
<h1><a id="title" name="title"></a>XML Binary Characterization Measurement Methodologies</h1>
<h2><a id="w3c-doctype" name="w3c-doctype"></a>W3C Working Group Note 31 March 2005</h2><dl><dt>This version:</dt><dd>
			<a href="http://www.w3.org/TR/2005/NOTE-xbc-measurement-20050331/">http://www.w3.org/TR/2005/NOTE-xbc-measurement-20050331/</a>
		</dd><dt>Latest version:</dt><dd>
			<a href="http://www.w3.org/TR/xbc-measurement">
 	         http://www.w3.org/TR/xbc-measurement</a>
		</dd><dt>Previous version:</dt><dd>
                       <a href="http://www.w3.org/TR/2005/WD-xbc-measurement-20050224">http://www.w3.org/TR/2005/WD-xbc-measurement-20050224</a>
		                            </dd><dt>Editors:</dt><dd>Stephen D. Williams, Invited Expert</dd><dd>Peter Haggar, IBM Corporation</dd></dl><p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>&nbsp;&copy;&nbsp;2005&nbsp;<a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>&reg;</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p></div><hr><div>
<h2><a id="abstract" name="abstract"></a>Abstract</h2><p>This document describes measurement aspects, methods, caveats, test data, and test scenarios
    for evaluating the potential benefits of a candidate binary XML format.


    </p></div><div>
<h2><a id="status" name="status"></a>Status of this Document</h2><p><em>This section describes the status of this document at
      the time of its publication. Other documents may supersede this
      document. A list of current W3C publications and the latest
      revision of this technical report can be found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>
      at http://www.w3.org/TR/.</em></p><p>This is a <a href="http://www.w3.org/2004/02/Process-20040205/tr.html#WGNote">Working Group Note</a>, produced by the <a href="http://www.w3.org/XML/Binary/">XML Binary Characterization Working Group</a> as part of the <a href="http://www.w3.org/XML/">XML Activity</a>.</p><p>This document is part of a set of documents
produced according to the Working Group's <a href="http://www.w3.org/2003/09/xmlap/xml-binary-wg-charter.html">charter</a>, in which the Working Group has been determining Use Cases, characterizing the Properties that are
required by those Use Cases, and establishing objective, shared Measurements
to help judge whether XML 1.x and alternate binary encodings provide the
required properties.</p><p>
The XML Binary Characterization Working Group has ended its work.
This document is not expected to become a Recommendation later. It will be
maintained as a WG Note.
</p><p>
Discussion of this document takes place on the public
<a href="mailto:public-xml-binary@w3.org">public-xml-binary@w3.org</a> mailing list (<a href="http://lists.w3.org/Archives/Public/public-xml-binary/">public archives</a>).
</p><p>
Publication as a Working Group Note does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time. It is inappropriate to cite
this document as other than work in progress.
</p></div><div class="toc">
<h2><a id="contents" name="contents"></a>Table of Contents</h2><p class="toc">1 <a href="#intro">Introduction</a><br>
2 <a href="#relationship-to-use-cases">Relationship to Use Case and Characterization Documents</a><br>
3 <a href="#abstract-scenarios">Considerations for Test Suite Development</a><br>
4 <a href="#N100D1">Test Data</a><br>
5 <a href="#property-measurement-methodology">Property Measurement Methodology</a><br>
6 <a href="#detailed-measurements">Property Measurement - Detailed Analysis</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.1 <a href="#compactness-ID">Compactness</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.1.1 <a href="#c-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.1.2 <a href="#c-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.1.3 <a href="#c-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.1.4 <a href="#c-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.1.5 <a href="#c-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.2 <a href="#processing-efficiency-ID">Processing Efficiency</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1 <a href="#pe-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1.1 <a href="#processing-phase-definitions">Processing phase definitions</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1.2 <a href="#standard-apis">Standard APIs vs. abstract operations</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1.3 <a href="#incremental-overhead">Incremental Overhead</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1.4 <a href="#N10438">Complexity</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.1.5 <a href="#meas-considerations">Measurement Considerations</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.2 <a href="#pe-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.3 <a href="#pe-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.4 <a href="#pe-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.2.5 <a href="#pe-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.3 <a href="#accel-sequential-access">Accelerated Sequential Access</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.3.1 <a href="#asa-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.3.2 <a href="#asa-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.3.3 <a href="#asa-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.3.4 <a href="#asa-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.3.5 <a href="#asa-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.4 <a href="#content-type-management-ID">Content Type Management</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.4.1 <a href="#ctm-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.4.2 <a href="#ctm-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.4.3 <a href="#ctm-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.4.4 <a href="#ctm-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.4.5 <a href="#ctm-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.5 <a href="#deltas">Deltas</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.5.1 <a href="#delta-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.5.2 <a href="#delta-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.5.3 <a href="#delta-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.5.4 <a href="#delta-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.5.5 <a href="#delta-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.6 <a href="#efficient-update">Efficient Update</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.6.1 <a href="#eu-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.6.2 <a href="#eu-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.6.3 <a href="#eu-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.6.4 <a href="#eu-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.6.5 <a href="#eu-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.7 <a href="#embedding-support-ID">Embedding Support</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.7.1 <a href="#es-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.7.2 <a href="#es-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.7.3 <a href="#es-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.7.4 <a href="#es-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.7.5 <a href="#es-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.8 <a href="#generality-ID">Generality</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.8.1 <a href="#g-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.8.2 <a href="#g-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.8.3 <a href="#g-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.8.4 <a href="#g-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.8.5 <a href="#g-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.9 <a href="#human-readable-ID">Human Readable and Editable</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.9.1 <a href="#hr-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.9.2 <a href="#hr-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.9.3 <a href="#hr-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.9.4 <a href="#hr-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.9.5 <a href="#hr-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.10 <a href="#integratable-xml-ID">Integratable into the XML Stack</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.10.1 <a href="#ix-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.10.2 <a href="#ix-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.10.3 <a href="#ix-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.10.4 <a href="#ix-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.10.5 <a href="#ix-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.11 <a href="#no-arbitrary-limits">No Arbitrary Limits</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.11.1 <a href="#nal-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.11.2 <a href="#nal-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.11.3 <a href="#nal-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.11.4 <a href="#nal-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.11.5 <a href="#nal-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.12 <a href="#platform-neutrality-ID">Platform Neutrality</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.12.1 <a href="#pn-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.12.2 <a href="#pn-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.12.3 <a href="#pn-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.12.4 <a href="#pn-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.12.5 <a href="#pn-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.13 <a href="#random-access">Random Access</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.13.1 <a href="#ra-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.13.2 <a href="#ra-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.13.3 <a href="#ra-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.13.4 <a href="#ra-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.13.5 <a href="#ra-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.14 <a href="#round-trippable-ID">Round Trip Support</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.14.1 <a href="#rt-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.14.2 <a href="#rt-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.14.3 <a href="#rt-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.14.4 <a href="#rt-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.14.5 <a href="#rt-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.15 <a href="#signable-ID">Signable</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.15.1 <a href="#si-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.15.2 <a href="#si-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.15.3 <a href="#si-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.15.4 <a href="#si-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.15.5 <a href="#si-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.16 <a href="#small-footprint-ID">Small Footprint</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.16.1 <a href="#sfoot-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.16.2 <a href="#sfoot-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.16.3 <a href="#sfoot-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.16.4 <a href="#sfoot-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.16.5 <a href="#sfoot-tradeoffs">Known Tradeoffs</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;6.17 <a href="#space-efficiency-ID">Space Efficiency</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.17.1 <a href="#se-desc">Description</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.17.2 <a href="#se-type">Type &amp; range</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.17.3 <a href="#se-method">Methodology</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.17.4 <a href="#se-dep">Dependencies</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6.17.5 <a href="#se-tradeoffs">Known Tradeoffs</a><br>
7 <a href="#N10909">References</a><br>
</p>
<h3><a id="appendices" name="appendices"></a>Appendix</h3><p class="toc">A <a href="#N109AE">Acknowledgements</a><br>
</p></div><hr><div class="body"><div class="div1">
<h2><a id="intro" name="intro"></a>1 Introduction</h2><p>This document describes measurement aspects, methods, caveats, test data, and test scenarios
    for evaluating the potential benefits of a candidate binary XML format.  This document
    relies on the XML Binary Characterization Working Group (XBC WG) documents for Use Cases and
    Properties.  The focus of this document is to provide a basis for later comparison rather than
    reporting of actual measurements of actual implementations.  The examined and potential use
    cases represent existing uses that might benefit from the use of an XML-like format, if it had
    certain additional properties.  This potential expansion of the XML community depends on the
    existence, identification, and evolution of solutions that cover the broadest problem footprint
    in the best fashion.  The XBC WG Characterization document represents the working group's
    consensus of required and useful properties.  This document discusses how fulfillment of those
    properties can be precisely evaluated and how combinations of properties are best compared.</p><p>A particular format in a particular application situation may need to incorporate design
    tradeoffs that lower support for a particular property.  Unless otherwise noted, the properties
    are written as positive requirements that are at least desirable.</p></div><div class="div1">
<h2><a id="relationship-to-use-cases" name="relationship-to-use-cases"></a>2 Relationship to Use Case and Characterization Documents</h2><p>Measurement of properties relies directly on Use Case needs.  These needs are expressed in
    application-specific terms and context.  The definition of properties in the Properties
    document, unified by common needs among use cases, provides the identification of measurement
    points, but additional information remains to be captured from the use cases.  The primary
    additional information areas are the operational scenarios, representative test data, and the
    thresholds at which an aggregate solution might be worth significant adoption.  Representation
    of these areas must be initially approximated and abstracted.  The Use Case document details and
    summarizes the relationship between properties and use cases.  The Characterization document
    represents decisions about thresholds of acceptability and ranking of properties.</p></div><div class="div1">
<h2><a id="abstract-scenarios" name="abstract-scenarios"></a>3 Considerations for Test Suite Development</h2><p>In evaluating efficient formats with regard to the properties defined in <a href="#XBC-Properties">XBC Properties</a> and further addressed in this document, it is necessary
to consider how properties may be related.  Property relationships may be affected by the nature of
the properties themselves, the environment, or the capabilities of the efficient format instances being
evaluated.</p><p>Some combinations of properties may be contradictory, especially with respect to certain design
strategies.  Some solutions may not support certain properties or simultaneous combinations of
properties.  Certain properties or combinations are comparable, sometimes only in one direction,
to other properties.  For instance, a lossless encoder can be compared to lossy encoders in an
evaluation of efficiency with the option of lossiness, but not vice versa.  In addition, a
non-schema solution can be compared to schema-based solutions in all modes, but schema-based
methods might not be comparable in property combinations that contraindicate schema-based encoding. </p><p>These property combinations and application scenario details must be considered when planning
test scenarios and when performing valid and useful format comparisons.  The discussion below
is intended to outline the goals and potential pitfalls of developing test scenarios for conducting
detailed format comparisons.  Wherever possible, test scenarios should be abstract
(i.e., not tied to any one particular use case).</p><p>The purpose of abstract test scenarios is to catalog and unify the variability that is needed in
realistic test suites to evaluate efficient formats.  A test suite that is representative of the
use cases must exercise appropriate combinations of this variability.  In addition, testing every
combination of property presence and weighting is not feasible with limited resources and is not
very useful.  Developing abstracting test scenarios can help to decrease the test suite size while
maintaining its relevance.</p><p>This approach also makes clear any simultaneous need for certain sets of properties.  This
correlation can be used to create a small set of property profiles that cluster around certain
types of problems.   A property profile defines a set of properties that are essential or desirable
for a particular abstract test scenario.  The goal of defining property profiles is to simplify the
number of test cases to be developed and applied.  Ideally, abstract test scenarios would also
include user use case situational descriptions and address data manipulation patterns.  These
manipulation patterns should include when and how data is created, read, modified, transferred,
and disposed of.</p><p>It is also useful to employ abstract variability descriptions of application environments. The
ranges listed below describe particular aspects of an application environment as it affects
processing of data that could be externalized in an efficient format.  This is an initial list,
and should not be considered complete or minimized.  At least one use case addresses each range,
although not all possible combinations of values are indicated.</p><ol class="enumar"><li>Processing symmetry: broadcast (powerful server, weak clients, one-way) mobile (powerful
server, weak clients, two-way), extra/intranet (powerful client/servers), peer-to-peer (powerful
clients, weak or no servers/routers), web (variable servers and clients)</li><li>Lifecycle: one time message (create/send/consume/forget), message stream, reused immutable
message, touring modifications*, routed / header based, fragment reuse</li><li>Communication/creation model: full copy, carousel w/errors, replication/template
(delta/differences), query/fragment, streaming, many processing updates, version update</li><li>Primary concern: size (wire, memory, footprint), speed (parse/serialize, access, modify),
size/speed, other</li><li>Data/structure mix: mostly structure tags, mostly text, mostly binary, mostly arrays of
scalars (floats, etc.), mixed ratios</li><li>Data access pattern: parse to DOM, parse event (SAX et al) to data binding, direct read or
read/write access</li><li>Redundancy: little, moderate, large, random or unknown redundancy of data and/or
structure</li><li>Schema: (Schema informed vs. schema based): closed, open, fixed, evolving, variable,
self contained</li><li>Document size: tiny (hundreds of bytes or less), small, medium, large, huge (gigabytes)</li><li>Security: canonicalized or raw data, signed document, signed subset, encrypted document,
encrypted subset</li><li>Aggregation: include binary data, include XML documents, none</li></ol><p>*Refers to the process of being created, transmitted and modified, and transmitted one or more
times.  An example would be a form that is routed from person to person with each person filling in
data and signing their portion. This pattern is particularly important when speed and security are
needed simultaneously.</p></div><div class="div1">
<h2><a id="N100D1" name="N100D1"></a>4 Test Data</h2><p>Appropriate test data is crucial to understanding performance for all considered uses and
    circumstances.  Data can be structure heavy, with many large tags, or data heavy.  Data can be
    more uniform or more random.  Data may benefit from generalized or application-specific
    compression or coding.  Good test data simulates a variety of applications and broad testing of
    solutions.</p><p>Most format candidates and implementations will have some tunable
    parameters that affect which options are enabled and to what degree.  It is impractical to test
    every combination of every parameter in such complex systems.  To solve this assessment
    challenge, suitable edge and midpoint values must be chosen and various combinations iterated.
    Reports based on testing should highlight average, typical, and worst case performance with
    explanations as needed.</p><p>Test data available to the working group is published on the working group public web site at
    <a href="http://www.w3.org/XML/Binary/2005/03/test-data/">http://www.w3.org/XML/Binary/2005/03/test-data/</a></p></div><div class="div1">
<h2><a id="property-measurement-methodology" name="property-measurement-methodology"></a>5 Property Measurement Methodology</h2><p>The methodology used includes two levels of property support measurement.  The first, basic
    level provides a succinct screening of formats by thresholding properties.  The threshold type
    is either boolean or trinary.  A boolean measurement indicates whether the property is supported
    and is expected to perform better than, or in some cases the same as, XML 1.x.  A trinary
    measurement records whether the format supports the property, does not prevent (DNP) the
    property, or prevents the property from being implemented.  These thresholds are used with
    property ranking and are contained in the <a href="#XBC-Characterization">Characterization</a>
    document to determine relative importance of properties which supports candidate format decision
    making.</p><p>The second, detailed measurement level for some properties is useful in detailed comparisons
    of candidate formats to each other.  Valid and useful comparison of formats is difficult for
    binary XML candidates.  This is caused by the need for a large array of properties constraining
    solutions which must simultaneously operate well on a broad range of data.  Detailed measurement
    of properties naturally falls into different types and ranges of values.  Some properties have
    one or more boolean membership values, others have categorical levels of compliance, relative,
    or absolute values.  The success of fulfilling a property may depend on the data and usage
    scenario.  Certain measurements, such as expected or actual performance of implementations and
    size of instances, require careful analysis.  In most cases design or configuration tradeoffs
    for one property will affect many others.  In some cases, that influence will be strongly
    correlated.  Additionally, a format may be tunable in hinted or automatic ways to favor
    different property goals.  An example of this would be optimizing for speed vs. compactness with
    various possible ratios of speed and compactness.  It is important to note that both compactness
    and processing efficiency are affected by the method of support for most other properties.  Many
    other properties are only beneficial when they are supported in ways that allow good compactness
    and processing efficiency.</p><p>This detailing of selected properties is in the
    <a href="#detailed-measurements">Property Measurement - Detailed Analysis</a> section.</p></div><div class="div1">
<h2><a id="detailed-measurements" name="detailed-measurements"></a>6 Property Measurement - Detailed Analysis</h2><p>The detailed property measurements identified by or
        submitted to the working group are documented below.  Detailed
        property descriptions may have the following descriptive
        sections:</p><ul><li><b>Description</b>:
    Provides a name and overview of the measurement.  May also
    identify the properties, usage scenarios, and use cases that apply
    and to what degree.  Each use case has one or a small number of
    pain points that are the most prominent issues to be improved,
    plus other issues that range from important to
    nice-to-have.</li><li><b>Type and range</b>: This aspect of a measurement
    indicates how a proposal rating is recorded and ranked.  Examples are:

    Boolean membership, degrees, absolutes, relatives</li><li><b>Methodology</b>: Different properties may be
    measured in different ways.  This can include logical inspection,
    formal or informal proofs, code inspection, or testing.</li><li><b>Dependencies</b>:
    Description of positively or negatively correlated dependencies on
    other properties and design tradeoffs.  Strong and weak
    dependencies are noted.</li><li><b>Known tradeoffs</b>: It can be important to be aware of certain key
    decisions and their overall effect to avoid focusing too narrowly.
    This section indicates the main design tradeoffs related to this
    property, both those directly involving the implementation of
    solutions to this property and other properties.</li></ul><p>A number of properties can be measured independent of other properties.  The key properties that
are at the root of the need for a successful binary XML format are <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a> and
<a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#processing-efficiency">Processing
Efficiency</a>.  These two properties directly depend on nearly every other property in the sense
that most of the other properties are interesting mainly when they are supported while also having
good compactness and processing efficiency.  For example, it is not useful to have a method of
random access if it makes instances bigger and slower than just parsing an XML 1.x document.</p><div class="div2">
<h3><a id="compactness-ID" name="compactness-ID"></a>6.1 Compactness</h3><div class="div3">
<h4><a id="c-desc" name="c-desc"></a>6.1.1 Description</h4><p>The <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>
			property measurement represents the amount of compression a particular
			format achieves when encoding data model items.  The degree of compactness
			achieved with a particular format is highly dependent on the input data model items,
			strategies enabled, and application characteristics.  In test scenarios,
			these characteristics should vary considerably to emulate all important use
			cases in order to properly measure the compactness property of each
			competing format.  To objectively compare formats for their ability to
			represent data model items in a compact manner, competing measurements of various
			formats must be taken using the same scenario.</p><p>A possible disadvantage of any compact encoding might be the additional
                        computation required to generate or interpret and use the encoding.  There
                        is a tendency, exhibited by many size minimization strategies, for
                        compactness to be inversely proportional to <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#processing-efficiency">Processing
                        Efficiency</a>.  If compactness is absolutely maximized, processing
                        efficiency will decrease in most cases.  Note that for many test scenarios,
                        it is possible to improve both compactness and processing efficiency
                        relative to the use of XML 1.x.  It is desirable for a format to support the
                        ability to control its compactness based on the need for processing
                        efficiency, available memory, or other properties.  For example, if the
                        format is processed on a high-bandwidth server, the algorithm should be able
                        to be tuned to obtain maximum processing efficiency by sacrificing memory
                        efficiency.  On the other hand, if the format is processed on low-bandwidth
                        mobile handsets, the algorithm should be able to obtain maximum compactness
                        by sacrificing processing efficiency.  A key need is the ability to balance
                        compactness with processing efficiency in a tunable way.  Certain
                        strategies, principally frequency-based dynamic analysis such as gzip
                        compression, are more appropriate when size is the overriding concern.
                        Given the constraints of simultaneously minimal size and processing
                        overhead, methods such as tokenization with dictionary tables might be more
                        successful.</p><p>Size efficiency, or compactness, concerns the optimization of the storage
                        or transmission resources needed to represent data model items.  Several
                        categories of methods are known to be useful.  This section reviews major
                        categories of methods and related topics which provides background for
                        format analysis.</p><p>A data object, which is the representation of data model items, consists of
			three logical components that usually have a physical representation.  These
			are the data, the structural information, and metadata (including typing).
			For XML 1.x, the structure and metadata are represented by tag syntax and
			naming while data is mostly present in attribute values and element text.
			Some strategies for data representation remove some or all structural and
			metadata representation and place it in external metadata or embedded in
			code.</p><p>There are three categories of methods to reduce the size of a data object
			or data model items: compression, decimation, or externalization.  Competitive
			formats may make use of one or more methods from each category.  Compression
			is the transformation of data into corresponding data that takes less
			storage through the removal or reuse of redundant information and more
			efficient coding of data.  Compression is often paired with decimation, the
			process of eliminating some details that are not used or of less importance
			than more important components of the original data.  This is called "lossy
			compression" as opposed to "lossless compression" or just "compression".</p><p>Externalization is the process of representing an original data model items as an
			external representation with varying degrees of reuse and a data object that
			relies on that external instance as a source of redundancy.  This external
			information can be considered shared information between a sender and
			receiver.  In some cases, this information is relatively stable, long term
			shared information, while in others, the potentially sharable information is
			ephemeral.  Besides the trivial replacement of an object with a reference,
			there are two main externalization methods, schema-based (usually long-term
			shared information) and delta-based (usually short-term shared information).
			A schema-based method relies on a specification for certain aggregate data
			types, structure, and/or values.  Trivially, this could mean sending and
			receiving code that simply writes and reads values in a certain order with
			no explicit structure.  In this case, the structure and data type metadata
			is implicitly present in the code.  More sophisticated methods rely on
			interface definition languages (IDL) or the reuse of validation schema such
			as XML Schema for externalization purposes.  The use of these structural and
			metadata specifications may result in code generation and/or the production
			of metadata for use by an interpretive engine.  Schema-based externalization
			usually has long-term schema reuse characteristics.  A schema-based
			externalization is relying on long-term redundancy.  This is compatible with
			some programming and lifecycle models, but can conflict with some
			application needs.</p><p>When the externalization method relies in a generalized way on
			representing differences from a template, parent object, or earlier message,
			it is called a delta.  Delta mechanisms can be implemented in a high level,
			logical operations level, or as a low level byte or slot difference
			representation.  <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#deltas">Deltas</a>
			can be produced by a computational differencing operation or by recording
			the location of changes as they happen.  <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#deltas">Deltas</a>
			take advantage of both long term and short term redundancy.</p><p>In many cases, compression benefits from processing as much data as
                        possible at the same time rather than considering individual fragments in
                        isolation.  This leads to processing models where a bulk compression or
                        decompression step is performed.  Generally, this leads to the data being
                        inaccessible to application logic until all of the data, or at least all of
                        the data up to a certain point, is decompressed.</p><p>There are numerous methods of compression which rely on different methods
			of detecting redundancy and representing data.  These methods sometimes have
			data access pattern needs and are generally good at compressing some data
			while having limited use on other data.  Some popular methods include:</p><ul><li>Stream compression</li><li>Block sorting compression</li><li>Run length coding</li><li>Linear quantization</li><li>Dictionary coding</li><li>Key compression</li><li>Huffman coding</li><li>Token tables</li><li>Arithmetic coding</li><li>Lempel-Ziv variants</li><li>Quadtree and similar subdivision methods</li><li>Frequency domain coding</li><li>Wavelet coding</li><li>Fractal coding</li></ul><p>See the <a href="#Usenet-Compression-FAQ">Compression FAQ</a> for more information on these methods.</p></div><div class="div3">
<h4><a id="c-type" name="c-type"></a>6.1.2 Type &amp; range</h4><p>For a given input document, this property is measured as a set of values
			[Tokenization, Schema, Compression (Data Analysis), Compression+Schema,
			Delta, Lossy].  Each of these values represents the percent smaller the
			encoded version of the document is from the original, N/S (not supported) if
			that method is not supported, or N/A (not applicable) if the method doesn't
			apply.  (Lossy and Lossless are defined by the <a href="#round-trippable-ID">Round Trip Support</a> Measurement.)
			Tokenization is the use of the format without the benefit of compression,
			schema-based encoding, deltas, or lossy compression.  Schema represents
			schema-based encoding methods.  In some cases, compression and schema-based
			encoding will be used together.  Data analysis for compression, use of
			schemas, deltas, and lossy compression should be noted as optional when
			appropriate.  Lossy compression quality must be normalized to some quality
			value, preferably based on an objective measure.  An array of values may be
			necessary to represent important points on a lossiness spectrum.</p><p>XML 1.x would measure as follows: [0%, N/S, N/S, N/S, N/S, N/S].</p></div><div class="div3">
<h4><a id="c-method" name="c-method"></a>6.1.3 Methodology</h4><p>The amount of compactness an XML format can achieve for a given XML document is a function of the
document's size, structure, schema, and regularity.  Because XML documents exist with a wide variety
of sizes, structures, schemas, and regularity, it is not possible to define a single size threshold
or percentage compactness that an XML format must achieve to be considered sufficiently compact for
a general purpose W3C standard.  The amount of compactness achieved by an XML format will vary from
one XML document to the next.</p><p>The amount of compactness achieved by an XML format will also vary from application to
application.  The amount of compactness practical for a given application depends on the
optimizations the application can effectively employ.  In general, there are two common categories
of optimizations XML formats use to improve compactness: schema optimizations (externalization) and
document optimizations (compression). </p><p>Schema optimizations leverage shared knowledge of a class of XML documents to improve compactness
(e.g., by omitting information known by all participants).  This shared knowledge may be codified in
some form of schema, but may also be embodied in other forms, such as source code.  Schema
optimizations are particularly effective for applications in which the XML format must be
competitive in size with existing or alternate hand designed binary formats.  However, schema
optimizations cannot be used in applications where it is not practical or possible to codify shared
knowledge about the subject XML documents or assume each participant has access to this
knowledge.</p><p>Document optimizations analyze XML documents to identify patterns and derive smaller
representations for the patterns that occur most frequently.  Most well known data compression
algorithms, such as Deflate, Lempel-Ziv coding, and Huffman coding fall into this category.
Document optimizations are particularly effective in applications involving larger XML documents
with repetitive structures, however they are not very effective on very small XML documents.  In
addition, document optimizations cannot be used in applications where it is not practical or
possible to allocate the time, memory, or processing resources required to analyze each document.</p><p>These two categories of optimizations partition the set of XML applications into the four classes
below.  Each class defines a metric for determining whether a format is sufficiently compact for that
class.  To maintain independence from variations in document size, structure, schema and regularity,
each metric defines sufficient compactness relative to well known and freely available encoding
specifications.</p><ul><li><em>Both Schema and Document Optimizations</em>: This class includes applications for which
both schema and document optimizations are practical and achieve sufficient
compactness.  A format is sufficiently compact for this class of application
if it is sufficiently compact for both the schema-only and the document only
classes of applications below.</li><li><em>Schema Optimizations only</em>: This class includes applications for which schema
optimizations are practical and achieve sufficient compactness, but document
optimizations cannot be used.  A format is sufficiently compact for this
class of applications if it can consistently produce encodings the same size
or smaller than the equivalent ASN.1 PER plus 20% encoding defined by the associated
schema [<a href="#compact-ref">1</a>].</li><li><em>Document Optimizations only</em>: This class includes applications for which document
optimizations are practical and achieve sufficient compactness, but schema
optimizations cannot be used.  A format is sufficiently compact for this
class of applications if it can consistently produce encodings the same size
or smaller than the deflate algorithm [<a href="#compact-ref">2</a>] used by gzip.</li><li><em>Neither Schema nor Document Optimizations</em>: This class includes applications for
which neither schema nor document optimizations are practical.  A format is
sufficiently compact for this class of applications if it is smaller than
XML 1.x [<a href="#compact-ref">3</a>].  It is expected that a that a format would
achieve significant compactness over XML 1.x in most cases.</li></ul><p>The following table classifies each XBC use case according to this
classification scheme.  Most use cases include applications that fall into
more than one class.</p><table rules="all" border="1"><tbody><tr><th align="left">Use case</th><th>Both</th><th>Schema</th><th>Document</th><th>Neither</th></tr><tr><td>Metadata in Broadcast Systems</td><td align="center">X</td><td align="center">X</td><td></td><td></td></tr><tr><td>Floating Point Arrays in the Energy Industry</td><td></td><td align="center">X</td><td></td><td></td></tr><tr><td align="center">X3D Graphics Model Compression, Serialization, and Transmission</td><td align="center">X</td><td align="center">X</td><td></td><td></td></tr><tr><td>Web Services for Small Devices</td><td align="center">X</td><td align="center">X</td><td></td><td></td></tr><tr><td>Web Services within the Enterprise</td><td></td><td align="center">X</td><td></td><td align="center">X</td></tr><tr><td>Electronic Documents</td><td align="center">X</td><td align="center">X</td><td align="center">X</td><td align="center">X</td></tr><tr><td>FIXML in the Securities Industry</td><td></td><td align="center">X</td><td></td><td></td></tr><tr><td>Multimedia XML Documents for Mobile Handsets</td><td align="center">X</td><td align="center">X</td><td></td><td></td></tr><tr><td>Intra/Inter Business Communication</td><td></td><td align="center">X</td><td></td><td align="center">X</td></tr><tr><td>XMPP Instant Messaging Compression</td><td align="center">X</td><td align="center">X</td><td></td><td align="center">X</td></tr><tr><td>XML Documents in Persistent Store</td><td></td><td align="center">X</td><td></td><td></td></tr><tr><td>Business and Knowledge Processing</td><td></td><td align="center">X</td><td></td><td align="center">X</td></tr><tr><td>XML Content-based Routing and Publish Subscribe</td><td></td><td></td><td></td><td align="center">X</td></tr><tr><td>Web Services Routing</td><td></td><td align="center">X</td><td></td><td align="center">X</td></tr><tr><td>Military Information Interoperability</td><td align="center">X</td><td align="center">X</td><td align="center">X</td><td align="center">X</td></tr><tr><td>Sensor Processing and Communication</td><td align="center">X</td><td align="center">X</td><td></td><td></td></tr><tr><td>SyncML for Data Synchronization</td><td align="center">X</td><td></td><td></td><td align="center">X</td></tr><tr><td>Supercomputing and Grid Processing</td><td align="center">X</td><td align="center">X</td><td align="center">X</td><td align="center">X</td></tr></tbody></table><p>An XML format is sufficiently compact to be a general purpose W3C standard if it is sufficiently
compact for each of these four application classes.  Formats that do not meet this criteria do not
achieve sufficient compactness to satisfy the majority of the binary XML use cases that state
compactness as a requirement.</p><p id="compact-ref"></p><ol class="enumar"><li>Mapping from XML Schemas to ASN.1 modules.
<a href="http://asn1.elibel.tm.fr/xml/mapping.htm">http://asn1.elibel.tm.fr/xml/mapping.htm</a></li><li>Deflate Compressed Data Format Specification.
<a href="http://www.faqs.org/rfcs/rfc1951.html">http://www.faqs.org/rfcs/rfc1951.html</a></li><li>Extensible Markup Language (XML) 1.0 (Third Edition)
<a href="http://www.w3.org/TR/REC-xml/">http://www.w3.org/TR/REC-xml/</a></li></ol><p>When measuring the Compactness property, the encoder is not permitted to
                        use prior knowledge about the semantics of information items used in the
                        input document.  For example, the encoder is not permitted to use
                        specialized codecs to encode the contents of a specific element or attribute
                        in the given instance document based on the name or location or that element
                        or attribute.</p><p>Candidate formats will likely have multiple optional methods for
                        achieving compactness depending on the circumstance.  Measurement of
                        compactness consists of encoding the same data using the major combinations
                        of methods which are categorized by: tokenization, document optimization
                        (compression), schema-based encoding, the combination of document and
                        schema-based optimization, deltas, and lossy compression.  Not all of these
                        will be available or appropriate for every candidate format and scenario
                        combination.  In this case, a "lower" level method score can be used for
                        comparison purposes.  For instance, if a format does not support deltas, a
                        schema or compression+schema score could be used.  When scoring for
                        scenarios that cannot use certain methods, such as schemas or compression,
                        these values may indicate N/A (not applicable).  Care must be taken to
                        consider what methods could be used to great benefit beyond traditional
                        models for existing data formats.</p><p>After scoring each combination of methods for each appropriate
			combination of test data and scenario, the best numbers for each test are
			tabulated to determine an overall score.  Weighting of this comparison may
			be needed to appropriately reflect market impact and the effects of
			overlapping scenarios.</p></div><div class="div3">
<h4><a id="c-dep" name="c-dep"></a>6.1.4 Dependencies</h4><p><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a> tends to have an inverse dependency relationship with
			<a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#processing-efficiency">Processing Efficiency</a>, <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#small-footprint">Small Footprint</a>, and <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#space-efficiency">Space Efficiency</a>.</p></div><div class="div3">
<h4><a id="c-tradeoffs" name="c-tradeoffs"></a>6.1.5 Known Tradeoffs</h4><p>High scores for this property may be at odds with higher scores in the
			following properties (An example is given why each is listed):</p><ul><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#accelerated-sequential-access">Accelerated Sequential Access</a>: An index is present to allow skipping over content.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#fragmentable">Fragmentable</a>: Additional context is required for decoder.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#human-readable-editable">Human Readable and Editable</a>: Additional information required in the format.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#random-access">Random Access</a>: An Index table is required in the format.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#robustness">Robustness</a>: Dedicated redundancy added to the format.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#roundtrip-support">Roundtrip Support</a>: Lossless equivalence can mean a larger representation.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#self-contained">Self Contained</a>: Relevant information retained in the encoding.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#specialized-codecs">Specialized Codecs</a>: The format may include references to predefined extensions.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#support-for-error-correction">Support for Error Correction</a>: Requires redundancy be contained in the representation.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#processing-efficiency">Processing Efficiency</a>: Additional information may be provided in a format to elicit faster processing speeds.</li></ul></div></div><div class="div2">
<h3><a id="processing-efficiency-ID" name="processing-efficiency-ID"></a>6.2 Processing Efficiency</h3><div class="div3">
<h4><a id="pe-desc" name="pe-desc"></a>6.2.1 Description</h4><p><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#processing-efficiency">Processing Efficiency</a> is a measure of the efficiency, and effectively the speed, of processing an
    instance of a format.  Determining the relative speed of different formats in a complete and
    valid way is difficult.  This is because there are many variables that affect actual speed,
    including processing library implementation details that are not fundamentally required by the
    format.  Ideally, different formats could be compared based on determination of their best
    reachable performance levels in all needed situations.  In practice, this cannot be done with
    absolute accuracy.  As a result, comparative evaluation must be accomplished by a combination of
    complexity analysis, processing characterization estimation, format characteristic analysis,
    fitness for all needed test scenarios, and actual empirical testing.  It is important to
    stress that while empirical testing provides proof of obtaining at least a certain level of
    performance, by itself it proves little about whether better performance can be obtained for a
    particular format and test scenario.  Complexity analysis tends to be able to provide better
    proof of the theoretical limits of performance, although this is not infallible in the face of
    unexpected algorithms.  Additionally, in some cases complexity for multiple candidates may be,
    for example, linear and relative performance differences may be dominated by format or method
    details that affect overhead such as an extra level of indirection.  As an example of a subtle
    but possibly dominant detail, one format may tend to allow better locality of reference in
    processing than another.  With cache memory in modern systems running 25 or more times faster
    than main memory, a large subset of processing scenarios could perform better for the former
    format.</p><div class="div4">
<h5><a id="processing-phase-definitions" name="processing-phase-definitions"></a>6.2.1.1 Processing phase definitions</h5><p>Applications use data formats to communicate information or to store data for later use.  XML
1.x, and presumably any binary XML candidates, provides external data representation that is rich and
flexible along with other benefits.  The use of XML tends to be better, overall, than more
simplistic approaches for many applications.  While solving all efficiency problems during the
creation of XML was not doable, ever-advancing experience and research on the problem have provided
new insight.  A format that solves these problems while retaining the benefits of XML 1.x and
possibly adding new benefits aids existing applications and offers to greatly expand the range of
applications which can justify the use of XML technology.</p><p>A key observation about the information technology industry is that often the macroscopic
separation of concerns at the operating system, programming language, protocol, service, application
framework, or application constrains problem solving and optimization.  In the past it was rare, for
instance, for an application developer outside of an operating system vendor to cause changes in an
operating system to solve performance problems.  (A notable exception to this is the addition of
facilities for direct access to SCSI command queuing in operating systems for the largest database
vendors.)  With respect to formats and performance, it has usually been the case that programming
languages have been optimized for in-memory operations on native variables while data formats have
been designed without prime consideration of processing complexity.  Because of the pervasive need
for modularization and network distribution of application components, any overhead crossing the
boundary between external format and memory representation is amplified.  An application exists to
accomplish actual work of some kind.  Any operations outside of that work are overhead.  While much
of the overhead in existing systems exists for a logical reason for a particular environment, when
considering candidate formats for binary XML, those reasons are a temporary artifact and immaterial.
This means that it is important to analyze the effect of candidate format design decisions on
existing and best possible processing complexity.  The first step in this analysis is to define the
processing phases involved in typical applications and determine variability points.</p><p>An application logic step is an operation that finds, traverses, reads, or modifies actual
payload data in an instance. Processing that is overhead may include decompression, parsing, implied
or required memory allocation or reference attachment, data binding, index maintenance, and schema
retrieval and processing.  Some candidate methods may involve other operations related to the use of
schemas.  Parsing is the conversion of a serialized form of data into a more readily usable form or
events with arguments (SAX et al).  Data binding can imply several levels. The simplest usable
level, "structure without conversion", converts parse events into a data structure that captures all
usable data and the usable structure of that data with no conversions.  SAX and other parse/event
engines are pure parsing engines. A DOM library implementation, when reading an XML 1.x object,
parses and produces an application generic, XML-specific DOM data structure.  The use of DOM is, in
a semantic sense, equivalent in most cases to an application using SAX or similar for parsing and
from parse events building an application-specific data structure.  An application specific data
structure may be interpretive "structure with conversion" or it may include representation of data
values directly in native, 3GL (third generation language) constructs such as objects or structs,
"native structure binding".  In the case of non-native structures, format details may create
overhead in application processing such as insertion and deletion which might be a tradeoff for
other advantages.  It might be that candidate formats have no substantial differences in how they
present to application phases in which case this analysis would be moot.  A survey of possible
candidates indicates some methods that may be beneficial.</p></div><div class="div4">
<h5><a id="standard-apis" name="standard-apis"></a>6.2.1.2 Standard APIs vs. abstract operations</h5><p>Numerous official, unofficial, and experimental application programming interfaces exist to
process XML data.  These APIs have provided valuable experience and have been an asset to
application development environments.  It is expected that any new format would be able to support
existing APIs.</p><p>It has become apparent that there are certain design flaws in existing APIs in addition to a
desire for features that simplify and streamline development.  One example of a fundamental flaw
that potentially affects performance is the "create object, fill object, link into tree" paradigm of
the DOM API.  Even if a format exists that supports minimal copying and coherent data, this API
forces multiple copies, fragmented representation, and/or data reordering.  Additionally, the new
industries, data, and application types made possible by a successful binary XML format may require
processing that is beyond traditional XML operations.  This indicates that new APIs will be
experimentally proposed and that valid evaluation of candidate formats must involve an abstract
representation of scenario operations that can be translated to the best available API.</p></div><div class="div4">
<h5><a id="incremental-overhead" name="incremental-overhead"></a>6.2.1.3 Incremental Overhead</h5><p>One aspect of a format is whether it allows and supports the ability to operate efficiently so
that processing is linear to the application logic steps rather than the size of data complexity of
the instance.  It is often desirable for processing complexity to be related to work needed rather
than the size or complexity of data.  Size refers to the number of bytes taken by the instance.
Data complexity refers to the granularity of XML-visible objects such as elements and attributes.  A
format that supports incremental overhead is fast for a single operation on an instance of any
combination of size or complexity.  While many applications desire this characteristic, it is not an
independent property of the format because it is a meta-property of other properties such as random
access and efficient update.  If a format supports incremental overhead in a partial or complete way
then certain properties operate incrementally.</p><p>While not measured as an independent property, this section provides some guidance when examining
the presence of incremental overhead.  The degrees of support for Incremental Overhead are expressed
in terms of cost of use vs. size/complexity of an instance. The overhead of moving raw data as an
efficient block copy is assumed. After parsing and data binding, data is accessed in an application
through three main methods:</p><ul><li>Data is in native data structures (object member variables, strings, scalars) that are
accessed directly.</li><li>Data is navigated to through a standard structure (like DOM) by an intermediating library but
final value access is through native data structures.</li><li>Data and structure are maintained in an application opaque manner by a library that fully
intermediates access and modification.</li></ul><p>The differences in these approaches can be large and are affected by specific choices in format,
implementation constraint, and API.  All of these choices can affect efficiency.  Minor differences
are frequently not useful, but algorithmic complexity measures and performance validation can be
very indicative.  One important tradeoff is native access plus linear or worse overhead relative to
size/complexity at one extreme vs. fully intermediated access and little or no overhead relative to
size/complexity.  Fully valid comparisons of this spectrum of approaches must include algorithmic
complexity, logical analysis, characteristic analysis such as modeling locality of reference, and
end-to-end and end-to-middle/middle-to-end measurements of available implementations.</p><p>The measurement of Incremental Overhead includes a category classification and an indication of
algorithmic complexity (in O(n) or relative to linear P^2 notation). An example might be: "linear/no
cost, O(P)*4, O(S/1000)".</p><p>Incremental Overhead degree categories:</p><ul><li>"linear factor to use, no cost for data size/complexity"</li><li>"linear factor to use, limited cost linear to size/complexity"</li><li>"linear to use, linear to size/complexity"</li></ul></div><div class="div4">
<h5><a id="N10438" name="N10438"></a>6.2.1.4 Complexity</h5><p>Algorithmic complexity relates to the fundamental theoretical performance characteristics of
    an algorithm.  Although particular measurements of different algorithms on the same data may be
    useful, without understanding the algorithmic complexity of the algorithms involved, the
    comparison is not known to be valid in all cases.  Each algorithm has scaling characteristics
    that are related to various kinds of overhead, startup, and input/output data related
    operations.  The relationship of the size and complexity of input/output data vs. the
    performance of the algorithm is represented as a formula that consists of linear and nonlinear
    factors plus constant factors.  Typically, algorithmic complexity is expressed as operations on
    'n' which represents the input size, count, or complexity.  The following illustrates the value
    of considering algorithmic complexity with the example of random access support in a format.</p><p> Let's assume one wants to access a random element out of an XML document with one million
elements.  On average, the code will have to examine (parse, read, etc.) 500,000 elements.  More
generally, the time it takes to average any element out of an n-element document is proportional to
n. It might be n/2, a slow implementation might be 2*n, and a fast one n/4, but fundamentally the
complexity cost is tied to n. </p><p>A format which implements random access, however - in the sense that an index table is included in
the format itself - can provide access to the nth element in time proportional to - depending on how
the index works - the log of n or even in constant time.  Again, there are various constant factors
which may vary between implementations. </p><p>As n gets larger, it is always bigger than log(n) and bigger than 1 - no matter what the constant
factors are. Thus one can reason about the relative performance of the format for certain operations
without resorting to ever measuring any implementations.  On the other hand, if one is interested in
improving only the constant factor, then one must measure implementations, with all the difficulties
that topic involves. </p><p>For example, DOM defines an API which supports random access in the sense that nodes do not need to be
accessed in order. However, because of how XML is defined, random access via a DOM API still takes
time proportional to n - the size of the document. DOM over XML does not support random access in the
sense which is used here, namely, better-than-linear access time. </p><p>That said, the DOM API most likely could be implemented over a different file format to provide
true random access; such an implementation would make use of the index included in the file. This
continues down the stack: if the file is stored on tape, which does not support random access, then
the benefits of the file format will still not be achieved.</p></div><div class="div4">
<h5><a id="meas-considerations" name="meas-considerations"></a>6.2.1.5 Measurement Considerations</h5></div><p>The amount of increase in processing speed is dependent on the input documents
		used for testing. Therefore, to objectively compare formats for their inherent
		processing speed, competing measurements using the conditions of the measurement
		should be the same.  The documents used as test data should vary in size and
		complexity to generate a set of results.  In addition, normal performance profiling
		steps need to be followed.  These include, but are not limited to, constructing a
		proper test environment with stable machines and software, utilizing a private
		network, and providing proper "warm-up" for adaptively compiled systems like Java.
		This requires the use of an appropriate set of test Scenarios, Property
		Profiles, and Test Data.</p><p>Any algorithm used for this measurement should have a theoretical runtime of no
                more than O(n).  However, this measurement alone cannot be used to effectively
                determine the speed of the algorithm.  It is possible that two algorithms with O(n)
                runtimes could have vastly different performance characteristics if, for example,
                one algorithm used 100 cycles per byte processed, while the other used 500 cycles
                per byte processed.  Both algorithms would be O(n), but result in vastly different
                performance measurements.</p></div><div class="div3">
<h4><a id="pe-type" name="pe-type"></a>6.2.2 Type &amp; range</h4><p>For a given test scenario, property profile, and test data test
			scenario, this property is measured in several different ways:</p><ol class="enumar"><li>Parsing into a DOM - The time it takes to parse into a DOM memory structure.</li><li>Parsing to SAX - The time it takes to parse to SAX events (push or pull).</li><li>Parsing to a new proposed interface (optional) - The time it takes to
			parse into a DOM-like memory structure proposed for binary XML as an
			improvement of DOM.</li><li>Query processing - The time it takes to process standard queries.</li><li>Update (creation, insertion, deletion) - The time it takes to modify
			an instance in a predetermined pattern of operations.</li><li>Retrieval - The time it takes to retrieve information from an instance.</li><li>XPath streaming - The time it takes to find a series of xpaths and associated data in a stream of data.</li><li>Serialization - The time it takes to generate the alternate format
			from a memory structure including DOM, SAX-related, and an optional proposed
			interface.</li><li>Lifecycle - Using the best available method, create an instance with
			data, interpret instance to get partial data, and modify or create new
			instance with some changes.  Memory of the instance at each write/read point
			must not be reused at the next step.</li></ol><p>Each measurement is recorded as a percentage faster than a standard
			text-based alternative for each type of operation.  </p></div><div class="div3">
<h4><a id="pe-method" name="pe-method"></a>6.2.3 Methodology</h4><p>Measurements must be taken as follows:</p><ol class="enumar"><li>For each test scenario, use a specific set of input documents and a
			text-based XML implementation to generate a baseline for each operation type.</li><li>Using the same test scenario and input documents, perform each operation type with the alternate format.</li><li>Compare results, investigating and detailing any irregularities.</li><li>Perform alternate mode measurement.  In some cases, there may be a
			choice of more than one combination of operating modes for a particular test
			scenario.  For example, all combinations of compactness methods that are
			valid for a test scenario should be tested for performance, including
			schema-based encoding, compression, and deltas.</li></ol><p>This will allow a fair comparison between various alternate formats to
			determine their processing efficiency differences. </p></div><div class="div3">
<h4><a id="pe-dep" name="pe-dep"></a>6.2.4 Dependencies</h4><p><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#processing-efficiency">Processing Efficiency</a> has a correlated relationship with <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#small-footprint">Small Footprint</a>
			and <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#space-efficiency">Space Efficiency</a> and an inverse relationship with <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>.
			Additionally, this property can be considered a measurement of the
			processing efficiency for most other properties.</p></div><div class="div3">
<h4><a id="pe-tradeoffs" name="pe-tradeoffs"></a>6.2.5 Known Tradeoffs</h4><p>High scores for this property may be at odds with
			higher scores in the following properties:</p><ul><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>:
			Compact encodings can cost extra cycles to
			interpret and expand compared to simpler
			formats.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#human-readable-editable">Human Readable and Editable</a>: Precise structural information needed for
			efficiency, even if represented in text, tends to make a format less human
			readable and editable.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#support-for-error-correction">Support for Error Correction</a>: Requires processor to potentially detect
			and correct errors, therefore reducing processing speed.</li></ul></div></div><div class="div2">
<h3><a id="accel-sequential-access" name="accel-sequential-access"></a>6.3 Accelerated Sequential Access</h3><div class="div3">
<h4><a id="asa-desc" name="asa-desc"></a>6.3.1 Description</h4><p>The objective of <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#accelerated-sequential-access">Accelerated
Sequential Access</a> to reduce the amount of time required to access XML data model items in a
document. The fundamental measurement is therefore the average time needed to access an XML data model
item.  This time can be compared to a baseline measurement of the average time needed to access an
XML data model item using an unaccelerated sequential access method like that used to implement
SAX.</p><ul><li>T(ix) - time to create a sequential index, if used (fixed)</li><li>T(sk) - time to seek a data model item (average)</li><li>T(am) - total time for all accesses over the document.
        This time amortizes the cost of T(ix) over the average number
        of total seeks (ns).</li></ul><p>Not all accelerated sequential access methods use a sequential index and incur T(ix). In this
case it is only necessary to compare T(sk) average for the unaccelerated case against the
accelerated one.</p><p>If accelerated sequential access supports update of the sequential index
we should also take this cost into account.</p><p>T(up) - time to update the sequential index.</p><p>T(up) should also be added to T(am) for the average number of total
updates (nu).</p><p>T(am) = T(ix) + ns ( T(sk) ) + nu ( T(up) )</p><p>For the baseline, unaccelerated sequential access case we consider only
T(sk) for the average total number of seeks (ns).</p><p>T(am) = ns ( T(sk) )</p><pre>
Example:

For an implementation of accelerated sequential access to XML:

  T(ix)  5.00ms
  T(sk)  3.50ms
  T(up)  3.00ms
  ns  1000
  nu    50

  T(am) = 5 + 1000 ( 3.5 ) + 50 ( 3.0 ) = 3655</pre><pre>
For unaccelerated sequential access:

  T(sk)  4.00ms

  T(am) = 1000( 4 ) = 4000</pre><p>Accelerated sequential access may have resource costs which can impact system performance. A more
comprehensive model would be needed to take these into account in a full assessment of the
comparative benefit of a accelerated sequential access implementation. As an approximation, an
implementation which produces lower number for the following resource costs will be better in
performance than an implementation with the same T(am) but with higher resource costs: </p><ol class="enumar"><li>Memory consumption for sequential index structure</li><li>Cost in bandwidth utilization for I/O and transport of the sequential index if
persisted</li><li>Cost of persistent store, if the sequential index structure is persisted</li></ol></div><div class="div3">
<h4><a id="asa-type" name="asa-type"></a>6.3.2 Type &amp; range</h4><p>A method is described in the preceding section to measure the effect of
			Accelerated Sequential Access both in absolute terms (index creation, seek
			time, update time) and relative to an access method implemented on a format
			which does not support this property.</p></div><div class="div3">
<h4><a id="asa-method" name="asa-method"></a>6.3.3 Methodology</h4><p>This property may be directly measured and compared by running seek and
			update (if supported) operations over a set of input documents for the
			Accelerated Sequential Access-capable format and text XML.</p></div><div class="div3">
<h4><a id="asa-dep" name="asa-dep"></a>6.3.4 Dependencies</h4><p>Implementation of the Random Access property will, in most cases,
			eliminate the need for Accelerated Sequential Access in that it subsumes its
			behavior and performance characteristics. Some Use Cases may specify both
			properties but only in the sense that Accelerated Sequential Access is seen
			as essential if and only if Random Access is not supported. </p></div><div class="div3">
<h4><a id="asa-tradeoffs" name="asa-tradeoffs"></a>6.3.5 Known Tradeoffs</h4><p>As a format supporting Accelerated Sequential Access will typically
			require the addition of information (an index) in the document, this
			property may be a tradeoff against Compactness.  Additional cost and
			complexity is introduced if update is supported, possibly limiting the
			ability to support the Efficient Update property. </p></div></div><div class="div2">
<h3><a id="content-type-management-ID" name="content-type-management-ID"></a>6.4 Content Type Management</h3><div class="div3">
<h4><a id="ctm-desc" name="ctm-desc"></a>6.4.1 Description</h4><p>Measures the degree to which a format specifies usable <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#content-type-management">Content
			Type Management</a> information.</p></div><div class="div3">
<h4><a id="ctm-type" name="ctm-type"></a>6.4.2 Type &amp; range</h4><p>This measurement uses a simple range of options from worst to best
			integration. </p></div><div class="div3">
<h4><a id="ctm-method" name="ctm-method"></a>6.4.3 Methodology</h4><p>Degrees of support:</p><ol class="enumar"><li>provides no media type or encoding specification</li><li>provides a media type but not a content coding</li><li>provides a media type suffix akin to "+xml"</li><li>provides a content coding</li></ol></div><div class="div3">
<h4><a id="ctm-dep" name="ctm-dep"></a>6.4.4 Dependencies</h4><p>None.</p></div><div class="div3">
<h4><a id="ctm-tradeoffs" name="ctm-tradeoffs"></a>6.4.5 Known Tradeoffs</h4><p>Note that there currently is dissent as to
			whether a binary XML format should be
			considered to be a content coding (like gzip)
			or not. Here are the options:</p><ul><li>It's just a content coding. In this case it <em>may</em> have a media type (like
application/gzip) but the proper way of using it is to keep the original media type of the XML
content and simply change the content coding. The upside is that the current dispatch system is
untouched, that the media type information is far more useful that way, and that the content coding
infrastructure is put to good use. The downside is that there is philotechnical dissent that binary
XML is an encoding in the way that gzip is, and that there can be friction with the charset
parameter to XML media types. With this content negotiation is fully possible. The behavior of
fragment identifiers does not need to be re-specified.</li><li>It's not a content coding but a media type, two sub-options:
<ul><li>There's just the media type. Any content sent using the format must have the media type of the
format. The upside is that it's simple. The downside is that you lose all media type information so
that you must then move to another system to provide that information (some Web systems -
e.g. browsers - don't work without it), or define new media types for all content
(application/binxhtml, image/binsvg, etc.). With this content negotiation is entirely impossible (or
rather, totally useless) unless new media types are defined for all things XML. The behavior of
fragment identifiers becomes impossible to specify, or has to be re-specified for all the new media
types.</li><li>A new suffix, in the "+xml" style, is defined (say "+bix"). The upside is that it's simple and
that the diversity of media types is maintained. The downside is that it requires more intrusive
modifications to systems that rely on existing media types. The latter may be fine if there is one
and only one binary XML encoding out there (or at least a set list so that the intrusive
modifications are performed only once), but given an open-ended set of binary XML formats it becomes
quite impractical. With this content negotiation is possible, but with lesser power. The behavior of
fragment identifiers has to be re-specified to map back to the one in +xml
types.</li></ul></li></ul></div></div><div class="div2">
<h3><a id="deltas" name="deltas"></a>6.5 Deltas</h3><div class="div3">
<h4><a id="delta-desc" name="delta-desc"></a>6.5.1 Description</h4><p>The <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#deltas">Deltas</a>
property is a representation of arbitrary changes from a particular instance of a base, parent
document which, along with that parent document, can be used to represent the new state of the
parent. The parent is identified in a globally or locally unique manner. A delta is distinct from a
fragment or a computed difference, although the latter could be represented as a delta.  This
property is somewhat related to support for <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#efficient-update">Efficient
Update</a>.</p><p>Measurement of this property consists of determining whether the format supports a high-level or
low-level delta mechanism and then determining the granularity, compactness, and processing
efficiency of the mechanism.</p><p>A high-level delta mechanism, represented as the Deltas property, consists of some indication of
operation, such as replace ID or delete element, and some representation of the content of the
change.  This kind of delta can be represented by XML data.  The creation and use of this delta
requires serialization of an XML representation and high-level interpretation of the operation and
data.  Both the creation and use of a high-level delta requires possibly complex processing and in
will in many cases result in the size of the delta instance being larger than absolutely necessary.
As an example, if a particular high-level delta mechanism can only replace whole nodes of some kind,
changes of a few characters might require a delta that includes all surrounding text.</p><p>A low-level delta feature of a format could support a fine grained, very low complexity, and
efficient method of representing changes to a parent document.  This property could be implemented
at or below the level of representation of the structure of an XML-like format.  As an example, a
mechanism could track or represent which characters were inserted, replaced, or deleted relative to
a parent along with just the data that changed.  This type of mechanism is low complexity because it
is implemented using some method that allows efficient traversal of the data for logical or actual
construction of the parent plus delta.  This access or construction should have complexity on the
order of access to ranges of bytes and efficiency similar to the size of "new" data with very small
overhead.</p></div><div class="div3">
<h4><a id="delta-type" name="delta-type"></a>6.5.2 Type &amp; range</h4><p>Evaluations of candidate formats that implement this property will produce a
		delta type categorization, a granularity measure, and compactness and processing
		efficiency performance characteristics.</p><p>Delta type categorization:</p><ol class="enumar"><li>Low Level Delta</li><li>High Level Delta</li></ol><p>Granularity Measure:</p><ol class="enumar"><li>Whole elements</li><li>Whole attributes and elements</li><li>Whole document components (elements, attributes, PI, comments, etc.)</li><li>Partial support for partial document component differences (at least element content)</li><li>Pervasively fine granularity (inserted/deleted characters or data at any location)</li></ol><p>Compactness:</p><ol class="enumar"><li>Large overhead for each document component change (full component plus
		  path-style location identification and logical operation).</li><li>Moderate overhead for each document component change (partial data plus
		  path-style location identification and logical operation).</li><li>Low overhead for each document component change (just inserted data and
		  efficient encoding of location and operation).</li></ol><p>Processing efficiency:</p><ol class="enumar"><li>Large overhead for each document component change (requires xpath-like search,
		  logical replacement operation).</li><li>Moderate overhead for each document component change (fast resolution of change
		  location and effect with logical replacement operation).</li><li>Low overhead for each document component change (fast resolution of change
		  location and ability to operate in efficient parent/delta data range threading.</li></ol></div><div class="div3">
<h4><a id="delta-method" name="delta-method"></a>6.5.3 Methodology</h4><p>The measurement for this property is by inspection of format
			specification, logical analysis, and empirical testing of test scenarios
			based on test scenarios that could benefit from Deltas, at least considering
			any of the categories listed in the Deltas property description.</p></div><div class="div3">
<h4><a id="delta-dep" name="delta-dep"></a>6.5.4 Dependencies</h4><p>This property doesn't depend on other properties.  It does have a weak
		    relationship with <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#efficient-update">Efficient
		    Update</a> based on solving similar problems.</p></div><div class="div3">
<h4><a id="delta-tradeoffs" name="delta-tradeoffs"></a>6.5.5 Known Tradeoffs</h4><p>A low-level delta can be created through two types of methods.  The most
		    efficient method would be to capture changes to a parent in a kind of
		    copy-on-write operation.  This could have very low complexity.  The other main
		    method is a differencing operation that compares a before and after version of a
		    document and represents the difference as a delta.  While the resulting delta
		    might be similar, the computational complexity of the latter might be
		    arbitrarily difficult while the former is minimal and linear.</p></div></div><div class="div2">
<h3><a id="efficient-update" name="efficient-update"></a>6.6 Efficient Update</h3><div class="div3">
<h4><a id="eu-desc" name="eu-desc"></a>6.6.1 Description</h4><p>The <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#efficient-update">Efficient
Update</a> property is concerned with whether a format instance can be modified efficiently without
being completely rebuilt.  When a format is designed with efficient update as a constraint, it will
tend to be apparent that this is possible.  When this was not planned for, it is still possible that
a processor could implement an efficient update capability.  In the latter case, an evaluation of
the format must determine if there are features that prevent or assist such implementation.  As the
property description notes, this property is somewhat related to support for <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#deltas">Deltas</a>.</p><p>There are three aspects under which this property should be evaluated:</p><ol class="enumar"><li>Efficiency of update: This is the time and complexity required to apply the changes, starting
from the original instance of a format up until the updated instance is produced.</li><li>Efficiency of retrieval: This is the time required to retrieve a (possibly) modified
value.</li><li><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>: This is the additional space required for the application of an update or the
typical overhead of supporting different kinds of changes to a format instance.  In the existence
proof example, inserting a new element might be efficient because it might just result in an append
to the file while inserting characters in a large text might cause a new chunk to be allocated at
the end of the file and the old chunk to become an unused block.  While the block could be reused
just like with malloc, mitigating the cost, it is still a potential inefficiency.</li></ol></div><div class="div3">
<h4><a id="eu-type" name="eu-type"></a>6.6.2 Type &amp; range</h4><p>Evaluations of candidate formats that implement this property will produce three
		percentage values and a standard deviation.  For update and retrieval, these are
		positive or negative percentages of improvement relative to comparison XML 1.x
		solution.  For compactness, this percentage is overhead over a linear creation of an
		instance with the same data in the candidate format, along with an estimated (for
		analytical) or actual (for empirical) standard deviation.</p></div><div class="div3">
<h4><a id="eu-method" name="eu-method"></a>6.6.3 Methodology</h4><p>The measurement for this property is by inspection of format
			specification, logical analysis, and empirical testing of test scenarios
			based on test scenarios that call for <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#efficient-update">Efficient
			Update</a>.</p></div><div class="div3">
<h4><a id="eu-dep" name="eu-dep"></a>6.6.4 Dependencies</h4><p>This property doesn't depend on other properties.  It does have a weak
		    relationship with <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#deltas">Deltas</a>
		    based on solving similar problems.</p></div><div class="div3">
<h4><a id="eu-tradeoffs" name="eu-tradeoffs"></a>6.6.5 Known Tradeoffs</h4><p>The ability to support efficient updates in the direct, complete sense tends to
		imply compactness measures that are not monolithic and a mechanism for growing or
		shrinking data without requiring repositioning for all data following the change.
		Solutions for these tradeoffs will likely focus on differing granularity and may be
		tunable.</p></div></div><div class="div2">
<h3><a id="embedding-support-ID" name="embedding-support-ID"></a>6.7 Embedding Support</h3><div class="div3">
<h4><a id="es-desc" name="es-desc"></a>6.7.1 Description</h4><p>Measures the degree to which a format supports embedding of files of arbitrary type within serialized content.

 </p></div><div class="div3">
<h4><a id="es-type" name="es-type"></a>6.7.2 Type &amp; range</h4><p>This property is measured along an integer scale from [0,6], where zero indicates no embedding support and six indicates the greatest possible degree of embedding support.

 </p></div><div class="div3">
<h4><a id="es-method" name="es-method"></a>6.7.3 Methodology</h4><p>This property is measured by considering which of the following statements is true, based on that format's specification:</p><ol class="enumar"><li>Provides structures or elements in which data of arbitrary type and reasonable size can be stored by virtue of the flexibility of the format.</li><li>Provides well-known points at which data of arbitrary type can be embedded.</li><li>Provides for the existence and management of metadata about embedded files.</li><li>Provides the ability to include or exclude embedded files from signatures over the file.</li><li>Provides the ability to include or exclude embedded files when (partially) encrypting the file.</li><li>Provides the ability to compress the contents of the embedded file.</li></ol><p>The measurement levels resulting from this analysis are:</p><ol class="enumar"><li>Doesn't support</li><li>Supports to some extent</li><li>Supports well</li></ol></div><div class="div3">
<h4><a id="es-dep" name="es-dep"></a>6.7.4 Dependencies</h4><p>Support for (d) signing and (e) encryption are dependent on an underlying
			format which supports the <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#signable">Signable</a>
			and <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#encryptable">Encryptable</a>
			properties.</p></div><div class="div3">
<h4><a id="es-tradeoffs" name="es-tradeoffs"></a>6.7.5 Known Tradeoffs</h4><p>A format which supports embedding must make weaker guarantees regarding the humanly readable and editable property, since it forgoes control over the contents of the embedded files.

  </p></div></div><div class="div2">
<h3><a id="generality-ID" name="generality-ID"></a>6.8 Generality</h3><div class="div3">
<h4><a id="g-desc" name="g-desc"></a>6.8.1 Description</h4><p>Measures the degree to which a format is competitive with alternatives across a diverse range of
data, applications and use cases.</p><p><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#generality">Generality</a>
is, in part, a function of the formats ability to optimize for application specific criteria and use
cases. For example, some applications need to maximize compactness and are willing to give up some
speed and processing resources to achieve it. While others need to maximize speed and are willing to
give up some compactness to achieve it. Similarly, some applications require all the information
contained in a document and are willing to give up some compactness to preserve it. Other
applications are willing to discard certain information items in a document to achieve higher
compactness.</p><p><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#generality">Generality</a>
is also a function of the optimizations the format includes for efficiently representing documents
of varying size and structure. For small, highly structured documents, a format informed by schema
analysis will generally produce more compact encodings than a format informed solely by document
analysis (e.g. generic compression software).  For larger, more loosely structured documents, a
format informed by document analysis techniques will generally produce more compact encodings than a
format solely informed by schema analysis. A format informed by both schema analysis and document
analysis will generally produce more compact encodings across a broader range of documents than a
format that only includes one of these techniques.</p></div><div class="div3">
<h4><a id="g-type" name="g-type"></a>6.8.2 Type &amp; range</h4><p>This property is measured along an integer scale in the range [0, 20], where a zero indicates a
very specialized format that applies narrowly to a small set of data, applications, and use cases and
20 indicates a very general format that applies to a wide range of data, applications, and use
cases.</p></div><div class="div3">
<h4><a id="g-method" name="g-method"></a>6.8.3 Methodology</h4><p>This property is measured by counting the number of statements below that are true of the format,
based on inspection of the format specification and objective analysis of compactness results over a
wide range of XML documents with varying size and structure. Statements designated as [optional]
will broaden the applicability of a binary XML file format, but are not required for that format to
be considered sufficiently general. The statements are organized into sections for readability.</p><p>Flexible schema analysis optimizations</p><ul><li>Can represent documents without a schema</li><li>Can represent documents that include elements and attributes not defined in the
                        associated schema (i.e., open content)</li><li>Can represent any schema-invalid document</li><li>Can leverage available schema information to improve compactness, processing
                        speed, and resource utilization</li><li>Can leverage available schema information to improve compactness, processing
                        speed, and resource utilization even when documents contain elements and attributes
                        not defined in the schema</li><li>Can leverage available schema information to improve compactness, processing
                        speed, and resource utilization for any schema-invalid document.</li></ul><p>Flexible document analysis optimizations</p><ul><li>Can leverage document analysis to improve compactness</li><li>Can suppress document analysis to increase speed and reduce resource
                        utilization</li><li>[optional] Can adjust document analysis to meet application performance and resource
                        utilization criteria</li><li>Can structure the binary XML stream to increase net compactness when
                        off-the-shelf compression software is built in to the communications
                        infrastructure</li></ul><p>Flexible fidelity optimizations</p><ul><li>[optional] Supports high fidelity XML representations that preserve an
                        exact copy of the original XML document, including all whitespace and
                        formatting</li><li>Supports reduced fidelity XML representations that preserve all data model items,
                        but discard whitespace and formatting to improve compactness</li><li>Supports reduced fidelity XML representations that preserve all information
                        needed by a particular application, but discard specified information items that are
                        not needed (e.g., comments and processing instructions) to improve compactness</li><li>Supports reduced fidelity XML representations that preserve the logical
                        structures and values of an XML document, but discard lexical and syntactic
                        constructs to improve compactness</li></ul><p>Competes with frequency based compression</p><ul><li>Can consistently produce XML representations that are close to the same size or
                        smaller than XML documents compressed using gzip</li><li>Can consistently produce more compact XML representations than XML documents
                        compressed using gzip</li><li>Can consistently produce more compact XML representations than binary XML
                        documents created with document analysis suppressed, then compressed using gzip</li></ul><p>Competes with schema based encodings and hand optimized formats</p><ul><li>Can consistently produce XML representations that are close to the same size
                        or smaller than the equivalent ASN.1 PER encoding plus 20%</li><li>Can consistently produce XML representations that are more compact than the
                        equivalent ASN.1 PER encoding plus 20%</li><li>[optional] Can consistently produce XML representations that are more
                        compact than the equivalent ASN.1 PER encoding plus 20% compressed using gzip</li></ul></div><div class="div3">
<h4><a id="g-dep" name="g-dep"></a>6.8.4 Dependencies</h4><p>This property indirectly measures presence of these properties:
			Compactness, Embedding Support, No Arbitrary Limits, Platform Neutrality,
			Robustness, Roundtrip Support, Schema Extensions and Deviations, Schema
			Instance Change Resilience, and Specialized Codecs.</p></div><div class="div3">
<h4><a id="g-tradeoffs" name="g-tradeoffs"></a>6.8.5 Known Tradeoffs</h4><p>High scores for this property may be at odds with high scores for the Small Footprint
property. Some implementation approaches for supporting a broad range of data, applications, and use
cases may require larger amounts of code.</p></div></div><div class="div2">
<h3><a id="human-readable-ID" name="human-readable-ID"></a>6.9 Human Readable and Editable</h3><div class="div3">
<h4><a id="hr-desc" name="hr-desc"></a>6.9.1 Description</h4><p>Measures the degree to which a format is or must be humanly readable and editable.</p></div><div class="div3">
<h4><a id="hr-type" name="hr-type"></a>6.9.2 Type &amp; range</h4><p>This measurement is a pair of integers &lt;m,n&gt;, each on the scale
			[0,5]. The first number indicates the degree to which a file in a format may
			be humanly readable and editable; the second number indicates the degree to
			which a file in a format must be so. Thus, the greater the difference
			between the two numbers the greater the degrees of freedom given to the
			file's creator with respect to this property.

 </p></div><div class="div3">
<h4><a id="hr-method" name="hr-method"></a>6.9.3 Methodology</h4><p>Each item in the following list of statements is evaluated to determine
			if it is never true, may be true, or is always true of file created
			according to the file's specification.</p><ol class="enumar"><li>If the statement is never true of this format no points are assigned;</li><li>If the statement may be true then one point is added to the first number of the score;</li><li>If the statement is always true than one point is added to both numbers of the score. </li></ol><p>(Note: the first number in the score is therefore always greater to or equal than the second
number.)</p><ol class="enumar"><li>Uses a regular and explicit structure.</li><li>Uses only text, avoiding the use of compression or magic numbers.</li><li>For any given type of information (i.e., specifying a character encoding) uses a unique
encoding mechanism.</li><li>Is self-contained.</li><li>Maintains the locality of items per their relative positions in the data model.</li></ol></div><div class="div3">
<h4><a id="hr-dep" name="hr-dep"></a>6.9.4 Dependencies</h4><p>Support for this property is dependent in part on how self contained it is.</p></div><div class="div3">
<h4><a id="hr-tradeoffs" name="hr-tradeoffs"></a>6.9.5 Known Tradeoffs</h4><p>High scores for this property may be at odds with higher scores in
			Compactness, Processing Efficiency, Efficient Update, Random Access,
			Accelerated Sequential Access, and Specialized Codecs, all of which
			typically use techniques at odds with the requirements of this property.</p></div></div><div class="div2">
<h3><a id="integratable-xml-ID" name="integratable-xml-ID"></a>6.10 Integratable into the XML Stack</h3><div class="div3">
<h4><a id="ix-desc" name="ix-desc"></a>6.10.1 Description</h4><p>Measures the ease with which a given format integrates with the rest of
			the XML Stack of recommendations, based on its orthogonality in
			specification and the way in which it supports the core assumptions common
			to XML specifications. Many relevant considerations are presented in the <a href="#Architecture-of-the-WWW">Architecture of the WWW</a>.</p></div><div class="div3">
<h4><a id="ix-type" name="ix-type"></a>6.10.2 Type &amp; range</h4><p>This property is measured using a scale derived from the notion that the
			XML Stack is fundamentally syntax-based and defines several different data
			models.</p></div><div class="div3">
<h4><a id="ix-method" name="ix-method"></a>6.10.3 Methodology</h4><p>The following scale (from lowest to highest support) is used:</p><ol class="enumar"><li>optimized for a data model from outside the core XML Stack</li><li>based on and supporting multiple data models in the XML Stack</li><li>uses the XML 1.x syntax</li></ol></div><div class="div3">
<h4><a id="ix-dep" name="ix-dep"></a>6.10.4 Dependencies</h4><p>None.</p></div><div class="div3">
<h4><a id="ix-tradeoffs" name="ix-tradeoffs"></a>6.10.5 Known Tradeoffs</h4><p>The simplest way of integrating well into the XML Stack is obviously to
			be fully compatible with the XML syntax. This however does not mean that the
			given format shall be XML 1.x itself, for instance it could be a subset
			allowing only certain tokens or requiring a certain form and encoding (for
			instance a canonical version of the SOAP subset of XML). While this would
			enable normally impossible optimization to XML parsers, it would likely move
			the complexity over to XML generators, and if it subsets XML it will create
			problems for applications using the features excluded from the subset.</p><p>It must also be noted that some core XML technologies such as signatures
			and encryption rely directly on the XML syntax. There is therefore a
			tradeoff in which a format could integrate perfectly well with the XML
			Family minus these two members.</p></div></div><div class="div2">
<h3><a id="no-arbitrary-limits" name="no-arbitrary-limits"></a>6.11 No Arbitrary Limits</h3><div class="div3">
<h4><a id="nal-desc" name="nal-desc"></a>6.11.1 Description</h4><p>The degree that the format supports no inherent limits is characterized
			as: No inherent limits, few limits (i.e. unreasonably large names), and many
			limits (fixed lengths, small tables).</p><p>Experience has shown that arbitrary limits in the design of reusable
			systems must be carefully scrutinized for the probability of future
			conflicts.  As computing limitations have repeatedly been surpassed in short
			order and technology has been put to innovative uses, decisions that turned
			out to be short-sighted have led to painful migration.  This property
			provides a rough measure with which to compare different approaches.</p></div><div class="div3">
<h4><a id="nal-type" name="nal-type"></a>6.11.2 Type &amp; range</h4><p>The range of this measurement is membership in a category.  These
			categories are: "no inherent limits", "few limits", and "many limits".</p></div><div class="div3">
<h4><a id="nal-method" name="nal-method"></a>6.11.3 Methodology</h4><p>The measurement for this property is by inspection of format
			specification and logical analysis.</p></div><div class="div3">
<h4><a id="nal-dep" name="nal-dep"></a>6.11.4 Dependencies</h4><p>This property does not depend on other properties.</p></div><div class="div3">
<h4><a id="nal-tradeoffs" name="nal-tradeoffs"></a>6.11.5 Known Tradeoffs</h4><p>Each type of flexibility in a data format can be a tradeoff between
			efficiency in the expected typical case and the ability to handle cases that
			are not expected to be encountered.  In many cases in the past, seemingly
			sensible choices have not aged well with increases in computing capacity and
			new uses of technology.</p></div></div><div class="div2">
<h3><a id="platform-neutrality-ID" name="platform-neutrality-ID"></a>6.12 Platform Neutrality</h3><div class="div3">
<h4><a id="pn-desc" name="pn-desc"></a>6.12.1 Description</h4><p>Measures the degree to which a format is platform neutral as opposed to
			being optimized for a given platform.</p></div><div class="div3">
<h4><a id="pn-type" name="pn-type"></a>6.12.2 Type &amp; range</h4><p>This property measurement is represented as a selection from a range of:
			"not platform neutral", "platform neutral, single choices", "platform
			neutral, multiple choices".  More than one aspect may need be measured.  For
			instance, character encodings may have a wide variety of options while
			scalars may not.  It is expected that at least character encoding and scalar
			encoding are included.</p></div><div class="div3">
<h4><a id="pn-method" name="pn-method"></a>6.12.3 Methodology</h4><p>This property is measured along an axis of
			values that rate its platform neutrality from
			none to optimal:</p><ol class="enumar"><li>not platform neutral at all (for instance, may be the native serialization of a given
programming platform)</li><li>defined in a platform-neutral manner, but with fixed values for certain parameters that may
advantage a platform over another (for instance, only a single Unicode encoding is supported)</li><li>defined in a platform-neutral manner, and multiple options (word-length, float format, etc.)
can be set so that users may choose locally optimal encodings when the platforms involved in a given
interchange are known.</li></ol></div><div class="div3">
<h4><a id="pn-dep" name="pn-dep"></a>6.12.4 Dependencies</h4><p>This property has a weak link to Implementation Complexity in that if it
			is supported at its optimal level it will lead to require multiple encoding
			options that could be costly in implementation terms.</p></div><div class="div3">
<h4><a id="pn-tradeoffs" name="pn-tradeoffs"></a>6.12.5 Known Tradeoffs</h4><p>While allowing a format to support a large range of options to enable
			optimal processing between similar platforms, the added complexity may in
			fact have a generally negative impact as it complicates the format.  While an
			assessment of this tradeoff can only be made on a format by format basis, it
			must be noted that allowing too many hooks for optimization may in fact prove to
			be a pessimisation.</p></div></div><div class="div2">
<h3><a id="random-access" name="random-access"></a>6.13 Random Access</h3><div class="div3">
<h4><a id="ra-desc" name="ra-desc"></a>6.13.1 Description</h4><p>The objective of <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#random-access">Random
                        Access</a> is to reduce the amount of time required to access XML data model
                        items in a document. The fundamental measurement is therefore the average
                        time needed to access an XML data model item. This time can be compared to a
                        baseline measurement of the average time needed to access an XML data model
                        item using a sequential access method like that used to implement SAX.</p><p>This performance metric does not take into account what may be accessed
                        with random access method and what operations may be supported on what is
                        looked up (for example, can the looked-up item be treated as a sub-document
                        or fragment).</p><ul><li>T(ra) - time to create an access table (fixed)</li><li>T(lu) - time to lookup an data model item (fixed)</li><li>T(sk) - time to seek an data model item (average)</li><li>T(am) - total time for all accesses over the life of the document</li></ul><p>Total time (T(am)) amortizes the cost of T(ra) over the average number of
                        total seeks (ns).</p><pre>T(am) = T(ra) + ns ( T(lu) + T(sk) )</pre><p>If random update of the access table is supported we should also take
                        into account this cost.</p><p>T(up) - time to update an access table (fixed)</p><p>T(up) should also be added to T(am) for the average number of total
                        updates (nu).</p><pre>T(am) = T(ra) + ns ( T(lu) + T(sk) ) + nu ( T(up) )</pre><p>For the baseline, sequential access case we consider only T(sk) for the
                        average total number of seeks (ns).</p><pre>T(am) = ns ( T(sk) )</pre><p>Example:</p><p>For an implementation of random access to XML:</p><pre>
T(ra) 10.00ms
T(lu)   .05ms
T(sk)  1.00ms
T(up)  1.00ms
ns  1000
nu    50
                        </pre><pre>T(am) = 10 + 1000( .05 + 1.00 ) + 50 ( 1.00 ) = 1110</pre><p>For sequential access:</p><pre>T(sk)  4.00ms</pre><pre>T(am) = 1000 ( 4 ) = 4000</pre><p>In this example, random access is advantageous if the average total
                        number of seeks is over 3. For ns = 3, nu = 0 the random access T(am) is
                        13.15 and the sequential T(am) is 12 while at ns = 4, nu = 0 it is 14.20
                        versus 16.</p></div><div class="div3">
<h4><a id="ra-type" name="ra-type"></a>6.13.2 Type &amp; range</h4><p>Values in time for each of the following:</p><ul><li>T(ra) - time to create an access table (fixed)</li><li>T(lu) - time to lookup an data model item (fixed)</li><li>T(sk) - time to seek an data model item (average)</li><li>T(am) - total time for all accesses over the life of the document</li></ul><p>Random access has resource costs which can impact system performance. A
                        more comprehensive model would be needed to take these into account in a
                        full assessment of the comparative benefit of a random access
                        implementation. As an approximation, an implementation which produces
                        lower numbers for the following resource costs will be better in
                        performance than an implementation with the same T(am) but with higher
                        resource costs. </p><ol class="enumar"><li>Memory consumption for access table</li><li>Cost in bandwidth utilization for I/O and transport of the access
                        table if persisted</li><li>Cost of persistent store, if the access table is persisted</li></ol><p>The random access implementation can be categorized by the embedding or
                        non-embedding of the access table, indicated by one of:</p><ol class="enumar"><li>No access table and not indexable - no random access</li><li>No access table but indexable</li><li>Access table defined part of format, but separate from XML document</li><li>Access table optionally embedded in the document</li><li>Access table always embedded in the document</li></ol><p>Another simplification made in comparing T(am) for random access and
                        sequential access is the assumption made that the random access
                        implementation is able to provide access to the data model items the user
                        wants. If this is not the case, either the implementation of random access
                        will not be useful to that user, its performance notwithstanding, or
                        alternate methods of access would have to be provided and accounted for in
                        the T(am). The access coverage to the data model provided by the random access
                        implementation can be categorized as being in one of the following
                        categories:</p><ol class="enumar"><li>Complete:   addressing information for all data model items</li><li>Selective:  for certain data model items</li><li>On-Demand:  for data model items which have been requested</li><li>Heuristic:  for data model items which have been predicated to be needed</li></ol><p>It should also be specified whether the implementation does or does not provide
                        alternative access methods to obtain all data model items.</p><p>The random access implementation can also be categorized by its support
                        for fragmentation into one of the following categories:</p><ol class="enumar"><li>Full Context (the random access implementation can provide full
                        context information for the accessed data model item's subtree)</li><li>Complete Subdocument (namespaces are propagated so that the accessed
                        data model item's subtree can be handled as a complete document)</li><li>No support for fragmentation</li></ol><p>Complete support for the semantics implied by random access include the
			ability to do random update and having support for stable virtual pointers,
			as described in the property description for Random Access.</p><p>Random Update:</p><ol class="enumar"><li>Full support for random update of an instance, with characterization
			    of processing efficiency and fragmentation mitigation.</li><li>Partial support for random update of an instance.</li><li>No support for random update of an instance, with characterization of
			    whether changes to support this seem small or large.</li></ol><p>Stable Virtual Pointer support:</p><ol class="enumar"><li>Full: Stable virtual pointers are supported by an efficient
			    representation that is accessible by internal and external references and
			    maintained during all updates to a format instance.  This implies a
			    lightweight representation, reference, de-reference, and maintenance
			    ability.</li><li>Partial: Stable virtual pointers are supported for read only
			    operations but must be rebuilt for any modification.</li><li>No support: There is no mechanism that works significantly better
 			    than storing an XPath or XPointer.</li></ol></div><div class="div3">
<h4><a id="ra-method" name="ra-method"></a>6.13.3 Methodology</h4><p>Random access is tested by starting with a test scenario and appropriate
			test data and constructing a realistic pattern of random access and update
			workload.  This workload is performed repeatedly with detailed measurement
			of each phase of computation along with overall characteristics of random
			access support and performance as detailed above.</p><p>It is important to understand how different random access strategies will
			perform in general.  Care must be taken to account for the effects of cached
			memory, detailed measurement mechanisms, and other things that affect
			performance.  Memory and architecturally related limits and boundaries
			should be exercised to determine inefficiency pitfalls.</p></div><div class="div3">
<h4><a id="ra-dep" name="ra-dep"></a>6.13.4 Dependencies</h4><p>The presence of this property overrides the need for Accelerated Sequential Access.</p></div><div class="div3">
<h4><a id="ra-tradeoffs" name="ra-tradeoffs"></a>6.13.5 Known Tradeoffs</h4><p>Carrying and maintaining random access tables as part of a format
			instance negatively affects compactness.  This can be minimized if only
			certain information indicated is indexed.  Additionally, if random access
			indexing information is not supported by all processors of the format
			instance, it may need to be rebuilt for certain transitions.</p></div></div><div class="div2">
<h3><a id="round-trippable-ID" name="round-trippable-ID"></a>6.14 Round Trip Support</h3><div class="div3">
<h4><a id="rt-desc" name="rt-desc"></a>6.14.1 Description</h4><p>Measures the degree to which a format supports round-tripping and
			round-tripping via XML. </p></div><div class="div3">
<h4><a id="rt-type" name="rt-type"></a>6.14.2 Type &amp; range</h4><p>These two properties are measured along the same enumerated scale
			consisting of the following values:</p><ol class="enumar"><li>"Exact equivalence": If round-tripping produces a byte-per-byte duplicate of the original</li><li>"Lossless equivalence": If exact equivalent is not achieved but round tripping produces a lossless equivalent to the original input</li><li>"Does not round trip": If round tripping is not supported.</li></ol></div><div class="div3">
<h4><a id="rt-method" name="rt-method"></a>6.14.3 Methodology</h4><p>This property is measured by comparing the set of data models which can
			be represented in XML with those that can be represented in the alternative
			format.</p><p>With regards to <em>Roundtrip Support</em> (XML to binary to XML):</p><ol class="enumar"><li>If the set of models supported by XML is a proper superset of
those supported by the format, the measurement is - "Does Not
Roundtrip."</li><li>If the transformations to and from the other format are byte
preserving, the measurement is - "Exact Equivalence."</li><li>Otherwise, the measurement is - "Lossless Equivalence."</li></ol><p>With regards to <em>Roundtripping via XML</em> (binary to XML to binary):</p><ol class="enumar"><li>If the set of models supported by XML is a proper subset of
those supported by the format, the measurement is - "Does Not
Roundtrip."</li><li>If the transformations to and from the other format are byte
preserving, the measurement is - "Exact Equivalence."</li><li>Otherwise, the measurement is - "Lossless Equivalence."</li></ol></div><div class="div3">
<h4><a id="rt-dep" name="rt-dep"></a>6.14.4 Dependencies</h4><p>There are no known dependencies of this property on other properties.</p></div><div class="div3">
<h4><a id="rt-tradeoffs" name="rt-tradeoffs"></a>6.14.5 Known Tradeoffs</h4><p>Formats supporting both roundtrip and roundtrip via XML will tend to have
			the same data model versatility measurement as XML, as that is a measure of
			the set of data models which they support. Formats with greater data model
			versatility are more likely to support round-tripping but less likely to
			support round-tripping via XML, and vice versa.</p></div></div><div class="div2">
<h3><a id="signable-ID" name="signable-ID"></a>6.15 Signable</h3><div class="div3">
<h4><a id="si-desc" name="si-desc"></a>6.15.1 Description</h4><p>Measures the degree to which a format supports the creation and inclusion
			of digital signatures.</p></div><div class="div3">
<h4><a id="si-type" name="si-type"></a>6.15.2 Type &amp; range</h4><p>This property is measured along an integer scale from [0,8], where zero
			indicates no support for digital signatures six indicates the greatest
			possible degree of support. (Note that a format with a score of zero is
			still signable, in that a file consists of a sequence of bytes and any
			sequence of bytes can be signed.)</p></div><div class="div3">
<h4><a id="si-method" name="si-method"></a>6.15.3 Methodology</h4><p>This property is measured by assigning the indicated number of points for
			each of the following statements which is true of the format, based on that
			format's specification:</p><ol class="enumar"><li>Defines unique format instances for each possible data model
instance (avoids canonicalization): 2 points</li><li>Permit multiple format instances for each data model instance,
but defines one instance as canonical: 1 point</li><li>Always serializes subtrees in a contiguous manner: 2
points</li><li>Permits, but does not require, the serialization of subtrees in
a contiguous manner: 1 point</li><li>Defines a syntax for signature (i.e., recording certificates,
signed ranges, etc.): 2 points</li></ol></div><div class="div3">
<h4><a id="si-dep" name="si-dep"></a>6.15.4 Dependencies</h4><p>There are no known dependencies of this property on other properties.</p></div><div class="div3">
<h4><a id="si-tradeoffs" name="si-tradeoffs"></a>6.15.5 Known Tradeoffs</h4><p>Implementation of this property may be at odds with <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>,
			<a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#random-access">Random
			Access</a>, and <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#efficient-update">Efficient
			Update</a>, as support for these properties may be at odds with maintaining
			contiguous subtrees.</p></div></div><div class="div2">
<h3><a id="small-footprint-ID" name="small-footprint-ID"></a>6.16 Small Footprint</h3><div class="div3">
<h4><a id="sfoot-desc" name="sfoot-desc"></a>6.16.1 Description</h4><p>A candidate format should be able to be processed by diverse platforms.
                        Many of these platforms have very limited resources for program storage.  A
                        format that requires little actual code and data tables (aka initialized or
                        BCC data) is attractive more widely.  Inspection of specifications can be a
                        useful form of analysis.  Analysis of actual implementations can also be
                        enlightening when those implementations are optimized by skilled
                        developers.</p></div><div class="div3">
<h4><a id="sfoot-type" name="sfoot-type"></a>6.16.2 Type &amp; range</h4><p>The detailed measurement for this property will consist of code and
			initialized data measured, estimated, or projected to a series of platforms
			that relate to key architectures including 64K StrongARM, Java bytecode, and
			Intel/AMD Pentium/64bit.</p></div><div class="div3">
<h4><a id="sfoot-method" name="sfoot-method"></a>6.16.3 Methodology</h4><p>The measurement for this property is by inspection of format
			specification, logical analysis, survey of implementations and
			implementers, and projections from one architecture to the others.</p></div><div class="div3">
<h4><a id="sfoot-dep" name="sfoot-dep"></a>6.16.4 Dependencies</h4><p>This property depends on design choices made in a format that may require
			large amounts of code or initialized data.</p></div><div class="div3">
<h4><a id="sfoot-tradeoffs" name="sfoot-tradeoffs"></a>6.16.5 Known Tradeoffs</h4><p>There is likely to be a tradeoff with <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#generality">Generality</a>,
			<a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>,
			and general support of many features.</p></div></div><div class="div2">
<h3><a id="space-efficiency-ID" name="space-efficiency-ID"></a>6.17 Space Efficiency</h3><div class="div3">
<h4><a id="se-desc" name="se-desc"></a>6.17.1 Description</h4><p><a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#space-efficiency">Space
                        Efficiency</a> is the measurement of dynamic memory needed to decode,
                        process, and encode a candidate format.  In this case, processing doesn't
                        include any application processing or needs, but may include any
                        format-induced processing or bookkeeping that must be done to adhere to the
                        format.  Special consideration must be given to separate and discount
                        overhead that a format requires that is accomplishing something that an
                        application would likely need to perform anyway.</p></div><div class="div3">
<h4><a id="se-type" name="se-type"></a>6.17.2 Type &amp; range</h4><p>This is a percentage measurement relative to the expected dynamic memory
			costs of popular and theoretical XML 1.x processing systems.  This may
			include both DOM and parser event (SAX et al) style processing.  Due to the
			nature of applications in memory-constrained environments, it is the
			DOM-style measurement that is ranked for this property.</p></div><div class="div3">
<h4><a id="se-method" name="se-method"></a>6.17.3 Methodology</h4><p>The measurement for this property is by inspection of format
			specification, logical analysis, and empirical testing on test scenarios.</p></div><div class="div3">
<h4><a id="se-dep" name="se-dep"></a>6.17.4 Dependencies</h4><p>This property is related to <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>
			and Processing Efficiency and may be affected by <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#generality">Generality</a>.</p></div><div class="div3">
<h4><a id="se-tradeoffs" name="se-tradeoffs"></a>6.17.5 Known Tradeoffs</h4><p>Some <a href="http://www.w3.org/XML/Binary/Properties/xbc-properties.html#compactness">Compactness</a>
			methods tend to increase memory usage, sometimes dramatically.  Reducing
			Processing Efficiency may also affect dynamic memory needed.</p></div></div></div><div class="div1">
<h2><a id="N10909" name="N10909"></a>7 References</h2><dl><dt class="label"><a id="XBC-UseCases" name="XBC-UseCases"></a>XBC Use Cases</dt><dd>
			<a href="http://www.w3.org/TR/xbc-use-cases"><cite>XML Binary Characterization Use Cases</cite></a>
		  (See http://www.w3.org/TR/xbc-use-cases.)</dd><dt class="label"><a id="XBC-Properties" name="XBC-Properties"></a>XBC Properties</dt><dd>
			<a href="http://www.w3.org/TR/xbc-properties"><cite>XML Binary Characterization Properties</cite></a>
		  (See http://www.w3.org/TR/xbc-properties.)</dd><dt class="label"><a id="XBC-Characterization" name="XBC-Characterization"></a>XBC Characterization</dt><dd>
			<a href="http://www.w3.org/TR/xbc-characterization"><cite>XML Binary Characterization</cite></a>
		  (See http://www.w3.org/TR/xbc-characterization.)</dd><dt class="label"><a id="Usenet-Compression-FAQ" name="Usenet-Compression-FAQ"></a>Compression FAQ</dt><dd>
			<a href="http://www.faqs.org/faqs/compression-faq/"><cite>Usenet Compression FAQ</cite></a>
		  (See http://www.faqs.org/faqs/compression-faq/.)</dd><dt class="label"><a id="Architecture-of-the-WWW" name="Architecture-of-the-WWW"></a>Architecture of the World Wide Web</dt><dd>
			<a href="http://www.w3.org/TR/webarch/"><cite>Architecture of the World Wide Web</cite></a>
		  (See http://www.w3.org/TR/webarch/.)</dd><dt class="label"><a id="XML-1.0" name="XML-1.0"></a>XML 1.0</dt><dd>
			<a href="http://www.w3.org/TR/REC-xml/"><cite>Extensible Markup Language (XML) 1.0</cite></a>
		  (See http://www.w3.org/TR/REC-xml/.)</dd><dt class="label"><a id="XML-1.1" name="XML-1.1"></a>XML 1.1</dt><dd>
			<a href="http://www.w3.org/TR/xml11/"><cite>Extensible Markup Language (XML) 1.1</cite></a>
		  (See http://www.w3.org/TR/xml11/.)</dd><dt class="label"><a id="QA-Specification-Guidelines" name="QA-Specification-Guidelines"></a>QA Specification Guidelines</dt><dd>
			<a href="http://www.w3.org/TR/2004/WD-qaframe-spec-20040830/"><cite>QA Framework: Specification Guidelines</cite></a>
		  (See http://www.w3.org/TR/2004/WD-qaframe-spec-20040830/.)</dd><dt class="label"><a id="QA-Handbook" name="QA-Handbook"></a>QA Handbook</dt><dd>
			<a href="http://www.w3.org/TR/2004/WD-qa-handbook-20040830/"><cite>The QA Handbook</cite></a>
		  (See http://www.w3.org/TR/2004/WD-qa-handbook-20040830/.)</dd><dt class="label"><a id="test-data-url" name="test-data-url"></a>Test Data</dt><dd>
			<a href="http://www.w3.org/XML/Binary/2005/03/test-data/"><cite>Test Data submitted to the working group</cite></a>
		  (See http://www.w3.org/XML/Binary/2005/03/test-data/.)</dd></dl></div></div><div class="back"><div class="div1">
<h2><a id="N109AE" name="N109AE"></a>A Acknowledgements</h2><p>The measurement methodologies are the result of the work of the XBC Working Group contributors:
   Robin Berjon (Expway), Carine Bournez (W3C), Don Brutzman (Web3D), Mike Cokus (MITRE), Roger Cutler (ChevronTexaco), Ed Day (Objective Systems), Fabrice Desr&eacute; (France Telecom), Seamus Donohue (Cape Clear), Olivier Dubuisson (France Telecom), Oliver Goldman (Adobe), Peter Haggar (IBM), Takanari Hayama (KDDI), J&ouml;rg Heuer (Siemens), Misko Hevery (Adobe), Alan Hudson (Web3D), Takuki Kamiya (Fujitsu), Jaakko Kangasharju (University of Helsinki), Arei Kobayashi (KDDI), Eugene Kuznetsov (DataPower), Terence Lammers (Boeing), Kelvin Lawrence (IBM), Eric Lemoine (Tarari), Dmitry Lenkov (Oracle), Michael Leventhal (Tarari), Don McGregor (Web3D), Ravi Murthy (Oracle), Mark Nottingham (BEA), Santiago Pericas-Geertsen (Sun), Liam Quin (W3C), Kimmo Raatikainen (Nokia), Rich Salz (DataPower), Paul Sandoz (Sun), John Schneider (AgileDelta), Claude Seyrat (Expway), Paul Thorpe (OSS Nokalva), Alessandro Triglia (OSS Nokalva), Stephen D. Williams (Invited Expert).</p></div></div></body></html>