Step-by-step 17.7 KB
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
	<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
	<title>Making your website valid: a step-by-step guide - W3C QA</title>
	<meta name="Keywords" content="qa, quality assurance, conformance, validity, test suite, validator, html, css" />
	<meta name="Description" content="W3C QA - An article to help web authors and webmasters improve the quality of an existing website" />

	<link rel="schema.DC" href="http://purl.org/dc" />
	<meta name="DC.Subject" lang="en" content="validator, html, css" />

	<meta name="DC.Title" lang="en" content="Making your Web site valid: a step by step guide" />
	<meta name="DC.Description.Abstract" lang="en" content="An article to help web authors and webmasters improve the quality of an existing website" />
	<meta name="DC.Date.Created" content="2002-06-24" />
	<meta name="DC.Language" scheme="RFC1766" content ="en" />
	<meta name="DC.Creator" content="Olivier Thereaux" />
	<meta name="DC.Publisher" content="W3C - World Wide Web Consortium - http://www.w3.org" />
	<meta name="DC.Rights" content="http://www.w3.org/Consortium/Legal/copyright-documents-19990405" />

	<link rel="Stylesheet" href="/QA/2002/12/qa4.css" />
</head>
<body>

<!-- Header -->
<div id="Logo">
<a href="http://www.w3.org/"><img alt="W3C" src="/Icons/WWW/w3c_home" /></a>
<a href="http://www.w3.org/QA/"><img alt="QA" src="/QA/images/qa" width="161" height="48" /></a>

<!-- <div id="Header">Be strict to be cool</div> -->
 <div><map name="introLinks" id="introLinks" title="Introductory Links">
<div class="banner"> <a
class="bannerLink" title="W3C Activities" accesskey="A"
href="/Consortium/Activities">Activities</a> | <a class="bannerLink"
title="Technical Reports and Recommendations" accesskey="T"
href="/TR/">Technical Reports</a> | <a class="bannerLink"
title="Alphabetical Site Index" accesskey="S"
href="/Help/siteindex">Site Index</a> | <a class="bannerLink"
title="Help for new visitors" accesskey="N"
href="/2002/03/new-to-w3c">New Visitors</a> | <a
class="bannerLink" title="About W3C" accesskey="B"
href="/Consortium/">About W3C</a> | <a class="bannerLink"
title="Join W3C" accesskey="J"
href="/Consortium/Prospectus/Joining">Join W3C</a></div>
</map></div>
</div>


<!-- menuRight -->
<div id="Menu">
<p><a href="#abstract">Abstract</a><span class="dot">&middot;</span>
<a href="#problem">Difficult&nbsp;Decision</a><span class="dot">&middot;</span>
<a href="#wrongway">Bad&nbsp;Approach</a><span class="dot">&middot;</span>
<a href="#hardway">Hard&nbsp;Approach</a><span class="dot">&middot;</span>
<a href="#suggestedway">Suggested&nbsp;Approach</a><span class="dot">&middot;</span>
<a href="#logvaltut">Practical&nbsp;Case</a><span class="dot">&middot;</span>
</p>
<hr />
<p class="navhead">Nearby:</p>
<p><a href="/QA/Tools/Logvalidator">LogValidator Home</a><span class="dot">&middot;</span>
<a href="/QA/"><abbr title="Quality Assurance">QA</abbr>&nbsp;Homepage</a><span class="dot">&middot;</span>
<a href="/QA/#resources">QA&nbsp;Resources</a><span class="dot">&middot;</span>
<a href="/QA/IG/">QA&nbsp;<abbr title="Interest Group">IG</abbr></a><span class="dot">&middot;</span>
</p></div>

<!-- content -->
<div id="Content">
<h1>Making your website valid: a step by step guide.</h1>

<h2 id="abstract">Abstract</h2>

<p>In this article we will imagine a situation when a webmaster wishes to make a
whole website compliant with regards to web standards (valid (X)HTML, valid CSS, etc.). 
This article describes the usual ways to approach this problem, as well as
suggesting a painless approach using a new tool developed  by W3C's 
QA activity.</p>

<h2 id="status">Status</h2>

<p>This article has been produced as part of the <acronym
title="World Wide Web Consortium">W3C</acronym> <a href="../../IG/">Quality
Assurance Interest Group</a> work. Please send any public feedback on it to
the <a href="http://lists.w3.org/Archives/Public/public-evangelist/"><strong>publicly
archived</strong></a> mailing list <a
href="mailto:public-evangelist@w3.org">public-evangelist@w3.org</a> or for private feedback to <a
href="mailto:ot@w3.org">ot@w3.org</a>.</p>

<p>This document has been <a href="http://www.w3.org/2003/03/Translations/byTechnology?technology=Step-by-step">translated in other languages</a>.</p>


<h2 id="problem">Improving an existing site: a difficult decision</h2>
<p>
Creating a Web site --one that complies with standards such as HTML,
CSS, or the Web Accessibility Guidelines --, is the right thing to
do, and is also a profitable choice. 
</p>

<p>
Guidelines and tools are readily available to help you create a 
Web site that conforms to Web standards, ensuring a broad audience, 
cost-effective development, and easier maintenance.</p>

<p>But deciding how to convert an existing site to a standards-compliant format 
is a difficult decision. Your site may have legacy, unmaintained documents 
in multiple formats or may serve a large amount of documents, making it difficult
to update. Your site may be backed by good design and flexible technologies, 
which will simplify the task, yet in any case updating the site will 
require a resource commitment.</p>

<p>However, the method you choose to update determines how many resources 
you'll need to dedicate, and the way you will dedicate them.</p>

<p>There are two typical ways to make an existing Web site standards 
compliant: start completely over (the wrong way), or manually validate 
each page (the hard way). For IT managers, neither is very attractive, 
hence making the decision to switch to a standards compliant site 
difficult: it simply does not seem worthy given the amount of work needed.</p>

<p>After looking in details at these two approaches (analyzing why they are wrong), 
we will see a third, better one: systematically update one section at a time.</p>

<h2 id="wrongway">The wrong way: Re-starting from scratch</h2>

<p>The wrong way to improve the quality of an already existing 
site is to delete everything existing, and restart the site
from scratch.</p>



<p> This approach may be tempting for the freedom it allows and the opportunity
to use a clean framework for the beginning. However in addition to the cost 
of a full redesign, rewrite and debug of the site, 
trying to fix things by beginning over may create more problems,
starting with <a href="http://www.w3.org/Provider/Style/URI.html">broken links</a>. 
</p>


<h2 id="hardway">The Hard Way: The whole works</h2>

<p>The usual way is also the hard way : the site administrator lists
all resources available (provided the technologies used make this feasible), 
and runs those, either one by one, or in batch, through "validating" technologies, 
like HTML validation, CSS validation, spell checking, or through corrective
filters (such as HTML Tidy).</p>

<p>This approach has a lot of advantages, and does not include any specific risk
as did the previous method. However, especially for sites with thousands of documents, 
it requires an incredible amount of work and can't be achieved, if at all, without 
an excellent organization. Just figuring out "where to start?" is, itself, 
a tricky question when it comes to checking a full site.</p>


<h2 id="suggestedway">A suggested alternative</h2>
<p>There might be no perfect way to fix a whole site, but some are better, 
or easier, than others.
Using tools introduced below, we will explain a relatively easy method
that we believe is good.
This method has, unfortunately, its limits: it is best used 
with static content, or dynamic/generated content if you have control 
over the templates. <br />
If you do not have control over those and they produce invalid markup, 
then we encourage you to send a bug report to the software vendors,
or to the service provider managing your content.</p>

<h3>Step by Step approach</h3>

<p>"The Hard Way", would certainly be the best method to 
fix an existing site for someone with unlimited resources dedicated to this task. 
In the "real world", unless the site is very small, this approach is not realistic, 
except if you make the process gradual, and ordered.
</p>

<p>
With careful planning and an extended time-line, you can eventually clean up the site. 
However, this process requires careful management, so that a given number of files 
are cleaned up at regular intervals and all new resources are valid.</p>

<h4>Do the math</h4>
<p>The number of resources you will clean up during each period depends upon 
the volume of content (and the ratio of invalid documents). 
Ask yourself the following questions when allocating resources:</p>

<ul>
	<li>How much time can you dedicate to cleaning up invalid content?</li>
	<li>How long does it take for you (or the people assigned for this 
	task) to fix one invalid document?</li>
</ul>

<h4>No deadline?</h4>
<p>
We have not mentioned any deadlines for this cleaning-up work. 
In most cases you probably have no idea what the initial ratio of invalid 
content you have, and you may even not know how many documents you have.
Without this information, how can you estimate how long it will take?</p>

<p>Of course, like every project this cleaning project needs limits and
deadlines. One limit you can set before starting the project is : 
"what is the acceptable invalid ratio for my site?" 
If you have a small or moderate-sized site, "zero" may be your answer, 
however we suggest you choose a more modest figure if you have a big site,
10% for example.</p>

<p>Once you have set the limit and the dedicated resources to the cleaning 
project, the first few rounds of the "step by step method" will give you an
idea of how long it will take to reach the limit. You will then be able to 
reconsider the amount of time dedicated if necessary, or your targeted 
"quality ratio".</p>

<h3>Traffic-based approach</h3>

<p>Here is a simple example to explain the traffic-based approach.</p>

<p>Imagine you have 4 documents on a site
(we'll call them 1,2,3 and 4), accounting for, respectively 40%, 30%, 
20% and 10% of the traffic for this site.</p>

<p>Now imagine that documents 1 and 4 are invalid. that's 50% of the
documents, and 50% of the traffic, and that's bad. If you have time to
fix both documents, fine, but if you have time to fix only one?</p>

<p>
The usual approach would be to just fix one so only 25% of the documents are invalid. 
The traffic approach tells you  to choose document 1, fix it, 
and go up to 90% of the traffic being valid.</p>

<p>This is a cost-efficient approach to the problem : 
given a limited amount of resources, you want to focus on the 
improvements that will have the best results.</p>

<h4>Estimating the Quality of a site using the Traffic approach</h4>
<p>The traffic based approach is also a more accurate tool to estimate the 
quality of a website. As we will see in the following section, given a site
(with an unknown number of documents served, but known logs for a given period 
of time), the LogValidator sorts the documents served during this time by 
popularity (traffic), then tries to find X invalid documents among the most 
popular ones.</p>

<p>Now, let's imagine a case where 100 documents have been served. The tool
needs to go through 20 documents to find 2 (we are setting X=2 for the example)
that were invalid HTML documents. These 20 documents account for 45% of the
traffic. The table below give the estimations of the quality of the site (with 
regards to HTML validity), with a "file approach" and with a "traffic approach".
</p>

<table style="border-style:solid;border-color:black;border-width:1px">
<tr>
<td></td><th colspan="2">Using the file approach</th> <th colspan="2">Using the traffic approach</th>
</tr>
<tr>
<td></td><th>Lower estimate</th><th>Upper estimate</th><th>Lower estimate</th><th>Upper estimate</th>
</tr>
<tr>
<th>Before validating the 2 documents</th>
<td>18%</td><td>98%</td>
<td>40.5%<br />(45*18/20)%</td><td>95.5%<br />((45*18/20)% +55%)</td>
</tr>
<tr>
<th>After validating the 2 document</th>
<td>20%</td><td>100%</td>
<td>45%</td><td>100%</td>
</tr>
</table>

<p>The "file based" estimations are loose and inaccurate, whereas the traffic-based estimations
are more accurate. Once you have fixed the 2 documents, and re-start this process, the traffic 
based estimates get more accurate (and higher, since more and more of the traffic is valid!).
</p>



<h2 id="logvaltut">Practical Case: Using the LogValidator and other tools 
to cleanup your site's markup</h2> 
<p>Here we will describe a practical example of this "cleanup strategy" 
using a limited set of (free) tools to validate a Web site's HTML. 
As stated before, HTML is just an example, you can use
the techniques described (and some of the tools) for many other cases.</p>

<h3>Get the tools</h3>
<p>The <a href="http://www.w3.org/QA/Tools/LogValidator/">LogValidator</a> 
will be the primary (if not the only) tool you will need. You can 
<a href="http://www.w3.org/QA/Tools/LogValidator/#download">download it</a>
freely, and install it on any system running Perl (your Web server 
certainly does). </p>

<p>You will also need a few <a href="http://www.w3.org/QA/Tools/LogValidator/Manual#id01">
other components</a> that the LogValidator depends on to run smoothly. They
can all be downloaded and installed free of charge.</p>

<p>If you are not an HTML expert, and cleaning up code is not your hobby, you
can use <a href="http://tidy.sourceforge.net/">tidy</a> to do it for you. It's 
a (semi-)automatic markup cleanup tool, and is available for many platforms.
</p>

<p>The LogValidator will check your documents through the online 
<a href="http://validator.w3.org">Markup validator</a> at W3C. If you have a big site, 
or want to save bandwidth, you can install it locally, too.</p>

<h3>Running the LogValidator</h3>
<p>We will assume you have installed at least the LogValidator, and at least read the 
<a href="http://www.w3.org/QA/Tools/LogValidator/Manual">Manual</a> carefully.</p>

<p>You first need to set up a configuration file to match your server configuration.
To do so you (mainly) need access to a log file for your Web server (this will be used
to compute traffic statistics). You can easily create the configuration file by copying the 
sample configuration file distributed with the tool, and edit it as explained in the 
<a href="http://www.w3.org/QA/Tools/LogValidator/Manual">Manual</a>.</p>

<p>Once this is done, you can run the LogValidator. Don't set the number of results 
too high, 10 should be enough to begin with.</p>

<p>You should get back a list of your 10 "most popular" invalid documents. Take some 
time to analyze them. You can run them through the 
<a href="http://validator.w3.org">Markup Validator</a> to check where the bad HTML is.
If you are using templates, does it seem like there is something wrong with them? 
Can you check the template with the validator?</p>

<p>Next, fix the first documents on the list.Remember, those are the
most popular documents on your site that are not valid, so it is an important step!
This first step may be difficult, especially if "big" documents are in the list.
<a href="http://tidy.sourceforge.net/">tidy</a> can help cleanup your code. 
You can also search the Web for guidelines for fixing Web pages and find people to assist you.</p>

<p>For example, if you don't understand the output of the validator, 
check out its <a href="http://validator.w3.org/docs/">documentation</a> 
or contact the public list 
<a href="http://lists.w3.org/Archives/Public/www-validator/">www-validator@w3.org</a>.
</p>

<p>Done? Congratulations! You can now set up the LogValidator to run  every week, 
day, or month (see the <a href="http://www.w3.org/QA/Tools/LogValidator/Manual#tips">tip
</a> to do this), and start again with other documents...</p>

<p>Keep up the good work. If you have a really big site made of static documents, 
chances are you won't reach 100% of valid pages, but that's OK. After some
time, the invalid pages that are left will account for a tiny portion of your site.</p>

<h2>Credits</h2>
<p>Thanks a lot to Kim Nylander for a thorough review of this document
and many invaluable suggestions.<br />
Thanks to Karl Dubost and Dominique Hazael-Massieux, W3C, for their 
comments and suggestions.</p>

<h2>Contact</h2>
<address>
Olivier Thereaux, <a href="http://www.w3.org">W3C</a> : 
&lt;<a href="http://www.w3.org/People/olivier/">ot@w3.org</a>&gt;
</address>

</div>
<!-- Footer -->

<hr />

<div class="disclaimer">
<a href="http://validator.w3.org/check/referer"><img
src="http://validator.w3.org/images/vxhtml10" alt="Valid XHTML 1.0!"
height="31" width="88" /></a> 

<p class="author">
Created Date: 2002-06-24 by <a href="mailto:ot@w3.org">Olivier Thereaux</a><br />
Last modified $Date: 2011/12/16 02:57:04 $ by $Author: gerald $</p>

<p class="policyfooter"><a rel="Copyright"
href="/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &#xa9; 2000-2003
<a href="/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>&#xae;</sup> (<a
href="http://www.lcs.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a
href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a
href="/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a
rel="Copyright" href="/Consortium/Legal/copyright-documents">document use</a>
and <a rel="Copyright" href="/Consortium/Legal/copyright-software">software
licensing</a> rules apply. Your interactions with this site are in accordance
with our <a href="/Consortium/Legal/privacy-statement#Public">public</a> and
<a href="/Consortium/Legal/privacy-statement#Members">Member</a> privacy
statements.</p>

</div>
</body>
</html>