top-500-html5-validity.html 18.3 KB
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css" media="all">
    @import "/QA/2006/01/blogstyle.css";
    </style>
    <meta name="keywords" content='html, html5, python, validation, validator' />
    <meta name="description" content="Following Brian Wilson lead and his validity survey, I tested against html 5. Less than 1% of top 500 Alexa Web sites seems to pass html 5 conformance checking. " />
    <meta name="revision" content="$Id: top-500-html5-validity.html,v 1.48 2011/12/16 03:03:02 gerald Exp $" />    
   <link rel="alternate" type="application/atom+xml" title="Atom" href="http://www.w3.org/QA/atom.xml" />
   <link rel="alternate" type="application/rss+xml" title="RSS 1.0" href="http://www.w3.org/QA/news.rss" />   
   <title>Alexa Global Top 500 against HTML 5 validation - W3C Blog</title>

   <link rel="start" href="http://www.w3.org/QA/" title="Home" />
   <link rel="prev" href="http://www.w3.org/QA/2008/09/parisweb-2008.html" title="ParisWeb 2008 - registration is open" />
   <link rel="next" href="http://www.w3.org/QA/2008/09/slideshow-must-go-on.html" title="The Slideshow Must Go On" />

   <!--
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
    rdf:about="http://www.w3.org/QA/2008/09/top-500-html5-validity.html"
    trackback:ping="http://www.w3.org/QA/sununga/mt-tb.cgi/221"
    dc:title="Alexa Global Top 500 against HTML 5 validation"
    dc:identifier="http://www.w3.org/QA/2008/09/top-500-html5-validity.html"
    dc:subject="HTML"
    dc:description="Following Brian Wilson lead and his validity survey, I tested against html 5. Less than 1% of top 500 Alexa Web sites seems to pass html 5 conformance checking. "
    dc:creator="Karl Dubost"
    dc:date="2008-09-19T06:57:34+00:00" />
</rdf:RDF>
-->

    <!-- <script type="text/javascript" src="http://www.w3.org/QA/mt.js"></script>-->

</head>
<body class="layout-one-column">
      <div id="banner">
      <h1 id="title">
	<a href="http://www.w3.org/"><img height="48" alt="W3C" id="logo" src="http://www.w3.org/Icons/WWW/w3c_home_nb" /></a>
W3C Blog
</h1>
    </div>
    
    <ul class="navbar" id="menu">
        <li><strong><a href="/QA/" title="W3C Blog Home">[ W3C Blog ]</a></strong></li>
        <li><a href="/QA/Library/" title="Documents and Publications on Web and Quality">Documents</a></li>
        <li><a href="/QA/Tools/" accesskey="3" title="Validators and other Tools">Tools</a></li>
        <li><a href="/2007/12/qa-blog-help/index#feedback">Feedback</a></li>
    </ul>
<div id="searchbox">
<form method="get" action="http://www.google.com/custom" enctype="application/x-www-form-urlencoded">
<p id="formbox"><input type="text" size="15" class="textfield" name="q" accesskey="E" maxlength="255" /> <input type="submit" class="submitfield" value="Search" id="goButton" name="sa" accesskey="G" /> <input type="hidden" name="cof" value="T:black;LW:72;ALC:#ff3300;L:http://www.w3.org/Icons/w3c_home;LC:#000099;LH:48;BGC:white;AH:left;VLC:#660066;GL:0;AWFID:0b9847e42caf283e;" /><input type="hidden" id="searchW3C" name="sitesearch" checked="checked" value="www.w3.org/QA" /><input type="hidden" name="domains" value="www.w3.org/QA" /></p>
</form>
</div>


    <div id="main"><!-- This DIV encapsulates everything in this page - necessary for the positioning -->

                     <p class="content-nav">
                        <a href="http://www.w3.org/QA/2008/09/parisweb-2008.html">&laquo; ParisWeb 2008 - registration is open</a> |
                        <a href="http://www.w3.org/QA/">Main</a>
                        | <a href="http://www.w3.org/QA/2008/09/slideshow-must-go-on.html">The Slideshow Must Go On &raquo;</a>
                     </p>

                        <h2 class="entry-header">Alexa Global Top 500 against HTML 5 validation</h2>
                           <div class="entry-body">
                              <p>Last month, <a href="http://my.opera.com/blooberry/info/">Brian Wilson</a> published a <a href="http://my.opera.com/operaqa/blog/2008/08/04/alexa-global-top-500-validation-research">survey</a> on validation. He took the top <a href="http://www.alexa.com/site/ds/top_sites?ts_mode=global&amp;lang=none">500 sites URI given by Alexa</a> and sent them to the W3C Markup validator. Recently, W3C created a <a href="http://www.w3.org/QA/2008/08/html5-validator-beta">beta instance of html 5 conformance checker</a>.   Brian concluded that <q>32 of the 487 URLs passed validation (6.57%)</q>.</p>

<p>So today I decided to take the <a href="http://files.myopera.com/blooberry/alexa/alexaglobaltop500list.htm">January 2008 list of web site</a> and to send them to the <strong>beta</strong> instance of html 5 conformance checker. I created a very simple python script (As usual if you are in horror with my code, any kind suggestions to improve it is welcome). Be careful you will need to install <a href="http://code.google.com/p/httplib2/" title="httplib2 - Google Code">httplib2</a>. The file alexa.txt contains the list of uris, one by line. To be sure to check against html 5, I forced the html 5 doctype.</p>

<pre><code>import httplib2
import time

h = httplib2.Http(".cache")

f = open("alexa.txt", "r")
urllist = f.readlines()
f.close()

for url in urllist:
   # wait 10 seconds before the next request - be nice with the validator
   time.sleep(10)
   resp= {}
   url = url.strip()
   urlrequest = "http://qa-dev.w3.org/wmvs/HEAD/check?doctype=HTML5&amp;uri="+url
   try:
      resp, content = h.request(urlrequest, "HEAD")
      if resp['x-w3c-validator-status'] == "Abort":
         print url, "FAIL"
      else:
         print url, resp['x-w3c-validator-status'], resp['x-w3c-validator-errors'], resp['x-w3c-validator-warnings']
   except:
      pass
</code></pre>

<p>Before I give the results, repeat after me 10 times : html 5 Conformance checker is in beta, which means <strong>not stable</strong> and in testing. html 5 specification is a Working Draft, which means <strong>highly to change</strong>. The test is only on the home page of the site.</p>

<p>The January 2008 file contains 485 web sites. 23 (4.7%) could not be validated. Most of the time, the site was too slow. Only 4 (&lt; 1%) sites were declared valid html 5 by the conformance checker. If Henri Sivonen could do the same thing with his instance of html 5 conformance checker that would help to know if my results are silly or in the right envelop.</p>

                           </div>
                           <div id="more" class="entry-more">
                              

                           </div>
                       <p class="postinfo">Filed by <a href="http://www.w3.org/People/karl/">Karl Dubost</a> on September 19, 2008  6:57 AM in <a href="http://www.w3.org/QA/archive/technology/html/">HTML</a>, <a href="http://www.w3.org/QA/archive/web_spotting/opinions_editorial/">Opinions &amp;amp; Editorial</a>, <a href="http://www.w3.org/QA/archive/w3cqa_news/tools/">Tools</a><br />
<span class="separator">|</span> <a class="permalink" href="http://www.w3.org/QA/2008/09/top-500-html5-validity.html">Permalink</a>
                                 | <a href="http://www.w3.org/QA/2008/09/top-500-html5-validity.html#comments">Comments (11)</a>
                                 | <a href="http://www.w3.org/QA/2008/09/top-500-html5-validity.html#trackback">TrackBacks (0)</a>
</p>



<h3 class="comments-header" id="comments">Comments</h3>
<div class="comment" id="comment-165928">
<p class="comment-meta" id="c165928">
<span class="comment-meta-author"><strong>hdh </strong></span>
<span class="comment-meta-date"><a href="#c165928">#</a> 2008-09-19</span>
</p>
<div class="comment-bulk">
<p>I am surprised that there are sites which use the HTML5 doctype. All other sites fails HTML5 validation right from the doctype.</p>

</div>
</div>


<div class="comment" id="comment-165929">
<p class="comment-meta" id="c165929">
<span class="comment-meta-author"><strong>hdh </strong></span>
<span class="comment-meta-date"><a href="#c165929">#</a> 2008-09-19</span>
</p>
<div class="comment-bulk">
<p>Please ignore my previous comment, I run a local v.nu instance, which does not override the document doctype as your does.</p>

<p>I set Preset to HTML5 (experimental), Parser to HTML5, but the original docype is still used.</p>

</div>
</div>


<div class="comment" id="comment-165938">
<p class="comment-meta" id="c165938">
<span class="comment-meta-author"><strong>olivier Thereaux, W3C </strong></span>
<span class="comment-meta-date"><a href="#c165938">#</a> 2008-09-19</span>
</p>
<div class="comment-bulk">
<p>Interesting... I thought one of the design principles of HTML5 was paving the cowpath and be backward-compatible, which, perhaps via simplistic logic, meant that anything conforming to an older version of HTML would still conform to HTML5. In other words, the number of sites passing the html5 check should be higher, not lower, than the dtd-based validation. At least that's my assumption.</p>

<p>It would be interesting to look in more details at what errors made the html5 checking fail. That would clarify whether html5 has drifted from earlier versions of html, whether the checker has bugs, etc. Concatenate the XML output(s) of validation and look into that? </p>

</div>
</div>


<div class="comment" id="comment-166027">
<p class="comment-meta" id="c166027">
<span class="comment-meta-author"><strong>Henri Sivonen </strong></span>
<span class="comment-meta-date"><a href="#c166027">#</a> 2008-09-23</span>
</p>
<div class="comment-bulk">
<p>HTML5 makes non-conforming a lot of attributes that were conforming and popular in HTML 4.01 Transitional and even Strict.</p>

<p>I think <a href="http://lists.w3.org/Archives/Public/public-html/2008Aug/0958.html" rel="nofollow">HTML5 needs adjustment</a>.</p>

</div>
</div>


<div class="comment" id="comment-166053">
<p class="comment-meta" id="c166053">
<span class="comment-meta-author"><strong>Dave </strong></span>
<span class="comment-meta-date"><a href="#c166053">#</a> 2008-09-25</span>
</p>
<div class="comment-bulk">
<p>More than 99% (minus 4.7%, sorta) of sites without HTML 5 doctypes don't validate as HTML 5? I think more tests may be needed here. Do pages with XHTML Strict tend to validate or not validate as HTML 4.01? Time permitting, maybe pages with HTML 5 doctypes could be checked for validation against HTML 5. ;)</p>

<p>In all seriousness, this is interesting information that's difficult for me to apply to a question. Is this roughly the same as searching current pages for any use of a tag/attribute that is planned to change in HTML 5? Is it fair to read between the lines that this test indicates that over 99% of pages would need to be changed in some way to conform to HTML 5 as it stands?</p>

</div>
</div>


<div class="comment" id="comment-173993">
<p class="comment-meta" id="c173993">
<span class="comment-meta-author"><strong>Levi Aho </strong></span>
<span class="comment-meta-date"><a href="#c173993">#</a> 2009-03-05</span>
</p>
<div class="comment-bulk">
<p>Am I the only one who thinks this is <em>utterly</em> pointless. What exactly is the point of validating pages to a doctype they don't even claim to implement? That'd be as stupid as calling valid HTML pages invalid because they aren't well-formed XML, even when they make no claim to be.</p>

<p>Sure, most of these pages aren't valid <em>anything</em>, but I'm sure most of them aren't even aiming for HTML5. They're much more likely to be some sort of HTML4 or XHTML or at least aiming for those two standards. (If, of course, the developers care at all. Sadly, many don't.)</p>

<p>I support validation, and standards, and HTML5, but this is a waste of time and brains. I'm of the opinion that any page that triggers quirks mode should just be considered invalid anyway and ignored, which is probaly 486 (or so) of the 500 sites on the list.</p>

</div>
</div>


<div class="comment" id="comment-175518">
<p class="comment-meta" id="c175518">
<span class="comment-meta-author"><strong>karl dubost </strong></span>
<span class="comment-meta-date"><a href="#c175518">#</a> 2009-03-10</span>
</p>
<div class="comment-bulk">
<p>@Levi,</p>

<p>HTML 5 is <em>supposed</em> to be designed by using the content as it is deployed on the Web. So to undestand the practices of users and tools, to try to not create too much discrepancy. </p>

<p>The goal of this validation was to show that in fact, switching to html 5 will require a lot of efforts from users. Do not forget that html 5 is a moving target, so this study, which didn't take a long time (just the time to develop the script ;) ), could be run again anytime depending on the status of html5.</p>

</div>
</div>


<div class="comment" id="comment-180951">
<p class="comment-meta" id="c180951">
<span class="comment-meta-author"><strong>eliecer </strong></span>
<span class="comment-meta-date"><a href="#c180951">#</a> 2009-04-27</span>
</p>
<div class="comment-bulk">
<p>muy interesante tu articulo, Un buen ranking en Alexa siempre es una buena carta de presentación y aunque no es representativo y fiable al 100% indica que estamos haciendo las cosas bien en cuanto a nuestra página web </p>

</div>
</div>


<div class="comment" id="comment-187269">
<p class="comment-meta" id="c187269">
<span class="comment-meta-author"><strong>Jim Hobson </strong></span>
<span class="comment-meta-date"><a href="#c187269">#</a> 2010-02-22</span>
</p>
<div class="comment-bulk">
<p>I support HTML5 and, of all industries, ours is certainly one where verything is constantly being imrpoved and upgraded.  While some may argue that HTML5 is being pushed too far ahead of it's development maturity I'm glad to see that things are being driven forward!</p>

<p>We have created sites that meet the current HTML5 standards and it did not take an overwhelming effort. </p>

</div>
</div>


<div class="comment" id="comment-187366">
<p class="comment-meta" id="c187366">
<span class="comment-meta-author"><strong>Sean Fraser </strong></span>
<span class="comment-meta-date"><a href="#c187366">#</a> 2010-02-26</span>
</p>
<div class="comment-bulk">
<p>@Karl,</p>

<p>The Conformance Checker has improved greatly since your comment. Either version. Perhaps Brian Wilson would run a new report with the current Alexa 500: how interesting would the results be? compared with the previous report.</p>

</div>
</div>


<div class="comment" id="comment-200762">
<p class="comment-meta" id="c200762">
<span class="comment-meta-author"><strong>Jason Devens </strong></span>
<span class="comment-meta-date"><a href="#c200762">#</a> 2010-09-15</span>
</p>
<div class="comment-bulk">
<p>Just curious as to how many errors the top 100 online business based on revenue receive when run through W3C validation?</p>

<p>What's the point of this again?</p>

</div>
</div>



  <div class="comments-open" id="comments-open">
<h3 class="comments-open-header">Leave a comment</h3>

<div class="comments-open-moderated">
   <p>
   Note: this blog is intended to foster <strong>polite
   on-topic discussions</strong>. Comments failing these
   requirements and spam will not get published. Please,
   enter your real name and email address. Every
   individual comment is reviewed by the W3C staff.
   This may take some time, thank you for your patience.
   </p>
   <p>
   You can use the following HTML markup (a href, b, i, 
   br/, p, strong, em, ul, ol, li, blockquote, pre) 
   and/or <a href="http://daringfireball.net/projects/markdown/syntax">Markdown syntax</a>.</p>
</div>

<div id="comments-open-data">
<form method="post" action="http://www.w3.org/QA/sununga/beach.pl" id="comments-form">
<h4>Your comment</h4>
<div id="comments-open-text">
  <textarea id="comment-text" name="text" rows="20" cols="100"></textarea><br />
<label for="comment-text">Write your comment text here. Remember, keep the discussion on topic and courteous.</label>
</div>

<h4>About you</h4>
<div id="comment-form-name">
  <input type="hidden" name="static" value="1" />
<input type="hidden" name="entry_id" value="231" />
<input type="hidden" name="__lang" value="en" /> 
<label for="comment-author">Your Name</label>
<input id="comment-author" name="author" size="30" value="" />
</div>
<div id="comment-form-email">
<label for="comment-email">Your Email Address</label>
<input id="comment-email" name="email" size="30" value="" />
</div>

<div id="comments-open-footer">
<input type="submit" accesskey="s" name="post" id="comment-submit" value="Submit" />

</div>
</form>
</div>
</div>



<p id="gentime">This page was last generated on $Date: 2011/12/16 03:03:02 $</p> 

      </div><!-- End of "main" DIV. -->

<address>

This blog is written by W3C staff and working group participants,<br />
&nbsp;and maintained by <a href="/People/CMercier/">Coralie Mercier</a>.<br />
Authorized parties may <a href="/QA/new">log in</a> to create a new entry.<br/>
<span id="poweredby">Powered by Movable Type, magpierss and a lot of Web Technology</span>
    </address>


    
    <p class="copyright">
      <a rel="Copyright" href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &copy; 1994-2011
      <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a>&reg;
      (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>,
      <a href="http://www.ercim.eu/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
      <a href="http://www.keio.ac.jp/">Keio</a>),
      All Rights Reserved.
      W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
      <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>,
      <a rel="Copyright" href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a>
      and <a rel="Copyright" href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</a>
      rules apply. Your interactions with this site are in accordance
      with our <a href="http://www.w3.org/Consortium/Legal/privacy-statement#Public">public</a> and
      <a href="http://www.w3.org/Consortium/Legal/privacy-statement#Members">Member</a> privacy
      statements.
    </p>

  </body>
</html>