php-session 13.1 KB
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
	<meta http-equiv="content-type" content="text/html; charset=utf-8" />
	<title>Ampersands, PHP Sessions and Valid HTML - QA @ W3C</title>
	<meta name="Keywords" content="qa, quality assurance, conformance, validity, test suite, @@Meta_Keywords@@" />
	<meta name="Description" content="W3C QA - Why using PHP sessions causes invalid HTML and XHTML to be generated, and how to fix it." />

	<link rel="schema.DC" href="http://purl.org/dc" />
	<meta name="DC.Subject" lang="en" content="@@Meta_Keywords@@" />
	<meta name="DC.Title" lang="en" content="Ampersands, PHP Sessions and Valid HTML" />
	<meta name="DC.Description.Abstract" lang="en" content="Why using PHP sessions causes invalid HTML and XHTML to be generated, and how to fix it." />
	<meta name="DC.Date.Created" content="2005-04-15" />
	<meta name="DC.Language" scheme="RFC1766" content ="en" />
	<meta name="DC.Creator" content="David Dorward" />
	<meta name="DC.Publisher" content="W3C - World Wide Web Consortium - http://www.w3.org" />
	<meta name="DC.Rights" content="http://www.w3.org/Consortium/Legal/copyright-documents-19990405" />

	<link rel="Stylesheet" href="/QA/2002/12/qa4.css" /></head>
<body>

<!-- Header -->
<div id="Logo">
<a href="http://www.w3.org/"><img alt="W3C" src="/Icons/WWW/w3c_home" /></a>
<a href="http://www.w3.org/QA/"><img alt="QA" src="/QA/images/qa" width="161" height="48" /></a>

<!-- <div id="Header">Be strict to be cool</div> -->
 <div><map name="introLinks" id="introLinks" title="Introductory Links">
<div class="banner"> <a
class="bannerLink" title="W3C Activities" accesskey="A"
href="/Consortium/Activities">Activities</a> | <a class="bannerLink"
title="Technical Reports and Recommendations" accesskey="T"
href="/TR/">Technical Reports</a> | <a class="bannerLink"
title="Alphabetical Site Index" accesskey="S"
href="/Help/siteindex">Site Index</a> | <a class="bannerLink"
title="Help for new visitors" accesskey="N"
href="/2002/03/new-to-w3c">New Visitors</a> | <a
class="bannerLink" title="About W3C" accesskey="B"
href="/Consortium/">About W3C</a> | <a class="bannerLink"
title="Join W3C" accesskey="J"
href="/Consortium/Prospectus/Joining">Join W3C</a></div>
</map></div>
</div>


<!-- menuRight -->
<div id="Menu">
<p><a href="#status">Status</a><span class="dot">&middot;</span>
<a href="#background">Background</a><span class="dot">&middot;</span>
<a href="#problem">Problem</a><span class="dot">&middot;</span>
<a href="#solutions">Solutions</a><span class="dot">&middot;</span>
</p>
<hr />
<p class="navhead">Nearby:</p>
<p><a href="/QA/"><abbr title="Quality Assurance">QA</abbr>&nbsp;Homepage</a><span class="dot">&middot;</span>
<a href="/QA/#latest">Latest News</a><span class="dot">&middot;</span>
<a href="/QA/#resources">QA&nbsp;Resources</a><span class="dot">&middot;</span>
<a href="/QA/IG/">QA&nbsp;<abbr title="Interest Group">IG</abbr></a><span class="dot">&middot;</span>
<a href="/QA/WG/">QA&nbsp;<abbr title="Working Group">WG</abbr></a><span class="dot">&middot;</span>
<a href="/QA/Agenda/">QA&nbsp;Calendar</a><span class="dot">&middot;</span>
</p></div>

<!-- content -->
<div id="Content">

<!-- Your content is starting after this -->

<h1>Ampersands, PHP Sessions and Valid HTML</h1>

<p>Why using PHP sessions causes invalid HTML and XHTML to be generated, and how to fix it.</p>

<h2 id="status">Status of this document</h2>
<p>This document is an article contributed to the <a href="/QA/IG/">QA 
Interest Group</a>. Feedback, suggestions and corrections are welcome, 
and should be sent to the publicly archived mailing-list 
<a href="http://lists.w3.org/Archives/Public/www-qa/">www-qa</a>.</p>

<h3>Credits</h3>
<dl>
<dt>Author(s)</dt>
<dd><a href="http://dorward.me.uk/">David Dorward</a></dd>
</dl>

<h2 id="toc">Table of Contents</h2>

<ul>
<li><a href="#background">Background</a></li>
<li><a href="#problem">Problem</a></li>
<li><a href="#solutions">Solutions</a>
<ul>
<li><a href="#reference">Outputting a character reference</a></li>
<li><a href="#separator">Using a different argument separator</a></li>
<li><a href="#disable">Disable sessions for non-cookie users</a></li>
</ul></li>
</ul>

<h2 id="background">Background</h2>

<p>In HTML (and XHTML, along with other SGML and XML applications)
certain characters have special meaning, a prime example being &lt;,
which indicates the beginning of a tag. Such characters cannot be
simply typed into a document if you wish them to display - otherwise
how could the user agent tell the difference between
<code>b&lt;a</code> (meaning <em>b is less than a</em>) and
<code>b&lt;a</code> (meaning <em>b followed by the start of an
anchor</em>)?</p>

<p>In order to display reserved characters HTML and XHTML provide a
mechanism called <a
href="http://www.w3.org/TR/html4/charset.html#h-5.3">character
references</a>. The syntax of these is:</p>

<ol>
  <li>an ampersand</li>
  <li>a "code" for the referenced character</li>
  <li>a semicolon</li>
</ol>

<p>For example, the "less than" character is represented as
<code>&amp;lt;</code>.</p>

<p>Giving the ampersand special meaning makes it, like &lt;, a
reserved character, so it also needs to be represented by an entity
for it to be used in a document - <code>&amp;amp;</code></p>

<p>Now for a small confession - there are exceptions to these rules,
although they are not relevant when dealing with the issues caused by
PHP sessions.</p>

<p>HTML and XHTML include blocks of what is called CDATA, where HTML
special characters no longer have special meaning. Inside such blocks
character references are no longer processed, so an ampersand must be
typed as an ampersand, and not as its character reference. In HTML,
the content of <code>&lt;script&gt;</code> and
<code>&lt;style&gt;</code> elements is CDATA, while in XHTML <a
href="http://www.w3.org/TR/2002/REC-xhtml1-20020801/#h-4.8">they are
marked explicitly</a>. You can avoid the problem by placing scripts
and style sheets in separate files and using <code>&lt;link&gt;</code>
and <code>&lt;script src="&#8230;"&gt;</code>.</p>

<p>The other exceptions are that sometimes the semi-colon is optional,
and sometimes ampersands can be represented without being encoded as
entities. In these situations it is never wrong to represent the
character as a character reference terminated by a semicolon, so I
won't go into more detail.</p>

<h2 id="problem">Problem</h2>

<p>PHP has session handling code built in, this enables data to be
stored on the server but be associated with a specific user (for,
roughly, a single visit to the site).</p>

<p> To link the data with a user, the website has to hand the user
agent a token which identifies it. This token is stored in a <a
href="http://www.cookiecentral.com/faq/#1.1">cookie</a>, but not all
user agents support cookies, and most of those which do allow them to
be turned off.</p>

<p>PHP provides a fallback mechanism. If it discovers that cookies are
not accepted by the client, it rewrites every link on the page to
include that token in a query string. I believe this used to be
enabled by default, but testing shows that, at least for the Fedora
package of PHP 4.3.11 (Fedora release 2.4 of that package), it
isn't. It can be turned by on by setting the <a
href="http://www.php.net/manual/en/ref.session.php#ini.session.use-trans-sid"><em>session.trans_sid</em></a>
directive.</p>

<p>This is, in theory, a pretty elegant solution to the problem
(discounting the issues of the token hanging around for third parties
to hover off public computers, bookmarking, link sharing, etc, etc),
but the implementation is flawed.</p>

<p>For links with no query string, there isn't a problem. PHP appends
<code>?PHPSESSID=</code> followed by a random hexadecimal number. For
links that do have a query string PHP appends
<code>&amp;PHPSESSID=</code>.</p>

<p>Ampersand characters used as argument separators pose no problem in
plain old URLs, however in URLs encoded in HTML they still mean
<em>start of character reference</em> (subject to the aforementioned
exceptions, which the above example does not qualify for).</p>

<p>Most users won't notice a problem, the majority of user agents are
rather good at working around mistakes by authors. However, that
does not mean authors should ignore the problem.</p>

<ul>

<li>You cannot know that every user agent to visit the page will be
able to cope with the error</li>

<li>If a <a href="http://validator.w3.org/">markup validator</a> flags
an error on every link it is going to be rather more difficult to find
errors that could cause you serious problems</li>

<li>If you ever plan on writing XHTML and <a
href="http://www.w3.org/TR/xhtml-media-types/" title="XHTML media
types">serving your markup as such</a> then rogue ampersands will
cause the XML parser to give up attempting to handle the code (this is
a requirement of the XML specification).</li>

</ul>

<h2 id="solutions">Solutions</h2>

<h3 id="reference">Outputting a character reference</h3>

<p>The character that PHP uses to separate arguments is configurable
with the <em>arg_separator.output</em> directive. This can be set in a
number of ways and is the solution suggested in the PHP manual.</p>

<h4>Editing php.ini</h4>

<p>The php.ini file contains the central configuration data for an
install of PHP on a computer. You can specify a character reference to
use there.</p>

<pre class="code"><code>arg_separator.output = "&amp;amp;"</code></pre>

<h4>Apache directives</h4>

<p>The <a href="http://httpd.apache.org/">Apache</a> web server can
set PHP scripts in all the usual places. This allows different
directives to be set on a per site or per directory basis (in, for
example, a &lt;location&gt; block or .htaccess file).</p>

<pre class="code"><code>php_value arg_separator.output &amp;amp;</code></pre>

<h4>Per script basis</h4>

<p>PHP configuration directives can be set on a per script basis with
<a href="http://php.net/ini_set">the ini_set function</a>. Put the
code to set the directives at the top of your script.</p>

<pre class="code"><code>&lt;?php ini_set('arg_separator.output','&amp;amp;'); ?&gt;</code></pre>

<h3 id="separator">Using a different argument separator</h3>

<p>Since the ampersand character has special meaning in HTML, the
specification suggests that query string parsers allow the <a
href="http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2">use of a
semicolon as an argument separator</a>. PHP comes preconfigured to
accept this, so you can alter the output code to use a semicolon
instead of an ampersand using the same techniques.</p>

<h4>Editing php.ini</h4>

<pre class="code"><code>arg_separator.output = ";"</code></pre>

<h4>Apache directives</h4>

<pre class="code"><code>php_value arg_separator.output ;</code></pre>

<h4>Per script basis</h4>

<pre class="code"><code>&lt;?php ini_set('arg_separator.output',';'); ?&gt;</code></pre>

<h3 id="disable">Disable sessions for non-cookie users</h3>

<p>This option has a number of advantages from a security point of
view as it reduces the chance of the session token leaking to third
parties. As a side effect it will render your session code useless for
visitors who disable, block or otherwise do not support cookies (this
has accessibility implications).</p>

<h4>Editing php.ini</h4>

<pre class="code"><code>session.use_trans_sid = 0</code></pre>

<h4>Apache directives</h4>

<pre class="code"><code>php_value session.use_trans_sid 0</code></pre>

<h4>Per script basis</h4>

<p>This directive may or may not be able to be set on a per script
basis depending on which version of PHP you are using. If it is
possible to set it then the syntax is as follows:</p>

<pre class="code"><code>&lt;?php ini_set('session.use_trans_sid','0'); ?&gt;</code></pre>




<!-- Your content is finishing before this -->
</div>
<!-- Footer -->

<hr />

<div class="disclaimer">
<a href="http://validator.w3.org/check/referer"><img
src="http://validator.w3.org/images/vxhtml10" alt="Valid XHTML 1.0!"
height="31" width="88" /></a> 

<address class="author">
Created Date: 2005-04-15 <br />
Last modified $Date: 2011/12/16 02:59:19 $ by $Author: gerald $</address>
<p class="policyfooter"><a rel="Copyright"
href="/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &#xa9; 2000-2003
<a href="/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>&#xae;</sup> (<a
href="http://www.csail.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a
href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a
href="/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a
rel="Copyright" href="/Consortium/Legal/copyright-documents">document use</a>
and <a rel="Copyright" href="/Consortium/Legal/copyright-software">software
licensing</a> rules apply. Your interactions with this site are in accordance
with our <a href="/Consortium/Legal/privacy-statement#Public">public</a> and
<a href="/Consortium/Legal/privacy-statement#Members">Member</a> privacy
statements.</p>

</div>
</body>
</html>