index.html
125 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Authoring HTML: Handling Right-to-left Scripts</title><style type="text/css" xml:space="preserve">
.warning { background-color: #FF9; border: 1px solid red; padding: 5px; }
#unicodecontrols {
margin-left: 5%;
margin-right: 5%;
border: 1px solid teal;
}
</style><link rel="stylesheet" href="local.css" type="text/css" /><link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE"/></head>
<body>
<div style="text-align:center;"><p>[ <a href="#contents" shape="rect">contents</a> ]</p></div>
<div class="head">
<a href="http://www.w3.org/" shape="rect"><img height="48" width="72" alt="W3C" src="http://www.w3.org/Icons/w3c_home" /></a>
<h1><a name="title" id="title" shape="rect">Authoring HTML: Handling Right-to-left Scripts</a></h1>
<h2>W3C Working Group Note 08 September 2009</h2><dl><dt>This version:</dt>
<dd>
<a href="http://www.w3.org/TR/2009/NOTE-i18n-html-tech-bidi-20090908/" shape="rect">http://www.w3.org/TR/2009/NOTE-i18n-html-tech-bidi-20090908/</a></dd><dt>Latest version:</dt><dd>
<a href="http://www.w3.org/TR/i18n-html-tech-bidi/" shape="rect">http://www.w3.org/TR/i18n-html-tech-bidi/</a>
</dd><dt>Previous version:</dt><dd><a href="http://www.w3.org/TR/2009/WD-i18n-html-tech-bidi-20090714/" shape="rect">http://www.w3.org/TR/2009/WD-i18n-html-tech-bidi-20090714/</a></dd><dt>Editor:</dt><dd>Richard Ishida, W3C</dd></dl>
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright" shape="rect">Copyright</a> © 2007-2009 <a href="http://www.w3.org/" shape="rect"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/" shape="rect"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/" shape="rect"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/" shape="rect">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer" shape="rect">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks" shape="rect">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents" shape="rect">document use</a> rules apply.</p></div><hr /><div>
<h2><a name="abstract" id="abstract" shape="rect"> Abstract</a></h2>
<p>This document provides advice for the use of HTML markup and CSS style sheets to create pages for languages that use
right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc. It explains how to create content in right-to-left scripts that builds on but goes beyond the Unicode bidirectional algorithm, as well as how to prepare content for localization into
right-to-left scripts.</p></div><div>
<h2><a name="status" id="status" shape="rect"> Status of this Document</a></h2><p><em>This section describes the status of this document at the time of its publication. Other documents may
supersede this document. A list of current W3C publications and the latest revision of this technical report can be
found in the
<a href="http://www.w3.org/TR/" shape="rect">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p>
<p>This document provides advice on practical techniques related to the creation of content in languages that use right-to-left scripts, such as Arabic and Hebrew, or content in other languages that includes fragments of text in these scripts.</p>
<p>This is a W3C Working Group Note produced by the
<a href="http://www.w3.org/International/core/" shape="rect">Internationalization Core Working Group</a>, part of the
<a href="http://www.w3.org/International/Activity" shape="rect">W3C Internationalization Activity</a>.</p>
<p>Please send comments on this document to <a href="mailto:www-international@w3.org" shape="rect">www-international@w3.org</a> (<a href="http://lists.w3.org/Archives/Public/www-international/" shape="rect">publicly archived</a>). </p>
<p>Publication as a Working Group Note does not imply endorsement by the W3C Membership. This document may be updated, replaced or obsoleted by other documents at any time. Therefore, quotes or references to specific information in the document should include the publication date of this version, 08 September 2009. It is inappropriate to cite this document as other than a Working Group Note, which is not an endorsed W3C Recommendation.</p>
<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/32113/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p></div><div class="toc">
<h2><a name="contents" id="contents" shape="rect"> </a>Table of Contents</h2>
<div class="toc1">1 <a href="#ri20030912.142608197">Introduction</a><br /><div class="toc2">1.1 <a href="#audience">Who should use this document</a><br /></div><div class="toc2">1.2 <a href="#howtouse">How to use this document</a><br /></div><div class="toc2">1.3 <a href="#technologies">Technologies addressed</a><br /></div><div class="toc2">1.4 <a href="#terminology">Terminology</a><br /></div></div><div class="toc1">2 <a href="#d2e254">Important concepts</a><br /><div class="toc2">2.1 <a href="#ri20040808.101452727">Bidirectional (or bidi) text</a><br /></div><div class="toc2">2.2 <a href="#ri20050208.093646470">Relationships between language and directionality</a><br /></div></div><div class="toc1">3 <a href="#bidisource">Problems with bidirectional source text</a><br /><div class="toc2">3.1 <a href="#ri20060623.095434796">Working with markup</a><br /></div><div class="toc2">3.2 <a href="#ri20060623.095429759">Adding escapes to the content</a><br /></div><div class="toc2">3.3 <a href="#ri20060625.110946746">Example source text in this document</a><br /></div></div><div class="toc1">4 <a href="#ri20030728.093644822">Authoring with localization in mind</a><br /><ul class="tocShortTitle"><li class="toc-technique"><a href="#ri20030728.090413145">Whenever possible, avoid HTML attributes with values of <code class="kw">right</code> and <code class="kw">left</code>. Use CSS in a linked style sheet instead.</a></li><li class="toc-technique"><a href="#tech-textalign">Only use <code class="kw">text-align</code> where you specifically want to override the current default alignment.</a></li></ul></div><div class="toc1">5 <a href="#ri20030218.135303232">Setting up a right-to-left page</a><br /><ul class="tocShortTitle"><li class="toc-technique"><a href="#ri20030112.213806280">Add <code>dir="rtl"</code> to the <code class="kw">html</code> tag any time the overall page direction is right-to-left.</a></li><li class="toc-technique"><a href="#tech-scrollbar">If you need to avoid the scroll bar moving on some browsers, put <code class="kw">dir</code> on the <code class="kw">head</code> element and a <code class="kw">div</code> just inside the <code class="kw">body</code> element.</a></li><li class="toc-technique"><a href="#ri20030112.21380914">Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding.</a></li><li class="toc-technique"><a href="#tech-iso-encoding">If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8.</a></li><li class="toc-technique"><a href="#ri20030728.092130948">Do not use CSS styling to control directionality in HTML. Use markup.</a></li></ul></div><div class="toc1">6 <a href="#ri20030728.09474480">Changing direction on block elements</a><br /><ul class="tocShortTitle"><li class="toc-technique"><a href="#ri20030726.114013437">Add the <code class="kw">dir</code> attribute to a block element to change
base direction. Don't use CSS or Unicode control characters.</a></li><li class="toc-technique"><a href="#ri20030726.132037950">Only use bidi markup to set the base direction for the document as a whole, or where you need to <em>change</em> the base direction.</a></li></ul></div><div class="toc1">7 <a href="#ri20030218.135304584">Mixing text direction inline</a><br /><ul class="tocShortTitle"><li class="toc-technique"><a href="#ri20030112.214820604">When you have bidirectional text nested in inline text of a different direction, and markup can be used, use the <code class="kw">dir</code> attribute to make the text display correctly. Otherwise, use RLE/LRE and PDF control characters to create an embedded base direction.</a></li><li class="toc-technique"><a href="#ri20030726.140315918">When weak or neutral characters or objects appear at the wrong side of a directional run, fix it using <code class="kw">dir</code> if there is markup already in place, or use an RLM/LRM.</a></li><li class="toc-technique"><a href="#ri20030728.072236229">When adjacent but separate directional runs with the same directionality are rendered in the wrong order, use RLM/LRM.</a></li><li class="toc-technique"><a href="#ri20030728.092841697">Use stateful Unicode control characters for
bidirectional control only for attribute text or element text that allows no internal markup.</a></li><li class="toc-technique"><a href="#tech-tooltips-etc">Consider using Unicode control characters to set the base direction around
bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes.</a></li><li class="toc-technique"><a href="#ri20030728.09323441">Do not leave white space at the end of inline elements that mark a directional
boundary.</a></li></ul></div><div class="toc1">8 <a href="#ri20030510.102858118">Handling parentheses & other mirrored characters</a><br /><ul class="tocShortTitle"><li class="toc-technique"><a href="#ri20030728.093416889">Treat mirrored characters as if any word <code class="kw">left</code> in the name meant '<span class="qterm">opening</span>', and
<code class="kw">right</code> meant '<span class="qterm">closing</span>'.</a></li></ul></div><div class="toc1">9 <a href="#ri20030218.135307338">Overriding the Unicode bidirectional algorithm</a><br /><ul class="tocShortTitle"><li class="toc-technique"><a href="#tech-bdo">Use the <code class="kw">bdo</code> element to force the directionality of a sequence of inline
characters.</a></li></ul></div><div class="toc1">A <a href="#d2e1978">Acknowledgments</a><br /></div>
</div>
<hr />
<div class="body">
<div class="div1">
<h2><a name="ri20030912.142608197" id="ri20030912.142608197" shape="rect">1 Introduction</a></h2>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="audience" id="audience" shape="rect">1.1 Who should use this document</a>?</h3>
<p>All content authors working with HTML and CSS who are working with text in a language that uses a right-to-left script, or whose content will be localized to a language that uses a right-to-left script. The term '<span class="qterm">author</span>' is used in the sense of a person that creates content either directly or via a script or program that generates HTML documents.</p><p>This document provides guidance for developers of HTML that enables support for international deployment.
Enabling international deployment is the responsibility of all content authors, not just localization groups or
vendors, and is relevant from the very start of development. Ignoring the advice in this document, or relegating it to
a later phase in the development process, will only add unnecessary costs and resource issues at a later date.</p><p>It is assumed that readers of this document are proficient in developing HTML and XHTML pages - this
document limits itself to providing advice specifically related to internationalization.</p></div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="howtouse" id="howtouse" shape="rect">1.2 How to use this document</a></h3>
<div class="note"><span class="note-head">Note: </span>This document will assume prior familiarity with the concepts introduced in the tutorial <a href="/International/tutorials/bidi-xhtml/" shape="rect">Creating HTML Pages in Arabic & Hebrew</a>. That tutorial provides an overview of how to create pages in right-to-left scripts.</div>
<p>This document lists a number of <span class="new-term">techniques</span> related to handling pages in right-to-left scripts, with explanations. Each technique is summarised in text on a light blue background. This is followed by </p>
<p style="background-color: #def; color: #005a9c;">The key recommendation for each technique is summarized tersely on a light blue background.</p>
<p>The text that follows the summary gives concise advice on how to implement the technique, and additional explanations and discussion follow that where appropriate. In some cases, the applicability of the recommendation may vary, depending on your aims and context. Where there are pros and cons for a given recommendation, we try to clearly indicate those.</p>
<p>The document is primarily designed for use as a reference source, where readers look up techniques one by one. For this reason, you may find a significant amount of duplication if you read the whole document in one go.</p>
<div class="div3">
<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="intro-examples" id="intro-examples" shape="rect">1.2.1 Examples</a></h4>
<p>We show examples of text in native scripts using images, to ensure that the example looks correct regardless of the fonts and rendering capabilities of the reader's platform. Clicking on the <img src="images/codelink.gif" alt="View code." /> graphic will typically display the actual text version in a separate window. You can then examine the source text too. </p>
<p>To make it easier to understand the flow of characters in an example if you do not read Arabic or Hebrew text, we also provide an ASCII-only version of many examples. This version uses uppercase translations to represent the Arabic or Hebrew characters, while all Latin text is lowercased. The order of characters represents the way you would see the native text arranged on screen (so you usually read the translations from right to left).</p>
<p>See also the note in the section <a class="section-ref" href="#ri20060625.110946746" shape="rect">Example source text in this document</a> about the difficulties of representing code examples, and the approach taken in the light of numerous possibilities.</p>
</div>
<div class="div3">
<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="d2e143" id="d2e143" shape="rect">1.2.2 Browser-specific notes</a></h4>
<p>When there are things you should know about with regard to how, or whether, a technique is supported on particular browsers this information can be found by following the 'Get more information' link at the bottom of each technique. This information is kept in a separate document so that it can be kept up-to-date as time passes. You should check the notes document regularly for changes.</p>
<p>Any browser-specific notes generally relate to the latest versions of a selection of browsers that were widely deployed at the time this document was published. However, they may also include information about later versions, as they are released, and may also eventually include information about other browsers.</p>
<p>In the absence of browser-specific notes, you can assume that the technique works interoperably on the following browser versions (which are those tested prior to the release of the document):</p>
<ul><li><img src="images/ie.png" alt="Internet Explorer icon" /> Internet Explorer v6-8 </li><li><img src="images/firefox.gif" alt="Firefox icon" /> Firefox v3.5.2</li><li>
<p><img src="images/opera.gif" alt="Opera icon" /> Opera v9.64 </p>
</li><li>
<p><img src="images/chrome.png" alt="Chrome icon" /> Google Chrome v2.0.172.33 </p>
</li><li>
<p><img src="images/safari.gif" alt="Safari icon" /> Safari v4.0 </p>
</li></ul>
<p>Three versions of Internet Explorer are listed, since they still account for a large proportion of the user base. </p>
</div>
<div class="div3">
<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="howto-furtherinfo" id="howto-furtherinfo" shape="rect">1.2.3 Further information</a></h4>
<p>When you follow the 'Get more information' link at the bottom of each technique you will also find links to one or more locations in a techniques index. These links lead to further how-to information, useful links, tests and results pages, etc. </p>
<p>The information in the techniques index is updated as new resources are discovered or developed.</p>
</div>
</div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="technologies" id="technologies" shape="rect">1.3 Technologies addressed</a></h3>
<p>This document provides techniques for developing pages using HTML 4.01 and XHTML 1.0 served as <code class="kw">text/html</code> with CSS.</p>
<div class="note"><span class="note-head">Note:</span> XHTML 1.0 can be served as XML (using MIME types application/xhtml+xml, application/xml or text/xml) or HTML (using the MIME type text/html). It is very common for XHTML 1.0 to be served as HTML, hopefully following the compatibility guidelines in Appendix C of the XHTML 1.0 specification. This allows authors to produce valid XML code, which has benefits for processing with scripts or XSLT, but is also well supported for display by all mainstream browsers. (Unlike XHTML served as application/xhtml+xml, which is not well supported by some browsers at the moment.) This document does not concern itself with XHTML served as XML.</div><p>Where a browser operates in both
<a href="http://www.w3.org/International/articles/serving-xhtml/#quirks" shape="rect">standards- and quirks-mode</a>,
standards-mode is assumed (ie. you should use a DOCTYPE statement).</p></div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="terminology" id="terminology" shape="rect">1.4 Terminology</a></h3>
<dl><dt class="term"><a name="term-basedirection" id="term-basedirection" shape="rect">base direction</a></dt><dd class="definition">The overall directional context for a document or block of text. In HTML, the default is left-to-right, but this can be changed using the <code class="kw">dir</code> attribute.</dd><dt class="term"><a name="term-bidialgorithm" id="term-bidialgorithm" shape="rect">bidirectional (bidi) algorithm</a></dt><dd class="definition">An algorithm, described in the Unicode Standard, for producing the correct visual ordering of right-to-left and bidirectional text. </dd><dt class="term"><a name="term-blockelement" id="term-blockelement" shape="rect">block element</a></dt><dd class="definition">Block elements are elements such as p, div, ol, ul, blockquote, body, etc. The opposite of a block element is an inline element, such as span, em, strong, a, etc.</dd><dt class="term"><a name="term-directionalrun" id="term-directionalrun" shape="rect">directional run</a></dt><dd class="definition">A sequence of adjacent characters in bidirectional text that all have the same directionality. In bidirectional text there will always be a minimum of two directional runs, one RTL and the other LTR.</dd><dt class="term"><a name="term-inlineelement" id="term-inlineelement" shape="rect">inline element</a></dt><dd class="definition">Inline elements are elements such as span, em, strong, a, etc. The opposite of an inline element is a block element, such as p, div, ol, ul, blockquote, body, etc.</dd><dt class="term"><a name="term-inlinetext" id="term-inlinetext" shape="rect">inline text</a></dt><dd class="definition">Text that lies wholly within a single block element, ie. text within a paragraph. Inline text may include inline markup.</dd><dt class="term"><a name="term-logicalorder" id="term-logicalorder" shape="rect">logical order</a></dt><dd class="definition">Characters arranged in memory in, for the most part, the order in which they are pronounced. Compare to visual order.</dd><dt class="term"><a name="term-lre" id="term-lre" shape="rect">LRE</a></dt><dd class="definition">A short name for the Unicode character U+202A <span class="uname">LEFT-TO-RIGHT EMBEDDING</span>. This invisible control character is used to begin a range of text with an embedded base direction of left-to-right.</dd><dt class="term"><a name="term-lro" id="term-lro" shape="rect">LRO</a></dt><dd class="definition">A short name for the Unicode character U+202E <span class="uname">LEFT-TO-RIGHT OVERRIDE</span>. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from left to right.</dd><dt class="term"><a name="term-pdf" id="term-pdf" shape="rect">PDF</a></dt><dd class="definition">A short name for the Unicode character U+202C <span class="uname">POP DIRECTIONAL FORMATTING</span>. This invisible control character is used to signal the end of a range of text that was started with one of the RLE, LRE, RLO or LRO characters.</dd><dt class="term"><a name="term-rle" id="term-rle" shape="rect">RLE</a></dt><dd class="definition">A short name for the Unicode character U+202B <span class="uname">RIGHT-TO-LEFT EMBEDDING</span>. This invisible control character is used to begin a range of text with an embedded base direction of right-to-left.</dd><dt class="term"><a name="term-rlo" id="term-rlo" shape="rect">RLO</a></dt><dd class="definition">A short name for the Unicode character U+202E <span class="uname">RIGHT-TO-LEFT OVERRIDE</span>. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from right to left.</dd><dt class="term"><a name="term-visualorder" id="term-visualorder" shape="rect">visual order</a></dt><dd class="definition">Characters arranged in memory in the order in which they are read on-screen. Compare to logical order.</dd></dl>
</div></div><div class="div1">
<h2><a name="d2e254" id="d2e254" shape="rect">2 Important concepts</a></h2><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="ri20040808.101452727" id="ri20040808.101452727" shape="rect">2.1 Bidirectional (or bidi) text</a></h3>
<p>'<span class="qterm">Bidirectional</span>', or '<span class="qterm">bidi</span>', text typically refers to text written using a mixture
of right-to-left and left-to-right scripts. For example, in Arabic and Hebrew text the content flows predominantly from right to left, but
embedded numbers or text in other scripts (such as Latin script) still runs left to right. Text in other languages,
such as English, can also be bidirectional if it includes excerpts from languages such as Arabic and Hebrew.</p><p>Scripts such as Arabic and Hebrew, which are predominantly right-to-left in orientation, may be referred
to as '<span class="qterm">RTL</span>' (right-to-left) scripts.</p>
<p>This document will use the Arabic and Hebrew languages for most of its examples. Many languages use the Arabic script, and several other scripts run predominantly right-to-left: these include Thaana, N'ko, and Syriac, as well as other scripts no longer in common use, such as Cypriot, Phoenician and Kharoshthi.</p></div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="ri20050208.093646470" id="ri20050208.093646470" shape="rect">2.2 Relationships between language and directionality</a>
</h3>
<p>Direction is a property of scripts, not language.</p>
<p>Some people think that information about directionality can be inferred from information about the language of the text, but this is
not true. There must be a one-to-one mapping between directionality and language for this to work, and there isn't. For example, Azerbaijani can be written using both right-to-left and left-to-right scripts, and the language code <code class="kw">az</code> is relevant for either. </p>
<p>In addition, when using directional markup inline, the markup and the values of that markup do not necessarily coincide with language declarations. </p><p>Also, markup used to indicate directionality has values that indicate that the normal directionality should be overridden; it is not possible to indicate that using language related values.</p><p>In the same way, attributes indicating text direction in HTML and XHTML do not, and should not, provide information about the language of
text.</p><p>There exist already separate mechanisms for declaring language and directionality in HTML and XHTML, and
these ideas should not be confused.</p>
<p>Other
W3C <a href="/International/technique-index" shape="rect">techniques information</a> describes how to declare character encoding and language.</p></div></div><div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="bidisource" id="bidisource" shape="rect">3 Problems with bidirectional source text</a></h2><p>There is currently a lack of good editing environments for creating HTML pages using right-to-left scripts. Because of the fact that HTML markup and escapes contain punctuation and strongly typed letters, you are always working with bidirectional source text. However, if the editor is not aware that the markup is not ordinary text (which is usually the case) it can produce some odd effects, and make coding difficult. </p><p>This section simply mentions some of those problems, so that you are forewarned. It doesn't propose a full solution, but it does offer some advice which may help with problematic editing environments.</p><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="ri20060623.095434796" id="ri20060623.095434796" shape="rect">3.1 Working with markup</a>
</h3><p>Unless your editor recognizes markup in source text as not being normal text, the strongly typed letters and punctuation in the markup will appear in places you wouldn't expect, and sometimes interfere with the order of the content itself.</p><p>If you are creating a large amount of right-to-left text, it makes sense to set the base direction of the editing window in your editor to right-to-left. This helps ensure that the content is correctly ordered. Unfortunately, this tends to increase the likelihood that your markup looks strange in the source text.</p><p><a href="#ri20060625.100504543">Example 1</a> shows some simple markup in a left-to-right context. </p><div class="exampleOuter"><div class="exampleHeader"><a name="ri20060625.100504543" id="ri20060625.100504543">Example 1: Markup being rearranged in LTR source code</a></div><p><p class="myclass" title="العربي">مشس هخصث خهس تخت تخهثز.</p></p></div><p>The source contains a <code class="kw">p</code> tag followed by a <code class="kw">class</code> attribute, followed by a <code class="kw">title</code> attribute with some Arabic text as its value. The content of the paragraph itself starts with Arabic text. The resulting order in a left-to-right environment (where Arabic text is indicated by text in square brackets) is </p><p><code><p class="myclass" title="[paragraph_content]<"[title_value].</p></code>.</p><p>As <a href="#ri20060625.102000852">Example 2</a> shows, things are hardly better if the overall context for the source code is right-to-left. In this case, the resulting order for the same source text is </p>
<p><code><p/>[paragraph_content]<"[title_value]"=p class="myclass" title></code>.</p><div class="exampleOuter"><div class="exampleHeader"><a name="ri20060625.102000852" id="ri20060625.102000852">Example 2: Markup being rearranged in RTL source code</a></div><p dir="rtl"><p class="myclass" title="العربي">مشس هخصث خهس تخت تخهثز.</p>
</p></div><p>Note, however, that this source will display correctly in a user agent. This is just a problem for reading and maintaining the source text.</p><p>The title attribute with Arabic text makes the situation much worse that normal in the above examples. The problem arises because there is only 'punctuation' between two runs of strongly-typed right-to-left text, so the Unicode bidirectional algorithm considers this to be a single run of text. It helps a little, if you can do it, to ensure that an attribute with a ltr value (ie. here the class attribute) appears last. This would make the text in a left-to-right context look as expected, and in a right-to-left context it would prevent the interaction of markup with content (see <a href="#ri20060625.103644305">Example 3</a>).</p><div class="exampleOuter"><div class="exampleHeader"><a name="ri20060625.103644305" id="ri20060625.103644305">Example 3: Markup being rearranged in RTL source code</a></div><p dir="rtl"><p title="العربي" class="myclass">مشس هخصث خهس تخت تخهثز.</p></p></div><p>If you are dealing with content that is predominantly in a right-to-left script, then, you need to look for a source editor that recognizes markup as a special construct, and produces a sensible order.</p>
<p>It can also help to start the content on a new line (see <a href="#ri20060625.105702196">Example 4</a>), however this doesn't always help with inline markup. Also, you should try to avoid including white space before the closing markup, as this can lead to other problems (see <a href="#ri20030728.09323441">7.6 Watch out for white space</a>).</p><div class="exampleOuter"><div class="exampleHeader"><a name="ri20060625.105702196" id="ri20060625.105702196">Example 4: Starting content after a new line can separate attributes and content</a></div><p dir="rtl"><p class="myclass" title="العربي"></p><p dir="rtl">مشس هخصث خهس تخت تخهثز.</p></p></div><p>Not only that, but if your markup includes a dir attribute to change the directional context of the content, your editor should recognize this and produce a corresponding change in the order of the source code.</p></div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="ri20060623.095429759" id="ri20060623.095429759" shape="rect">3.2 Adding escapes to the content</a></h3><div class="note"><span class="note-head">Note: </span>See <a href="#ri20030726.140315918">7.2 Weak/neutral characters at the edge of a directional run</a> and <a href="#ri20030728.072236229">7.3 Adjacent, same-direction directional runs</a> for details about how escapes can be used to correctly order bidirectional inline text.</div>
<p>If you use a Unicode character for Unicode control characters such as the <span class="uname">RIGHT TO LEFT MARK</span> (RLM) or <span class="uname">ZERO-WIDTH NON JOINER</span>, you will not usually be able to see it in the source text, since it is invisible. For this reason you may think that a useful way to represent these characters is with the pre-defined HTML character entities,
<code class="kw">&rlm;</code> and <code class="kw">&zwnj;</code>, or their numeric equivalents, <code>&#x200F;</code> and <code>&#x200C;</code>. </p><p>Unfortunately, such an approach typically has its problems, too. As described in the previous section related to markup in source text, the strongly-typed left-to-right characters and 'punctuation' characters in the escapes will normally cause the Unicode bidirectional algorithm to display very odd looking source text.</p>
<p>Very few editors currently recognize, for example, the sequence of characters <code>&#x200F;</code> as a single unit representing a character with a strong right-to-left direction. They treat this as simply text containing punctuation, numbers and two strongly-typed left-to-right characters (x and F), and apply the Unicode bidirectional algorithm to that as they would to any normal text.</p>
<p><a href="#ri20060623.091251616">Example 5</a> shows a typical view of source text after adding an escape to bidirectional text in right-to-left ordered source text. The sequence <code>&#x200F;</code> embedded in right-to-left text is displayed <code>;x200F#&</code>. At the beginning or end of embedded English text the escape is broken into fragments, and appears as <code>x200F;text in english#&</code> or <code>;text in english&#x200F</code>, respectively.</p><p>Note that the source will still display correctly in a user agent. This is just a problem for reading and maintaining the source text.</p><div class="exampleOuter">
<div class="exampleHeader"><a name="ri20060623.091251616" id="ri20060623.091251616">Example 5: Escape sequences being rearranged in RTL source code</a></div>
<p dir="rtl">مشس&#x200F; هخصث خهس text in english تخت تخهثز.</p>
<p dir="rtl">مشس هخصث خهس &#x200F;text in english تخت تخهثز.</p>
<p dir="rtl">مشس هخصث خهس text in english&#x200F; تخت تخهثز.</p></div><p>Various approaches are possible, if you want to avoid using invisible characters:</p><ul><li><p>use an editor that recognizes an escape as a single unit representing a RLM/LRM character and produces the expected effect on the surrounding source text</p></li><li><p>use an editor that provides a symbolic visual representation of the RLM/LRM character, so that you don't lose sight of it</p></li><li><p>break the source code line around the escape - works in some cases</p></li><li><p>learn to live with the undesirable reordering effects for escapes.</p></li></ul></div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="ri20060625.110946746" id="ri20060625.110946746" shape="rect">3.3 Example source text in this document</a></h3><p>Given the discussion above, representing examples of source text in this document can be quite difficult. Should we show source text in right-to-left order, or left-to-right? Should we assume that the editor recognizes and handles markup and escapes as separate entities from the content, and create source fragments that look like that - or should we show source as it really looks for many people who don't have such clever editors? And particularly, should we assume that the bidirectional algorithm is properly applied in the source editor, picking up cues from the markup, or not?</p><p>We will avoid source code examples unless they are very useful. We will try to describe how to apply the markup rather than show it.</p><p>We will typically represent examples in a left-to-right context, and use invisible markup to make content and markup look as you might expect it to be displayed by an intelligent editor, since this will provide maximum clarity about the point being made, even if it doesn't reflect how the markup will look for many people.</p></div></div>
<div class="div1">
<h2><a name="ri20030728.093644822" id="ri20030728.093644822" shape="rect">4 Authoring with localization in mind</a></h2>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030728.090413145" name="ri20030728.090413145" href="#ri20030728.090413145" shape="rect">4.1 <span class="bptitle">Avoid attributes with values of 'right' and 'left'</span></a></div>
<div class="rule">Whenever possible, avoid HTML attributes with values of <code class="kw">right</code> and <code class="kw">left</code>. Use CSS in a linked style sheet instead.</div>
<div class="description">
<div class="infotype">How to</div>
<p>Attributes in HTML 4.01 that have values of <code class="kw">right</code> and <code class="kw">left</code> are <code class="kw">align</code> and
<code class="kw">clear</code>. </p>
<p>The <code class="kw">align</code> attribute is used with the elements <code class="kw">hr</code>, <code class="kw">div</code>, <code class="kw">h1-h6</code>, <code class="kw">p</code>,
<code class="kw">table</code>, <code class="kw">caption</code>, <code class="kw">col</code>, <code class="kw">colgroup</code>, <code class="kw">tbody</code>, <code class="kw">table</code>, <code class="kw">td</code>, <code class="kw">tfoot</code>, <code class="kw">th</code>, <code class="kw">thead</code> and
<code class="kw">tr</code>. The use of this attribute is deprecated in HTML 4.01 for all but the table-related elements.</p>
<p><code class="kw">clear</code> is used with <code class="kw">br</code>, but is also deprecated in HTML 4.01.</p>
<p>For example, to right-align an image, such as the icons in this document that return you to the table of contents, you could use the CSS rule in <a href="#ri20060622.111500152">Example 6</a>:</p><div class="exampleOuter"><div class="exampleHeader"><a name="ri20060622.111500152" id="ri20060622.111500152">Example 6: Using CSS to replace the align attribute.</a></div>
<p><code>img.TOClink { text-align: right; }</code></p></div>
<p>You can achieve the same effect as the <code class="kw">clear</code> attribute using the CSS shown in <a href="#egClear">Example 7</a>. This rule ensures that the <code>h2</code> element has no floated content to its left. </p><div class="exampleOuter"><div class="exampleHeader"><a name="egClear" id="egClear">Example 7: Using CSS to replace the clear attribute.</a></div>
<p><code>h2 { clear: left; }</code></p></div>
<p>These style rules would, of course, need to be changed in the style sheet for a version of the page that was localized into a right-to-left script, but it should be much easier to do that than to go through all the HTML content.</p>
<p>(Note that this technique does not refer to the values <code class="kw">rtl</code> and <code class="kw">ltr</code> that are used with the
<code class="kw">dir</code> attribute.)</p>
<div class="infotype">Discussion</div>
<p> Values of <code class="kw">right</code> and <code class="kw">left</code> in attributes need to be reversed when translating the document
into a language using a right-to-left script. </p>
<p>Whether you are authoring a LTR document or a RTL document, it can save a lot of time and risk to use CSS style sheets to
achieve the same effect, since one small change in a CSS file can save the trouble of editing code in many HTML documents. (One should expect the style sheet to need conversion anyway as part of the translation process.)</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030728.090413145" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="tech-textalign" name="tech-textalign" href="#tech-textalign" shape="rect">4.2 <span class="bptitle">Use text-align carefully</span></a></div>
<div class="rule">Only use <code class="kw">text-align</code> where you specifically want to override the current default alignment.</div>
<div class="description">
<div class="infotype">Discussion</div>
<p> Values of <code class="kw">right</code> and <code class="kw">left</code> need to be reversed when translating the document
into a language using a script with a different direction. To reduce the effort and complexity of adapting the styles it is better to only use <code class="kw">text-align</code> where it is actually needed to override the current default alignment. </p>
<p>Often people apply it by default when it is not actually required. This overrides the default alignment derived from the base direction, and leads to more work when localizing a document - particularly if the document contains blocks of text in more than one direction. By default the <code class="kw">dir</code> attribute setting should produce the correct alignment. </p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#tech-textalign" class="uanotesref">Get more information >></a></div>
</div>
</div>
</div>
<div class="div1">
<h2><a name="ri20030218.135303232" id="ri20030218.135303232" shape="rect">5 Setting up a right-to-left page</a></h2>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030112.213806280" name="ri20030112.213806280" href="#ri20030112.213806280" shape="rect">5.1 <span class="bptitle">Use dir on the html tag for RTL pages</span></a></div>
<div class="rule">Add <code>dir="rtl"</code> to the <code class="kw">html</code> tag any time the overall page direction is right-to-left.</div>
<!--div class="applicability"><span class="applic-title">See notes for: </span><img src="images/ie6.gif" alt="ie6" /> <img src="images/ie7.gif" alt="ie7" /></div-->
<div class="section1">
<div class="infotype">How to</div>
<p>Add <code>dir="rtl"</code> to the <code class="kw">html</code> tag any time the overall <span class="rule">page</span> direction is right-to-left.</p>
<div class="exampleOuter"><div class="exampleHeader"><a name="ri20060622.160051134" id="ri20060622.160051134">Example 8: Using the dir attribute to set the overall direction of a page.</a></div><p><code><html dir="rtl" lang="ar"></code><br clear="none" />
...<br clear="none" />
<code></html></code></p>
</div>
<p>This will cause <a class="termref" href="#term-blockelement" shape="rect">block elements</a> and table columns to start on the right and flow from right to left. All block elements in the document will inherit this setting unless it is explicitly overridden.</p>
<p>No <code class="kw">dir</code> attribute is needed for <span class="rule">page</span>s that have a <a class="termref" href="#term-basedirection" shape="rect">base direction</a> of <code class="kw">ltr</code>, since this is the default.</p>
<div class="infotype">Discussion</div>
<p>Setting the <code class="kw">dir</code> attribute on the <code class="kw">html</code> element sets the default direction for all elements in the <span class="rule">page</span>, including the <code class="kw">head</code> element. Note, however, the effect of this on the user interface of some browsers (the 'browser chrome') as described in the browser-specific notes.</p>
<p> Having established the base direction at the level of the <code class="kw">html</code> tag, you should not use the <code class="kw">dir</code> attribute on other elements unless you want to <em>change</em> the base direction for that element. Unnecessary use of the dir attribute impacts bandwidth and creates unnecessary additional work for page maintenance (see <a href="#ri20030726.132037950">6.2 Use bidi markup only when necessary</a>).</p>
<p>There is not usually any good reason for using <code class="kw">dir</code> on the <code class="kw">body</code> element. Placing the attribute on the <code class="kw">html</code> element has the same effect, but also covers the text in the <code class="kw">head</code> element, too.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030112.213806280" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="tech-scrollbar" name="tech-scrollbar" href="#tech-scrollbar" shape="rect">5.2 <span class="bptitle">Avoiding user interface side-effects</span></a></div>
<div class="rule">If you need to avoid the scroll bar moving on some browsers, put <code class="kw">dir</code> on the <code class="kw">head</code> element and a <code class="kw">div</code> just inside the <code class="kw">body</code> element.</div>
<!--div class="applicability"><span class="applic-title">See notes for: </span><img src="images/ie6.gif" alt="ie6" /> <img src="images/ie7.gif" alt="ie7" /></div-->
<div class="section1">
<div class="infotype">How to</div>
<div class="note"><span class="note-head">Note: </span>This technique is relevant only if you consider it to be a problem that putting <code class="kw">dir</code> in the <code class="kw">html</code> tag may affect the user interface of some browsers. See the discussion below before implementing this technique.</div>
<p>To avoid this behavior without tagging every block element in the document, you could add, immediately inside the <code class="keyword">body</code> element, a <code class="keyword">div</code> element that surrounds all the other content in the document, and apply the <code class="keyword">dir</code> attribute to that. The directionality will then be inherited by all other block elements in the body of the document, but will not set off the changes to the browser. </p>
<p>If you do this, you must ensure that you add a <code class="keyword">dir</code> attribute to the <code class="keyword">head</code> element also, to cover its <code class="keyword">title</code> element, attribute values, etc.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ri20060622." id="ri20060622.">Example 9: Using the dir attribute to set the overall direction of a page without using the html or body tags.</a></div>
<pre xml:space="preserve"><html lang="he">
<head dir="rtl">
...
</head>
<body>
<div dir="rtl">
...
</div>
</body>
</html</pre></div>
<div class="infotype">Discussion</div>
<p>In some browsers, applying a right-to-left direction in the <code class="keyword">html</code> or <code class="keyword">body</code> tag will affect the user interface, too. If the page has a scroll bar, it will appear on the left side of the window. JavaScript alert boxes may also be mirror imaged.</p>
<div class="note"><span class="note-head">Note: </span>At the time of writing, the scroll bar effect can be seen in Internet Explorer and Opera, but the JavaScript dialog boxes are only different for Internet Explorer and only in certain circumstances. You can find more detailed and up-to-date information in the <a href="/International/docs/bp-html-bidi/uanotes#tech-scrollbar" shape="rect">notes page</a> associated with this document.</div>
<p>Some speakers of languages that use right-to-left scripts prefer the directionality of the user interface to be associated with the desktop environment, not with the content of a particular document. Because of this, they may prefer to not declare the document directionality on the <code class="keyword">html</code> or <code class="keyword">body</code> tag. </p>
<p>The approach outlined in the How To section above shows the simplest way to avoid this behavior, and yet still ensure that the default base direction for the whole document is right-to-left.</p>
<p>If you want to know more about this, read the Microsoft article <a href="http://www.microsoft.com/globaldev/handson/dev/Mideast.mspx" shape="rect">Authoring HTML for Middle Eastern
Content</a>. According to this, the following behaviors can only be expected in Internet Explorer 5+ if the <code class="keyword">dir</code> attribute is on the html
element, rather than the <code class="keyword">body</code> element.</p>
<ul><li>
<p>The OLE/COM ambient property of the document is set to <code class="keyword">AMBIENT_RIGHTTOLEFT</code></p>
</li><li>
<p>The document direction can be toggled through the document object model (DOM)
(<code>document.direction="ltr"</code> or <code>document.direction="rtl"</code>)</p>
</li><li>
<p>An HTML Dialog will get the correct extended windows styles set so it displays as a RTL dialog on a Bidi
enabled system.</p>
</li><li>
<p>If the document has vertical scrollbars, they will be used on the left side if <code>dir="rtl"</code>.</p>
</li></ul>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#tech-scrollbar" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030112.21380914" name="ri20030112.21380914" href="#ri20030112.21380914" shape="rect">5.3 <span class="bptitle">Don't use visual ordering</span></a></div>
<div class="rule">Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding.</div><div class="description">
<div class="infotype">How to</div>
<p>Create and store your Hebrew content in <a class="termref" href="#term-logicalorder" shape="rect">logical order</a> (ie. usually as you would pronounce it), not the order you expect to see it displayed. </p>
<p>You will need to use an appropriate character encoding. It is usually best to use an Unicode encoding, such as UTF-8. If, for some reason,
you choose to serve your Hebrew page in an ISO encoding instead, then specify ISO-8859-8-I, not ISO-8859-8.</p>
<div class="infotype">Discussion</div>
<p><span class="qterm"><span class="new-term">Visual ordering.</span></span> Visual ordering of text was common for old user agents that didn't support the Unicode
bidirectional algorithm. Text was stored in the source code in the same order you would expect to see it displayed.
This also involved such things as disabling any line wrapping, explicit right-alignment of text in paragraphs/ table
cells, and reverse-ordering of table columns when translating from English to a language using a bidi script. For example, if you want to add a few words in the middle of
a paragraph, you would have to move text to and from every line that followed it in the paragraph (see the tutorial <a href="http://www.w3.org/International/tutorials/bidi-xhtml/#Slide0090" shape="rect">Creating HTML Pages in Arabic & Hebrew</a> for an example).</p><p>Note, too, that if you have in-line markup, such as emphasis or link text, that spans more than one line, you will need to mark up the text runs on both lines separately. Again, adding text before such markup in a paragraph would mean that you have to carefully change this markup to reflect the new position of the text.</p>
<p>The
result is very fragile code that is difficult to maintain. In addition, all the extra tags needed to manage the text would bloat your code and impact not only authoring time, but also bandwidth. Visually ordered bidirectional HTML does not conform to the HTML specification unless <span class="kw">bdo</span> markup is used.</p>
<p><span class="new-term">Logical ordering.</span> Using logical ordering, on the other hand, makes it almost trivial to create long paragraphs of flowing text that automatically wraps to the width of the block element. It also makes it much easier to address accessibility, using such things as screen readers.</p>
<p>L<span class="qterm">ogically ordered</span> text is stored in memory in the order in which it would normally be
typed (and usually pronounced). The Unicode <a class="termref" href="#term-bidialgorithm" shape="rect">bidirectional algorithm</a> is then applied by the browser to render the
correct visual display.</p>
<div class="note"><span class="note-head">Note: </span>Visual ordering isn't really seen much for Arabic. Since the Arabic letters are all joined up there was a stronger motivation on the part of Arabic implementers to enable the logical ordering approach.</div>
<p><strong>Character encoding considerations.</strong> Certain character encodings are associated with visual vs. logical ordering of text. Text in a Unicode encoding, such as UTF-8, is always logical. Unicode is generally the best choice for a character encoding, but if you wish to use an ISO code page, you should read <a href="#ri20030726.132037950">6.2 Use bidi markup only when necessary</a>.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030112.21380914" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="tech-iso-encoding" name="tech-iso-encoding" href="#tech-iso-encoding" shape="rect">5.4 <span class="bptitle">Use the right ISO encoding</span></a></div>
<div class="rule">If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8.</div>
<div class="description">
<div class="infotype">How to</div>
<p> It is usually best to use an Unicode encoding, such as UTF-8. If, for some reason,
you choose to serve your Hebrew page in an ISO encoding instead, then declare the encoding to be ISO-8859-8-i, not ISO-8859-8, and create and store your Hebrew content in <a class="termref" href="#term-logicalorder" shape="rect">logical order</a>, not the order you expect to see it displayed.</p>
<div class="infotype">Discussion</div>
<p> Certain character encodings are associated with visual vs. logical ordering of text. Text in a Unicode encoding, such as UTF-8, is always logical.</p>
<p> According to RFC1555 and RFC1556, there are special conventions for the use of charset parameter values to
indicate bidirectional treatment in MIME mail, in particular to distinguish between visual, implicit, and explicit
directionality. 'Visual' refers to the practice of storing Hebrew characters in presentation order, so that there is no reliance on reordering performed by the operating system or the display subsystem. '<span class="qterm">Implicit</span>' is also called logical ordering, and refers to an approach where all characters are stored in memory in the order in which they would normally be typed. Correct ordering for display is then done by a special
algorithm (this is the preferred approach). '<span class="qterm">Explicit</span>' refers to the use of explicit markers in the text
to indicate directional changes.</p>
<p> The charset parameter value <code class="kw">ISO-8859-8</code> for Hebrew denotes visual ordering, <code class="kw">ISO-8859-8-i</code> denotes implicit bidirectionality, and <code class="kw">ISO-8859-8-e</code> denotes explicit directionality. (The latter is not supported by any common browser.)</p>
<p>HTML assumes by default that bidi data is stored in logical order, and that rendering agents will have to use the Unicode Bidirectional Algorithm to present the text in correct visual order. If the encoding is ISO-8859-8, the corresponding charset specification must be ISO-8859-8-i. </p>
<p> Explicit directional control is also possible with HTML, but cannot be expressed with
ISO 8859-8, so "ISO-8859-8-e" should not be used.</p>
<p>Note, also, that ISO encodings don't include diacritics - if you want these, then use a logical encoding such as a Unicode encoding or Windows-1255.</p>
<p>By the way, contrary to what is said in RFC1555 and RFC1556, ISO-8859-6 (Arabic) does not imply visual ordering.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#tech-iso-encoding" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030728.092130948" name="ri20030728.092130948" href="#ri20030728.092130948" shape="rect">5.5 <span class="bptitle">Don't use CSS styling for direction in HTML</span></a></div>
<div class="rule">Do not use CSS styling to control directionality in HTML. Use markup.</div><div class="description">
<div class="infotype">How to</div>
<p> Just use the <code class="kw">dir</code> attribute when you need to indicate direction, and don't use CSS properties.</p>
<div class="infotype">Discussion</div>
<p> It is possible to express direction for a range of text using the CSS <code class="kw">direction</code> and <code class="kw">unicode-bidi</code> properties. Even when CSS is used, however, because directionality is an integral part of the document structure and needs to be persistent, you should always use dedicated markup to set
the <a class="termref" href="#term-basedirection" shape="rect">base direction</a> for a document or chunk of information, or to indicate places in the text where the Unicode <a class="termref" href="#term-bidialgorithm" shape="rect">bidi algorithm</a> is insufficient to achieve desired inline directionality.</p>
<p>The way in which a browser is expected to handle the <code class="kw">dir</code> attribute and its values is clearly defined in the HTML specification, so CSS is not needed. The CSS2 specification also recommends the use of markup for bidi text in HTML. In fact it goes as far as to say
that conforming HTML user agents may ignore CSS bidi properties, since the HTML specification clearly defines
the expected behavior of user agents with respect to the bidi markup.</p>
<p>Although XHTML uses XML syntax, it is usually served to browsers using the text/html MIME type, ie. the browser recognizes it and treats it as HTML. Therefore the same principle applies: use the markup, don't use CSS for direction.</p>
<p>See the article
<a href="http://www.w3.org/International/questions/qa-bidi-css-markup" shape="rect">CSS vs. markup for bidi
support</a> for a fuller explanation.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030728.092130948" class="uanotesref">Get more information >></a></div>
</div>
</div>
</div>
<div class="div1">
<h2><a name="ri20030728.09474480" id="ri20030728.09474480" shape="rect">6 Changing direction on block elements</a></h2>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030726.114013437" name="ri20030726.114013437" href="#ri20030726.114013437" shape="rect">6.1 <span class="bptitle">Use the dir attribute on block elements.</span></a></div>
<div class="rule">Add the <code class="kw">dir</code> attribute to a block element to change
base direction. Don't use CSS or Unicode control characters.</div><div class="description">
<div class="infotype">How to</div>
<p> Add the <code class="kw">dir</code> attribute to a <a class="termref" href="#term-blockelement" shape="rect">block element</a> where you want to change the <a class="termref" href="#term-basedirection" shape="rect">base direction</a>. <a href="#ri20060622.175155357">Example 10</a> shows how you might mark up a <code class="kw">blockquote</code> element to render a left-aligned English quotation in a right-to-left page.</p>
<div class="exampleOuter"><div class="exampleHeader"><a name="ri20060622.175155357" id="ri20060622.175155357">Example 10: Switching a block to left-to-right on a right-to-left page.</a></div>
<p><code><blockquote dir="ltr" lang="en" <br clear="none" />
cite="http://www.example.org/romeoandjuliet#2.2.2"><br clear="none" />
<p>But, soft! What light through yonder window breaks?<br><br clear="none" />
It is the east, and Juliet is the sun.</p><br clear="none" />
</blockquote></code></p></div>
<p>Do not try to achieve the same effect using CSS or Unicode control characters.</p>
<p>Note also that you should only use the <code class="kw">dir</code> attribute on block elements when you need to change the base direction from the current default (see <a href="#ri20030726.132037950">6.2 Use bidi markup only when necessary</a>).</p>
<p>Tables are slightly different from other block elements. Using a <code class="kw">dir</code> attribute directly on a <code class="kw">table</code> tag will reorder the columns and contents as expected, but will not cause the table to move to the other side of the displayed page. If you want that to happen, you should put the table in a block element, such as <code class="kw">div</code>, and add the <code class="kw">dir</code> attribute to that, rather than put the attribute on the <code class="kw">table</code> element.</p>
<div class="infotype">Discussion</div>
<p>Apart from the fact that it can be difficult to manage Unicode control characters because they are invisible, they don't really work for managing base direction across block elements because of questions of scoping and inheritance.</p>
<p>CSS is not needed for bidi support in HTML, and it is best to rely on the dedicated markup that HTML provides, with all needed behavior built in (see <a href="#ri20030728.092130948">5.5 Don't use CSS styling for direction in HTML</a>) </p>
<p> For more information see the tutorial <a href="http://www.w3.org/International/tutorials/bidi-xhtml/#Slide0150" shape="rect">Creating HTML Pages in Arabic & Hebrew</a>.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030726.114013437" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030726.132037950" name="ri20030726.132037950" href="#ri20030726.132037950" shape="rect">6.2 <span class="bptitle">Use bidi markup only when necessary</span></a></div>
<div class="rule">Only use bidi markup to set the base direction for the document as a whole, or where you need to <em>change</em> the base direction.</div><div class="description">
<div class="infotype">How to</div>
<p> Once you have established the appropriate <a class="termref" href="#term-basedirection" shape="rect">base direction</a> for the <code class="kw">html</code> element you will only need to
apply bidi markup to another element if you want that element's base direction to be <em>different</em> from that currently in force.</p>
<p>The same
principle applies for inline markup. Do not use inline bidi markup unless the Unicode <a class="termref" href="#term-bidialgorithm" shape="rect">bidi algorithm</a> is insufficient on its
own to produce the expected results.</p>
<div class="infotype">Discussion</div>
<p>The following Arabic example shows bad usage. None of the <span class="kw"><code class="kw">dir</code></span> attributes are needed if <code>dir="rtl"</code> is added to the <code class="kw">html</code> element. Removing them will significantly simplify the document
and reduce bandwidth requirements.</p><!--div class="exampleOuter">
<div class="exampleHeader"><a name="d2e1250" id="d2e1250"><strong>Bad practice. Do not copy!</strong></a> Directional markup used far too often in a document.</div><p><code><h2 dir="rtl">القاموس</h2></code></p><p><code><dl></code></p><p><code><dt dir="rtl">المنالية</dt></code></p><p><code><dd dir="rtl">سهولة منال للويب من قبل الجميع بصرف النّظر عن إعاقةهم . </dd></code></p><p><code><dt dir="rtl">برنامج التصديق</dt></code></p><p><code><dd dir="rtl"></code></p><p dir="rtl"><code>أو "الفاليديتور" أداة للتّحقّق من صلاحيّة صفحة ويب. على سبيل المثال، للتّحقّق من صلاحيّة
</code></p><p dir="rtl"><code><span dir="ltr">HTML</span> ، يمكن أن تستخدم بزنامج تصديق</code></p><p dir="rtl"><code><span dir="ltr">W3C</span></code></p><p><code></dd></code></p><p><code><dt dir="rtl">التّدويل</dt></code></p><p><code><dd dir="rtl"></code></p><p dir="rtl"><code>تدويل الويب يسمح و يجعله سهل لاستخدام موقعك باللّغات و السّيناريوهات و الثّقافات
المختلفة.</code></p><p><code></dd></code></p><p><code></dl></code></p></div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="d2e1250" id="d2e1250">Example 11: <strong>[Bad practice. Do not copy!]</strong></a> Directional markup used far too often in a document.</div>
<blockquote>
<p class="code"><h2 dir="rtl"></p>
<p class="code" dir="rtl">القاموس</p>
<p class="code"></h2></p>
<p class="code"><dl></p>
<p class="code"><dt dir="rtl"></p>
<p class="code" dir="rtl">المنالية</p>
<p class="code"></dt></p>
<p class="code"><dd dir="rtl"></p>
<p class="code" dir="rtl">سهولة منال للويب من قبل الجميع بصرف النّظر عن إعاقةهم.</p>
<p class="code"></dd></p>
<p class="code"><dt dir="rtl"></p>
<p class="code" dir="rtl">برنامج التصديق</p>
<p class="code"></dt></p>
<p class="code"><dd dir="rtl"></p>
<p class="code" dir="rtl">أو "الفاليديتور" أداة للتّحقّق من صلاحيّة صفحة ويب. على سبيل المثال، للتّحقّق من صلاحيّة </p>
<p class="code" dir="rtl"><span dir="ltr">HTML</span>، يمكن أن تستخدم بزنامج تصديق</p>
<p class="code" dir="rtl"><span dir="ltr">W3C</span></p>
<p class="code"></dd></p>
<p class="code"><dt dir="rtl"></p>
<p class="code" dir="rtl">التّدويل</p>
<p class="code"></dt></p>
<p class="code"><dd dir="rtl"></p>
<p class="code" dir="rtl">تدويل الويب يسمح و يجعله سهل لاستخدام موقعك باللّغات و السّيناريوهات و الثّقافات
المختلفة.</p>
<p class="code"></dd></p>
<p class="code"></dl></p>
</blockquote>
</div>
<p>The <a class="termref" href="#term-blockelement" shape="rect">block elements</a> inherit their direction from that set on the <code class="kw">html</code> element, or the previous parent element where a change was made. The inline 'HTML' and 'CSS' words in <a href="#d2e1250">Example 11</a> do not need markup because the bidi algorithm can produce the right result automatically.</p>
<p>Occasionally the Unicode bidirectional algorithm is not sufficient to correctly order certain inline sequences of bidirectional
text. Alternatively, you may want to override the effects of the bidirectional algorithm for a part of the page. In
these cases you can apply additional markup to produce the ordering you want. These scenarios are discussed in other techniques in this document.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030726.132037950" class="uanotesref">Get more information >></a></div>
</div>
</div>
</div>
<div class="div1">
<h2><a name="ri20030218.135304584" id="ri20030218.135304584" shape="rect">7 Mixing text direction inline</a></h2>
<p>There are three main scenarios that cause problems when dealing with bidirectional inline text. These are:</p>
<ul><li>embedded bidirectional runs of text that need to be ordered differently from the current base direction</li><li>neutral or weak characters or objects that are on the edge of a directional run with different directionality to the current base direction</li><li>adjacent, same-direction directional runs with directionality different from the current base direction.</li></ul>
<p>We address these scenarios here with proposals for solutions.</p>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030112.214820604" name="ri20030112.214820604" href="#ri20030112.214820604" shape="rect">7.1 <span class="bptitle">Use the dir attribute for inline nested segments</span></a></div>
<div class="rule">When you have bidirectional text nested in inline text of a different direction, and markup can be used, use the <code class="kw">dir</code> attribute to make the text display correctly. Otherwise, use RLE/LRE and PDF control characters to create an embedded base direction.</div>
<div class="description">
<div class="infotype">How to</div>
<p> Add the <code class="kw">dir</code> attribute to an element surrounding the embedded text. If there is no element surrounding the text, use a <code class="kw">span</code> element. Set the value of the <code class="kw">dir</code> attribute to either <code class="kw">ltr</code> or <code class="kw">rtl</code>, depending on the <a class="termref" href="#term-basedirection" shape="rect">base direction</a> of the embedded text, as shown in the examples below.</p>
<p>If it is not possible to use markup, eg. in the <code class="kw">title</code> element or attribute values, you will need to use Unicode control characters. </p>
<p>For examples and more detailed explanations see the discussion that follows.</p>
<div class="infotype">Discussion</div>
<p>This technique is useful where nested, inline text, such as a quotation, is bidirectional. At a simple level the Unicode <a class="termref" href="#term-bidialgorithm" shape="rect">bidirectional algorithm</a> takes care of the reordering of <a class="termref" href="#term-inlinetext" shape="rect">inline text</a>, but where
nested text is bidirectional you need to set up an embedding level, ie. indicate a range of text for which a different base direction will be applied.</p>
<p> This can be done using markup around the relevant text, or by adding Unicode control characters to the text. It is recommended that markup be used in preference to control characters because the latter are difficult to manage well, given that they are invisible.</p>
<p> You need to be familiar with the concepts in the article <a href="/International/articles/inline-bidi-markup/" shape="rect">What you need to know about the bidi algorithm and
inline markup</a> to understand this technique.</p>
<p><strong class="leadin">Using markup.</strong> <a href="#nestedmarkup">Example 12</a> shows a sentence that, because we rely solely on the bidirectional algorithm, is incorrectly ordered, but that can be fixed with markup.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="nestedmarkup" id="nestedmarkup">Example 12: Applying markup to solve incorrect rendering of nested text.</a></div>
<p>If we rely solely on the bidirectional algorithm, the text 'W3C' in the sentence below will appear in the wrong place. It is part of the quotation and should appear after, ie. to the left of, the Hebrew text, and the comma should be just to its right.</p>
<p><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EThe%20title%20is%20%22%D7%A4%D7%A2%D7%99%D7%9C%D7%95%D7%AA%20%D7%94%D7%91%D7%99%D7%A0%D7%90%D7%95%D7%9D%2C%20W3C%22%20in%20Hebrew.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/nesting-wrong.png" alt="Incorrectly ordered text, because no embedding." /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> the title is "YTIVITCA NOITAZILANOITANRETNI, w3c" in hebrew.</blockquote>
<p>This is how the sentence should look.</p>
<p><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EThe%20title%20is%20%22%3Cspan%20dir%3D%22rtl%22%3E%D7%A4%D7%A2%D7%99%D7%9C%D7%95%D7%AA%20%D7%94%D7%91%D7%99%D7%A0%D7%90%D7%95%D7%9D%2C%20W3C%3C%2Fspan%3E%22%20in%20Hebrew.%20%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/nesting-right.png" alt="Correctly ordered text via embedding." /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> the title is "w3c ,YTIVITCA NOITAZILANOITANRETNI" in hebrew.</blockquote>
<p>Here is the markup that would produce it.</p>
<blockquote><code><p>The title is "<span dir="rtl" lang="he"><span dir="rtl">...</span></span>" in Hebrew</p></code></blockquote>
<p>The markup sets up a new base direction for the embedded text. This RTL base direction causes the directional runs in the embedded text to proceed from right to left, and makes the comma between the two different directional runs become RTL-typed.</p>
</div>
<p>It is possible that the embedded text is not surrounded by markup, and you may therefore need to add it, but note that the quotation here was already surrounded by markup. A <code class="kw">span</code> was used in order to label the language of the quotation. This is likely to be a common occurrence. In addition to marking up for language, quotations may be marked up with such things as a <code class="kw">span</code> or <code class="kw">q</code> element for styling or semantic properties. Given that the boundary of the quotation is already clearly marked, adding the <code class="kw">dir</code> attribute is simple and quick.</p>
<p>Note also, by the way, that we placed the <code class="kw">span</code> element <em>inside</em> the quotation marks, since these are a part of the English
text.</p>
<p><strong class="leadin">Using control characters.</strong> Where markup is not available, such as in a <code class="kw">title</code> attribute value or an <code class="kw">option</code> element, you will have to use Unicode control characters to demarcate the required range of text and assign a base direction to it. </p>
<p>To mark the beginning of the embedded section you use one of U+202B <span class="uname">RIGHT-TO-LEFT EMBEDDING (RLE)</span> or U+202A <span class="uname">LEFT-TO-RIGHT EMBEDDING (LRE)</span> to set the base direction. This corresponds to the markup <code><span dir="rtl"></code> or <code><span dir="ltr"></code>, respectively. At the other end of the embedded section is U+202C <span class="uname">POP DIRECTIONAL FORMATTING (PDF)</span>. This corresponds to <code></span></code> in markup terms.</p>
<p>These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section <a class="section-ref" href="#ri20060623.095429759" shape="rect">Adding escapes to the content</a>.) </p>
<p><a href="#nestedcontrols">Example 13</a> shows how.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="nestedcontrols" id="nestedcontrols">Example 13: Applying Unicode control characters to solve incorrect rendering of nested text.</a></div>
<p>Here is the text we plan to use in a page description, rendered incorrectly because we are relying only on the bidi algorithm heuristics.</p>
<p xml:lang="he" lang="he"><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EReport%20on%20the%20%D9%84%D8%BA%D8%A9%20XML%20%D9%84%D9%87%D8%A7%20%D8%B9%D8%B4%D8%B1%D8%A9%20%D8%B3%D9%86%D9%88%D8%A7%D8%AA%21%20event.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/xmlat10-wrong.png" alt="Hebrew for 'Report on the XML at 10! event.'" /></a></p>
<p>This is how the sentence should look.</p>
<p xml:lang="he" lang="he"><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EReport%20on%20the%20%3Cspan%20lang%3D%22ar%22%20dir%3D%22rtl%22%3E%D9%84%D8%BA%D8%A9%20XML%20%D9%84%D9%87%D8%A7%20%D8%B9%D8%B4%D8%B1%D8%A9%20%D8%B3%D9%86%D9%88%D8%A7%D8%AA%21%3C%2Fspan%3E%20event.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/xmlat10-right.png" alt="Hebrew for 'Report on the XML at 10! event.'" /></a></p>
<p>Here is the markup that would produce it.</p>
<blockquote><code><meta name="description" content="Report on the &#x202B;...&#x202C; event." /></code></blockquote>
<p>The RLE character sets up a new RTL base direction for the embedded text. This RTL base direction causes the directional runs to all proceed from right to left. The limit of the embedded text is indicated using a PDF character. We used numeric character references for the source text so that you can see what we did.</p>
</div>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030112.214820604" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030726.140315918" name="ri20030726.140315918" href="#ri20030726.140315918" shape="rect">7.2 <span class="bptitle">Weak/neutral characters at the edge of a directional run</span></a></div>
<div class="rule">When weak or neutral characters or objects appear at the wrong side of a directional run, fix it using <code class="kw">dir</code> if there is markup already in place, or use an RLM/LRM.</div>
<div class="description">
<div class="infotype">How to</div>
<p>If the <a class="termref" href="#term-directionalrun" shape="rect">directional run</a> is surrounded by markup, you can simply add the <code class="kw">dir</code> attribute to the element surrounding it. </p>
<p>If not, rather than use the <a class="termref" href="#term-rle" shape="rect">RLE</a>/<a class="termref" href="#term-lre" shape="rect">LRE</a> plus <a class="termref" href="#term-pdf" shape="rect">PDF</a> controls to create embedded text, place U+200F <span class="uname">RIGHT-TO-LEFT
MARK</span> (RLM) or U+200E <span class="uname">LEFT-TO-RIGHT
MARK</span> (LRM) alongside misplaced characters to produce the desired result.</p>
<p>The RLM/LRM characters can be added as either characters or as escapes. (But see the issues associated with escapes in the section <a class="section-ref" href="#ri20060623.095429759" shape="rect">Adding escapes to the content</a>.)</p>
<div class="note"><span class="note-head">Note: </span>Although we talk in terms of characters in this technique, the same principles apply to objects such as checkboxes, images, radio buttons, etc, since they are treated in the same way as neutral characters.</div>
<p>For examples and more detailed explanations see the discussion that follows.</p>
<div class="infotype">Discussion</div>
<p> You need to be familiar with the concepts in the article <a href="/International/articles/inline-bidi-markup/" shape="rect">What you need to know about the bidi algorithm and
inline markup</a> to understand this technique.</p>
<p>Weakly-typed or neutral characters between different directional runs take on the directionality of the <a class="termref" href="#term-basedirection" shape="rect">base direction</a>. This can be an issue if the character in question is part of, but on the edge of, a directional run which has a different direction from the current base direction.</p>
<p>You can deal with misplaced characters by either providing a different base direction, or by making sure the problematic character is followed by an appropriate strongly-typed character. <a href="#exedgemarkup">Example 14</a> illustrates the problem and both of these solutions. </p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="exedgemarkup" id="exedgemarkup">Example 14: Fixing misplaced neutral/weak characters at the edge of a directional run using markup or RLM/LRM.</a></div>
<p>In the example text immediately below, the exclamation mark is part of the Arabic phrase and should have appeared to its left. It appears to the right because it falls, in memory, between an Arabic and Latin character and the overall paragraph direction is LTR. It is therefore treated as part of the English text (as is the adjacent quotation mark).</p>
<p xml:lang="he" lang="he"><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EThe%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8%21%22.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/keytowebstandards-wrong.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> the title is "SDRADNATS BEW OT YEK EHT!".</blockquote>
<p>This is what we should have seen.</p>
<p xml:lang="he" lang="he"><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EThe%20title%20is%20%22%3Cspan%20lang%3D%22ar%22%20dir%3D%22rtl%22%3E%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8%21%3C%2Fspan%3E%22%20.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/keytowebstandards-right.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> the title is "!SDRADNATS BEW OT YEK EHT".</blockquote>
<p>An easy way to fix this is to insert the Unicode character U+200F, called the <span class="uname">RIGHT-TO-LEFT
MARK</span> (RLM), after the exclamation mark. Now with two strong RTL characters on either side, the exclamation mark too will be treated as part of the
RTL directional run and we will get the correct result. Here's the markup that would produce it. We use a numeric character reference so that you can see the character.</p>
<blockquote><code><p>The title is "<span dir="rtl">...</span>&#x200F".</p></code></blockquote>
<p>Note, however, that in a case such as this you are likely to have markup in place around the Arabic text to identify its semantics, assign a language tag or apply appropriate styling. If that is the case, it is equally simple to just add a <code class="kw">dir</code> attribute to the existing markup, as shown here.</p>
<blockquote><code><p>The title is "<cite lang="ar" dir="rtl"><span dir="rtl">...</span></cite>".</p></code></blockquote>
</div>
<div class="note"><span class="note-head">Note: </span>The use of a RLM/LRM character only works in the simple cases like <a href="#exedgemarkup">Example 14</a> where the embedded text in the sentence is a single directional run. If it contains bidirectional elements, you will need to apply the approach outlined in<a href="#ri20030112.214820604">7.1 Use the dir attribute for inline nested segments</a>.</div>
<p>Although our base text for <a href="#exedgemarkup">Example 14</a> was in Latin script, you are more likely to encounter this kind of problem in an Arabic paragraph that included English text followed by punctuation. In that case you would use <span class="uname">U+200E <span class="uname">LEFT-TO-RIGHT
MARK</span> (LRM)</span> to address the problem. Here is an example.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ex-fixingneutrals" id="ex-fixingneutrals">Example 15: Fixing misplaced neutral/weak characters using markup or RLM/LRM when the base direction is RTL.</a></div>
<p>In the example text immediately below, the parenthesis to the left is part of the English phrase and should have appeared to its right. It appears to the left because it falls, in memory, between a Latin and Arabic character and the overall paragraph direction is RTL. It is therefore treated as part of the Arabic text. Note: the shape of the parenthesis is irrelevant, since it is a mirrored character.</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?htmldir=rtl&fontsize=100%25&background=fff8dd&markup=%3Cp%3E%D7%A8%D7%90%D7%95%20Web%20Content%20Accessibility%20Guidelines%20%28WCAG%29%20%D7%9C%D7%94%D7%A7%D7%93%D7%9E%D7%94%20%D7%95%D7%A7%D7%99%D7%A9%D7%95%D7%A8%D7%99%D7%9D%20%D7%9C%D7%97%D7%95%D7%9E%D7%A8%20%D7%98%D7%9B%D7%A0%D7%99%20%D7%95%D7%9C%D7%97%D7%95%D7%9E%D7%A8%D7%99%20%D7%94%D7%93%D7%A8%D7%9B%D7%94%20%D7%A9%D7%9C%20WCAG.%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/wcag-wrong.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote style="text-align:right;"><span class="visualascii">Visual ASCII version:</span> DNA NOITCUDORTNI NA ROF (web content accessibility guidelines (wcag EES<br clear="none" />
.wcag ROF LAIRETAM LANOITACUDE DNA LACINHCET OT SKNIL</blockquote>
<p>This is what we should have seen. </p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?htmldir=rtl&fontsize=100%25&background=fff8dd&markup=%3Cp%3E%D7%A8%D7%90%D7%95%20Web%20Content%20Accessibility%20Guidelines%20%28WCAG%29%E2%80%8E%20%D7%9C%D7%94%D7%A7%D7%93%D7%9E%D7%94%20%D7%95%D7%A7%D7%99%D7%A9%D7%95%D7%A8%D7%99%D7%9D%20%D7%9C%D7%97%D7%95%D7%9E%D7%A8%20%D7%98%D7%9B%D7%A0%D7%99%20%D7%95%D7%9C%D7%97%D7%95%D7%9E%D7%A8%D7%99%20%D7%94%D7%93%D7%A8%D7%9B%D7%94%20%D7%A9%D7%9C%20WCAG.%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/wcag-right.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote style="text-align:right;"><span class="visualascii">Visual ASCII version:</span> DNA NOITCUDORTNI NA ROF web content accessibility guidelines (wcag) EES<br clear="none" />
.wcag ROF LAIRETAM LANOITACUDE DNA LACINHCET OT SKNIL</blockquote>
<p>The easy way to fix this is to insert the Unicode character U+200E, called the <span class="uname">LEFT-TO-RIGHT
MARK</span> (LRM), after the parenthesis that is in the wrong place. Now with two strong LTR characters on either side, the parenthesis will be treated as part of the
LTR directional run and we will get the correct result. </p>
<p>If, however, the text "Web Content Accessibility Guidelines (WCAG)" is surrounded by markup, it is equally simple and effective to just add a <code class="kw">dir</code> attribute to the existing markup, rather than insert the character.</p>
</div>
<p>Here is a slightly different looking example, that turns out to be the same problem. There is a major issue with this example because it is not obvious that a mistake has been made. </p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ex-macaddress" id="ex-macaddress">Example 16: Fixing a MAC address.</a>
</div><p>The first part of this MAC number has been moved to the right. This is because the characters between the Hebrew text and the 'aa' are all neutral or weak, and so they take on the base direction, and are associated with the Hebrew directional run.</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?htmldir=rtl&fontsize=130%25&background=fff8dd&markup=%3Cp%3E%D7%9B%D7%AA%D7%95%D7%91%D7%AA%2001%3A02%3Aaa%3A04%3Abb%3A06%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/mac-wrong.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote style="text-align:right;"><span class="visualascii">Visual ASCII version:</span> aa:04:bb:06:01:02 REBMUN</blockquote>
<p>This is what we should have seen.</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?htmldir=rtl&fontsize=130%25&background=fff8dd&markup=%3Cp%3E%D7%9B%D7%AA%D7%95%D7%91%D7%AA%20%E2%80%8E01%3A02%3Aaa%3A04%3Abb%3A06%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/mac-right.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote style="text-align:right;"><span class="visualascii">Visual ASCII version:</span> 01:02:aa:04:bb:06 REBMUN</blockquote>
<p>Again, we can produce the right ordering just by putting an LRM character immediately before the start of the number. This puts the initial digits and colons between two strong LTR characters, which associates them with the rest of the number.</p>
</div>
<p>Similar results can be obtained for telephone numbers with certain separators.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030726.140315918" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030728.072236229" name="ri20030728.072236229" href="#ri20030728.072236229" shape="rect">7.3 <span class="bptitle">Adjacent, same-direction directional runs</span></a></div>
<div class="rule">When adjacent but separate directional runs with the same directionality are rendered in the wrong order, use RLM/LRM.</div><div class="description">
<div class="infotype">How to</div>
<p> If the <a class="termref" href="#term-basedirection" shape="rect">base direction</a> is right-to-left, place an RLM character (U+200F <span class="uname">RIGHT-TO-LEFT
MARK</span>) between the <a class="termref" href="#term-directionalrun" shape="rect">directional runs</a> to produce the desired result. Otherwise use a LRM mark (U+200E <span class="uname">LEFT-TO-RIGHT MARK</span>).</p>
<p>These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section <a class="section-ref" href="#ri20060623.095429759" shape="rect">Adding escapes to the content</a>.)</p><p>Note that the <code class="kw">dir</code> attribute is not appropriate to resolve this
case.</p>
<div class="infotype">Discussion</div>
<p> This technique is relevant when you have a list or sequence of items in text that includes more than one adjacent items with the same directionality, but a directionality that is different to the current base direction.</p>
<p>It will be easiest to describe this using some examples.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ri20060711.121727643" id="ri20060711.121727643">Example 17: A sequence of items that is incorrectly ordered (1).</a></div>
<p>In the sentence that follows the first and second Arabic words in the list of states are in the wrong order, and the comma is misplaced.</p>
<p xml:lang="he" lang="he"><a href="samples/html.php?fontsize=100%25&background=fff8dd&markup=The%20names%20of%20these%20states%20in%20Arabic%20are%20%D9%85%D8%B5%D8%B1%2C%20%D8%A7%D9%84%D8%A8%D8%AD%D8%B1%D9%8A%D9%86%20and%20%D8%A7%D9%84%D9%83%D9%88%D9%8A%D8%AA%20respectively." target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/misorderedlist1.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> the names of these states in arabic are NIARHAB ,TPYGE and TIAWUK respectively.</blockquote>
<p>This is because the first two Arabic words have the same direction, are only separated by neutral and weak characters, which adopt the directionality of the surrounding characters, and therefore constitute a single directional run (right to left). The text should look like this.</p>
<p xml:lang="he" lang="he"><a href="samples/html.php?fontsize=100%25&background=fff8dd&markup=The%20names%20of%20these%20states%20in%20Arabic%20are%20%D9%85%D8%B5%D8%B1%2C%E2%80%8E%20%D8%A7%D9%84%D8%A8%D8%AD%D8%B1%D9%8A%D9%86%20and%20%D8%A7%D9%84%D9%83%D9%88%D9%8A%D8%AA%20respectively." target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/misorderedlist2.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> the names of these states in arabic are TPYGE, NIARHAB and TIAWUK respectively.</blockquote>
<p>To achieve the desired effect, we need to break the directional run by adding a strongly-typed left-to-right character between the two words. This has the additional effect of changing the directionality of the comma to LTR, since it is now between two characters of differing directionality. The character we added is an invisible LRM. Here is the markup.</p>
<blockquote><code><p>The names of these states in Arabic are ...,&#x200E; <span dir="rtl">...</span> and ... respectively.</p></code></blockquote>
</div>
<p><a href="#ri20060711.121727643">Example 17</a> shows a case that occurs only rarely in English. Because of the likelihood of Latin text showing up in languages written with the Arabic or Hebrew scripts, this situation is more common when writing in those languages. <a href="#exAdjacentSameDir">Example 18</a> shows a typical case.</p><div class="exampleOuter">
<div class="exampleHeader"><a name="exAdjacentSameDir" id="exAdjacentSameDir">Example 18: A sequence of items that is incorrectly ordered (2).</a></div><p>In the next, right-to-left, sentence the acronym and the following number are incorrectly ordered, and the neutral parenthesis adds to the confusion.</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?fontsize=120%25&background=fff8dd&markup=%3Cp%20dir%3D%22rtl%22%3E%D7%94%D7%A0%D7%97%D7%99%D7%95%D7%AA%20%D7%9C%D7%94%D7%A0%D7%92%D7%A9%D7%AA%20%D7%AA%D7%9B%D7%A0%D7%99%20%D7%90%D7%AA%D7%A8%D7%99%20%D7%90%D7%99%D7%A0%D7%98%D7%A8%D7%A0%D7%98%20%28WCAG%29%202.0%3Cp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/misorderedlist3.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> wcag) 2.0) SENILEDIUG YTILIBISSECCA TNETNOC BEW</blockquote>
<p>This is what we expected to see.</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?fontsize=120%25&background=fff8dd&markup=%3Cp%20dir%3D%22rtl%22%3E%D7%94%D7%A0%D7%97%D7%99%D7%95%D7%AA%20%D7%9C%D7%94%D7%A0%D7%92%D7%A9%D7%AA%20%D7%AA%D7%9B%D7%A0%D7%99%20%D7%90%D7%AA%D7%A8%D7%99%20%D7%90%D7%99%D7%A0%D7%98%D7%A8%D7%A0%D7%98%20%28WCAG%29%E2%80%8F%202.0%3Cp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/misorderedlist4.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> 2.0 (wcag) SENILEDIUG YTILIBISSECCA TNETNOC BEW</blockquote>
<p>The problem is caused by the assumption on the part of the bidi algorithm that WCAG and 2.0 are part of the same directional run, whereas in reality the 2.0 is related to the Hebrew title, rather than the acronym (which is what should be in parentheses). To solve the problem we add a RLM character between the two, to break this into two directional runs which then get ordered right to left.</p>
</div>
<p>The same problem can occur when, say, a Persian sentence ends in an English word followed by a period, and then the next sentence starts with an English word. To avoid the two words being swapped around you need to put an RLM after the end of the first sentence.</p>
<p>This same issue also applies to sequences of items such as checkbox or radio button labels or other lists of items on a page. See <a href="#ri20060711.125210188">Example 19</a> for an illustration.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ri20060711.125210188" id="ri20060711.125210188">Example 19: A sequence of items that is incorrectly ordered (3).</a></div>
<p>In this checklist of languages the author had intended English to be nearer the beginning of the sentence than French (ie. further to the right), but the checkbox is treated as a directionally neutral item in the text, and so the English and French items are treated as a single directional run. This leads to them being displayed the wrong way round. To add to the confusion, it looks as if Arabic and English have been selected by the user, whereas in fact the user has selected Arabic and French!</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?htmldir=rtl&fontsize=120%25&background=fff8dd&markup=%3Cp%3E%D9%84%D8%BA%D8%AA%D9%8A%3A%20%3Cinput%20type%3D%22checkbox%22%20checked%3D%22checked%22%20%2F%3E%20%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9%20%3Cinput%20type%3D%22checkbox%22%20%2F%3E%09English%20%3Cinput%20type%3D%22checkbox%22%20checked%3D%22checked%22%20%2F%3E%20Fran%C3%A7ais%20%3Cinput%20type%3D%22checkbox%22%20%2F%3E%20%D9%81%D8%A7%D8%B1%D8%B3%DB%8C%20%3Cinput%20type%3D%22checkbox%22%20%2F%3E%20%D8%A7%D8%B1%D8%AF%D9%88%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/misorderedlist5.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> UDRU NAISREP english français CIBARA :SEGAUGNAL</blockquote>
<p>The solution here is the same as before. Simply add an RLM character after the English label, and you get the following.</p>
<p xml:lang="he" lang="he" style="text-align:right;"><a href="samples/html.php?htmldir=rtl&fontsize=120%25&background=fff8dd&markup=%3Cp%3E%D9%84%D8%BA%D8%AA%D9%8A%3A%20%3Cinput%20type%3D%22checkbox%22%20checked%3D%22checked%22%20%2F%3E%20%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9%20%3Cinput%20type%3D%22checkbox%22%20%2F%3E%09English%E2%80%8F%20%3Cinput%20type%3D%22checkbox%22%20checked%3D%22checked%22%20%2F%3E%20Fran%C3%A7ais%20%3Cinput%20type%3D%22checkbox%22%20%2F%3E%20%D9%81%D8%A7%D8%B1%D8%B3%DB%8C%20%3Cinput%20type%3D%22checkbox%22%20%2F%3E%20%D8%A7%D8%B1%D8%AF%D9%88%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/misorderedlist6.png" alt="Hebrew for 'Leading the Web to its full potential...'" /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> UDRU NAISREP français english CIBARA :SEGAUGNAL</blockquote></div>
<p>There are a number of scenarios where you need to look out for this issue. For example, it is common to create navigation bars from items listed in a <code class="kw">ul</code> element that are then rendered as inline items using CSS <code>display: inline</code>. For this to work you will need to end relevant <code class="kw">li</code> content with a RLM or LRM (depending on the base direction of the context in which the items will be displayed). </p>
<p>The same applies to lists that are created at run time using scripting, such as that shown at the top left of the page in <a href="#ri20070601.191404500">Example 20</a>, where the language links are automatically generated based on information about which language versions are supported for that page. In this case, the same mechanism and labels are used as for left-to-right pages, so the script detects the language of this page as one that is written right-to-left, and then adds an RLM to all of the labels, whatever the language.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ri20070601.191404500" id="ri20070601.191404500">Example 20: A list on a right-to-left page containing separate LTR items that must be ordered from right to left.</a></div>
<p><img src="images/rtl-list.png" alt="Picture of the top of an article written in Hebrew, showing a list of translations, using the name of the language in the native script, in Arabic, English and French." /></p></div>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030728.072236229" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030728.092841697" name="ri20030728.092841697" href="#ri20030728.092841697" shape="rect">7.4 <span class="bptitle">Use Unicode control characters for PCDATA only</span></a></div>
<div class="rule">Use stateful Unicode control characters for
bidirectional control only for attribute text or element text that allows no internal markup.</div><div class="description">
<div class="infotype">How to</div>
<p> In HTML do not use the Unicode characters <a class="termref" href="#term-rle" shape="rect">RLE</a>, <a class="termref" href="#term-lre" shape="rect">LRE</a>, <a class="termref" href="#term-rlo" shape="rect">RLO</a>, <a class="termref" href="#term-lro" shape="rect">LRO</a> and <a class="termref" href="#term-pdf" shape="rect">PDF</a> where markup is available. To show the limits of embedded text with a different base direction, use the <code class="kw">dir</code> attribute, and to override the <a class="termref" href="#term-bidialgorithm" shape="rect">bidirectional algorithm</a> use the <code class="kw">bdo</code> element.</p>
<div class="note"><span class="note-head">Note: </span>Two non-embedding directional control characters provided by Unicode do not have corresponding markup and should be used. These are U+200F <span class="uname">RIGHT-TO-LEFT MARK</span> (RLM) and U+200E <span class="uname">LEFT-TO-RIGHT MARK</span> (LRM).</div>
<p>On the other hand, attribute text or element text that allows no internal markup, such as the <code class="kw">title</code>, <code class="kw">textarea</code> and <code class="kw">option</code> elements, cannot support use of <code class="kw">dir</code> on a <code class="kw">span</code> or other element to label part of its content.</p>
<p>In these cases you need to use Unicode characters to do the job. The following table shows correspondences between markup and Unicode control codes:</p>
<table id="unicodecontrols"><tbody><tr><th rowspan="1" colspan="1">Markup</th><th rowspan="1" colspan="1">Code</th><th rowspan="1" colspan="1">Codepoint</th><th rowspan="1" colspan="1">Description</th></tr><tr><td rowspan="1" colspan="1">dir = "rtl"</td><td rowspan="1" colspan="1">RLE</td><td rowspan="1" colspan="1">U+202B</td><td rowspan="1" colspan="1">Same effect as the start tag of a block or inline element with the attribute <code class="kw">dir</code> set to <code class="kw">rtl</code> .</td></tr><tr><td rowspan="1" colspan="1"> dir = "ltr"</td><td rowspan="1" colspan="1">LRE</td><td rowspan="1" colspan="1">U+202A</td><td rowspan="1" colspan="1">Same effect as the start tag of a block or inline element with the attribute <code class="kw">dir</code> set to <code class="kw">ltr</code> .</td></tr><tr><td rowspan="1" colspan="1"><bdo dir = "rtl"></td><td rowspan="1" colspan="1">RLO</td><td rowspan="1" colspan="1">U+202E</td><td rowspan="1" colspan="1">Same effect as the start tag of a <code class="kw">bdo</code> element with the attribute <code class="kw">dir</code> set to <code class="kw">rtl</code> .</td></tr><tr><td rowspan="1" colspan="1"><bdo dir = "ltr"></td><td rowspan="1" colspan="1">LRO </td><td rowspan="1" colspan="1">U+202D</td><td rowspan="1" colspan="1">Same effect as the start tag of a <code class="kw">bdo</code> element with the attribute <code class="kw">dir</code> set to <code class="kw">ltr</code> .</td></tr><tr><td rowspan="1" colspan="1">end of selection</td><td rowspan="1" colspan="1">PDF</td><td rowspan="1" colspan="1">U+202C</td><td rowspan="1" colspan="1">When used to terminate RLE or LRE it is equivalent to the end tag of the element carrying the dir attribute. When used to terminate RLO or LRO it is equivalent to the </bdo> tag.</td></tr></tbody></table>
<p>These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section <a class="section-ref" href="#ri20060623.095429759" shape="rect">Adding escapes to the content</a>.)</p>
<div class="infotype">Discussion</div>
<p> The HTML 4 specification specifically <a href="http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2.3" shape="rect">warns against</a> mixing the two approaches because of the increased likelihood of improper nesting. It also recommends the use of markup because it "offers a better guarantee of document structural integrity and alleviates some problems when editing bidirectional HTML text with a simple text editor". It does not proscribe the use of Unicode bidi formatting codes.</p><p>The joint Unicode Technical Report #20 and W3C Note, <a href="http://www.w3.org/TR/unicode-xml/#Bidi" shape="rect">Unicode in XML and other Markup Languages</a> goes further. It explicitly recommends that only the markup be used. It also recommends that the Unicode bidi formatting codes should be ignored if detected in a browser context, and replaced by appropriate markup when received in an editing context.</p><p>Of course, in attribute values or for the three elements listed above markup cannot be used, so the Unicode control characters are the only option available.</p>
<p>For further discussion, see the article <a href="http://www.w3.org/International/questions/qa-bidi-controls" shape="rect">Bidi formatting codes vs. markup in (X)HTML</a>.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030728.092841697" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="tech-tooltips-etc" name="tech-tooltips-etc" href="#tech-tooltips-etc" shape="rect">7.5 <span class="bptitle">Tooltips, page titles and JavaScript dialog boxes</span></a></div>
<div class="rule">Consider using Unicode control characters to set the base direction around
bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes.</div>
<div class="description">
<div class="infotype">How to</div>
<div class="note"><span class="note-head">Note: </span>This technique is described as a way to work around the fact that some browsers don't always do what you would expect. Note, also, that this is only a problem for <em>bidirectional</em> text. Monodirectional text should look fine, because there is no need to correctly order a sequence of different <a class="termref" href="#term-directionalrun" shape="rect">directional runs</a>.</div>
<p> Put the Unicode characters <a class="termref" href="#term-rle" shape="rect">RLE</a> (U+202B) or <a class="termref" href="#term-lre" shape="rect">LRE</a> (U+202A) at the beginning, and <a class="termref" href="#term-pdf" shape="rect">PDF</a> (U+202C) at the end of bidirectional text that you expect to be displayed in one of the following situations:</p>
<ul><li>as a tooltip</li><li>in the page title</li><li>on a JavaScript dialog box</li></ul>
<p>These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section <a class="section-ref" href="#ri20060623.095429759" shape="rect">Adding escapes to the content</a>.)</p>
<div class="infotype">Discussion</div>
<p> Bidirectional text that is displayed in page titles, or JavaScript dialog boxes in a browser with a left-to-right locale is typically displayed with a base direction of left-to-right. This means that directional runs in these contexts are also ordered left-to-right. At the time of writing Internet Explorer and Firefox do respect the base direction of the content when displaying tooltips for title attributes, but other browsers do not.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ex-tooltips" id="ex-tooltips">Example 21: Tooltips displaying bidirectional text.</a></div>
<p>Here is a screen snap of a tooltip in a right-to-left page which looks correct.</p>
<p><img src="images/tooltip-correct.png" alt="Picture of bidirectional text in a tooltip, with directional runs in the correct order." /></p>
<p>On a different browser, the text in the same tooltip has the two Arabic words on the wrong sides of the 'W3C' text, because a base direction of LTR has been applied.</p>
<p><img src="images/tooltip-incorrect.png" alt="Picture of bidirectional text in a tooltip, with directional runs in the correct order." /></p>
</div>
<p>Since markup is not effective in any of these situations, Unicode control characters can be used to establish the base direction as right-to-left. This produces the desired effect on most browsers.</p>
<p>For more information about handling of these situations in browsers and examples, follow the link to more information, below, and look at the test and results pages.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#tech-tooltips-etc" class="uanotesref">Get more information >></a></div>
</div>
</div>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030728.09323441" name="ri20030728.09323441" href="#ri20030728.09323441" shape="rect">7.6 <span class="bptitle">Watch out for white space</span></a></div><div class="rule">Do not leave white space at the end of inline elements that mark a directional
boundary.</div><div class="description">
<div class="infotype">How to</div>
<p>Remove all white space from before the end tag of an <a class="termref" href="#term-inlineelement" shape="rect">inline element</a> that changes the <a class="termref" href="#term-basedirection" shape="rect">base direction</a>.</p>
<div class="infotype">Discussion</div>
<p>Spaces between <a class="termref" href="#term-directionalrun" shape="rect">directional runs</a> may appear to collapse at the boundary of an embedding if there is a space just before the end tag of the inline element that surrounds the embedded text. Here is an example.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ex-missingspace" id="ex-missingspace">Example 22: An illustration of the missing space phenomenon.</a></div>
<p>The following picture shows the problem. (See Hebrew word to the right and the word 'in'.)</p>
<blockquote><a href="samples/html.php?htmldir=ltr&background=fff8dd&markup=%3Cp%3EThe%20title%20says%20%3Cspan%20dir%3D%22rtl%22%3E%D7%A4%D7%A2%D7%99%D7%9C%D7%95%D7%AA%20%D7%94%D7%91%D7%99%D7%A0%D7%90%D7%95%D7%9D%2C%20W3C%20%3C%2Fspan%3E%20in%20Hebrew.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/missingspace-yes.png" alt="An example of text that is apparently missing a space." /></a></blockquote>
<p>Here is the source text that produced that result.</p>
<blockquote><code><p>The title says <span dir="rtl" lang="he">... W3C </span> in
Hebrew.</p></code></blockquote>
<p>Note carefully the space between the C of W3C and the < of the following </span>. This is what causes the effect. If you simply eliminate that space, you get what you expected, which is what is shown next.</p>
<blockquote><a href="samples/html.php?htmldir=ltr&background=fff8dd&markup=%3Cp%3EThe%20title%20says%20%3Cspan%20dir%3D%22rtl%22%3E%D7%A4%D7%A2%D7%99%D7%9C%D7%95%D7%AA%20%D7%94%D7%91%D7%99%D7%A0%D7%90%D7%95%D7%9D%2C%20W3C%3C%2Fspan%3E%20in%20Hebrew.%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/missingspace-no.png" alt="Parentheses and Latin text incorrectly ordered." /></a></blockquote>
</div>
<p>Although this seems paradoxical, that an extra space can cause a missing space, it is not a bug. For a detailed explanation of why this happens see the article <a href="/International/questions/qa-bidi-space" shape="rect">Bidi space loss</a>.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030728.09323441" class="uanotesref">Get more information >></a></div>
</div>
</div>
</div>
<div class="div1">
<h2><a name="ri20030510.102858118" id="ri20030510.102858118" shape="rect">8 Handling parentheses & other mirrored characters</a></h2>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="ri20030728.093416889" name="ri20030728.093416889" href="#ri20030728.093416889" shape="rect">8.1 <span class="bptitle">Treatment of mirrored characters</span></a></div>
<div class="rule">Treat mirrored characters as if any word <code class="kw">left</code> in the name meant '<span class="qterm">opening</span>', and
<code class="kw">right</code> meant '<span class="qterm">closing</span>'.</div><div class="description">
<div class="infotype">How to</div>
<p>Whatever the <a class="termref" href="#term-basedirection" shape="rect">base direction</a> of the text you are authoring, always use U+0028 <span class="uname">LEFT PARENTHESIS</span> and U+0029 <span class="uname">RIGHT PARENTHESIS</span> (or their equivalents in non-Unicode but <a class="termref" href="#term-logicalorder" shape="rect">logical</a> encodings) as the opening parenthesis and closing parenthesis, respectively. Ignore the actual names of these characters. Allow the rendering algorithms to choose the appropriate shape for you.</p>
<p>The same applies to the other mirrored characters.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="ex-rtlparens" id="ex-rtlparens">Example 23: Use of parentheses in right-to-left text.</a></div>
<p>The following text runs right to left. The first parenthesis in memory is U+0028 <span class="uname">LEFT PARENTHESIS</span> and the second is U+0029 <span class="uname">RIGHT PARENTHESIS</span>. The rendering automatically produces the correct shape for the displayed glyphs.</p>
<blockquote style="text-align:right;"><a href="samples/html.php?htmldir=rtl&background=fff8dd&markup=%3Cp%3E%D7%90%D7%91%D7%A8%D7%94%D7%9D%20%28%D7%94%D7%A9%D7%9B%D7%9F%29%20%D7%91%D7%99%D7%A7%D7%A9%20%D7%90%D7%AA%20%D7%A2%D7%96%D7%A8%D7%AA%D7%99.%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/mirroring.png" alt="Example of parentheses changing shape." /></a></blockquote>
</div>
<div class="infotype">Discussion</div>
<p>Mirrored characters are used according to their Unicode semantics, rather than their actual displayed shape. There are a number of paired punctuation characters, but also some single characters. The shape of a mirrored character when displayed will automatically change according to the directional context.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="exChangingParens" id="exChangingParens">Example 24: An illustration of how parentheses change shape automatically as their directionality is changed.</a></div>
<p>The following picture shows some text before and after a RLM has been inserted alongside the first parenthesis (red). Before the RLM was added, the bidi algorithm assumed that the parenthesis was part of the LTR directional run, ie. the Latin text. After the RLM was added, the parenthesis was between two characters with different directionality, and therefore takes on the directionality of the base direction, ie. RTL. Note how the shape is automatically changed to reflect this. No change was made to the text other than inserting the RLM.</p>
<blockquote style="text-align:right;"><a href="samples/html.php?htmldir=rtl&background=fff8dd&markup=%3Cp%20dir%3D%22rtl%22%3EW3C%20%28World%20Wide%20Web%20Consortium%29%20%D7%9E%D7%A2%D7%91%D7%99%D7%A8%3Cbr%20%2F%3E%20%D7%90%D7%AA%20%D7%A9%D7%99%D7%A8%D7%95%D7%AA%D7%99%20%D7%94%D7%90%D7%A8%D7%97%D7%94%20%D7%91%D7%90%D7%99%D7%A8%D7%95%D7%A4%D7%94%20%D7%9C%20-%20ERCIM.%3C%2Fp%3E%3Cp%20dir%3D%22rtl%22%3EW3C%E2%80%8F%20%28World%20Wide%20Web%20Consortium%29%20%D7%9E%D7%A2%D7%91%D7%99%D7%A8%3Cbr%20%2F%3E%20%D7%90%D7%AA%20%D7%A9%D7%99%D7%A8%D7%95%D7%AA%D7%99%20%D7%94%D7%90%D7%A8%D7%97%D7%94%20%D7%91%D7%90%D7%99%D7%A8%D7%95%D7%A4%D7%94%20%D7%9C%20-%20ERCIM.%3C%2Fp%3E" target="text" shape="rect"><img class="codelinkl" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/mirroredparens.png" alt="Example of parentheses changing shape." /></a></blockquote>
</div>
<p>It is unfortunate in this case that Unicode character names cannot be changed, otherwise these parentheses you see in <a href="#exChangingParens">Example 24</a> would have been named <span class="uname">OPENING PARENTHESIS</span> and <span class="uname">CLOSING PARENTHESIS</span> instead, to make their use clearer.</p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#ri20030728.093416889" class="uanotesref">Get more information >></a></div>
</div>
</div>
</div>
<div class="div1">
<h2><a name="ri20030218.135307338" id="ri20030218.135307338" shape="rect">9 Overriding the Unicode bidirectional algorithm</a></h2>
<div class="bp">
<div class="short-name"><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a id="tech-bdo" name="tech-bdo" href="#tech-bdo" shape="rect">9.1 <span class="bptitle">Use the bdo element.</span></a></div>
<div class="rule">Use the <code class="kw">bdo</code> element to force the directionality of a sequence of inline
characters.</div>
<div class="applicability"><span class="applic-title">No UA specific notes.</span></div>
<div class="description">
<div class="infotype">How to</div>
<p> Surround the text with a <code class="kw">bdo</code> element. Set the value of the <code class="kw">dir</code> attribute on the <code class="kw">bdo</code> tag to either <code class="kw">ltr</code> or <code class="kw">rtl</code>, depending on the <a class="termref" href="#term-basedirection" shape="rect">base direction</a> of the surrounding text, as shown in the examples below.</p>
<p>If it is not possible to use markup, eg. in the <code class="kw">title</code> element or attribute values, you will need to use Unicode control characters. </p>
<p>For examples and more detailed explanations see the discussion that follows.</p>
<div class="infotype">Discussion</div>
<p><code class="kw">bdo</code> stands for '<span class="qterm">bidirectional override</span>'. This inline element can be used to override
the Unicode <a class="termref" href="#term-bidialgorithm" shape="rect">bidirectional algorithm</a>, and just list all characters in the sequence they are stored in memory.</p>
<p>This is not often required, but it can be very useful to correctly display part numbers or to show how text is stored in memory.</p>
<p> This can be done using markup around the relevant text, or by adding Unicode control characters to the text. It is recommended that markup is used in preference to control characters because control characters create states with invisible boundaries, and are difficult to manage.</p>
<p><strong class="leadin">Using markup.</strong> <a href="#exBdo">Example 25</a> shows how to use <code class="kw">bdo</code>.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="exBdo" id="exBdo">Example 25: Applying markup to override the bidirectional algorithm.</a></div>
<p>The following picture shows the same text as the bidirectional algorithm would display it, and how it would look if you use <code class="kw">bdo</code> markup to remove the effects of the bidi algorithm.</p>
<p><a href="samples/html.php?background=fff8dd&markup=%3Cp%3EIn%20the%20phrase%2C%20%22%3Cspan%20dir%3D%22rtl%22%3E%D7%A4%D7%A2%D7%99%D7%9C%D7%95%D7%AA%20%D7%94%D7%91%D7%99%D7%A0%D7%90%D7%95%D7%9D%2C%20W3C%3C%2Fspan%3E%22%2C%20the%20order%20of%20characters%20in%20memory%20is%3A%3C%2Fp%3E%3Cp%3E%3Cbdo%20dir%3D%22ltr%22%3E%D7%A4%D7%A2%D7%99%D7%9C%D7%95%D7%AA%20%D7%94%D7%91%D7%99%D7%A0%D7%90%D7%95%D7%9D%2C%20W3C%3C%2Fbdo%3E%3C%2Fp%3E" target="text" shape="rect"><img class="codelink" src="images/codelink.gif" alt="View code." title="View code for this example." /><img src="samples/bdo.png" alt="Incorrectly ordered text, because no embedding." /></a></p>
<blockquote><span class="visualascii">Visual ASCII version:</span> <br clear="none" />
in the phrase "w3c ,YTIVITCA NOITAZILANOITANRETNI",<br clear="none" />
the order of characters in memory is:<br clear="none" />
<br clear="none" />
INTERNATIONALIZATION ACTIVITY, w3c
</blockquote>
<p>Here is the markup that would produce it.</p>
<blockquote>
<p><code><p>In the phrase "<span dir="rtl" lang="he"><span dir="rtl">...</span></span>" the order of characters in memory is:</p></code></p>
<p><code><p><bdo dir="ltr"> ... </bdo></p></code></p>
</blockquote>
</div>
<p>It is possible that the embedded text is not surrounded by markup, and you may need to add it, but note that the quotation here was already surrounded by markup. A <code class="kw">span</code> was used in order to label the language of the quotation. This is likely to be a common occurrence. In addition to marking up for language, quotations may be marked up with such things as a <code class="kw">span</code> or <code class="kw">q</code> element for styling or semantic properties. Given that the boundary of the quotation is already clearly marked, adding the <code class="kw">dir</code> attribute is simple and quick.</p>
<p>Note also, by the way, that we placed the span element <em>inside</em> the quotation marks, since these are a part of the English
text.</p>
<p><strong class="leadin">Using control characters.</strong> Where markup is not available, such as in a <code class="kw">title</code> attribute value or an <code class="kw">option</code> element, you will have to use Unicode control characters to demarcate the required range of text and assign a base direction to it. </p>
<p>To mark the beginning of the embedded section you use one of U+202E <span class="uname">RIGHT-TO-LEFT OVERRIDE (RLO)</span> or U+202D <span class="uname">LEFT-TO-RIGHT OVERRIDE (LRO)</span> to set the base direction. This corresponds to the markup <code><bdo dir="rtl"></code> or <code><bdo dir="ltr"></code>, respectively. At the other end of the embedded section is U+202C <span class="uname">POP DIRECTIONAL FORMATTING (PDF)</span>. This corresponds to <code></bdo></code> in markup terms.</p>
<p>These characters can be added as characters or as escapes. (But see the issues associated with escapes in the section <a class="section-ref" href="#ri20060623.095429759" shape="rect">Adding escapes to the content</a>.) </p>
<div class="resourcelink"><a href="/International/docs/bp-html-bidi/uanotes#tech-bdo" class="uanotesref">Get more information >></a></div>
</div>
</div></div></div><div class="back"><div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="d2e1978" id="d2e1978" shape="rect">A Acknowledgments</a></h2>
<p>Members of the Internationalization Working Group and former GEO Working Group have contributed their time and
valuable comments to shaping these guidelines.</p></div></div></body></html>