index.html
106 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Additional Requirements for Bidi in HTML</title><style type="text/css" xml:space="preserve">
.warning { background-color: #FF9; border: 1px solid red; padding: 5px; }
#unicodecontrols {
margin-left: 5%;
margin-right: 5%;
border: 1px solid teal;
}
.list-examples {
border-color: rgb(136, 136, 136);
border-width: 1px;
table-layout: fixed;
border-collapse: collapse;
font-family: monospace;
white-space: nowrap;
margin-top: .6em;
margin-bottom: 1em;
}
.list-examples td {
width: 250px;
border: 1px solid rgb(136, 136, 136);
}
</style><link rel="stylesheet" href="local.css" type="text/css" /><!-- EDIT VERSION<link rel="stylesheet" type="text/css" href="base.css" /> --><!-- DRAFT VERSION--><link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-WD" /><!-- EDIT VERSION<script language="JavaScript" src="redlining.js" type="text/javascript"></script> --></head><!-- EDIT VERSION<body ondblclick="toggleRows();"> --><!-- DRAFT VERSION --><body>
<!-- EDIT VERSION
<div style="float:right; width:30%; font-size:80%;">
<div><p>This document is an editor's copy. It supports markup to identify changes from a previous version. Two kinds of changes are highlighted: <ins>new, added text</ins>, and <del>deleted text</del>.</p></div>
<div id="revisions"></div>
<div>$Id: Overview.html,v 1.6 2010/03/05 07:28:54 mike Exp $<script type="text/javascript">showButton()</script></div>
</div>-->
<div style="text-align:center;"><p>[ <a href="#contents" shape="rect">contents</a> ]</p></div>
<div class="head">
<a href="http://www.w3.org/" shape="rect"><img height="48" width="72" alt="W3C" src="http://www.w3.org/Icons/w3c_home" /></a>
<h1><a name="title" id="title" shape="rect">Additional Requirements for Bidi in HTML</a></h1>
<h2><a name="w3c-doctype" id="w3c-doctype" shape="rect"> W3C Working Draft </a> 4 March 2010</h2><dl><dt>This version:</dt><dd>
<a href="http://www.w3.org/TR/2010/WD-html-bidi-20100304/" shape="rect">http://www.w3.org/TR/2010/WD-html-bidi-20100304/</a></dd><dt>Latest version:</dt><dd>
<a href="http://www.w3.org/TR/html-bidi/" shape="rect">http://www.w3.org/TR/html-bidi/</a></dd><dt>Editor:</dt><dd>Aharon Lanin, Google</dd><dt>Additional Contributors:</dt><dd>Adil Allawi, Technical Director, Diwan Software</dd><dd>Matitiahu Allouche, Bidi Architect, IBM</dd><dd>Uri Bernstein, Google</dd><dd>Douglas Davidson, Apple</dd><dd>Mark Davis, Senior I18n Architect, Google; President of the Unicode Consortium</dd><dd>Martin J. Dürst, W3C I18n Interest Group Chair</dd><dd>Asmus Freytag, President, ASMUS, Inc.</dd><dd>Richard Ishida, I18n Lead, W3C</dd><dd>Shanjian Li, Google</dd><dd>Mohamed Mohie, IBM</dd><dd>Jeremy Moskovich, Google</dd><dd>Shachar Shemesh, Lingnu Open Source Consulting</dd><dd>Gaal Yahas, Google</dd></dl>
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright" shape="rect">Copyright</a> © 2007-2010 <a href="http://www.w3.org/" shape="rect"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/" shape="rect"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.eu/" shape="rect"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/" shape="rect">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer" shape="rect">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks" shape="rect">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents" shape="rect">document use</a> rules apply.</p></div><hr /><div>
<h2><a name="abstract" id="abstract" shape="rect"> Abstract</a></h2>
<p>Authoring a web app that needs to support both right-to-left and left-to-right interfaces, or to take as input and display both left-to-right and right-to-left data, usually presents a number of challenges that make it an especially laborious and bug-prone task. Some of these are due to browser bugs, but some can be traced to a gap in the specification of the bidirectional aspects of a given HTML feature. And some of these challenges could be greatly simplified by adding a few strategically placed new HTML features. This document proposes fixes for some of the most repetitive pain points.</p></div><div>
<h2><a name="status" id="status" shape="rect"> Status of this Document</a></h2>
<!--<p><strong>This document is an editors' copy that has no official standing.</strong></p>-->
<p><em>This section describes the status of this document at the time of its publication. Other documents may
supersede this document. A list of current W3C publications and the latest revision of this technical report can be
found in the
<a href="http://www.w3.org/TR/" shape="rect">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p>
<p>This document contains proposals for features to be added to HTML to support bidirectional text in languages such as Arabic, Hebrew, Persian, Thaana, Urdu, etc..</p>
<p>This is a W3C First Public Working Draft produced by the
<a href="http://www.w3.org/International/core/" shape="rect">Internationalization Core Working Group</a>, part of the
<a href="http://www.w3.org/International/Activity" shape="rect">W3C Internationalization Activity</a>. The Working Group expects this Working Draft to become a Working Group Note.</p>
<p>Please send comments on this document to <a href="mailto:public-i18n-bidi@w3.org?subject=[html-bidi]" shape="rect">public-i18n-bidi@w3.org</a> (<a href="http://lists.w3.org/Archives/Public/public-i18n-bidi/" shape="rect">publicly archived</a>). See also the <a href="http://www.w3.org/International/docs/html-bidi-requirements/" shape="rect">latest editor's draft</a>.</p>
<p>Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>
<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/" shape="rect">5 February 2004 W3C Patent Policy</a>. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/32113/status" shape="rect">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential" shape="rect">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure" shape="rect">section 6 of the W3C Patent Policy</a>. </p>
</div>
<div class="toc">
<h2><a name="contents" id="contents" shape="rect"> </a>Table of Contents</h2>
<div class="toc1">1 <a href="#ri20030912.142608197">Introduction</a><br /><div class="toc2">1.1 <a href="#notation">Notation</a><br /></div><div class="toc2">1.2 <a href="#basedirection">Base direction: a recurrent theme</a><br /></div><div class="toc2">1.3 <a href="#terminology">Terminology</a><br /></div></div><div class="toc1">2 <a href="#newhtmlfeatures">New HTML features</a><br /><div class="toc2">2.1 <a href="#bidi-isolation">Support bidi isolation of inline element content</a><br /></div><div class="toc2">2.2 <a href="#auto-direction">Support auto-direction</a><br /></div><div class="toc2">2.3 <a href="#reporting-direction">Support reporting the chosen direction of <input> and <textarea> in form submissions</a><br /></div><div class="toc2">2.4 <a href="#image-flip">Support option for image elements to be flipped horizontally in RTL</a><br /></div></div><div class="toc1">3 <a href="#existing-features">Standardizing Bidi Aspects of Existing HTML Features</a><br /><div class="toc2">3.1 <a href="#br-as-separator"><br> should should serve as a bidi separator</a><br /></div><div class="toc2">3.2 <a href="#newline-as-separator">Newline characters should serve as bidi separators inside <pre>, <textarea>, and script dialog text</a><br /></div><div class="toc2">3.3 <a href="#blocks-as-separators">Embedded block elements should serve as bidi separators </a><br /></div><div class="toc2">3.4 <a href="#script-dialog">Script dialog text should be displayed in the page's direction </a><br /></div><div class="toc2">3.5 <a href="#title-and-dir"><title> should support the dir attribute </a><br /></div><div class="toc2">3.6 <a href="#title-and-alt">title and alt attribute text should be displayed in the element's direction </a><br /></div><div class="toc2">3.7 <a href="#option"><option> should support the dir attribute and be displayed accordingly both in the dropdown and after being chosen</a><br /></div><div class="toc2">3.8 <a href="#set-direction"><input type="text"> and <textarea> should support compatible "set direction" functionality</a><br /></div><div class="toc2">3.9 <a href="#remember-input-dir">When an input value is remembered, its direction should be remembered too</a><br /></div><div class="toc2">3.10 <a href="#lists">The rendering of numbering or bullets in a list should be independent of the direction of individual <li> elements</a><br /></div><div class="toc2">3.11 <a href="#vertical-scrollbar">A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction</a><br /></div><div class="toc2">3.12 <a href="#vertical-scrollbar-body">The vertical scrollbar of an element below <body> should be on the "end" side relative to the element's direction</a><br /></div></div><div class="toc1">4 <a href="#appendix-a"> Appendix A: The Word-Count Direction Estimation Algorithm</a><br /></div>
</div>
<hr />
<div class="body">
<div class="div1">
<h2><a name="ri20030912.142608197" id="ri20030912.142608197" shape="rect">1 Introduction</a></h2>
<p>Authoring a web app that needs to support both right-to-left and left-to-right interfaces, or to take as input and display both left-to-right and right-to-left data, usually presents a number of challenges that make it an especially laborious and bug-prone task. Some of these are due to browser bugs, but some can be traced to a lapse in the specification of the bidirectional aspects of a given HTML feature. And some of these challenges could be greatly simplified by adding a few strategically placed new HTML features. </p>
<p>This document proposes fixes for some of the most repetitive pain points.</p>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="notation" id="notation" shape="rect">1.1 Notation</a></h3>
<p>All examples in this document are in "fake bidi", i.e. use uppercase English to represent RTL characters and lowercase English for LTR characters. They will usually first give the characters in the order in which they are stored in memory, and then in the visual order in which they appear when displayed. For example, we would say: When displayed, "RTL SENTENCE" comes out as "ECNETNES LTR".</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="basedirection" id="basedirection" shape="rect">1.2 Base direction: a recurrent theme</a></h3>
<p>Much of this proposal deals with determining and declaring the base direction of text. This is because text displayed in the wrong direction is often garbled. </p>
<p>For example, "10 main st." is displayed in RTL as</p>
<div class="exampleOuter">
<p>.main st 10</p>
</div>
<p>and "MAKE html WORK FOR YOU" is displayed in LTR as </p>
<div class="exampleOuter">
<p>EKAM html UOY ROF KROW</p>
</div>
<p>instead of the intended </p>
<div class="exampleOuter">
<p>UOY ROF KROW html EKAM</p>
</div>
<p>and is quite unreadable. </p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="terminology" id="terminology" shape="rect">1.3 Terminology</a></h3>
<dl><dt class="term"><a name="term-basedirection" id="term-basedirection" shape="rect">base direction</a></dt><dd class="definition">The overall directional context, LTR or RTL, in which text is displayed, and which often affects the way it is displayed when using the Unicode Bidirectional Algorithm.</dd><dt class="term"><a name="term-computeddirection" id="term-computeddirection" shape="rect">computed direction</a></dt><dd class="definition">In HTML, the base direction of an element can be specified using the dir attribute, but can also be inherited from an ancestor element, as well as set using CSS. We call the "bottom line" base direction (LTR or RTL) applied to an element after considering all these factors the computed direction.</dd><dt class="term"><a name="term-inlineelement" id="term-inlineelement" shape="rect">inline element</a></dt><dd class="definition">Inline elements are elements such as <span>, <em>, <strong>, <a>, etc. The opposite of an inline element is a block element, such as <p>, <div>, <ol>, <ul>, <blockquote>, <body>, etc.</dd><dt class="term"><a name="term-inlinetext" id="term-inlinetext" shape="rect">inline text</a></dt><dd class="definition">Text that lies wholly within a single block element, ie. text within a paragraph. Inline text may include inline markup.</dd><dt class="term"><a name="term-lre" id="term-lre" shape="rect">LRE</a></dt><dd class="definition">A short name for the Unicode character U+202A <span class="uname">LEFT-TO-RIGHT EMBEDDING</span>. This invisible control character is used to begin a range of text with an embedded base direction of left-to-right.</dd><dt class="term"><a name="term-lro" id="term-lro" shape="rect">LRO</a></dt><dd class="definition">A short name for the Unicode character U+202E <span class="uname">LEFT-TO-RIGHT OVERRIDE</span>. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from left to right.</dd><dt class="term"><a name="term-ltr" id="term-ltr" shape="rect">LTR</a></dt><dd class="definition">Left-to-right.</dd><dt class="term"><a name="term-pdf" id="term-pdf" shape="rect">PDF</a></dt><dd class="definition">A short name for the Unicode character U+202C <span class="uname">POP DIRECTIONAL FORMATTING</span>. This invisible control character is used to signal the end of a range of text that was started with one of the RLE, LRE, RLO or LRO characters.</dd><dt class="term"><a name="term-rle" id="term-rle" shape="rect">RLE</a></dt><dd class="definition">A short name for the Unicode character U+202B <span class="uname">RIGHT-TO-LEFT EMBEDDING</span>. This invisible control character is used to begin a range of text with an embedded base direction of right-to-left.</dd><dt class="term"><a name="term-rlo" id="term-rlo" shape="rect">RLO</a></dt><dd class="definition">A short name for the Unicode character U+202E <span class="uname">RIGHT-TO-LEFT OVERRIDE</span>. This invisible control character is used to begin a range of text that ignores the Unicode bidirectional algorithm and arranges characters from right to left.</dd><dt class="term"><a name="term-rtl" id="term-rtl" shape="rect">RTL</a></dt><dd class="definition">Right-to-left.</dd><dt class="term"><a name="term-uba" id="term-uba" shape="rect">UBA</a></dt><dd class="definition">The <a href="http://unicode.org/reports/tr9/" class="external text" title="http://unicode.org/reports/tr9/" shape="rect">Unicode Bidi Algorithm</a>, which determines the visual order in which bidi text is to be displayed, given a <i>base direction</i> that is either set explicitly or "guessed" from the text itself using a standard algorithm. In HTML, the UBA is always passed a specific base direction, and never asked to guess it.</dd></dl>
</div>
</div><div class="div1">
<h2><a name="newhtmlfeatures" id="newhtmlfeatures" shape="rect">2 New HTML features</a></h2>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="bidi-isolation" id="bidi-isolation" shape="rect">2.1 Support bidi isolation of inline element content</a></h3>
<h4 id="bidi-isolation-background">Background</h4>
<p>The UBA's rendering of a piece of text depends not only on the explicitly declared direction in which it appears (e.g. the dir attribute value on the parent element) and the characters it contains, but also on the implicit directional properties of the characters preceding and following it. For example, in an RTL context, "john: " is displayed as "john: " when followed by "susan" (i.e. "john: susan"), but as " :john" when followed by "SUSAN" (i.e. "NASUS :john") - note the change in colon positioning.</p>
<p>The bidi formatting characters LRO, RLO, LRE, RLE, and PDF have particularly strong influence on what surrounds them. For example, RLO makes all text up to the next PDF behave as RTL characters, making "hello" display as "olleh".</p>
<h4 id="bidi-isolation-problem">The Problem</h4>
<p>Most documents contain a large number of self-contained entities whose content must not influence the directional rendering of what precedes or follows them. Furthermore, the document author naively expects such an entity to be displayed visually between what precedes it and what follows it, laid out in the current direction: preceding - entity - following.</p>
<p>Examples of such entities are legion: the title of an article, the name of an author, a description, etc.</p>
<p>As long as the entire document and all the entities it contains are of uniform direction, there is no problem. Arbitrary-direction entities also don't cause a problem when they are displayed as a separate block element (which is treated as a separate "paragraph" in UBA terms). However, when an inline entity is allowed to contain text of arbitrary direction, bad things start happening, and existing HTML mark-up is powerless to stop it.</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="purplepizza1" id="purplepizza1">Example 1: </a></div>
<p><code>PURPLE PIZZA - 3 reviews</code></p>
<p>The entity here is the RTL name of a restaurant, being displayed in an LTR context. The intent is to have it appear as</p>
<p><code>AZZIP ELPRUP - 3 reviews </code></p>
<p>However, it is actually displayed as </p>
<p><code>3 - AZZIP ELPRUP reviews</code></p>
<p>and is effectively unreadable. This happens because according to the UBA, a number "sticks" to the strong-directional run preceding it. </p>
</div>
<!--p><span class="leading">Example 1:</span> <code>PURPLE PIZZA - 3 reviews</code></p>
<p>The entity here is the RTL name of a restaurant, being displayed in an LTR context. The intent is to have it appear as</p>
<div class="exampleOuter">
<p><code>AZZIP ELPRUP - 3 reviews </code></p>
</div>
<p>However, it is actually displayed as </p>
<div class="exampleOuter">
<p><code>3 - AZZIP ELPRUP reviews</code></p>
</div>
<p>and is effectively unreadable. This happens because according to the UBA, a number "sticks" to the strong-directional run preceding it. </p-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="purplepizza2" id="purplepizza2">Example 2: </a></div>
<p><code><span dir="rtl">PURPLE PIZZA</span> - 3 reviews</code></p>
<p>This is a common first attempt at fixing <a href="#purplepizza1">Example 1</a>. In fact, wrapping opposite-direction text in mark-up indicating its direction is generally a good idea, and is in many cases essential. Here, however, it makes no difference at all - the result is exactly the same as in <a href="#purplepizza1">Example 1</a>.</p>
<p> That the "fix" does not work is, in fact, to be expected: the <span dir="rtl"> only explicitly states the direction of the text inside it, and does not say anything at all about what surrounds it.
</p>
<p> In fact, the currently recommended way to fix our purple pizza is not to use mark-up at all, but to insert an LRM character (U+200E, &lrm;) after the PURPLE PIZZA. This prevents the RTL text from "sticking" to the number that happens to follow it. If the context had been RTL and the entity LTR, the same magic would be worked by the RLM character (U+200F, &rlm;). The same technique is supposed to be applied to <a href="#use-css">Example 3</a> and <a href="#first-novel">Example 4</a> below. Unfortunately, using LRM/RLM marks like this is less than ideal, for reasons we will discuss below.</p>
</div>
<div class="exampleOuter">
<div class="exampleHeader"><a name="use-css" id="use-css">Example 3: </a></div>
<p><code>USE css (<span dir="ltr">position:relative</span>).</code></p>
<p>The entity here is a code snippet ("position:relative"), marked with a span and its LTR direction, to be displayed in an RTL context. Despite the RTL context, it is preceded by the LTR word "css" because technical terms and brand names often appear in their original Latin script in RTL text. The intent is to have it appear as </p>
<pre style="text-align: right;" xml:space="preserve">.(position:relative) css ESU</pre>
<p>However, it is actually displayed as </p>
<pre style="text-align: right;" xml:space="preserve">.(css (position:relative ESU</pre>
<p>This happens because the LTR word "css" before the entity "sticks" to the LTR entity according to normal UBA rules.</p>
</div>
<!--p><span class="leading">Example 2:</span> <code><span dir="rtl">PURPLE PIZZA</span> - 3 reviews</code></p>
<p>This is a common first attempt at fixing <a class="example-ref" href="#purplepizza1">Example ppizza1</a>. In fact, wrapping opposite-direction text in mark-up indicating its direction is generally a good idea, and is in many cases essential. Here, however, it makes no difference at all - the result is exactly the same as in <a class="example-ref" href="#purplepizza1">Example ppizza1</a>.</p>
<p> That the "fix" does not work is, in fact, to be expected: the <span dir="rtl"> only explicitly states the direction of the text inside it, and does not say anything at all about what surrounds it.
</p><p> In fact, the currently recommended way to fix our purple pizza is not to use mark-up at all, but to insert an LRM character (U+200E, &lrm;) after the PURPLE PIZZA. This prevents the RTL text from "sticking" to the number that happens to follow it. If the context had been RTL and the entity LTR, the same magic would be worked by the RLM character (U+200F, &rlm;). The same technique is supposed to be applied to Examples 3 and 4 below. Unfortunately, using LRM/RLM marks like this is less than ideal, for reasons we will discuss below.
</p-->
<!--p><span class="leading">Example 3:</span> <code>USE css (<span dir="ltr">position:relative</span>).</code></p>
<p>The entity here is a code snippet ("position:relative"), marked with a span and its LTR direction, to be displayed in an RTL context. Despite the RTL context, it is preceded by the LTR word "css" because technical terms and brand names often appear in their original Latin script in RTL text. The intent is to have it appear as </p>
<div class="exampleOuter">
<pre style="text-align: right;">.(position:relative) css ESU</pre>
</div>
<p>However, it is actually displayed as </p>
<div class="exampleOuter">
<pre style="text-align: right;">.(css (position:relative ESU</pre>
</div>
<p>This happens because the LTR word "css" before the entity "sticks" to the LTR entity according to normal UBA rules.</p-->
<!--p><span class="leading">Example 4:</span> <code>documents &gt; <span dir="rtl">MY FIRST NOVEL</span> &gt; <span dir="rtl">CHAPTER 1</span></code></p>
<p>The entities here are folder names displayed in "breadcrumbs" in an LTR context, where two of the folder names happen to be RTL. Suprisingly enough, this is displayed as</p>
<div class="exampleOuter">
<pre>documents > 1 RETPAHC < LEVON TSRIF YM</pre>
</div>
<p>instead of the intended</p>
<div class="exampleOuter">
<pre>documents > LEVON TSRIF YM > 1 RETPAHC
</pre>
</div>
<p>i.e. with the RTL folder names visually in the wrong order (and the arrow between them reversed). This happens because according to the UBA, the two RTL entities "stick" together, whether or not they are wrapped in <span>s as shown here.</p-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="first-novel" id="first-novel">Example 4: </a></div>
<p><code>documents &gt; <span dir="rtl">MY FIRST NOVEL</span> &gt; <span dir="rtl">CHAPTER 1</span></code></p>
<p>The entities here are folder names displayed in "breadcrumbs" in an LTR context, where two of the folder names happen to be RTL. The intent is to have it displayed as</p>
<pre xml:space="preserve">documents > LEVON TSRIF YM > 1 RETPAHC</pre>
<p>However, it is actually displayed as</p>
<pre xml:space="preserve">documents > 1 RETPAHC < LEVON TSRIF YM</pre>
<p>i.e. with the RTL folder names visually in the wrong order (and the arrow between them reversed). This happens because according to the UBA, the two RTL entities "stick" together, whether or not they are wrapped in <span>s as shown here.</p>
</div>
<!--p><span class="leading">Example 5:</span> <code> joe hackerRLO: overdrawn</code>
</p>
<p>The entity here is the name of a user, as chosen by a malicious user to include the invisible RLO character (U+202E), followed by a status string. Obviously, the user's name is "<a href="http://en.wikipedia.org/wiki/HTML#Character_and_entity_references" class="external text" title="http://en.wikipedia.org/wiki/HTML#Character_and_entity_references">HTML-escaped</a>" when displayed, but this does not do anything to the RLO character. The outcome is that this is displayed as </p>
<div class="exampleOuter">
<pre>joe hacker: nwardrevo</pre>
</div>
<p>where the entity influenced the display of what follows it, reversing its characters. This has security implications and has <a href="http://simonwillison.net/2009/Mar/15/bidi/#comments" class="external text" title="http://simonwillison.net/2009/Mar/15/bidi/#comments">surfaced on blogs</a>. On the other hand, it does not even have to be due to malicious use, only to the inadvertently bad trimming of an overly-long string.
</p-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="joe-hacker" id="joe-hacker">Example 5: </a></div>
<p><code>joe hackerRLO: overdrawn</code></p>
<p>The entity here is the name of a user, as chosen by a malicious user to include the invisible RLO character (U+202E), followed by a status string. Obviously, the user's name is "<a href="http://en.wikipedia.org/wiki/HTML#Character_and_entity_references" class="external text" title="http://en.wikipedia.org/wiki/HTML#Character_and_entity_references" shape="rect">HTML-escaped</a>" when displayed, but this does not do anything to the RLO character. The outcome is that this is displayed as </p>
<pre xml:space="preserve">joe hacker: nwardrevo</pre>
<p>where the entity influenced the display of what follows it, reversing its characters. This has security implications and has <a href="http://simonwillison.net/2009/Mar/15/bidi/#comments" class="external text" title="http://simonwillison.net/2009/Mar/15/bidi/#comments" shape="rect">surfaced on blogs</a>. On the other hand, it does not even have to be due to malicious use, only to the inadvertently bad trimming of an overly-long string.
</p>
</div>
<p> Currently, there is no reliable way to deal with <a href="#purplepizza1">Example 1</a> to <a href="#first-novel">Example 4</a> using mark-up, except by redundantly marking an entity's surroundings with the base direction, which is counterintuitive and painful to implement. The usual way to deal with <a href="#purplepizza1">Example 1</a> to <a href="#first-novel">Example 4</a> is to surround an entity in either LRM or RLM characters - LRM in an LTR context, and RLM in an RTL context. This prevents the entity from "sticking" to what precedes or follows it.
</p><p> However, using the LRM/RLM technique has several disadvantages, particularly in a web application:
</p>
<ul><li>
<p> The LRM or RLM is being used to address a layout issue that reflects the structure of the document, i.e. to indicate the boundary of an entity. There should be a way to express it in mark-up, not magic Unicode characters. In fact, the entity is typically already surrounded by an element that either gives it style or indicates its direction; why can't the element itself be used to indicate an entity?
</p>
</li><li>
<p> In a web application, having to add logic to choose between an LRM and an RLM is a pain, especially when the existing code layer does not happen to have easy access to the context's direction.
</p>
</li><li>
<p> Not all search engines (e.g. the browser's own CTRL-F) are smart enough to ignore invisible Unicode characters such as LRM and RLM. This makes a document using such characters less searchable: the user searches for "A B", but does not find "A B" because there is an invisible character between them. Or, conversely, the user copy-pastes text - accidentally including the LRM or RLM character - from the page into some search box, and does not get hits in any other documents because they do not contain the LRM/RLM. In a manually-authored HTML document using a few judiciously placed LRMs/RLMs, such problems do not amount to much. In a web application, however, the simplest way to use this technique is to do it wholesale, around every inserted entity. This results in very real searchability problems. Avoiding them requires implementing quite complicated logic to decide whether the LRM or RLM is really necessary.
</p>
</li></ul>
<p> Furthermore, LRMs and RLMs do not help in <a href="#joe-hacker">Example 5</a>. Nor is there any mark-up to solve it. The only current way to deal with it is for the application to either remove any LRE, RLE, LRO, RLO, or PDF characters in it, or to remove any extra PDFs and then add any missing ones at the end. This is a rarely-implemented pain in the neck. </p>
<h4 id="bidi-isolation-solution">Proposed solution</h4>
<p>Add an element attribute to HTML used to make an inline element directionally isolated from its surroundings. A tentative name for the new attribute might be bdi, for "bidirectional isolate", as in <span dir="rtl" bdi="yes">. The attribute would take three values:
</p>
<ul><li>
<p> no, specifying no special isolation. This is the default, except in special cases indicated in the sections below.
</p>
</li><li>
<p> yes, specifying isolation.
</p>
</li><li>
<p> bdi, a synonym for yes. Allows specifying the attribute without a value for conciseness, e.g. <span dir="rtl" bdi>.
</p>
</li></ul>
<p> Applications generating HTML would use bdi routinely on elements that wrap an inserted data string (usually in conjunction with indicating its direction using the dir attribute).
</p><p> The exact definition of the effects of bdi="yes" on an element:
</p>
<ul><li><p>The element, even when empty, is to be displayed as if it were surrounded with strong-directional characters of the last explicit embedding level within which it appears. In most cases, the last explicit embedding level is simply the parent element's computed direction. (The exception is when the element is between LRE/RLE and PDF characters, which is discouraged by the W3C.) For example, take:</p>
<pre xml:space="preserve"><span dir="rtl" bdi="yes">PURPLE PIZZA</span> - 3 reviews</pre>
<p>In a dir="ltr" element, it should be displayed the same as</p>
<pre xml:space="preserve">&lrm;<span dir="rtl">PURPLE PIZZA</span>&lrm; - 3 reviews</pre>
<p>i.e. as</p>
<pre xml:space="preserve">AZZIP ELPRUP - 3 reviews</pre>
</li><li>
<p> The "imaginary" LRM/RLM characters must not actually appear in the output, e.g. for the purposes of a "copy to clipboard" operation.
</p>
</li><li>
<p> The effects of LRE, RLE, LRO, RLO, and PDF characters appearing in the element will never extend beyond the element: unbalanced PDFs will be ignored, and missing PDFs will be assumed at the close of the element.
</p>
</li></ul>
<p> The use of "imaginary" characters by higher level protocols such as HTML is explicitly allowed by the UBA's section 4.3, HL5 ("Provide artificial context").
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="auto-direction" id="auto-direction" shape="rect">2.2 Support auto-direction</a></h3>
<h4 id="auto-direction-problem">The problem</h4>
<p>Many web applications with an RTL-language interface or an RTL-language data source need to display and accept as input both LTR and RTL data. Furthermore, the application often does not know and can not control the direction of the data.</p>
<p>
For example, an online book store that carries books in many languages needs to display the original book titles regardless of the language of the user interface. Thus, a Hebrew or Arabic book title may appear in an English interface, and vice-versa. The direction of the title may be available as a separate attribute, but more likely it isn't, and needs to be guessed. The safest guess is on the basis of the characters making up the title.
</p><p> If this site also allows user comments or reviews, it is unreasonable to limit these to one language. For example, for an English book listed in an Arabic or Hebrew interface, it is perfectly reasonable to get comments both in English and in the book's language. The application does not know what the user will type until the user types it.
</p><p> Unless opposite-direction data is explicitly declared as such, it is often displayed garbled as shown above. Perhaps even worse, the user experience of typing opposite-direction data is quite awkward due to the cursor and punctuation jumping around during data entry and difficulty in selecting text.
</p><p> Currently, avoiding such problems requires that the application implement logic to estimate the data's direction - and use it in the many places where it is needed. Such logic is not easy to implement, since it requires using long tables of strong-RTL and/or strong-LTR characters, and becomes non-obvious when a string contains both. For an input element, where the direction must be automatically set as the user types the text, there is no choice but to implement the estimation logic in page scripts, thus requiring even more advanced programming skills. As a result, few applications wind up doing direction estimation, and a poor user experience is quite common for web pages mixing LTR data in an RTL interface or vice-versa.
</p>
<h4 id="auto-direction-not-problem">Not the problem</h4>
<p>The issue at hand is with text data that is basically compatible with the UBA. That is, given the correct base direction, applying the UBA will display the text intelligibly. The only problem is that we don't know the correct base direction.
</p><p> This is distinct from a different, harder issue: text mixing LTR and RTL without using the formatting characters necessary to display it intelligibly using standard UBA rules. Whichever base direction is applied, the text will not be displayed as intended. Examples of such data are not as rare as one might think:
</p>
<ul><li> Path or URL that includes consecutive RTL folder or file names (one would expect the path components to proceed in a uniform direction)
</li><li> "Tweets" that include both an RTL phrase and LTR parts like @name and a URL
</li><li> An RTL sentence that attempts to give a phone number with spaces in it
</li><li> Sentence containing an opposite-direction quotation that starts with a number or ends with punctuation
</li><li> Multi-paragraph text containing both LTR and RTL paragraphs, e.g. an RTL restaurant review followed by restaurant address in Latin script.
</li></ul>
<p> Such text does not include the Unicode formatting characters that could fix its display either because it must conform to a syntax that would misinterpret such characters, or simply because it was created by a human user that does not know such characters exist, much less how to enter or use them. Given the text's syntax, or at least a set of patterns for the problematic parts, the text could, in theory, be parsed into its constituent parts, and formatting characters added to make the text display correctly.
</p><p> Although this is a painful real-world problem, it is unrelated to HTML per se and currently lacks a mature solution. We are not proposing one here.
</p>
<h4 id="auto-direction-algorithms">Estimation algorithms (skip if not interested)</h4>
<p>A data string's direction is obvious when it contains either LTR or RTL characters, but not both. The following heuristic algorithms have been used when the data does contain both LTR and RTL characters:
</p>
<ul><li> First character with strong direction. This is the algorithm specifically mandated by the UBA for choosing a paragraph's base direction (unless overridden by a higher-level protocol, which is what currently always happens in HTML). This has the advantage of being easy to understand (and even surmise) for the user, and text is usually more readable when starting with a word in its overall direction. Nevertheless, it is not uncommon for an RTL phrase to start with an LTR word like a brand name or a technical term, in which case this algorithm fails.
</li><li> Does the string contain any RTL characters? This fails for LTR text that includes some RTL, which is quite uncommon, but not unheard of.
</li><li> Word count: does the percentage of RTL words exceed some threshold value? Works very well, but also proves unintuitive to the user in some circumstances.
</li></ul>
<p> Different approaches have been preferred in different contexts: first-strong for search boxes, any-RTL for advertisements, and word-count for longer texts like e-mails. Nevertheless, it is worth pointing out that the choice of the precise algorithm is an optimization. For most real-world data strings, all these estimation algorithms will give the same correct result.
</p><p> In addition to the basic algorithm choice, there are also several side issues:
</p>
<ul><li> Sometimes, there may be good reason to want to bias the estimation to a particular direction unless the actual value clearly indicates otherwise. For example, the value may be one title from a feed whose contents are, generally speaking, in a particular known language.
</li><li> When the value contains no strong-directional characters, it usually seems best to display it in the same direction as its surroundings, i.e. inherit the direction.
</li><li> How should those parts of the string bracketed in LRE / RLE / LRO / RLO and PDF characters be treated? It is possible to simply ignore such bracketing characters, and this is actually specified by the UBA for its first-strong algorithm. Another possibility is to ignore them together with the substrings they bracket. The rationale for the latter approach is that the direction we estimate for the whole string will not be applied to the bracketed substrings anyway. In fact, if part of a string is explicitly declared LTR, it is usually because the overall string is RTL, and vice-versa. On the other hand, if the string contains no strong-directional characters outside the declared substrings, and all the declared substrings give the same direction, then it might be best to estimate the string overall to be of the same direction as the declared substrings.
</li></ul>
<h4 id="auto-direction-solution">Proposed solution</h4>
<p>Make simple direction estimation functionality available in the browser by allowing the dir attribute to take on new values indicating that the user agent is responsible for estimating the direction of the element's contents.
</p><p> One such dir attribute value would specify using the word-count algorithm, defined and discussed in <a href="#appendix-a" shape="rect">Appendix A</a>. Another would specify the first-strong algorithm, as defined by the UBA.
</p><p> Specifying one of these direction-estimation values for the dir attribute would direct the user agent to examine the element's text content and estimate whether it is LTR or RTL using well-defined heuristics based on the inherent direction of the characters (as defined by the Unicode standard). The result it will return for text mixing LTR and RTL characters, although well-defined, may or may not be correct as judged by a human user.
</p><p> Although specifying direction estimation would be allowed on any element, it is primarily intended for elements wrapping a "single-origin" piece of text, e.g. a text input. The more complex the element's structure, the higher the chances that it mixes LTR and RTL content, and the lower the chances that an estimation algorithm will succeed in displaying the contents intelligibly. It is meaningless to use an estimation algorithm on content mixed to the extent that it is unintelligible in both LTR and RTL (when displayed by standard UBA rules).
</p><p> The new dir values should also be added to the CSS direction property's repertoire for completeness. However, since <a href="http://www.w3.org/TR/i18n-html-tech-bidi/#ri20030728.092130948" class="external text" title="http://www.w3.org/TR/i18n-html-tech-bidi/#ri20030728.092130948" shape="rect">W3C guidelines</a> recommend that direction be declared using the dir attribute, not CSS, this is first and foremost an HTML issue.
</p><p> Further details:
</p>
<ul><li> The content to be examined by a direction estimation algorithm is all descendant text nodes, visited in "in-fix" order, except for those under a descendant element with a unicode-bidi style other than "normal", e.g. a <bdo> element or an element with a dir attribute (whatever the value).
</li><li> No attempt will be made to exclude "hidden" content, whether using display:none style or any other invisibility technique.
</li><li> When none of the examined content contains strong-directional characters, the parent element's computed direction will be used.
</li><li> Elements with dir values specifying estimation will be considered to have bdi="yes" by default.
</li><li> For backward compatibility, dir values specifying estimation will not be the default for any currently defined elements except as defined below.
</li><li> Ideally, the current computed direction of an auto-directional element should be exposed to scripts as an element property, e.g. computedDirection.
</li></ul>
<h4 id="auto-direction-issues">Open issues</h4>
<ul><li> What dir attribute values should be used to specify the word-count and first-string estimation algorithms? One possibility would be simply "word-count" and "first-strong". Or should they both start with the word "auto", i.e "auto-word-count" and "auto-first-strong".
</li><li> Is it really truly essential to support both the word-count and the first-strong algorithm? Using just one algorithm would reduce confusion; there would be just one new dir attribute value, the easy-to-understand "auto".
</li><li> Should direction estimation be the default for <input type="text"> and <textarea> elements? Although it would be very useful in most cases, it is not backward compatible. The best way to come to a decision might be to discuss specific examples where estimation might prove harmful, and to judge their importance.
</li></ul>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="reporting-direction" id="reporting-direction" shape="rect">2.3 Support reporting the chosen direction of <input> and <textarea> in form submissions</a> </h3>
<h4 id="reporting-direction-background">Background</h4>
<p>In many applications, it is necessary to allow the user to enter text of either direction into a given <input type="text"> or <textarea> element, regardless of the page's direction.</p>
<p>Although algorithms for estimating the direction of a string exist (and hopefully will be exposed by the browser as described in <a href="#auto-direction">Section 2.2 </a> ), they remain heuristic for mixed-script strings.</p>
<p>As a result, all major browsers provide some way for the user to explicitly set the direction of an <input type="text"> or <textarea> element, e.g. via keyboard shortcuts, so the text being entered by the user is displayed correctly.</p>
<h4 id="reporting-direction-problem">The problem</h4>
<p>Once the text entered by the user has been submitted to the server, the direction in which it was displayed in the page is lost, unless explicitly added to the form as an invisible input by page scripts. However, scripts are not available in all environments, e.g. e-mail forms. As a result, in such an environment, the application is forced to guess at the direction of a string submitted by the user, will sometimes get it wrong, and as a result display it incorrectly in subsequent pages. </p>
<h4 id="reporting-direction-solution">Proposed solution</h4>
<p>Support a new attribute in <input> and <textarea> that would specify that the value of the element's computed direction at submission time will be included in the submission as an additional field. The additional field's name will be the value of element's name attribute suffixed with "_dir". (Reminder: the computed direction is the bottom-line "ltr" or "rtl" being used to display the element; it never takes on any other value.)
</p><p>The new attribute could be called submit_dir, and would take three values: "yes", "no", and "submit_dir". The last would be a synonym for "yes", and would allow using the attribute without an explicit value, for short. The default would be "no". For example, let's assume that a dir attribute value to indicate direction estimation is "auto", and an RTL page contains the following form:
</p>
<div class="exampleOuter">
<pre xml:space="preserve"><form action="foo" method="get">
<input type="text" name="mytest" submit_dir="yes" dir="auto" />
</form>
</pre>
</div>
<p>Then, if the user typed in the LTR value "hello", the submission URL would be "foo?mytest=hello&mytest_dir=ltr".
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="image-flip" id="image-flip" shape="rect">2.4 Support option for image elements to be flipped horizontally in RTL</a></h3>
<h4 id="image-flip-background">Background</h4>
<p>Although most images, e.g. photos, are equally applicable to LTR and RTL pages, some images are inherently and primarily "handed" or "directional", and need to appear in a mirror image in an RTL page. Common examples include various arrow and "connector" images. A less obvious example might be star rating images: the "full" half of a half-star needs to be on the left in LTR and on the right in RTL. </p>
<h4 id="image-flip-problem">The problem</h4>
<p>Currently, the author of a page to be localized into both LTR and RTL languages is forced to create two separate versions of each "handed" image, stored in two separate files, and use one or the other depending on the page language by changing the src attribute of the <img>. This process is monotonous and error-prone. </p>
<h4 id="image-flip-solution">Proposed solution</h4>
<p>Being able to tell the browser to do such flips automatically would make it that much easier for web applications to support both LTR and RTL interfaces. Only one image would be provided by the page, and the <img> element's attributes would not even have to differ between LTR and RTL pages. The single image file provided by the page would come with instructions in the <img> element to be flipped by the browser when its parent element has RTL computed direction. (The <img> element's own dir does not count, since it is used primarily to indicate the direction of its tooltip text as specified by alt and title attributes, which need not match the surroundings.)
</p><p> One possibility for such a specification would be with a new HTML attribute: hflip="no|yes|ltr|rtl".
</p><p> The default "no" would mean that the image should be displayed as is, and "yes" would mean that it must be displayed horizontally mirrored. To indicate that the flip should only occur when the element's parent has RTL computed direction, the value should be "rtl", and of course "ltr" would specify the inverse.
</p><p> A similar capability would be very important in CSS. For example, its "background" property is very popular, especially with "sprite" images. However, this is beyond the scope of this HTML proposal.
</p></div>
</div>
<div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="existing-features" id="existing-features" shape="rect">3 Standardizing Bidi Aspects of Existing HTML Features</a></h2><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="br-as-separator" id="br-as-separator" shape="rect">3.1 <br> should should serve as a bidi separator</a></h3>
<h4 id="br-as-separator-background">Background</h4>
<p>The UBA's rendering of a piece of text depends not only on the explicitly declared direction in which it appears (e.g. the dir attribute value on the parent element) and the characters it contains, but also on the implicit directional properties of the characters preceding and following it. For example, in an RTL context, "john: " is displayed as "john: " when followed by "susan" (i.e. "john: susan"), but as " :john" when followed by "SUSAN" (i.e. "NASUS :john") - note the change in colon positioning.
</p><p> The bidi formatting characters LRO, RLO, LRE, RLE, and PDF have particularly strong influence on what surrounds them. For example, RLO makes all text up to the next PDF behave as RTL characters, making "hello" display as "olleh".
</p><p> In the UBA, whitespace provides almost no separation against either kind of bidi influence.
</p><p> On the other hand, the UBA's sections 3.3.1 and 3.3.2 require that the bidi state be completely reset at a "paragraph break". This means that strong-directional text (e.g. letters) and explicit bidi formatting characters (e.g. RLE and RLO) in one paragraph have no effect on the formatting of the text in the next paragraph and vice-versa. This is a very high level of bidi separation.
</p><p> In plain text, "newline" characters like the line feed (U+000A) and carriage return (U+000D) are commonly used both to end paragraphs and simply to wrap logical lines. The former usage needs a UBA paragraph break, while the latter usage wants no more bidi separation than other kinds of whitespace. The UBA resolves this ambiguity in favor of the paragraph break because of its importance. All common UBA implementations for plain text treat newline characters as a UBA paragraph break, in accordance with the UBA specification.
</p><p> The UBA leaves the definition of a "paragraph" in higher-level protocols like HTML up to the protocol.
</p><p>
It is well-accepted that HTML block elements like <div> and <p> form UBA paragraphs, and this is implemented by all major browsers. Thus, whatever happens inside a block element has no effect on the bidirectional rendering of the text before it or after it.
</p>
<h4 id="br-as-separator-problem">The problem</h4>
<p>The HTML 4 standard <a href="http://www.w3.org/TR/html4/struct/text.html#edef-BR" class="external text" title="http://www.w3.org/TR/html4/struct/text.html#edef-BR" shape="rect">explicitly specifies</a> that <br> is to be treated for bidi purposes as whitespace, and not as a UBA paragraph break. The arguments for this decision seem to be that:
</p>
<ul><li>
<p> <br> is defined as an inline element.
</p>
</li><li>
<p> The preferred way to demarcate a paragraph in HTML is as a <p> or some other block element.
</p>
</li></ul>
<p> Firefox and Opera follow this specification and treat <br> as whitespace for UBA purposes.
</p><p> In actual usage, however, <br> is a very popular element and is used to form paragraphs at least as often as <p>, just like newlines in plain text. In fact, unlike newlines in plain text, it is almost always used for that purpose, as opposed to just wrapping a line to fit in a limited amount of space, simply because HTML normally takes care of line wrapping by itself.
</p><p> As a result, Firefox's implementation of <br> as UBA whitespace, despite being in accordance with the current HTML specification, is regularly reported as a bug. It results in innocent-looking HTML like</p>
<div class="exampleOuter"><pre xml:space="preserve">1. his name is JOHN.<br>
2. SUSAN is a friend of his.</pre></div>
<p>being rendered as</p>
<div class="exampleOuter"><pre xml:space="preserve">1. his name is .NHOJ
NASUS .2 is a friend of his.</pre></div>
<p>
Because the "<code>JOHN.<br>2. SUSAN</code>" forms a single RTL run despite the <br>, the "2" goes to the right of SUSAN. (Please note that wrapping the "JOHN" and "SUSAN" in separate dir="rtl" spans, i.e. "<code><span dir="rtl">JOHN</span>.<br>2. <span dir="rtl">SUSAN</span></code>", does not make any difference.)
</p><p> Although this LTR example is somewhat contrived, the RTL equivalent is quite realistic because it is common for LTR brand names, acronyms, etc. to be used in RTL text:
</p>
<div class="exampleOuter">
<pre xml:space="preserve">1. IT IS IMPORTANT TO LEARN html.<br>
2. css IS IMPORTANT TOO.</pre></div>
<p>which is rendered in Firefox and Opera as</p>
<div class="exampleOuter"><pre style="text-align: right;" xml:space="preserve">html. NRAEL OT TNATROPMI SI TI .1
.OOT TNATROPMI SI 2. css</pre></div>
<p> As a result, IE and WebKit treat <br> as a UBA paragraph break. Although this is not in conformance with the HTML 4 spec, the bidi separation it provides does seem to follow most users' expectations.
</p><p> If IE and WebKit were to change their <br> behavior to conform to the current standard, many existing RTL HTML documents would be broken, especially given that they tend to be authored mostly with IE in mind.
</p><p> While the bidi separation provided by treating <br> as a UBA paragraph separator is useful, the very strong nature of this separation (closing all open embedding levels) also creates problems. Being an inline element, <br> can be nested within an arbitrary number of other inline elements. If these inline ancestors have explicit dir attribute values of their own, should the <br> terminate their effects as UBA's definition of a paragraph separator says it should? That is what a newline in plain text does when it comes between an LRE or RLE and its matching PDF. So, should the second line in <div dir="rtl"><span dir="ltr">1. hello!<br>2. goodbye!</span></div> be displayed as RTL? That would conform to the definition of a UBA paragraph break, but would go against the spirit of HTML. This is, in fact, what WebKit currently does (although it is now being treated as a bug).
</p><p> To avoid this problem, IE apparently re-opens the directional embedding levels specified on ancestor elements via mark-up (dir attribute, <bdo> element) or CSS up to the closest ancestor block element after closing them at a <br> paragraph break. On the other hand, it does not reopen the directional embedding levels stemming from surrounding LRE/RLE/LRO/RLO and PDF characters.
</p><p> Should the HTML specification of the bidi behavior of <br> be changed to this rather complicated definition in the hope that all browsers will be able to standardize around it?
</p><p> And what about those rare uses of <br> when it is simply being used to wrap a line?
</p>
<h4 id="br-as-separator-solution">Proposed solution</h4>
<p>The <br> situation can be resolved by the simple expedient of defining <br> as having bdi="yes" by default. (The bdi attribute is proposed in <a href="#bidi-isolation">Section 2.1 </a>.) Thus, it would not form a UBA paragraph break, but would be treated, by default, as if it had either LRM or RLM characters around it, providing the required bidi separation to fix the examples above. Not being a paragraph break, it would not close any explicit embedding levels surrounding it. The only visible difference compared to IE's current behavior should be when the <br> falls between an LRE, RLE, LRO, RLO and its matching PDF. Since the use of these characters around mark-up is both very rare and discouraged by the W3C, this exception is quite minor.
</p><p> When the author wants to use <br> just to wrap a line without adding bidi separation, <br bdi="no"> will do the trick.
</p><p><a href="http://unicode.org/reports/tr20/#Line" class="external text" title="http://unicode.org/reports/tr20/#Line" shape="rect">UTR #20</a> and <a href="http://unicode.org/reports/tr13/tr13-9.html#Background" class="external text" title="http://unicode.org/reports/tr13/tr13-9.html#Background" shape="rect">UAX #13</a> will need to be updated to reflect this change. In the former, 'In HTML, use <xhtml:br /> instead of U+2028' should be replaced with 'In HTML, use <xhtml:br bdi="no" /> instead of U+2028'. In the latter, 'line separators basically correspond to HTML <BR>' should be replaced with 'line separators basically correspond to HTML <BR BDI="no">'.
</p></div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="newline-as-separator" id="newline-as-separator" shape="rect">3.2 Newline characters should serve as bidi separators inside <pre>, <textarea>, and script dialog text</a></h3>
<h4 id="newline-as-separator-background">Background</h4>
<p>As in <a href="#br-as-separator">Section 3.1 </a>. </p>
<h4 id="newline-as-separator-problem">The problem</h4>
<p>IE and WebKit treat newline characters as a UBA paragraph break in <pre>, <textarea>, and the text displayed in dialogs by the page's scripts using functions such as Javascript's alert() and confirm(). Given that in these contexts newlines are expected to behave as they do in plain text, this would seem to be in accordance with the UBA. Firefox, however, treats newlines in all these contexts as UBA whitespace, while Opera treats them as UBA paragraph separators in <textarea> and dialog text, but as whitespace in <pre>. See <a href="#br-as-separator">Section 3.1 </a> for examples where this makes a difference. </p>
<h4 id="newline-as-separator-solution">Proposed solution</h4>
<p>The HTML specification should state that newline characters should be treated as UBA paragraph breaks in the following plain-text environments:
</p>
<ul><li> In <textarea> (except immediately after the start tag or immediately preceding the end tag).
</li><li> Text passed by page scripts for display outside the page, via services such as Javascript's alert() and confirm() functions, or the proposed <a href="http://dev.chromium.org/developers/design-documents/desktop-notifications/api-specification" class="external text" title="http://dev.chromium.org/developers/design-documents/desktop-notifications/api-specification" shape="rect">desktop notifications API</a>
</li></ul>
<p> In <pre>, however, newlines can appear inside inline elements and thus have the same issues with being treated as UBA paragraph breaks as exist for <br>. Thus, a newline inside <pre> (except immediately after a start tag or immediately preceding an end tag, where the newline is ignored) should be treated as if it were a <br bdi="yes"> (see <a href="#br-as-separator">Section 3.1 </a>). </p><p> Where lines are wrapped automatically in any of these contexts, the wrapping should be treated as UBA whitespace.
</p></div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="blocks-as-separators" id="blocks-as-separators" shape="rect">3.3 Embedded block elements should serve as bidi separators </a></h3>
<h4 id="blocks-as-separators-background">Background</h4>
<p>As in <a href="#br-as-separator">Section 3.1 </a>. </p>
<h4 id="blocks-as-separators-problem">The problem</h4>
<p>There is no standard definition of whether a block element serves as a UBA break between the text preceding and following it, i.e. whether the text preceding a <div></div> or an <hr> (<a href="http://www.w3.org/TR/html4/sgml/dtd.html#block" class="external text" title="http://www.w3.org/TR/html4/sgml/dtd.html#block" shape="rect">defined</a> to be a block element) should behave as if it were in the same UBA paragraph as the text following it. For short, we will call block elements with text on both sides "embedded".
</p><p> Different browsers treat embedded block elements differently. Just as with <br>, in Firefox and Opera, an embedded block element provides no bidi separation between the text preceding and following it, while IE and WebKit treat it as a UBA paragraph break. See <a href="#br-as-separator">Section 3.1 </a> for examples where this discrepancy makes a difference; just replace <br> with <hr>.
</p><p> It is difficult to justify Firefox and Opera's treatment of embedded block elements. Besides breaking a line, the embedded block elements:
</p>
<ul><li> Include among them the paragraph element, <p>. It seems reasonable to expect the insertion of a paragraph to break the text before it and the text after it into two UBA paragraphs.
</li><li> The text before and after a block element is said to form "<a href="http://www.w3.org/TR/CSS21/visuren.html#anonymous-block-level" class="external text" title="http://www.w3.org/TR/CSS21/visuren.html#anonymous-block-level" shape="rect">anonymous blocks</a>", and it is well accepted that blocks should constitute UBA paragraphs.
</li><li> Unlike <br>, a block element can and usually does contain text of its own, which certainly forms a separate UBA paragraph. Having the text before an embedded block element affect the display of the text after it or vice-versa, skipping right over the text inside the embedded block element, is very strange, a quite unexpected "action at a distance".
</li></ul>
<p>
Thus, it seems reasonable to resolve the discrepancy in favor of treating embedded elements as UBA paragraph breaks. Since inline elements are not allowed to contain block elements, we expect all ancestors of a block element to be block elements themselves, and thus do not expect the issues that we encountered with inline ancestors in the case of <br>.
</p><p> Unfortunately, life is not simple, and the CSS "display" property allows making an inline element behave like a block element; this creature can then be legally placed inside a normal inline element. Making the inline-turned-to-block element act as a paragraph separator indeed creates a problem for its inline ancestors that create embedding levels (e.g. have dir attributes).
</p><p> Furthermore, the CSS "display" property allows many more possibilities than good old "block" and "inline". Should these be treated as UBA paragraph breaks?
</p>
<h4 id="blocks-as-separators-solution">Proposed solution</h4>
<p>Elements with block display should be specified as introducing a UBA paragraph break between the text preceding and following them.
</p><p> If the block-display element has ancestors with inline display that have bidi properties (e.g. the dir attribute or the <bdo> element), these bidi properties should be applied to the anonymous block boxes created for these inline elements, in accordince with <a href="http://www.w3.org/TR/2009/CR-CSS2-20090908/visuren.html#anonymous-block-level" class="external text" title="http://www.w3.org/TR/2009/CR-CSS2-20090908/visuren.html#anonymous-block-level" shape="rect">CSS specs for anonymous block boxes</a>.
</p>
<p> The text inside an elements with inline-block display should constitute a UBA paragraph, but probably should not introduce a UBA paragraph break. Instead, it can default to bdi="yes". This and the other types of display should be given more thorough investigation.
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="script-dialog" id="script-dialog" shape="rect">3.4 Script dialog text should be displayed in the page's direction </a></h3>
<h4 id="script-dialog-background">Background</h4>
<p>The W3C <a href="http://www.w3.org/TR/i18n-html-tech-bidi/#ri20030112.214820604" class="external text" title="http://www.w3.org/TR/i18n-html-tech-bidi/#ri20030112.214820604" shape="rect">recommends</a> that in HTML, the direction of text be declared using the dir attribute, avoiding the use of Unicode formatting characters LRE, RLE, and PDF except where the dir attribute is inapplicable.
</p>
<h4 id="script-dialog-problem">The problem</h4>
<p>One would expect that the page's direction set using <html dir=...> or <body dir=...> would apply to the text displayed in dialogs by the page's scripts using functions such as Javascript's alert() and confirm(). This is in fact the case in IE. However, it is not the case in any other major browser. The directional context that the major non-IE browsers use for dialog text is either the OS or the browser chrome's default direction, which neither the server nor page scripts can even determine, let alone control.
</p><p> Since a value displayed in the wrong direction can come out garbled, pages wind up having to wrap their RTL dialog text in RLE + PDF characters for correct display on LTR systems. On the other hand, pages dare not wrap their LTR dialog text in LRE + PDF characters for correct display on RTL systems, since most computers in the world are running an LTR OS without RTL script support turned on, and thus display LRE and PDF as rectangles. (This is not a concern in the case of RTL dialog text, since a system that does not have RTL script support will not display RTL text correctly anyway.) Furthermore, these formatting characters are little-known, lack named entities, and are generally undesirable in HTML documents.
</p>
<h4 id="script-dialog-solution">Proposed solution</h4>
<p>The HTML specification should state that text passed by page scripts for display outside the page, via services such as Javascript's alert() and confirm() functions, or the proposed <a href="http://dev.chromium.org/developers/design-documents/desktop-notifications/api-specification" class="external text" title="http://dev.chromium.org/developers/design-documents/desktop-notifications/api-specification" shape="rect">desktop notifications API</a> will be displayed in the <body> element's direction. Backward compatibility is not an issue here because this aspect of bidi behavior was never defined by the HTML standard, and browser behavior has not been consistent.
</p><p> The proposed solution agrees with the <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=2" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=2" shape="rect">dir on html, javascript alert box</a> and <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=10" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=10" shape="rect">dir on body, javascript alert box</a> tests in the <a href="http://www.w3.org/International/tests/list-html-css#chromebidi" class="external text" title="http://www.w3.org/International/tests/list-html-css#chromebidi" shape="rect">i18n test suite being developed by the W3C</a>.
</p><p> The <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=15" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=15" shape="rect">local dir, javascript alert box</a> test and similar ones for <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=16" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=16" shape="rect">confirm()</a> and <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=17" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=17" shape="rect">prompt()</a>, which instead assert that dialog text should be displayed in the triggering element's direction, should be modified to conform with the proposed solution. Making the dialog text direction dependent on the triggering element makes things too difficult for the page developer, since the same function when called for events triggered by different elements - or even the same element after its computed direction changes - will result in different dialog displays.
</p><p> It is easy enough for a browser to implement the proposed solution, since it knows the default directional context in which the text will be displayed by the underlying platform. If and only if this differs from the page's direction, the browser needs to wrap (each paragraph of) the dialog text in RLE + PDF in an RTL page and LRE + PDF in an LTR page.
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="title-and-dir" id="title-and-dir" shape="rect">3.5 <title> should support the dir attribute </a></h3>
<h4 id="title-and-dir-background">Background</h4>
<p>As in <a href="#script-dialog">Section 3.4 </a>. </p>
<h4 id="title-and-dir-problem">The problem</h4>
<p>One would expect that the page's direction set using <html dir=...> would apply to the page's <title>. Unfortunately, however, this is not the case in any major browser. The directional context all major browsers use for <title> is either the OS or the browser chrome's default direction, which neither the server nor page scripts can even determine, let alone control.
</p><p> Nor does setting the dir attribute directly on the <title> element have any effect in any major browser.</p><p> Since a value displayed using the wrong direction can come out garbled, pages wind up having to wrap their RTL <title> in RLE + PDF characters. This has the same problems as with script dialog text, see <a href="#script-dialog">Section 3.4 </a>.</p>
<h4 id="title-and-dir-solution">Proposed solution</h4>
<p>The HTML specification should explicitly state that the <title>'s text will be displayed in the <title>'s computed direction.
</p><p> It is easy enough for a browser to implement this, since it knows the default directional context in which the text will be displayed. If and only if this differs from the desired direction, the browser needs to wrap the title text in RLE + PDF when RTL is desired and LRE + PDF when LTR is desired.
</p><p>
In principle, this could break existing RTL documents that count on their title being displayed in LTR, as is usually the case today. The change should be made despite this, because:
</p>
<ul><li>
<p> Such documents can't really count on the current behavior anyway: on an RTL OS / browser the title is already displayed RTL.
</p>
</li><li>
<p> In many cases, RTL documents work around the problem by having a title that looks the same whether displayed in LTR or RTL.
</p>
</li><li>
<p>This will fix more documents than it will break.</p>
</li><li>
<p> Forcing backward compatibility will perpetuate an ugly exception.
</p>
</li></ul>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="title-and-alt" id="title-and-alt" shape="rect">3.6 title and alt attribute text should be displayed in the element's direction </a></h3>
<h4 id="title-and-alt-background">Background</h4>
<p>As in <a href="#script-dialog">Section 3.4 </a>. </p>
<h4 id="title-and-alt-problem">The problem</h4>
<p>Currently all major browsers (IE, FF, Chrome, Safari, Opera) display the tooltips specified by a title or alt element attribute in the direction of the element to which it belongs, but this does not appear to be formally specified anywhere. Furthermore, this consensus seems fragile because in principle, the direction of an element and the text of its tooltip do not have to coincide. Here is a reasonable counterexample: an RTL web page displays an LTR address (e.g. for a location in Europe), with a tooltip on the address element saying "ADDRESS" in the page's language. The tooltip thus needs to be RTL while the element needs to be LTR.
</p><p> Until recently, Chrome displayed tooltips in the OS / browser's default direction. When fixing this bug, the initial inclination was to apply only the page's direction, not the element's, due to the "in principle" consideration above.
</p><p> Nevertheless, although counterexamples as given above can be found, tooltip text most usually does have the same direction as the element's text even where the element does have text, which is not very often. For such counterexamples, there is a simple workaround in the form of putting the tooltip on an extra element wrapping the original one.
</p><p> Apparently not trusting browser behavior, the W3C <a href="http://www.w3.org/TR/i18n-html-tech-bidi/#tech-tooltips-etc" class="external text" title="http://www.w3.org/TR/i18n-html-tech-bidi/#tech-tooltips-etc" shape="rect">suggests</a> that tooltip direction may have to be set using LRE | RLE + PDF. This is actually quite difficult to do properly, since wrapping an LTR tooltip in LRE + PDF just in case the browser winds up displaying it in an RTL context will result in the LRE and PDF displaying as rectangles on LTR OS's without RTL support enabled, i.e. the vast majority of computers.</p>
<h4 id="title-and-alt-solution">Proposed solution</h4>
<p>The HTML specification should explicitly state that title and alt attribute text will be displayed in the element's computed direction.</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="option" id="option" shape="rect">3.7 <option> should support the dir attribute and be displayed accordingly both in the dropdown and after being chosen</a></h3>
<h4 id="option-background">Background</h4>
<p>As in <a href="#script-dialog">Section 3.4 </a>. </p>
<h4 id="option-problem">The problem</h4>
<p>In a single <select>, the values of different options may have different directions. Currently, however, out of all major browsers, only FF supports the dir attribute on <option>, and does so poorly: once the value is chosen, it is displayed in the <select>'s direction.
</p><p> IE and Opera display all options in the <select>'s direction.</p>
<p> Safari automatically estimates the direction of each option and displays it as such both in the dropdown and after it has been chosen regardless of the <select>'s direction (which is only used to place the down-arrow button and to align the values). This is all very nice, but direction estimation algorithms do make mistakes, so it would be good to be able to specify the actual dir value for a given <option> - and Safari does not support that.
</p><p> Chrome does not support the dir attribute on <option> and is on its way to doing what Safari does.
</p><p> As a result, the only practical way to specify <option> value direction is using LRE | RLE + PDF, which is cumbersome.
</p>
<h4 id="option-solution">Proposed solution</h4>
<p>The HTML specification should state that an <option> element's computed direction will take its dir attribute into account, and will be used to display the option's text in both the dropdown and after being chosen.</p><p> The HTML specification should also state that setting an <option> element's alignment via CSS or the align attribute will affect its display accordingly in both the dropdown and after being chosen.
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="set-direction" id="set-direction" shape="rect">3.8 <input type="text"> and <textarea> should support compatible "set direction" functionality</a></h3>
<h4 id="set-direction-background">Background</h4>
<p>Garbling by incorrect direction also applies to text being entered by the user in an input control. In fact, entering text of direction opposite to the input's declared direction is an unpleasant experience even if the full text does not wind up being garbled, due to the cursor and punctuation jumping around during data entry and difficulty in selecting text. All major browsers thus provide some way for the user to set the direction of each <input type="text"> and <textarea> element. </p>
<h4 id="set-direction-problem">The problem</h4>
<p>Unfortunately, the way "set direction" functionality interacts with page scripts varies significantly between browsers, which makes it difficult to write scripts that are informed of the user's choice.
</p><p> IE: Direction is set using keyboard shortcuts - CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL. (These key combinations are also adopted for this purpose by most Microsoft products, e.g. Windows dialogs, notepad and Word.) They set the value of the element's dir attribute, which is then available to scripts. They trigger the onpropertychange event, at which time the dir value is already changed. They also trigger onkeyup, but before the dir value has been changed, so setTimeout(0) has to be used to get the updated dir value. They do not trigger onkeypress.
</p><p>
FF: Direction is set using the CTRL + SHIFT + X keyboard shortcut, which cycles through LTR and RTL. It does not set the value of the element's dir attribute, and is thus invisible to scripts.
</p><p> Opera: same keyboard shortcuts as IE. They do not set the value of the element's dir attribute, and are thus invisible to scripts.
</p><p> Chrome: same keyboard shortcuts as IE. They set the value of the element's dir attribute, which is then available to scripts. They trigger the onkeyup event, at which time the dir value is already changed. They do not trigger onkeypress or oninput. They also do not trigger onpropertychange, since this event exists only in IE.
</p><p> Safari: Right-click on the <input> or <textarea> provides a "Set paragraph direction" submenu. Using "Set paragraph direction" sets the value of the element's dir attribute, which is then available to scripts. However, it does not trigger onkeyup, onkeypress, or oninput. It also doesn't trigger onpropertychange, since this event exists only in IE.
</p>
<h4 id="set-direction-solution">Proposed solution</h4>
<p>The HTML specification should state that some way to set the direction of <input type="text"> and <textarea> elements should be exposed to the user, and using it will:
</p>
<ul><li> Set the element's dir attribute value accordingly.
</li><li> Trigger oninput after the dir attribute has been set; even though no actual input took place, the user did change the recommended interpretation of the input already collected.
</li></ul>
<p> Furthermore, it should be recommended that on an OS that has a widespread convention for setting direction (such as CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL on Windows), the user agent will support that convention (although it may provide other methods too).
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="remember-input-dir" id="remember-input-dir" shape="rect">3.9 When an input value is remembered, its direction should be remembered too</a></h3>
<h4 id="remember-input-dir-background">Background</h4>
<p>Some browsers implement auto-completion, a feature whereby values previously entered into an element like <input type="text"> are remembered and under certain conditions presented to the user in a dropdown. When the user selects one of the items in the dropdown, this value is assigned to the element. At different times, the user may enter values of different direction for the same input. The direction of a value is set either directly by the user through a "set direction" command exposed by the browser (e.g. via keyboard shortcuts, see <a href="#set-direction">Section 3.8 </a>) or letting page scripts automatically set the input's dir attribute after estimating the direction of the value on the fly. </p>
<h4 id="remember-input-dir-problem">The problem</h4>
<p>Browsers do not remember the direction of previously-entered values. Some display them in the dropdown in the OS or browser default direction. Some display them in the input's current direction. Finally, some display each value in its own estimated direction. Each of these will result in some values being displayed incorrectly; even the last approach will sometimes fail because estimation algorithms do make mistakes, and this may not have been the direction originally set by the user or page scripts. </p>
<p>After the user chooses a value from the dropdown, the value is usually displayed in the input's current direction, which may or may not be correct for it. </p>
<h4 id="remember-input-dir-solution">Proposed solution</h4>
<p>The HTML specification should state that whenever a user agent stores a user-provided <input type="text"> or <textarea> value for later use (such as auto-completion), it should also store the nominal direction value the element had when displaying this value. This may be the original direction of the element, or may have been set by the user for that value via keyboard shortcuts, or may have been set for that value by page scripts. If the user agent later displays the value in an auto-completion dropdown, it should be displayed in its stored direction. If the value is assigned to an element, the element's dir value should be set to its stored direction. </p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="lists" id="lists" shape="rect">3.10 The rendering of numbering or bullets in a list should be independent of the direction of individual <li> elements</a></h3>
<h4 id="lists-background">Background</h4>
<p>The HTML specifications gives no indication of how the bullet or number of an <li> element should be displayed when its computed direction is the opposite of its parent's (usually an <ol> or <ul>). </p>
<h4 id="lists-problem">The problem</h4>
<p>In practice, for <li> elements whose "list-style-position" CSS property has the default "outside" value, different browsers do different things. Furthermore, the effects vary depending on the list's alignment, and whether it is ordered or unordered:</p>
<div class="exampleOuter">
<div class="exampleHeader"><a name="table1" id="table1">Example 6: </a></div>
<p><code><ul dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><th rowspan="1" colspan="1">IE</th><th rowspan="1" colspan="1">Firefox, Opera and WebKit</th></tr><tr><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div style="text-align:right;">.C METI LTR * </div>
</td><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div style="text-align:right;">.C METI LTR </div>
</td></tr></tbody></table>
</div>
<!--p><span class="leading">Example 1:</span> <code><ul dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<th>IE</th>
<th>Firefox, Opera and WebKit</th>
</tr>
<tr>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div style="text-align:right;">.C METI LTR * </div>
</td>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div style="text-align:right;">.C METI LTR </div>
</td>
</tr>
</tbody></table>
</div-->
<!--p><span class="leading">Example 2:</span> <code><ol dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<th>IE</th>
<th>Firefox, Opera and WebKit</th>
</tr>
<tr>
<td>
<div> 1. item a.</div>
<div> 2. longer item b.</div>
<div style="text-align: right;"> .C METI LTR . </div>
</td>
<td>
<div> 1. item a.</div>
<div> 2. longer item b.</div>
<div style="text-align: right;"> .C METI LTR </div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table2" id="table2">Example 7: </a></div>
<p><code><ol dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><th rowspan="1" colspan="1">IE</th><th rowspan="1" colspan="1">Firefox, Opera and WebKit</th></tr><tr><td rowspan="1" colspan="1">
<div> 1. item a.</div>
<div> 2. longer item b.</div>
<div style="text-align: right;"> .C METI LTR . </div>
</td><td rowspan="1" colspan="1">
<div> 1. item a.</div>
<div> 2. longer item b.</div>
<div style="text-align: right;"> .C METI LTR </div>
</td></tr></tbody></table>
</div>
<!--p><span class="leading">Example 3:</span> <code><ul dir="ltr" style="text-align:left"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<th>IE</th>
<th>Firefox and Opera</th>
<th>WebKit</th>
</tr>
<tr>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div> .C METI LTR *</div>
</td>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div> .C METI LTR *</div>
</td>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div> .C METI LTR</div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table3" id="table3">Example 8: </a></div>
<p><code><ul dir="ltr" style="text-align:left"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><th rowspan="1" colspan="1">IE</th><th rowspan="1" colspan="1">Firefox and Opera</th><th rowspan="1" colspan="1">WebKit</th></tr><tr><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div> .C METI LTR *</div>
</td><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div> .C METI LTR *</div>
</td><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div> .C METI LTR</div>
</td></tr></tbody></table>
</div>
<!--p><span class="leading">Example 4:</span> <code><ul dir="ltr" style="text-align:right"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<th>IE</th>
<th>Firefox and Opera</th>
<th>WebKit</th>
</tr>
<tr>
<td>
<div style="text-align: right;">* item a. </div>
<div style="text-align: right;">* longer item b. </div>
<div style="text-align: right;">.C METI LTR * </div>
</td>
<td>
<div style="text-align: right;">* item a. </div>
<div style="text-align: right;">* longer item b. </div>
<div style="text-align: right;">.C METI LTR </div>
</td>
<td>
<div style="float: left;"> *</div><div style="float: right;">item a. </div>
<div style="clear: both;"></div>
<div style="float: left;"> *</div><div style="float: right;">longer item b. </div>
<div style="clear: both;"></div>
<div style="float: left;"></div><div style="float: right;">.C METI LTR </div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table4" id="table4">Example 9: </a></div>
<p><code><ul dir="ltr" style="text-align:right"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><th rowspan="1" colspan="1">IE</th><th rowspan="1" colspan="1">Firefox and Opera</th><th rowspan="1" colspan="1">WebKit</th></tr><tr><td rowspan="1" colspan="1">
<div style="text-align: right;">* item a. </div>
<div style="text-align: right;">* longer item b. </div>
<div style="text-align: right;">.C METI LTR * </div>
</td><td rowspan="1" colspan="1">
<div style="text-align: right;">* item a. </div>
<div style="text-align: right;">* longer item b. </div>
<div style="text-align: right;">.C METI LTR </div>
</td><td rowspan="1" colspan="1">
<div style="float: left;"> *</div>
<div style="float: right;">item a. </div>
<div style="clear: both;"></div>
<div style="float: left;"> *</div>
<div style="float: right;">longer item b. </div>
<div style="clear: both;"></div>
<div style="float: left;"></div>
<div style="float: right;">.C METI LTR </div>
</td></tr></tbody></table>
</div>
<p> In our opinion, not only is browser behavior unacceptably incompatible and inconsistent, but none of the above provides a usable display of opposite-direction list items.
</p>
<h4 id="lists-solution">Proposed solution</h4>
<p>In our opinion, the bullets/numbers of all "list-style-position:outside" items, regardless of their direction, should always:
</p>
<ul><li>
<p> Be present.
</p>
</li><li>
<p> Line up.
</p>
</li></ul>
<p> This can be easily achieved by specifying that the rendering of "list-style-position:outside" list item bullets should depend on the direction of the list element (<ul>, <ol>, etc.), and not depend on the direction of the individual list items or the alignment of either the list or the list item. The outcome should look like this:
</p>
<!--p><span class="leading">Example 1:</span> <code><ul dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div><div style="float: left;"> *</div><div style="float: right;">.C METI LTR </div></div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table5" id="table5">Example 10: </a>.</div>
<p><code><ul dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div>
<div style="float: left;"> *</div>
<div style="float: right;">.C METI LTR </div>
</div>
</td></tr></tbody></table>
</div>
<!--p><span class="leading">Example 2:</span> <code><ol dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<td>
<div> 1. item a.</div>
<div> 2. longer item b.</div>
<div><div style="float: left;"> 3.</div><div style="float: right;">.C METI LTR </div></div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table6" id="table6">Example 11: </a></div>
<p><code><ol dir="ltr"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><td rowspan="1" colspan="1">
<div> 1. item a.</div>
<div> 2. longer item b.</div>
<div>
<div style="float: left;"> 3.</div>
<div style="float: right;">.C METI LTR </div>
</div>
</td></tr></tbody></table>
</div>
<!--p><span class="leading">Example 3:</span> <code><ul dir="ltr" style="text-align:left"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<td>
<div> * item a.</div>
<div> * longer item b.</div>
<div> * .C METI LTR</div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table7" id="table7">Example 12: </a></div>
<p><code><ul dir="ltr" style="text-align:left"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><td rowspan="1" colspan="1">
<div> * item a.</div>
<div> * longer item b.</div>
<div> * .C METI LTR</div>
</td></tr></tbody></table>
</div>
<!--p><span class="leading">Example 4:</span> <code><ul dir="ltr" style="text-align:right"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<div class="exampleOuter">
<table class="list-examples">
<tbody><tr>
<td>
<div style="float: left;"> *</div><div style="float: right;">item a. </div>
<div style="clear: both;"></div>
<div style="float: left;"> *</div><div style="float: right;">longer item b. </div>
<div style="clear: both;"></div>
<div style="float: left;"> *</div><div style="float: right;">.C METI LTR </div>
</td>
</tr>
</tbody></table>
</div-->
<div class="exampleOuter">
<div class="exampleHeader"><a name="table8" id="table8">Example 13: </a></div>
<p><code><ul dir="ltr" style="text-align:right"><li>item a.</li><li> longer item b.</li><li dir="rtl">RTL ITEM C.</li></ul></code></p>
<table class="list-examples"><tbody><tr><td rowspan="1" colspan="1">
<div style="float: left;"> *</div>
<div style="float: right;">item a. </div>
<div style="clear: both;"></div>
<div style="float: left;"> *</div>
<div style="float: right;">longer item b. </div>
<div style="clear: both;"></div>
<div style="float: left;"> *</div>
<div style="float: right;">.C METI LTR </div>
</td></tr></tbody></table>
</div>
<p> The HTML specification should also state that setting an <li> element's alignment via CSS or the align attribute will affect its display accordingly.
</p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="vertical-scrollbar" id="vertical-scrollbar" shape="rect">3.11 A page's overall vertical scrollbar should be on the "end" side relative to the user agent chrome direction</a></h3>
<h4 id="vertical-scrollbar-background">Background</h4>
<p>The vertical scrollbar in an LTR UIs is normally placed on the right side of a window or widget, and on the left side in an RTL UI. </p>
<h4 id="vertical-scrollbar-problem">The problem</h4>
<p>In a browser open on a given page, the UI is made up of two parts: the chrome of the browser itself (e.g. its menus and toolbars), and the page being displayed in the browser. The two parts can be and often are in two different langauges and thus directions.
</p><p> It is unclear which of the two is the principal part of the UI. Certainly the page takes up most of the window and is presumably the user's focus of attention. As a result, it seems natural that the vertical scrollbar should be on the "end" edge relative to the page's (i.e. the <body> element's) overall direction - and not the browser's chrome direction.
</p><p> However, this usually results in a usability issue when surfing: the scrollbar moves from side to side when going from an LTR page to an RTL page or vice-versa, confusing the user and making the scrollbar surprisingly difficult to find visually and click on physically. It is also arguable that the overall scrollbar is a part of the browser chrome, not the page, so it has no business being dependent on the page direction.
</p><p> As a result, Firefox, Chrome and Safari place the scrollbar on the "end" edge relative to the browser's chrome direction. Furthermore, this is the behavior required by the <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=1" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=1" shape="rect">dir on html, vertical scrollbar alignment</a> and <a href="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=9" class="external text" title="http://www.w3.org/International/tests/tests-html-css/tests-bidi-chrome/generatehtml?test=9" shape="rect">dir on body, vertical scrollbar alignment test</a> tests in the <a href="http://www.w3.org/International/tests/list-html-css#chromebidi" class="external text" title="http://www.w3.org/International/tests/list-html-css#chromebidi" shape="rect">i18n test suite being developed by the W3C</a>.
</p><p> However, IE and Opera continue to put the scrollbar relative to the page direction.
</p>
<h4 id="vertical-scrollbar-solution">Proposed solution</h4>
<p>The HTML specification should state that the user agent window's overall vertical scrollbar should be located independent of the direction of any page element, despite being otherwise controlled by the style of the <body> element. (Thus, it should be located on the the "end" side relative to the user agent chrome direction.) </p>
</div>
<div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="vertical-scrollbar-body" id="vertical-scrollbar-body" shape="rect">3.12 The vertical scrollbar of an element below <body> should be on the "end" side relative to the element's direction</a></h3>
<h4 id="vertical-scrollbar-body-background">Background</h4>
<p>As in <a href="#br-as-separator">Section 3.1 </a>. </p>
<h4 id="vertical-scrollbar-body-problem">The problem</h4>
<p>Users expect the vertical scrollbar of a "widget" inside the page to be on an LTR widget's right side, and on an RTL widget's left side. The rationale for making the browser chrome direction determine the location of the vertical scrollbar for the <body> element in <a href="#vertical-scrollbar">Section 3.11 </a> were exceptional to the <body> element:
</p>
<ul><li>
<p> Only the <body>'s scrollbars could conceivably be in the same window location across all pages.
</p>
</li><li>
<p> Only the <body>'s scrollbars can be conceived of as being part of the browser chrome.
</p>
</li></ul>
<p> However, due to the usability problem with the page's overall vertical scrollbar described in <a href="#vertical-scrollbar">Section 3.11 </a>, Firefox, Chrome and Safari place <i>every</i> element's vertical scrollbar on the "end" edge relative to the browser's chrome direction, regardless of the element's direction. While this is indeed desirable for the <body> element as indicated in <a href="#vertical-scrollbar">Section 3.11 </a>, it is not desirable for the elements below it. </p>
<h4 id="vertical-scrollbar-body-solution">Proposed solution</h4>
<p>The HTML specification should state that the vertical scrollbar of an element below <body> should be on the "end" side relative to the element's direction. </p>
</div>
</div>
<div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" class="toclink" title="Go to the table of contents." alt="Go to the table of contents." /></a><a name="appendix-a" id="appendix-a" shape="rect">4 Appendix A: The Word-Count Direction Estimation Algorithm</a></h2>
<p>The word-count algorithm is better suited than the first-strong algorithm for longer texts, e.g. ones where whole sentences and paragraphs are expected, since it is fairly common to use LTR words like brand names and e-mail addresses in RTL text, even as its first word. The first-strong approach in the UBA was really a stopgap. Nevertheless, "first string" is probably less better-suited for short, editable text, e.g. search boxes, where the user probably prefers a predictable result.
</p><p> The word-count algorithm can be defined as follows:
</p>
<ol><li>Extract the element's text content, as described in the Proposed Solution of <a href="#auto-direction">Section 2.2 </a> for any estimation algorithm, i.e.:
<ol><li>The content to be examined is all descendant text nodes, concatenated in "in-fix" order, except for those under a descendant element with a unicode-bidi style other than "normal", e.g. a <bdo> element or an element with a dir attribute (whatever the value).</li><li>No attempt will be made to exclude "hidden" content, whether using display:none style or any other invisibility technique.</li></ol>
</li><li>Divide the text into "words" on line break opportunities, using whatever algorithm the user agent implements for line wrapping.</li><li>Categorize each word as one of LTR, RTL, weak LTR, and neutral, using the following logic:
<ol><li>If the word contains no strong LTR (Unicode bidi class L) or strong RTL (Unicode bidi class R or AL) characters, but does contain numerals (Unicode bidi class EN or AN), it is weak LTR.</li><li>Otherwise, if the word begins with one of the common url protocol identifiers, i.e. "http:", "https:", "ftp:", "file:", and "mailto:", or looks like an e-mail address, domain name, or file path, it is weak LTR. Regular expressions will have to be specified for each of these.</li><li>Otherwise, if the first strong character in the word is LTR, the word is LTR.</li><li>Otherwise, if the first strong character in the word is RTL, the word is RTL.</li><li>Otherwise, the word is neutral.</li><li>For the purposes of the tests above, it is best to ignore (i.e. not count as strongly LTR or RTL) not only LRE, RLE, LRO, RLO, and PDF, but also LRM, and RLM.</li></ol>
</li><li>Compute the direction to be used for the element based on the relative word counts, using the following logic:
<ol><li>If the number of RTL words is 40% or more of the sum of the number of LTR and RTL words, the element's effective direction is RTL.</li><li>Otherwise, if the number of LTR words or the number of weak LTR words exceeds 0, the element's effective direction is LTR.</li><li>Otherwise, the element's computed direction is the same as its parent's computed direction; for the <html> element, it is LTR.</li></ol>
</li></ol>
<p>
There is an almost-compliant
<a href="http://code.google.com/p/closure-templates/source/browse/trunk/java/src/com/google/template/soy/internal/i18n/BidiUtils.java" class="external text" title="http://code.google.com/p/closure-templates/source/browse/trunk/java/src/com/google/template/soy/internal/i18n/BidiUtils.java" shape="rect">Java implementation</a>;
see estimateDirection(String str). A true reference implementation still needs to be provided.
</p><p> Some rationale for the algorithm's design:
</p>
<ul><li> The 40% threshold is somewhat arbitrary, but does reflect the fact that it is common to find substantial amounts of LTR in overall RTL text, but not vice-versa.
</li><li> Character-counting algorithms have been found to be less reliable.
</li><li> Algorithms for recognizing line-break opportunities are a good-enough approximation for true word breaking, and all user agents need to break lines of text in order to display text, so all of them already have a working line-breaking algorithm. Any advances in the direction of true word breaking that they happen to implement are likely to be used for line breaking.
</li><li> Classifying numbers as "weak LTR" is meant to make sure that an element containing a "formatted number", e.g. a phone number containing spaces, is categorized as LTR, which is essential for its correct display, without letting the presence of numbers in RTL text bias it towards LTR.
</li><li> Same for urls, email addresses, etc., which are commonly LTR and are best displayed in LTR, but whose presence in RTL text should not bias it towards LTR.
</li></ul>
<p> Some tweaks that should be considered:
</p>
<ul><li> Modify the 40% threshold, optimally on the basis of real data where the LTR component is not limited to English.
</li><li> Limit the length of text to be examined, using a sufficiently large value to deal well with most ordinary text paragraphs. Once again, this would optimally be based on real data.
</li><li> If the text starts with a leading LRM or RLM, skip counting words and return the direction indicated by LRM or RLM.
</li><li> Alternatively, recognize the case where the whole string is wrapped with a leading LRE or RLE and a trailing balanced PDF, and return the direction indicated by the LRE or RLE.
</li></ul>
<p> Such tweaks, if not settled at standardization time, as well as differences in the line-breaking algorithms employed, and further tweaks that could be invented by browser manufacturers in the future, could result in divergence in direction estimation results between browsers. However, it does not seem all that terrible if some edge cases were displayed differently in different browsers, when these are explicitly declared to be, in effect, of unknown direction that the browser is supposed to guess.
</p>
</div>
</div><div class="back"></div></body></html>