parsing.html
97.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en-US-x-Hixie" ><head><title>8.2 Parsing HTML documents — HTML5 </title><style type="text/css">
pre { margin-left: 2em; white-space: pre-wrap; }
h2 { margin: 3em 0 1em 0; }
h3 { margin: 2.5em 0 1em 0; }
h4 { margin: 2.5em 0 0.75em 0; }
h5, h6 { margin: 2.5em 0 1em; }
h1 + h2, h1 + h2 + h2 { margin: 0.75em 0 0.75em; }
h2 + h3, h3 + h4, h4 + h5, h5 + h6 { margin-top: 0.5em; }
p { margin: 1em 0; }
hr:not(.top) { display: block; background: none; border: none; padding: 0; margin: 2em 0; height: auto; }
dl, dd { margin-top: 0; margin-bottom: 0; }
dt { margin-top: 0.75em; margin-bottom: 0.25em; clear: left; }
dt + dt { margin-top: 0; }
dd dt { margin-top: 0.25em; margin-bottom: 0; }
dd p { margin-top: 0; }
dd dl + p { margin-top: 1em; }
dd table + p { margin-top: 1em; }
p + * > li, dd li { margin: 1em 0; }
dt, dfn { font-weight: bold; font-style: normal; }
dt dfn { font-style: italic; }
pre, code { font-size: inherit; font-family: monospace; font-variant: normal; }
pre strong { color: black; font: inherit; font-weight: bold; background: yellow; }
pre em { font-weight: bolder; font-style: normal; }
@media screen { code { color: orangered; } code :link, code :visited { color: inherit; } }
var sub { vertical-align: bottom; font-size: smaller; position: relative; top: 0.1em; }
table { border-collapse: collapse; border-style: hidden hidden none hidden; }
table thead, table tbody { border-bottom: solid; }
table tbody th:first-child { border-left: solid; }
table tbody th { text-align: left; }
table td, table th { border-left: solid; border-right: solid; border-bottom: solid thin; vertical-align: top; padding: 0.2em; }
blockquote { margin: 0 0 0 2em; border: 0; padding: 0; font-style: italic; }
.bad, .bad *:not(.XXX) { color: gray; border-color: gray; background: transparent; }
.matrix, .matrix td { border: none; text-align: right; }
.matrix { margin-left: 2em; }
.dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
.dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
.dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }
.toc dfn, h1 dfn, h2 dfn, h3 dfn, h4 dfn, h5 dfn, h6 dfn { font: inherit; }
img.extra { float: right; }
pre.idl { border: solid thin; background: #EEEEEE; color: black; padding: 0.5em 1em; }
pre.idl :link, pre.idl :visited { color: inherit; background: transparent; }
pre.css { border: solid thin; background: #FFFFEE; color: black; padding: 0.5em 1em; }
pre.css:first-line { color: #AAAA50; }
dl.domintro { color: green; margin: 2em 0 2em 2em; padding: 0.5em 1em; border: none; background: #DDFFDD; }
hr + dl.domintro, div.impl + dl.domintro { margin-top: 2.5em; margin-bottom: 1.5em; }
dl.domintro dt, dl.domintro dt * { color: black; text-decoration: none; }
dl.domintro dd { margin: 0.5em 0 1em 2em; padding: 0; }
dl.domintro dd p { margin: 0.5em 0; }
dl.switch { padding-left: 2em; }
dl.switch > dt { text-indent: -1.5em; }
dl.switch > dt:before { content: '\21AA'; padding: 0 0.5em 0 0; display: inline-block; width: 1em; text-align: right; line-height: 0.5em; }
dl.triple { padding: 0 0 0 1em; }
dl.triple dt, dl.triple dd { margin: 0; display: inline }
dl.triple dt:after { content: ':'; }
dl.triple dd:after { content: '\A'; white-space: pre; }
.diff-old { text-decoration: line-through; color: silver; background: transparent; }
.diff-chg, .diff-new { text-decoration: underline; color: green; background: transparent; }
a .diff-new { border-bottom: 1px blue solid; }
h2 { page-break-before: always; }
h1, h2, h3, h4, h5, h6 { page-break-after: avoid; }
h1 + h2, hr + h2.no-toc { page-break-before: auto; }
p > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]),
li > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]), { border-bottom: solid #9999CC; }
div.head { margin: 0 0 1em; padding: 1em 0 0 0; }
div.head p { margin: 0; }
div.head h1 { margin: 0; }
div.head .logo { float: right; margin: 0 1em; }
div.head .logo img { border: none } /* remove border from top image */
div.head dl { margin: 1em 0; }
div.head p.copyright, div.head p.alt { font-size: x-small; font-style: oblique; margin: 0; }
body > .toc > li { margin-top: 1em; margin-bottom: 1em; }
body > .toc.brief > li { margin-top: 0.35em; margin-bottom: 0.35em; }
body > .toc > li > * { margin-bottom: 0.5em; }
body > .toc > li > * > li > * { margin-bottom: 0.25em; }
.toc, .toc li { list-style: none; }
.brief { margin-top: 1em; margin-bottom: 1em; line-height: 1.1; }
.brief li { margin: 0; padding: 0; }
.brief li p { margin: 0; padding: 0; }
.category-list { margin-top: -0.75em; margin-bottom: 1em; line-height: 1.5; }
.category-list::before { content: '\21D2\A0'; font-size: 1.2em; font-weight: 900; }
.category-list li { display: inline; }
.category-list li:not(:last-child)::after { content: ', '; }
.category-list li > span, .category-list li > a { text-transform: lowercase; }
.category-list li * { text-transform: none; } /* don't affect <code> nested in <a> */
.XXX { color: #E50000; background: white; border: solid red; padding: 0.5em; margin: 1em 0; }
.XXX > :first-child { margin-top: 0; }
p .XXX { line-height: 3em; }
.annotation { border: solid thin black; background: #0C479D; color: white; position: relative; margin: 8px 0 20px 0; }
.annotation:before { position: absolute; left: 0; top: 0; width: 100%; height: 100%; margin: 6px -6px -6px 6px; background: #333333; z-index: -1; content: ''; }
.annotation :link, .annotation :visited { color: inherit; }
.annotation :link:hover, .annotation :visited:hover { background: transparent; }
.annotation span { border: none ! important; }
.note { color: green; background: transparent; font-family: sans-serif; }
.warning { color: red; background: transparent; }
.note, .warning { font-weight: bolder; font-style: italic; }
p.note, div.note { padding: 0.5em 2em; }
span.note { padding: 0 2em; }
.note p:first-child, .warning p:first-child { margin-top: 0; }
.note p:last-child, .warning p:last-child { margin-bottom: 0; }
.warning:before { font-style: normal; }
p.note:before { content: 'Note: '; }
p.warning:before { content: '\26A0 Warning! '; }
.bookkeeping:before { display: block; content: 'Bookkeeping details'; font-weight: bolder; font-style: italic; }
.bookkeeping { font-size: 0.8em; margin: 2em 0; }
.bookkeeping p { margin: 0.5em 2em; display: list-item; list-style: square; }
.bookkeeping dt { margin: 0.5em 2em 0; }
.bookkeeping dd { margin: 0 3em 0.5em; }
h4 { position: relative; z-index: 3; }
h4 + .element, h4 + div + .element { margin-top: -2.5em; padding-top: 2em; }
.element {
background: #EEEEFF;
color: black;
margin: 0 0 1em 0.15em;
padding: 0 1em 0.25em 0.75em;
border-left: solid #9999FF 0.25em;
position: relative;
z-index: 1;
}
.element:before {
position: absolute;
z-index: 2;
top: 0;
left: -1.15em;
height: 2em;
width: 0.9em;
background: #EEEEFF;
content: ' ';
border-style: none none solid solid;
border-color: #9999FF;
border-width: 0.25em;
}
.example { display: block; color: #222222; background: #FCFCFC; border-left: double; margin-left: 2em; padding-left: 1em; }
td > .example:only-child { margin: 0 0 0 0.1em; }
ul.domTree, ul.domTree ul { padding: 0 0 0 1em; margin: 0; }
ul.domTree li { padding: 0; margin: 0; list-style: none; position: relative; }
ul.domTree li li { list-style: none; }
ul.domTree li:first-child::before { position: absolute; top: 0; height: 0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
ul.domTree li:not(:last-child)::after { position: absolute; top: 0; bottom: -0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
ul.domTree span { font-style: italic; font-family: serif; }
ul.domTree .t1 code { color: purple; font-weight: bold; }
ul.domTree .t2 { font-style: normal; font-family: monospace; }
ul.domTree .t2 .name { color: black; font-weight: bold; }
ul.domTree .t2 .value { color: blue; font-weight: normal; }
ul.domTree .t3 code, .domTree .t4 code, .domTree .t5 code { color: gray; }
ul.domTree .t7 code, .domTree .t8 code { color: green; }
ul.domTree .t10 code { color: teal; }
body.dfnEnabled dfn { cursor: pointer; }
.dfnPanel {
display: inline;
position: absolute;
z-index: 10;
height: auto;
width: auto;
padding: 0.5em 0.75em;
font: small sans-serif, Droid Sans Fallback;
background: #DDDDDD;
color: black;
border: outset 0.2em;
}
.dfnPanel * { margin: 0; padding: 0; font: inherit; text-indent: 0; }
.dfnPanel :link, .dfnPanel :visited { color: black; }
.dfnPanel p { font-weight: bolder; }
.dfnPanel * + p { margin-top: 0.25em; }
.dfnPanel li { list-style-position: inside; }
#configUI { position: absolute; z-index: 20; top: 10em; right: 1em; width: 11em; font-size: small; }
#configUI p { margin: 0.5em 0; padding: 0.3em; background: #EEEEEE; color: black; border: inset thin; }
#configUI p label { display: block; }
#configUI #updateUI, #configUI .loginUI { text-align: center; }
#configUI input[type=button] { display: block; margin: auto; }
fieldset { margin: 1em; padding: 0.5em 1em; }
fieldset > legend + * { margin-top: 0; }
fieldset > :last-child { margin-bottom: 0; }
fieldset p { margin: 0.5em 0; }
.stability {
position: fixed;
bottom: 0;
left: 0; right: 0;
margin: 0 auto 0 auto !important;
z-index: 1000;
width: 50%;
background: maroon; color: yellow;
-webkit-border-radius: 1em 1em 0 0;
-moz-border-radius: 1em 1em 0 0;
border-radius: 1em 1em 0 0;
-moz-box-shadow: 0 0 1em #500;
-webkit-box-shadow: 0 0 1em #500;
box-shadow: 0 0 1em red;
padding: 0.5em 1em;
text-align: center;
}
.stability strong {
display: block;
}
.stability input {
appearance: none; margin: 0; border: 0; padding: 0.25em 0.5em; background: transparent; color: black;
position: absolute; top: -0.5em; right: 0; font: 1.25em sans-serif; text-align: center;
}
.stability input:hover {
color: white;
text-shadow: 0 0 2px black;
}
.stability input:active {
padding: 0.3em 0.45em 0.2em 0.55em;
}
.stability :link, .stability :visited,
.stability :link:hover, .stability :visited:hover {
background: transparent;
color: white;
}
</style><link href="data:text/css,.impl%20%7B%20display:%20none;%20%7D%0Ahtml%20%7B%20border:%20solid%20yellow;%20%7D%20.domintro:before%20%7B%20display:%20none;%20%7D" id="author" rel="alternate stylesheet" title="Author documentation only"><link href="data:text/css,.impl%20%7B%20background:%20%23FFEEEE;%20%7D%20.domintro:before%20%7B%20background:%20%23FFEEEE;%20%7D" id="highlight" rel="alternate stylesheet" title="Highlight implementation
requirements"><link href="http://www.w3.org/StyleSheets/TR/W3C-WD" rel="stylesheet" type="text/css"><style type="text/css">
.applies thead th > * { display: block; }
.applies thead code { display: block; }
.applies tbody th { whitespace: nowrap; }
.applies td { text-align: center; }
.applies .yes { background: yellow; }
.matrix, .matrix td { border: hidden; text-align: right; }
.matrix { margin-left: 2em; }
.dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
.dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
.dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }
td.eg { border-width: thin; text-align: center; }
#table-example-1 { border: solid thin; border-collapse: collapse; margin-left: 3em; }
#table-example-1 * { font-family: "Essays1743", serif; line-height: 1.01em; }
#table-example-1 caption { padding-bottom: 0.5em; }
#table-example-1 thead, #table-example-1 tbody { border: none; }
#table-example-1 th, #table-example-1 td { border: solid thin; }
#table-example-1 th { font-weight: normal; }
#table-example-1 td { border-style: none solid; vertical-align: top; }
#table-example-1 th { padding: 0.5em; vertical-align: middle; text-align: center; }
#table-example-1 tbody tr:first-child td { padding-top: 0.5em; }
#table-example-1 tbody tr:last-child td { padding-bottom: 1.5em; }
#table-example-1 tbody td:first-child { padding-left: 2.5em; padding-right: 0; width: 9em; }
#table-example-1 tbody td:first-child::after { content: leader(". "); }
#table-example-1 tbody td { padding-left: 2em; padding-right: 2em; }
#table-example-1 tbody td:first-child + td { width: 10em; }
#table-example-1 tbody td:first-child + td ~ td { width: 2.5em; }
#table-example-1 tbody td:first-child + td + td + td ~ td { width: 1.25em; }
.apple-table-examples { border: none; border-collapse: separate; border-spacing: 1.5em 0em; width: 40em; margin-left: 3em; }
.apple-table-examples * { font-family: "Times", serif; }
.apple-table-examples td, .apple-table-examples th { border: none; white-space: nowrap; padding-top: 0; padding-bottom: 0; }
.apple-table-examples tbody th:first-child { border-left: none; width: 100%; }
.apple-table-examples thead th:first-child ~ th { font-size: smaller; font-weight: bolder; border-bottom: solid 2px; text-align: center; }
.apple-table-examples tbody th::after, .apple-table-examples tfoot th::after { content: leader(". ") }
.apple-table-examples tbody th, .apple-table-examples tfoot th { font: inherit; text-align: left; }
.apple-table-examples td { text-align: right; vertical-align: top; }
.apple-table-examples.e1 tbody tr:last-child td { border-bottom: solid 1px; }
.apple-table-examples.e1 tbody + tbody tr:last-child td { border-bottom: double 3px; }
.apple-table-examples.e2 th[scope=row] { padding-left: 1em; }
.apple-table-examples sup { line-height: 0; }
.details-example img { vertical-align: top; }
#base64-table {
white-space: nowrap;
font-size: 0.6em;
column-width: 6em;
column-count: 5;
column-gap: 1em;
-moz-column-width: 6em;
-moz-column-count: 5;
-moz-column-gap: 1em;
-webkit-column-width: 6em;
-webkit-column-count: 5;
-webkit-column-gap: 1em;
}
#base64-table thead { display: none; }
#base64-table * { border: none; }
#base64-table tbody td:first-child:after { content: ':'; }
#base64-table tbody td:last-child { text-align: right; }
#named-character-references-table {
white-space: nowrap;
font-size: 0.6em;
column-width: 30em;
column-gap: 1em;
-moz-column-width: 30em;
-moz-column-gap: 1em;
-webkit-column-width: 30em;
-webkit-column-gap: 1em;
}
#named-character-references-table > table > tbody > tr > td:first-child + td,
#named-character-references-table > table > tbody > tr > td:last-child { text-align: center; }
#named-character-references-table > table > tbody > tr > td:last-child:hover > span { position: absolute; top: auto; left: auto; margin-left: 0.5em; line-height: 1.2; font-size: 5em; border: outset; padding: 0.25em 0.5em; background: white; width: 1.25em; height: auto; text-align: center; }
#named-character-references-table > table > tbody > tr#entity-CounterClockwiseContourIntegral > td:first-child { font-size: 0.5em; }
.glyph.control { color: red; }
@font-face {
font-family: 'Essays1743';
src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743.ttf');
}
@font-face {
font-family: 'Essays1743';
font-weight: bold;
src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Bold.ttf');
}
@font-face {
font-family: 'Essays1743';
font-style: italic;
src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Italic.ttf');
}
@font-face {
font-family: 'Essays1743';
font-style: italic;
font-weight: bold;
src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-BoldItalic.ttf');
}
</style><style type="text/css">
.domintro:before { display: table; margin: -1em -0.5em -0.5em auto; width: auto; content: 'This box is non-normative. Implementation requirements are given below this box.'; color: black; font-style: italic; border: solid 2px; background: white; padding: 0 0.25em; }
</style><script type="text/javascript">
function getCookie(name) {
var params = location.search.substr(1).split("&");
for (var index = 0; index < params.length; index++) {
if (params[index] == name)
return "1";
var data = params[index].split("=");
if (data[0] == name)
return unescape(data[1]);
}
var cookies = document.cookie.split("; ");
for (var index = 0; index < cookies.length; index++) {
var data = cookies[index].split("=");
if (data[0] == name)
return unescape(data[1]);
}
return null;
}
</script>
<script src="link-fixup.js" type="text/javascript"></script>
<link href="style.css" rel="stylesheet"><link href="syntax.html" title="8 The HTML syntax" rel="prev">
<link href="spec.html#contents" title="Table of contents" rel="index">
<link href="tokenization.html" title="8.2.4 Tokenization" rel="next">
</head><body><div class="head" id="head">
<div id="multipage-common">
<p class="stability" id="wip"><strong>This is a work in
progress!</strong> For the latest updates from the HTML WG, possibly
including important bug fixes, please look at the <a href="http://dev.w3.org/html5/spec/Overview.html">editor's draft</a> instead.
There may also be a more
<a href="http://www.w3.org/TR/html5">up-to-date Working Draft</a>
with changes based on resolution of Last Call issues.
<input onclick="closeWarning(this.parentNode)" type="button" value="╳⃝"></p>
<script type="text/javascript">
function closeWarning(element) {
element.parentNode.removeChild(element);
var date = new Date();
date.setDate(date.getDate()+4);
document.cookie = 'hide-obsolescence-warning=1; expires=' + date.toGMTString();
}
if (getCookie('hide-obsolescence-warning') == '1')
setTimeout(function () { document.getElementById('wip').parentNode.removeChild(document.getElementById('wip')); }, 2000);
</script></div>
<p><a href="http://www.w3.org/"><img alt="W3C" height="48" src="http://www.w3.org/Icons/w3c_home" width="72"></a></p>
<h1>HTML5</h1>
</div><div>
<a href="syntax.html" class="prev">8 The HTML syntax</a> –
<a href="spec.html#contents">Table of contents</a> –
<a href="tokenization.html" class="next">8.2.4 Tokenization</a>
<ol class="toc"><li><ol><li><a href="parsing.html#parsing"><span class="secno">8.2 </span>Parsing HTML documents</a>
<ol><li><a href="parsing.html#overview-of-the-parsing-model"><span class="secno">8.2.1 </span>Overview of the parsing model</a></li><li><a href="parsing.html#the-input-stream"><span class="secno">8.2.2 </span>The input stream</a>
<ol><li><a href="parsing.html#determining-the-character-encoding"><span class="secno">8.2.2.1 </span>Determining the character encoding</a></li><li><a href="parsing.html#character-encodings-0"><span class="secno">8.2.2.2 </span>Character encodings</a></li><li><a href="parsing.html#preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</a></li><li><a href="parsing.html#changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</a></li></ol></li><li><a href="parsing.html#parse-state"><span class="secno">8.2.3 </span>Parse state</a>
<ol><li><a href="parsing.html#the-insertion-mode"><span class="secno">8.2.3.1 </span>The insertion mode</a></li><li><a href="parsing.html#the-stack-of-open-elements"><span class="secno">8.2.3.2 </span>The stack of open elements</a></li><li><a href="parsing.html#the-list-of-active-formatting-elements"><span class="secno">8.2.3.3 </span>The list of active formatting elements</a></li><li><a href="parsing.html#the-element-pointers"><span class="secno">8.2.3.4 </span>The element pointers</a></li><li><a href="parsing.html#other-parsing-state-flags"><span class="secno">8.2.3.5 </span>Other parsing state flags</a></li></ol></li></ol></li></ol></li></ol></div>
<div class="impl">
<h3 id="parsing"><span class="secno">8.2 </span>Parsing HTML documents</h3>
<p><i>This section only applies to user agents, data mining tools,
and conformance checkers.</i></p>
<p class="note">The rules for parsing XML documents into DOM trees
are covered by the next section, entitled "<a href="the-xhtml-syntax.html#the-xhtml-syntax">The XHTML
syntax</a>".</p>
<p>For <a href="dom.html#html-documents">HTML documents</a>, user agents must use the parsing
rules described in this section to generate the DOM trees. Together,
these rules define what is referred to as the <dfn id="html-parser">HTML
parser</dfn>.</p>
<div class="note">
<p>While the HTML syntax described in this specification bears a
close resemblance to SGML and XML, it is a separate language with
its own parsing rules.</p>
<p>Some earlier versions of HTML (in particular from HTML2 to
HTML4) were based on SGML and used SGML parsing rules. However, few
(if any) web browsers ever implemented true SGML parsing for HTML
documents; the only user agents to strictly handle HTML as an SGML
application have historically been validators. The resulting
confusion — with validators claiming documents to have one
representation while widely deployed Web browsers interoperably
implemented a different representation — has wasted decades
of productivity. This version of HTML thus returns to a non-SGML
basis.</p>
<p>Authors interested in using SGML tools in their authoring
pipeline are encouraged to use XML tools and the XML serialization
of HTML.</p>
</div>
<p>This specification defines the parsing rules for HTML documents,
whether they are syntactically correct or not. Certain points in the
parsing algorithm are said to be <dfn id="parse-error" title="parse error">parse
errors</dfn>. The error handling for parse errors is well-defined:
user agents must either act as described below when encountering
such problems, or must abort processing at the first error that they
encounter for which they do not wish to apply the rules described
below.</p>
<p>Conformance checkers must report at least one parse error
condition to the user if one or more parse error conditions exist in
the document and must not report parse error conditions if none
exist in the document. Conformance checkers may report more than one
parse error condition if more than one parse error condition exists
in the document. Conformance checkers are not required to recover
from parse errors.</p>
<p class="note">Parse errors are only errors with the
<em>syntax</em> of HTML. In addition to checking for parse errors,
conformance checkers will also verify that the document obeys all
the other conformance requirements described in this
specification.</p>
<p>For the purposes of conformance checkers, if a resource is
determined to be in <a href="syntax.html#syntax">the HTML syntax</a>, then it is an
<a href="dom.html#html-documents" title="HTML documents">HTML document</a>.</p>
</div><div class="impl">
<h4 id="overview-of-the-parsing-model"><span class="secno">8.2.1 </span>Overview of the parsing model</h4>
<p>The input to the HTML parsing process consists of a stream of
Unicode characters, which is passed through a
<a href="tokenization.html#tokenization">tokenization</a> stage followed by a <a href="tree-construction.html#tree-construction">tree
construction</a> stage. The output is a <code><a href="infrastructure.html#document">Document</a></code>
object.</p>
<p class="note">Implementations that <a href="infrastructure.html#non-scripted">do not
support scripting</a> do not have to actually create a DOM
<code><a href="infrastructure.html#document">Document</a></code> object, but the DOM tree in such cases is
still used as the model for the rest of the specification.</p>
<p>In the common case, the data handled by the tokenization stage
comes from the network, but <a href="apis-in-html-documents.html#dynamic-markup-insertion" title="dynamic markup
insertion">it can also come from script</a> running in the user
agent, e.g. using the <code title="dom-document-write"><a href="apis-in-html-documents.html#dom-document-write">document.write()</a></code> API.</p>
<p><img alt="" height="554" src="parsing-model-overview.png" width="427"></p>
<p id="nestedParsing">There is only one set of states for the
tokenizer stage and the tree construction stage, but the tree
construction stage is reentrant, meaning that while the tree
construction stage is handling one token, the tokenizer might be
resumed, causing further tokens to be emitted and processed before
the first token's processing is complete.</p>
<div class="example">
<p>In the following example, the tree construction stage will be
called upon to handle a "p" start tag token while handling the
"script" end tag token:</p>
<pre>...
<script>
document.write('<p>');
</script>
...</pre>
</div>
<p>To handle these cases, parsers have a <dfn id="script-nesting-level">script nesting
level</dfn>, which must be initially set to zero, and a <dfn id="parser-pause-flag">parser
pause flag</dfn>, which must be initially set to false.</p>
</div><div class="impl">
<h4 id="the-input-stream"><span class="secno">8.2.2 </span>The <dfn>input stream</dfn></h4>
<p>The stream of Unicode characters that comprises the input to the
tokenization stage will be initially seen by the user agent as a
stream of bytes (typically coming over the network or from the local
file system). The bytes encode the actual characters according to a
particular <em>character encoding</em>, which the user agent must
use to decode the bytes into characters.</p>
<p class="note">For XML documents, the algorithm user agents must
use to determine the character encoding is given by the XML
specification. This section does not apply to XML documents. <a href="references.html#refsXML">[XML]</a></p>
<h5 id="determining-the-character-encoding"><span class="secno">8.2.2.1 </span>Determining the character encoding</h5>
<p>In some cases, it might be impractical to unambiguously determine
the encoding before parsing the document. Because of this, this
specification provides for a two-pass mechanism with an optional
pre-scan. Implementations are allowed, as described below, to apply
a simplified parsing algorithm to whatever bytes they have available
before beginning to parse the document. Then, the real parser is
started, using a tentative encoding derived from this pre-parse and
other out-of-band metadata. If, while the document is being loaded,
the user agent discovers an encoding declaration that conflicts with
this information, then the parser can get reinvoked to perform a
parse of the document with the real encoding.</p>
<p id="documentEncoding">User agents must use the following
algorithm (the <dfn id="encoding-sniffing-algorithm">encoding sniffing algorithm</dfn>) to determine
the character encoding to use when decoding a document in the first
pass. This algorithm takes as input any out-of-band metadata
available to the user agent (e.g. the <a href="fetching-resources.html#content-type" title="Content-Type">Content-Type metadata</a> of the document)
and all the bytes available so far, and returns an encoding and a
<dfn id="concept-encoding-confidence" title="concept-encoding-confidence">confidence</dfn>. The
confidence is either <i>tentative</i>, <i>certain</i>, or
<i>irrelevant</i>. The encoding used, and whether the confidence in
that encoding is <i>tentative</i> or <i>certain</i>, is <a href="tree-construction.html#meta-charset-during-parse">used during the parsing</a> to
determine whether to <a href="#change-the-encoding">change the encoding</a>. If no
encoding is necessary, e.g. because the parser is operating on a
stream of Unicode characters and doesn't have to use an encoding at
all, then the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is
<i>irrelevant</i>.</p>
<ol><li><p>If the user has explicitly instructed the user agent to
override the document's character encoding with a specific
encoding, optionally return that encoding with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>certain</i> and abort these steps.</p></li>
<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>certain</i>, and abort these steps.</p></li>
<li>
<p>The user agent may wait for more bytes of the resource to be
available, either in this step or at any later step in this
algorithm. For instance, a user agent might wait 500ms or 1024
bytes, whichever came first. In general preparsing the source to
find the encoding improves performance, as it reduces the need to
throw away the data structures used when parsing upon finding the
encoding information. However, if the user agent delays too long
to obtain data to determine the encoding, then the cost of the
delay could outweigh any performance improvements from the
preparse.</p>
<p class="note">The authoring conformance requirements for
character encoding declarations limit them to only appearing <a href="semantics.html#charset1024">in the first 1024 bytes</a>. User agents are
therefore encouraged to use the preparse algorithm below (part of
these steps) on the first 1024 bytes, but not to stall beyond
that.</p>
</li>
<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>certain</i>, and abort these steps:</p>
<table><thead><tr><th>Bytes in Hexadecimal
</th><th>Encoding
</th></tr></thead><tbody><tr><td>FE FF
</td><td>Big-endian UTF-16
</td></tr><tr><td>FF FE
</td><td>Little-endian UTF-16
</td></tr><tr><td>EF BB BF
</td><td>UTF-8
</td></tr></tbody></table><p class="note">This step looks for Unicode Byte Order Marks
(BOMs).</p></li>
<li><p>Otherwise, the user agent will have to search for explicit
character encoding information in the file itself. This should
proceed as follows:
</p><p>Let <var title="">position</var> be a pointer to a byte in the
input stream, initially pointing at the first byte. If at any
point during these substeps the user agent either runs out of
bytes or decides that scanning further bytes would not be
efficient, then skip to the next step of the overall character
encoding detection algorithm. User agents may decide that scanning
<em>any</em> bytes is not efficient, in which case these substeps
are entirely skipped.</p>
<p>Now, repeat the following "two" steps until the algorithm
aborts (either because user agent aborts, as described above, or
because a character encoding is found):</p>
<ol><li><p>If <var title="">position</var> points to:</p>
<dl class="switch"><dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
<dd>
<p>Advance the <var title="">position</var> pointer so that it
points at the first 0x3E byte which is preceded by two 0x2D
bytes (i.e. at the end of an ASCII '-->' sequence) and comes
after the 0x3C byte that was found. (The two 0x2D bytes can be
the same as the those in the '<!--' sequence.)</p>
</dd>
<dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and finally one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash)</dt>
<dd>
<ol><li><p>Advance the <var title="">position</var> pointer so
that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
0x2F byte (the one in sequence of characters matched
above).</p></li>
<li><p>Let <var title="">attribute list</var> be an empty
list of strings.</p></li>
<li><p>Let <var title="">got pragma</var> be false.</p></li>
<li><p>Let <var title="">need pragma</var> be null.</p></li>
<li><p>Let <var title="">charset</var> be the null value
(which, for the purposes of this algorithm, is distinct from
an unrecognised encoding or the empty string).</p></li>
<li><p><i>Attributes</i>: <a href="#concept-get-attributes-when-sniffing" title="concept-get-attributes-when-sniffing">Get an
attribute</a> and its value. If no attribute was sniffed,
then jump to the <i>processing</i> step below.</p></li>
<li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
labeled <i>attributes</i>.</p>
</li><li><p>Add the attribute's name to <var title="">attribute
list</var>.</p>
</li><li>
<p>Run the appropriate step from the following list, if one
applies:</p>
<dl class="switch"><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
<dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
pragma</var> to true.</p></dd>
<dt>If the attribute's name is "<code title="">content</code>"</dt>
<dd><p>Apply the <a href="fetching-resources.html#algorithm-for-extracting-an-encoding-from-a-meta-element">algorithm for extracting an encoding
from a <code>meta</code> element</a>, giving the
attribute's value as the string to parse. If an encoding is
returned, and if <var title="">charset</var> is still set
to null, let <var title="">charset</var> be the encoding
returned, and set <var title="">need pragma</var> to
true.</p></dd>
<dt>If the attribute's name is "<code title="">charset</code>"</dt>
<dd><p>Let <var title="">charset</var> be the encoding
corresponding to the attribute's value, and set <var title="">need pragma</var> to false.</p></dd>
</dl></li>
<li><p>Return to the step labeled <i>attributes</i>.</p></li>
<li><p><i>Processing</i>: If <var title="">need pragma</var>
is null, then jump to the second step of the overall "two
step" algorithm.</p></li>
<li><p>If <var title="">mode</var> is true but <var title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</p></li>
<li><p>If <var title="">charset</var> is a UTF-16 encoding,
change the value of <var title="">charset</var> to
UTF-8.</p></li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
overall "two step" algorithm.</p></li>
<li><p>Return the encoding given by <var title="">charset</var>, with <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>tentative</i>, and abort all these steps.</p></li>
</ol></dd>
<dt>A sequence of bytes starting with a 0x3C byte (ASCII <), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
<dd>
<ol><li><p>Advance the <var title="">position</var> pointer so
that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
(ASCII >) byte.</p></li>
<li><p>Repeatedly <a href="#concept-get-attributes-when-sniffing" title="concept-get-attributes-when-sniffing">get an
attribute</a> until no further attributes can be found,
then jump to the second step in the overall "two step"
algorithm.</p></li>
</ol></dd>
<dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')</dt>
<dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')</dt>
<dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
<dd>
<p>Advance the <var title="">position</var> pointer so that it
points at the first 0x3E byte (ASCII >) that comes after the
0x3C byte that was found.</p>
</dd>
<dt>Any other byte</dt>
<dd>
<p>Do nothing with that byte.</p>
</dd>
</dl></li>
<li>Move <var title="">position</var> so it points at the next
byte in the input stream, and return to the first step of this
"two step" algorithm.</li>
</ol><p>When the above "two step" algorithm says to <dfn id="concept-get-attributes-when-sniffing" title="concept-get-attributes-when-sniffing">get an
attribute</dfn>, it means doing this:</p>
<ol><li><p>If the byte at <var title="">position</var> is one of 0x09
(ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
0x20 (ASCII space), or 0x2F (ASCII /) then advance <var title="">position</var> to the next byte and redo this
substep.</p></li>
<li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
>), then abort the "get an attribute" algorithm. There isn't
one.</p></li>
<li><p>Otherwise, the byte at <var title="">position</var> is the
start of the attribute name. Let <var title="">attribute
name</var> and <var title="">attribute value</var> be the empty
string.</p></li>
<li><p><i>Attribute name</i>: Process the byte at <var title="">position</var> as follows:</p>
<dl class="switch"><dt>If it is 0x3D (ASCII =), and the <var title="">attribute
name</var> is longer than the empty string</dt>
<dd>Advance <var title="">position</var> to the next byte and
jump to the step below labeled <i>value</i>.</dd>
<dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>
<dd>Jump to the step below labeled <i>spaces</i>.</dd>
<dt>If it is 0x2F (ASCII /) or 0x3E (ASCII >)</dt>
<dd>Abort the "get an attribute" algorithm. The attribute's
name is the value of <var title="">attribute name</var>, its
value is the empty string.</dd>
<dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
Z)</dt>
<dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
name</var> (where <var title="">b</var> is the value of the
byte at <var title="">position</var>).</dd>
<dt>Anything else</dt>
<dd>Append the Unicode character with the same code point as the
value of the byte at <var title="">position</var>) to <var title="">attribute name</var>. (It doesn't actually matter how
bytes outside the ASCII range are handled here, since only
ASCII characters can contribute to the detection of a character
encoding.)</dd>
</dl></li>
<li><p>Advance <var title="">position</var> to the next byte and
return to the previous step.</p></li>
<li><p><i>Spaces</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
advance <var title="">position</var> to the next byte, then,
repeat this step.</p></li>
<li><p>If the byte at <var title="">position</var> is
<em>not</em> 0x3D (ASCII =), abort the "get an attribute"
algorithm. The attribute's name is the value of <var title="">attribute name</var>, its value is the empty
string.</p></li>
<li><p>Advance <var title="">position</var> past the 0x3D (ASCII
=) byte.</p></li>
<li><p><i>Value</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
advance <var title="">position</var> to the next byte, then,
repeat this step.</p></li>
<li><p>Process the byte at <var title="">position</var> as
follows:</p>
<dl class="switch"><dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>
<dd>
<ol><li>Let <var title="">b</var> be the value of the byte at
<var title="">position</var>.</li>
<li>Advance <var title="">position</var> to the next
byte.</li>
<li>If the value of the byte at <var title="">position</var>
is the value of <var title="">b</var>, then advance <var title="">position</var> to the next byte and abort the "get
an attribute" algorithm. The attribute's name is the value of
<var title="">attribute name</var>, and its value is the
value of <var title="">attribute value</var>.</li>
<li>Otherwise, if the value of the byte at <var title="">position</var> is in the range 0x41 (ASCII A) to
0x5A (ASCII Z), then append a Unicode character to <var title="">attribute value</var> whose code point is 0x20 more
than the value of the byte at <var title="">position</var>.</li>
<li>Otherwise, append a Unicode character to <var title="">attribute value</var> whose code point is the same as
the value of the byte at <var title="">position</var>.</li>
<li>Return to the second step in these substeps.</li>
</ol></dd>
<dt>If it is 0x3E (ASCII >)</dt>
<dd>Abort the "get an attribute" algorithm. The attribute's
name is the value of <var title="">attribute name</var>, its
value is the empty string.</dd>
<dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
Z)</dt>
<dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
value</var> (where <var title="">b</var> is the value of the
byte at <var title="">position</var>). Advance <var title="">position</var> to the next byte.</dd>
<dt>Anything else</dt>
<dd>Append the Unicode character with the same code point as the
value of the byte at <var title="">position</var>) to <var title="">attribute value</var>. Advance <var title="">position</var> to the next byte.</dd>
</dl></li>
<li><p>Process the byte at <var title="">position</var> as
follows:</p>
<dl class="switch"><dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
>)</dt>
<dd>Abort the "get an attribute" algorithm. The attribute's
name is the value of <var title="">attribute name</var> and its
value is the value of <var title="">attribute value</var>.</dd>
<dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
Z)</dt>
<dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
value</var> (where <var title="">b</var> is the value of the
byte at <var title="">position</var>).</dd>
<dt>Anything else</dt>
<dd>Append the Unicode character with the same code point as the
value of the byte at <var title="">position</var>) to <var title="">attribute value</var>.</dd>
</dl></li>
<li><p>Advance <var title="">position</var> to the next byte and
return to the previous step.</p></li>
</ol><p>For the sake of interoperability, user agents should not use a
pre-scan algorithm that returns different results than the one
described above. (But, if you do, please at least let us know, so
that we can improve this algorithm and benefit everyone...)</p>
</li>
<li><p>If the user agent has information on the likely encoding for
this page, e.g. based on the encoding of the page when it was last
visited, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>tentative</i>, and abort these steps.</p></li>
<li>
<p>The user agent may attempt to autodetect the character encoding
from applying frequency analysis or other algorithms to the data
stream. Such algorithms may use information about the resource
other than the resource's contents, including the address of the
resource. If autodetection succeeds in determining a character
encoding, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>tentative</i>, and abort these steps. <a href="references.html#refsUNIVCHARDET">[UNIVCHARDET]</a></p>
<p class="note">The UTF-8 encoding has a highly detectable bit
pattern. Documents that contain bytes with values greater than
0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very
likely not. User-agents are therefore encouraged to search for
this common encoding. <a href="references.html#refsPPUTF8">[PPUTF8]</a> <a href="references.html#refsUTF8DET">[UTF8DET]</a></p>
</li>
<li>
<p>Otherwise, return an implementation-defined or user-specified
default character encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
<i>tentative</i>.</p>
<p>In controlled environments or in environments where the
encoding of documents can be prescribed (for example, for user
agents intended for dedicated use in new networks), the
comprehensive <code title="">UTF-8</code> encoding is
suggested.</p>
<p>In other environments, the default encoding is typically
dependent on the user's locale (an approximation of the languages,
and thus often encodings, of the pages that the user is likely to
frequent). The following table gives suggested defaults based on
the user's locale, for compatibility with legacy content. Locales
are identified by BCP 47 language tags. <a href="references.html#refsBCP47">[BCP47]</a></p>
<table><thead><tr><th>Locale language
</th><th>Suggested default encoding
</th></tr></thead><tbody><tr><td>ar
</td><td>UTF-8
</td></tr><tr><td>be
</td><td>ISO-8859-5
</td></tr><tr><td>bg
</td><td>windows-1251
</td></tr><tr><td>cs<!-- -CZ -->
</td><td>ISO-8859-2
</td></tr><tr><td>cy
</td><td>UTF-8
</td></tr><tr><td>fa<!-- -IR -->
</td><td>UTF-8
</td></tr><tr><td>he<!-- -IL -->
</td><td>windows-1255
</td></tr><tr><td>hr
</td><td>UTF-8
</td></tr><tr><td>hu<!-- -HU -->
</td><td>ISO-8859-2
</td></tr><tr><td>ja
</td><td>Windows-31J
</td></tr><tr><td>kk
</td><td>UTF-8
</td></tr><tr><td>ko<!-- -KR -->
</td><td>windows-949 <!-- EUC-KR -->
</td></tr><tr><td>ku
</td><td>windows-1254 <!-- ISO-8859-9 -->
</td></tr><tr><td>lt
</td><td>windows-1257
</td></tr><tr><td>lv<!-- -LV -->
</td><td>ISO-8859-13
</td></tr><tr><td>mk<!-- -MK -->
</td><td>UTF-8
</td></tr><tr><td>or
</td><td>UTF-8
</td></tr><tr><td>pl<!-- -PL -->
</td><td>ISO-8859-2
</td></tr><tr><td>ro
</td><td>UTF-8
</td></tr><tr><td>ru
</td><td>windows-1251
</td></tr><tr><td>sk
</td><td>windows-1250
</td></tr><tr><td>sl
</td><td>ISO-8859-2
</td></tr><tr><td>sr
</td><td>UTF-8
</td></tr><tr><td>th
</td><td>windows-874 <!-- TIS-620 -->
</td></tr><tr><td>tr<!-- -TR -->
</td><td>windows-1254 <!-- ISO-8859-9 -->
</td></tr><tr><td>uk
</td><td>windows-1251
</td></tr><tr><td>vi
</td><td>UTF-8
</td></tr><tr><td>zh-CN
</td><td>GB18030
</td></tr><tr><td>zh-TW
</td><td>Big5
</td></tr><tr><td>All other locales
</td><td>windows-1252
</td></tr></tbody></table></li>
</ol><p>The <a href="dom.html#document-s-character-encoding">document's character encoding</a> must immediately
be set to the value returned from this algorithm, at the same time
as the user agent uses the returned value to select the decoder to
use for the input stream.</p>
<p class="note">This algorithm is a <a href="introduction.html#willful-violation">willful violation</a>
of the HTTP specification, which requires that the encoding be
assumed to be ISO-8859-1 in the absence of a <a href="semantics.html#character-encoding-declaration">character
encoding declaration</a> to the contrary, and of RFC 2046,
which requires that the encoding be assumed to be US-ASCII in the
absence of a <a href="semantics.html#character-encoding-declaration">character encoding declaration</a> to the
contrary. This specification's third approach is motivated by a
desire to be maximally compatible with legacy content. <a href="references.html#refsHTTP">[HTTP]</a> <a href="references.html#refsRFC2046">[RFC2046]</a></p>
<h5 id="character-encodings-0"><span class="secno">8.2.2.2 </span>Character encodings</h5>
<p>User agents must at a minimum support the UTF-8 and Windows-1252
encodings, but may support more. <a href="references.html#refsRFC3629">[RFC3629]</a> <a href="references.html#refsWIN1252">[WIN1252]</a></p>
<p class="note">It is not unusual for Web browsers to support dozens
if not upwards of a hundred distinct character encodings.</p>
<p>User agents must support the <a href="infrastructure.html#preferred-mime-name">preferred MIME name</a> of
every character encoding they support, and should support all the
IANA-registered names and aliases of every character encoding they
support. <a href="references.html#refsIANACHARSET">[IANACHARSET]</a></p>
<p>When comparing a string specifying a character encoding with the
name or alias of a character encoding to determine if they are
equal, user agents must remove any leading or trailing <a href="common-microsyntaxes.html#space-character" title="space character">space characters</a> in both names, and
then perform the comparison in an <a href="infrastructure.html#ascii-case-insensitive">ASCII
case-insensitive</a> manner.</p>
<hr><p>When a user agent would otherwise use an encoding given in the
first column of the following table to either convert content to
Unicode characters or convert Unicode characters to bytes, it must
instead use the encoding given in the cell in the second column of
the same row. When a byte or sequence of bytes is treated
differently due to this encoding aliasing, it is said to have been
<dfn id="misinterpreted-for-compatibility">misinterpreted for compatibility</dfn>.</p>
<table id="table-encoding-overrides"><caption>Character encoding overrides</caption>
<thead><tr><th> Input encoding </th><th> Replacement encoding </th><th> References
</th></tr></thead><tbody><tr><td> EUC-KR </td><td> windows-949 </td><td>
<a href="references.html#refsEUCKR">[EUCKR]</a>
<a href="references.html#refsWIN949">[WIN949]</a>
</td></tr><tr><td> EUC-JP </td><td> CP51932 </td><td>
<a href="references.html#refsEUCJP">[EUCJP]</a>
<a href="references.html#refsCP51932">[CP51932]</a>
</td></tr><tr><td> GB2312 </td><td> GBK </td><td>
<a href="references.html#refsRFC1345">[RFC1345]</a>
<a href="references.html#refsGBK">[GBK]</a>
</td></tr><tr><td> GB_2312-80 </td><td> GBK </td><td>
<a href="references.html#refsRFC1345">[RFC1345]</a>
<a href="references.html#refsGBK">[GBK]</a>
</td></tr><tr><td> ISO-8859-1 </td><td> windows-1252 </td><td>
<a href="references.html#refsRFC1345">[RFC1345]</a>
<a href="references.html#refsWIN1252">[WIN1252]</a>
</td></tr><tr><td> ISO-8859-9 </td><td> windows-1254 </td><td>
<a href="references.html#refsRFC1345">[RFC1345]</a>
<a href="references.html#refsWIN1254">[WIN1254]</a>
</td></tr><tr><td> ISO-8859-11 </td><td> windows-874 </td><td>
<a href="references.html#refsISO885911">[ISO885911]</a>
<a href="references.html#refsWIN874">[WIN874]</a>
</td></tr><tr><td> KS_C_5601-1987 </td><td> windows-949 </td><td>
<a href="references.html#refsRFC1345">[RFC1345]</a>
<a href="references.html#refsWIN949">[WIN949]</a>
</td></tr><tr><td> Shift_JIS </td><td> Windows-31J </td><td>
<a href="references.html#refsSHIFTJIS">[SHIFTJIS]</a>
<a href="references.html#refsWIN31J">[WIN31J]</a>
</td></tr><tr><td> TIS-620 </td><td> windows-874 </td><td>
<a href="references.html#refsTIS620">[TIS620]</a>
<a href="references.html#refsWIN874">[WIN874]</a>
</td></tr><tr><td> US-ASCII </td><td> windows-1252 </td><td>
<a href="references.html#refsRFC1345">[RFC1345]</a>
<a href="references.html#refsWIN1252">[WIN1252]</a>
</td></tr></tbody></table><p class="note">The requirement to treat certain encodings as other
encodings according to the table above is a <a href="introduction.html#willful-violation">willful
violation</a> of the W3C Character Model specification, motivated
by a desire for compatibility with legacy content. <a href="references.html#refsCHARMOD">[CHARMOD]</a></p>
<p>When a user agent is to use the UTF-16 encoding but no BOM has
been found, user agents must default to UTF-16LE.</p>
<p class="note">The requirement to default UTF-16 to LE rather than
BE is a <a href="introduction.html#willful-violation">willful violation</a> of RFC 2781, motivated by a
desire for compatibility with legacy content. <a href="references.html#refsRFC2781">[RFC2781]</a></p>
<hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
encodings. <a href="references.html#refsCESU8">[CESU8]</a> <a href="references.html#refsUTF7">[UTF7]</a> <a href="references.html#refsBOCU1">[BOCU1]</a> <a href="references.html#refsSCSU">[SCSU]</a></p>
<p>Support for encodings based on EBCDIC is not recommended. This
encoding is rarely used for publicly-facing Web content.</p>
<p>Support for UTF-32 is not recommended. This encoding is rarely
used, and frequently implemented incorrectly.</p>
<p class="note">This specification does not make any attempt to
support EBCDIC-based encodings and UTF-32 in its algorithms; support
and use of these encodings can thus lead to unexpected behavior in
implementations of this specification.</p>
<h5 id="preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</h5>
<p>Given an encoding, the bytes in the input stream must be
converted to Unicode characters for the tokenizer, as described by
the rules for that encoding, except that the leading U+FEFF BYTE
ORDER MARK character, if any, must not be stripped by the encoding
layer (it is stripped by the rule below).</p>
<p>Bytes or sequences of bytes in the original byte stream that
could not be converted to Unicode code points must be converted to
U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is
UTF-8, the bytes must be <a href="infrastructure.html#decoded-as-utf-8-with-error-handling" title="decoded as UTF-8, with error
handling">decoded with the error handling</a> defined in this
specification.</p>
<p class="note">Bytes or sequences of bytes in the original byte
stream that did not conform to the encoding specification
(e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are
errors that conformance checkers are expected to report.</p>
<p>Any byte or sequence of bytes in the original byte stream that is
<a href="#misinterpreted-for-compatibility">misinterpreted for compatibility</a> is a <a href="#parse-error">parse
error</a>.</p>
<p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
any are present.</p>
<p class="note">The requirement to strip a U+FEFF BYTE ORDER MARK
character regardless of whether that character was used to determine
the byte order is a <a href="introduction.html#willful-violation">willful violation</a> of Unicode,
motivated by a desire to increase the resilience of user agents in
the face of naïve transcoders.</p>
<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
U+000E to U+001F, U+007F
to U+009F, U+FDD0
to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF,
U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE,
U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF,
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href="#parse-error" title="parse error">parse
errors</a>. These are all control characters or permanently
undefined Unicode characters (noncharacters).</p>
<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
characters are treated specially. Any CR characters that are
followed by LF characters must be removed, and any CR characters not
followed by LF characters must be converted to LF characters. Thus,
newlines in HTML DOMs are represented by LF characters, and there
are never any CR characters in the input to the
<a href="tokenization.html#tokenization">tokenization</a> stage.</p>
<p>The <dfn id="next-input-character">next input character</dfn> is the first character in the
input stream that has not yet been <dfn id="consumed">consumed</dfn>. Initially,
the <i><a href="#next-input-character">next input character</a></i> is the first character in the
input. The <dfn id="current-input-character">current input character</dfn> is the last character
to have been <i><a href="#consumed">consumed</a></i>.</p>
<p>The <dfn id="insertion-point">insertion point</dfn> is the position (just before a
character or just before the end of the input stream) where content
inserted using <code title="dom-document-write"><a href="apis-in-html-documents.html#dom-document-write">document.write()</a></code> is actually
inserted. The insertion point is relative to the position of the
character immediately after it, it is not an absolute offset into
the input stream. Initially, the insertion point is
undefined.</p>
<p>The "EOF" character in the tables below is a conceptual character
representing the end of the <a href="#the-input-stream">input stream</a>. If the parser
is a <a href="apis-in-html-documents.html#script-created-parser">script-created parser</a>, then the end of the
<a href="#the-input-stream">input stream</a> is reached when an <dfn id="explicit-eof-character">explicit "EOF"
character</dfn> (inserted by the <code title="dom-document-close"><a href="apis-in-html-documents.html#dom-document-close">document.close()</a></code> method) is
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>
<h5 id="changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</h5>
<p>When the parser requires the user agent to <dfn id="change-the-encoding">change the
encoding</dfn>, it must run the following steps. This might happen
if the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> described above
failed to find an encoding, or if it found an encoding that was not
the actual encoding of the file.</p>
<ol><li>If the new encoding is identical or equivalent to the encoding
that is already being used to interpret the input stream, then set
the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
<i>certain</i> and abort these steps. This happens when the
encoding information found in the file matches what the
<a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> determined to be the
encoding, and in the second pass through the parser if the first
pass found that the encoding sniffing algorithm described in the
earlier section failed to find the right encoding.</li>
<li>If the encoding that is already being used to interpret the
input stream is a UTF-16 encoding, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>
<li>If the new encoding is a UTF-16 encoding, change it to
UTF-8.</li>
<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
encoding and the new encoding, and if the user agent supports
changing the converter on the fly, then the user agent may change
to the new converter for the encoding on the fly. Set the
<a href="dom.html#document-s-character-encoding">document's character encoding</a> and the encoding used to
convert the input stream to the new encoding, set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
<i>certain</i>, and abort these steps.</li>
<li>Otherwise, <a href="history.html#navigate">navigate</a> to the
document again, with <a href="history.html#replacement-enabled">replacement enabled</a>, and using
the same <a href="history.html#source-browsing-context">source browsing context</a>, but this time skip
the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> and instead just set
the encoding to the new encoding and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
<i>certain</i>. Whenever possible, this should be done without
actually contacting the network layer (the bytes should be
re-parsed from memory), even if, e.g., the document is marked as
not being cacheable. If this is not possible and contacting the
network layer would involve repeating a request that uses a method
other than HTTP GET (<a href="fetching-resources.html#concept-http-equivalent-get" title="concept-http-equivalent-get">or
equivalent</a> for non-HTTP URLs), then instead set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
<i>certain</i> and ignore the new encoding. The resource will be
misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>
</ol></div><div class="impl">
<h4 id="parse-state"><span class="secno">8.2.3 </span>Parse state</h4>
<h5 id="the-insertion-mode"><span class="secno">8.2.3.1 </span>The insertion mode</h5>
<p>The <dfn id="insertion-mode">insertion mode</dfn> is a state variable that controls
the primary operation of the tree construction stage.</p>
<p>Initially, the <a href="#insertion-mode">insertion mode</a> is "<a href="tree-construction.html#the-initial-insertion-mode" title="insertion mode: initial">initial</a>". It can change to
"<a href="tree-construction.html#the-before-html-insertion-mode" title="insertion mode: before html">before html</a>",
"<a href="tree-construction.html#the-before-head-insertion-mode" title="insertion mode: before head">before head</a>",
"<a href="tree-construction.html#parsing-main-inhead" title="insertion mode: in head">in head</a>", "<a href="tree-construction.html#parsing-main-inheadnoscript" title="insertion mode: in head noscript">in head noscript</a>",
"<a href="tree-construction.html#the-after-head-insertion-mode" title="insertion mode: after head">after head</a>", "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>", "<a href="tree-construction.html#parsing-main-incdata" title="insertion mode: text">text</a>", "<a href="tree-construction.html#parsing-main-intable" title="insertion
mode: in table">in table</a>", "<a href="tree-construction.html#parsing-main-intabletext" title="insertion mode: in
table text">in table text</a>", "<a href="tree-construction.html#parsing-main-incaption" title="insertion mode: in
caption">in caption</a>", "<a href="tree-construction.html#parsing-main-incolgroup" title="insertion mode: in column
group">in column group</a>", "<a href="tree-construction.html#parsing-main-intbody" title="insertion mode: in
table body">in table body</a>", "<a href="tree-construction.html#parsing-main-intr" title="insertion mode: in
row">in row</a>", "<a href="tree-construction.html#parsing-main-intd" title="insertion mode: in cell">in
cell</a>", "<a href="tree-construction.html#parsing-main-inselect" title="insertion mode: in select">in
select</a>", "<a href="tree-construction.html#parsing-main-inselectintable" title="insertion mode: in select in table">in
select in table</a>", "<a href="tree-construction.html#parsing-main-afterbody" title="insertion mode: after
body">after body</a>", "<a href="tree-construction.html#parsing-main-inframeset" title="insertion mode: in
frameset">in frameset</a>", "<a href="tree-construction.html#parsing-main-afterframeset" title="insertion mode: after
frameset">after frameset</a>", "<a href="tree-construction.html#the-after-after-body-insertion-mode" title="insertion mode:
after after body">after after body</a>", and "<a href="tree-construction.html#the-after-after-frameset-insertion-mode" title="insertion mode: after after frameset">after after
frameset</a>" during the course of the parsing, as described in
the <a href="tree-construction.html#tree-construction">tree construction</a> stage. The insertion mode affects
how tokens are processed and whether CDATA sections are
supported.</p>
<p>Several of these modes, namely "<a href="tree-construction.html#parsing-main-inhead" title="insertion mode: in
head">in head</a>", "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in
body</a>", "<a href="tree-construction.html#parsing-main-intable" title="insertion mode: in table">in
table</a>", and "<a href="tree-construction.html#parsing-main-inselect" title="insertion mode: in select">in
select</a>", are special, in that the other modes defer to them
at various times. When the algorithm below says that the user agent
is to do something "<dfn id="using-the-rules-for">using the rules for</dfn> the <var title="">m</var> insertion mode", where <var title="">m</var> is one
of these modes, the user agent must use the rules described under
the <var title="">m</var> <a href="#insertion-mode">insertion mode</a>'s section, but
must leave the <a href="#insertion-mode">insertion mode</a> unchanged unless the
rules in <var title="">m</var> themselves switch the <a href="#insertion-mode">insertion
mode</a> to a new value.</p>
<p>When the insertion mode is switched to "<a href="tree-construction.html#parsing-main-incdata" title="insertion
mode: text">text</a>" or "<a href="tree-construction.html#parsing-main-intabletext" title="insertion mode: in table
text">in table text</a>", the <dfn id="original-insertion-mode">original insertion mode</dfn>
is also set. This is the insertion mode to which the tree
construction stage will return.</p>
<hr><p>When the steps below require the UA to <dfn id="reset-the-insertion-mode-appropriately">reset the insertion
mode appropriately</dfn>, it means the UA must follow these
steps:</p>
<ol><li>Let <var title="">last</var> be false.</li>
<li>Let <var title="">node</var> be the last node in the
<a href="#stack-of-open-elements">stack of open elements</a>.</li>
<li><i>Loop</i>: If <var title="">node</var> is the first node in
the stack of open elements, then set <var title="">last</var> to
true and set <var title="">node</var> to the <var title="concept-frag-parse-context"><a href="the-end.html#concept-frag-parse-context">context</a></var> element.
(<a href="the-end.html#fragment-case">fragment case</a>)</li>
<li>If <var title="">node</var> is a <code><a href="the-button-element.html#the-select-element">select</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inselect" title="insertion mode: in select">in select</a>" and abort these
steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>
<li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-td-element">td</a></code> or
<code><a href="tabular-data.html#the-th-element">th</a></code> element and <var title="">last</var> is false, then
switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intd" title="insertion
mode: in cell">in cell</a>" and abort these steps.</li>
<li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-tr-element">tr</a></code> element, then
switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intr" title="insertion
mode: in row">in row</a>" and abort these steps.</li>
<li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-tbody-element">tbody</a></code>,
<code><a href="tabular-data.html#the-thead-element">thead</a></code>, or <code><a href="tabular-data.html#the-tfoot-element">tfoot</a></code> element, then switch the
<a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intbody" title="insertion mode: in
table body">in table body</a>" and abort these steps.</li>
<li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-caption-element">caption</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-incaption" title="insertion mode: in caption">in caption</a>" and abort
these steps.</li>
<li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-colgroup-element">colgroup</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-incolgroup" title="insertion mode: in column group">in column group</a>" and
abort these steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>
<li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-table-element">table</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intable" title="insertion mode: in table">in table</a>" and abort these
steps.</li>
<li>If <var title="">node</var> is a <code><a href="semantics.html#the-head-element">head</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>" ("<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>"! <em> not "<a href="tree-construction.html#parsing-main-inhead" title="insertion mode: in head">in head</a>"</em>!) and abort
these steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>
<li>If <var title="">node</var> is a <code><a href="sections.html#the-body-element">body</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>" and abort these
steps.</li>
<li>If <var title="">node</var> is a <code><a href="obsolete.html#frameset">frameset</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inframeset" title="insertion mode: in frameset">in frameset</a>" and abort
these steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>
<li>If <var title="">node</var> is an <code><a href="semantics.html#the-html-element">html</a></code> element,
then switch the <a href="#insertion-mode">insertion mode</a>
to "<a href="tree-construction.html#the-before-head-insertion-mode" title="insertion mode: before head">before
head</a>" Then, abort these steps. (<a href="the-end.html#fragment-case">fragment
case</a>)</li>
<li>If <var title="">last</var> is true, then switch the
<a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in
body">in body</a>" and abort these steps. (<a href="the-end.html#fragment-case">fragment
case</a>)</li>
<li>Let <var title="">node</var> now be the node before <var title="">node</var> in the <a href="#stack-of-open-elements">stack of open
elements</a>.</li>
<li>Return to the step labeled <i>loop</i>.</li>
</ol><h5 id="the-stack-of-open-elements"><span class="secno">8.2.3.2 </span>The stack of open elements</h5>
<p>Initially, the <dfn id="stack-of-open-elements">stack of open elements</dfn> is empty. The
stack grows downwards; the topmost node on the stack is the first
one added to the stack, and the bottommost node of the stack is the
most recently added node in the stack (notwithstanding when the
stack is manipulated in a random access fashion as part of <a href="tree-construction.html#adoptionAgency">the handling for misnested tags</a>).</p>
<p>The "<a href="tree-construction.html#the-before-html-insertion-mode" title="insertion mode: before html">before
html</a>" <a href="#insertion-mode">insertion mode</a> creates the
<code><a href="semantics.html#the-html-element">html</a></code> root element node, which is then added to the
stack.</p>
<p>In the <a href="the-end.html#fragment-case">fragment case</a>, the <a href="#stack-of-open-elements">stack of open
elements</a> is initialized to contain an <code><a href="semantics.html#the-html-element">html</a></code>
element that is created as part of <a href="the-end.html#html-fragment-parsing-algorithm" title="html fragment
parsing algorithm">that algorithm</a>. (The <a href="the-end.html#fragment-case">fragment
case</a> skips the "<a href="tree-construction.html#the-before-html-insertion-mode" title="insertion mode: before
html">before html</a>" <a href="#insertion-mode">insertion mode</a>.)</p>
<p>The <code><a href="semantics.html#the-html-element">html</a></code> node, however it is created, is the topmost
node of the stack. It only gets popped off the stack when the parser
<a href="the-end.html#stop-parsing" title="stop parsing">finishes</a>.</p>
<p>The <dfn id="current-node">current node</dfn> is the bottommost node in this
stack.</p>
<p>The <dfn id="current-table">current table</dfn> is the last <code><a href="tabular-data.html#the-table-element">table</a></code>
element in the <a href="#stack-of-open-elements">stack of open elements</a>, if there is
one. If there is no <code><a href="tabular-data.html#the-table-element">table</a></code> element in the <a href="#stack-of-open-elements">stack of
open elements</a> (<a href="the-end.html#fragment-case">fragment case</a>), then the
<a href="#current-table">current table</a> is the first element in the <a href="#stack-of-open-elements">stack
of open elements</a> (the <code><a href="semantics.html#the-html-element">html</a></code> element).</p>
<p>Elements in the stack fall into the following categories:</p>
<dl><dt><dfn id="special">Special</dfn></dt>
<dd><p>The following elements have varying levels of special
parsing rules: HTML's <code><a href="sections.html#the-address-element">address</a></code>, <code><a href="obsolete.html#the-applet-element">applet</a></code>,
<code><a href="the-map-element.html#the-area-element">area</a></code>, <code><a href="sections.html#the-article-element">article</a></code>, <code><a href="sections.html#the-aside-element">aside</a></code>,
<code><a href="semantics.html#the-base-element">base</a></code>, <code><a href="obsolete.html#basefont">basefont</a></code>, <code><a href="obsolete.html#bgsound">bgsound</a></code>,
<code><a href="grouping-content.html#the-blockquote-element">blockquote</a></code>, <code><a href="sections.html#the-body-element">body</a></code>, <code><a href="text-level-semantics.html#the-br-element">br</a></code>,
<code><a href="the-button-element.html#the-button-element">button</a></code>, <code><a href="tabular-data.html#the-caption-element">caption</a></code>, <code><a href="obsolete.html#center">center</a></code>,
<code><a href="tabular-data.html#the-col-element">col</a></code>, <code><a href="tabular-data.html#the-colgroup-element">colgroup</a></code>, <code><a href="interactive-elements.html#the-command-element">command</a></code>,
<code><a href="grouping-content.html#the-dd-element">dd</a></code>, <code><a href="interactive-elements.html#the-details-element">details</a></code>, <code><a href="obsolete.html#dir">dir</a></code>,
<code><a href="grouping-content.html#the-div-element">div</a></code>, <code><a href="grouping-content.html#the-dl-element">dl</a></code>, <code><a href="grouping-content.html#the-dt-element">dt</a></code>,
<code><a href="the-iframe-element.html#the-embed-element">embed</a></code>, <code><a href="forms.html#the-fieldset-element">fieldset</a></code>, <code><a href="grouping-content.html#the-figcaption-element">figcaption</a></code>,
<code><a href="grouping-content.html#the-figure-element">figure</a></code>, <code><a href="sections.html#the-footer-element">footer</a></code>, <code><a href="forms.html#the-form-element">form</a></code>,
<code><a href="obsolete.html#frame">frame</a></code>, <code><a href="obsolete.html#frameset">frameset</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h1</a></code>,
<code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h2</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h3</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h4</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h5</a></code>,
<code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h6</a></code>, <code><a href="semantics.html#the-head-element">head</a></code>, <code><a href="sections.html#the-header-element">header</a></code>,
<code><a href="sections.html#the-hgroup-element">hgroup</a></code>, <code><a href="grouping-content.html#the-hr-element">hr</a></code>, <code><a href="semantics.html#the-html-element">html</a></code>,
<code><a href="the-iframe-element.html#the-iframe-element">iframe</a></code>, <code><a href="embedded-content-1.html#the-img-element">img</a></code>, <code><a href="the-input-element.html#the-input-element">input</a></code>,
<code><a href="obsolete.html#isindex-0">isindex</a></code>, <code><a href="grouping-content.html#the-li-element">li</a></code>, <code><a href="semantics.html#the-link-element">link</a></code>,
<code><a href="obsolete.html#listing">listing</a></code>, <code><a href="obsolete.html#the-marquee-element">marquee</a></code>, <code><a href="interactive-elements.html#the-menu-element">menu</a></code>,
<code><a href="semantics.html#the-meta-element">meta</a></code>, <code><a href="sections.html#the-nav-element">nav</a></code>, <code><a href="obsolete.html#noembed">noembed</a></code>,
<code><a href="obsolete.html#noframes">noframes</a></code>, <code><a href="scripting-1.html#the-noscript-element">noscript</a></code>, <code><a href="the-iframe-element.html#the-object-element">object</a></code>,
<code><a href="grouping-content.html#the-ol-element">ol</a></code>, <code><a href="grouping-content.html#the-p-element">p</a></code>, <code><a href="the-iframe-element.html#the-param-element">param</a></code>,
<code><a href="obsolete.html#plaintext">plaintext</a></code>, <code><a href="grouping-content.html#the-pre-element">pre</a></code>, <code><a href="scripting-1.html#the-script-element">script</a></code>,
<code><a href="sections.html#the-section-element">section</a></code>, <code><a href="the-button-element.html#the-select-element">select</a></code>, <code><a href="semantics.html#the-style-element">style</a></code>,
<code><a href="interactive-elements.html#the-summary-element">summary</a></code>, <code><a href="tabular-data.html#the-table-element">table</a></code>, <code><a href="tabular-data.html#the-tbody-element">tbody</a></code>,
<code><a href="tabular-data.html#the-td-element">td</a></code>, <code><a href="the-button-element.html#the-textarea-element">textarea</a></code>, <code><a href="tabular-data.html#the-tfoot-element">tfoot</a></code>,
<code><a href="tabular-data.html#the-th-element">th</a></code>, <code><a href="tabular-data.html#the-thead-element">thead</a></code>, <code><a href="semantics.html#the-title-element">title</a></code>,
<code><a href="tabular-data.html#the-tr-element">tr</a></code>, <code><a href="grouping-content.html#the-ul-element">ul</a></code>, <code><a href="text-level-semantics.html#the-wbr-element">wbr</a></code>, and
<code><a href="obsolete.html#xmp">xmp</a></code>; MathML's <code title="">mi</code>, <code title="">mo</code>, <code title="">mn</code>, <code title="">ms</code>, <code title="">mtext</code>, and <code title="">annotation-xml</code>; and SVG's <code title="">foreignObject</code>, <code title="">desc</code>, and
<code title="">title</code>.</p></dd>
<dt><dfn id="formatting">Formatting</dfn></dt>
<dd><p>The following HTML elements are those that end up in the
<a href="#list-of-active-formatting-elements">list of active formatting elements</a>: <code><a href="text-level-semantics.html#the-a-element">a</a></code>,
<code><a href="text-level-semantics.html#the-b-element">b</a></code>, <code><a href="obsolete.html#big">big</a></code>, <code><a href="text-level-semantics.html#the-code-element">code</a></code>,
<code><a href="text-level-semantics.html#the-em-element">em</a></code>, <code><a href="obsolete.html#font">font</a></code>, <code><a href="text-level-semantics.html#the-i-element">i</a></code>,
<code><a href="obsolete.html#nobr">nobr</a></code>, <code><a href="text-level-semantics.html#the-s-element">s</a></code>, <code><a href="text-level-semantics.html#the-small-element">small</a></code>,
<code><a href="obsolete.html#strike">strike</a></code>, <code><a href="text-level-semantics.html#the-strong-element">strong</a></code>, <code><a href="obsolete.html#tt">tt</a></code>, and
<code><a href="text-level-semantics.html#the-u-element">u</a></code>.</p></dd>
<dt><dfn id="ordinary">Ordinary</dfn></dt>
<dd><p>All other elements found while parsing an HTML
document.</p></dd>
</dl><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-the-specific-scope" title="has an element in the specific scope">have an element in a
specific scope</dfn> consisting of a list of element types <var title="">list</var> when the following algorithm terminates in a
match state:</p>
<ol><li><p>Initialize <var title="">node</var> to be the <a href="#current-node">current
node</a> (the bottommost node of the stack).</p></li>
<li><p>If <var title="">node</var> is the target node, terminate in
a match state.</p></li>
<li><p>Otherwise, if <var title="">node</var> is one of the element
types in <var title="">list</var>, terminate in a failure
state.</p></li>
<li><p>Otherwise, set <var title="">node</var> to the previous
entry in the <a href="#stack-of-open-elements">stack of open elements</a> and return to step
2. (This will never fail, since the loop will always terminate in
the previous step if the top of the stack — an
<code><a href="semantics.html#the-html-element">html</a></code> element — is reached.)</p></li>
</ol><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-scope" title="has an element in scope">have an element in scope</dfn> when
it <a href="#has-an-element-in-the-specific-scope">has an element in the specific scope</a> consisting
of the following element types:</p>
<ul class="brief"><li><code><a href="obsolete.html#the-applet-element">applet</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="tabular-data.html#the-caption-element">caption</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="semantics.html#the-html-element">html</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="tabular-data.html#the-table-element">table</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="tabular-data.html#the-td-element">td</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="tabular-data.html#the-th-element">th</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="obsolete.html#the-marquee-element">marquee</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="the-iframe-element.html#the-object-element">object</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code title="">mi</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
<li><code title="">mo</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
<li><code title="">mn</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
<li><code title="">ms</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
<li><code title="">mtext</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
<li><code title="">annotation-xml</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
<li><code title="">foreignObject</code> in the <a href="namespaces.html#svg-namespace">SVG namespace</a></li>
<li><code title="">desc</code> in the <a href="namespaces.html#svg-namespace">SVG namespace</a></li>
<li><code title="">title</code> in the <a href="namespaces.html#svg-namespace">SVG namespace</a></li>
</ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-list-item-scope" title="has an element in list item scope">have an element in list
item scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
scope</a> consisting of the following element types:</p>
<ul class="brief"><li>All the element types listed above for the <i><a href="#has-an-element-in-scope">has an element
in scope</a></i> algorithm.</li>
<li><code><a href="grouping-content.html#the-ol-element">ol</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="grouping-content.html#the-ul-element">ul</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
</ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-button-scope" title="has an element in button scope">have an element in button
scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
scope</a> consisting of the following element types:</p>
<ul class="brief"><li>All the element types listed above for the <i><a href="#has-an-element-in-scope">has an element
in scope</a></i> algorithm.</li>
<li><code><a href="the-button-element.html#the-button-element">button</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
</ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-table-scope" title="has an element in table scope">have an element in table
scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
scope</a> consisting of the following element types:</p>
<ul class="brief"><li><code><a href="semantics.html#the-html-element">html</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="tabular-data.html#the-table-element">table</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
</ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-select-scope" title="has an element in select scope">have an element in select
scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
scope</a> consisting of all element types <em>except</em> the
following:</p>
<ul class="brief"><li><code><a href="the-button-element.html#the-optgroup-element">optgroup</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
<li><code><a href="the-button-element.html#the-option-element">option</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
</ul><p>Nothing happens if at any time any of the elements in the
<a href="#stack-of-open-elements">stack of open elements</a> are moved to a new location in,
or removed from, the <code><a href="infrastructure.html#document">Document</a></code> tree. In particular, the
stack is not changed in this situation. This can cause, amongst
other strange effects, content to be appended to nodes that are no
longer in the DOM.</p>
<p class="note">In some cases (namely, when <a href="tree-construction.html#adoptionAgency">closing misnested formatting elements</a>),
the stack is manipulated in a random-access fashion.</p>
<h5 id="the-list-of-active-formatting-elements"><span class="secno">8.2.3.3 </span>The list of active formatting elements</h5>
<p>Initially, the <dfn id="list-of-active-formatting-elements">list of active formatting elements</dfn> is
empty. It is used to handle mis-nested <a href="#formatting" title="formatting">formatting element tags</a>.</p>
<p>The list contains elements in the <a href="#formatting">formatting</a>
category, and scope markers. The scope markers are inserted when
entering <code><a href="obsolete.html#the-applet-element">applet</a></code> elements, buttons, <code><a href="the-iframe-element.html#the-object-element">object</a></code>
elements, marquees, table cells, and table captions, and are used to
prevent formatting from "leaking" <em>into</em> <code><a href="obsolete.html#the-applet-element">applet</a></code>
elements, buttons, <code><a href="the-iframe-element.html#the-object-element">object</a></code> elements, marquees, and
tables.</p>
<p class="note">The scope markers are unrelated to the concept of an
element being <a href="#has-an-element-in-scope" title="has an element in scope">in
scope</a>.</p>
<p>In addition, each element in the <a href="#list-of-active-formatting-elements">list of active formatting
elements</a> is associated with the token for which it was
created, so that further elements can be created for that token if
necessary.</p>
<p>When the steps below require the UA to <dfn id="push-onto-the-list-of-active-formatting-elements">push onto the list of
active formatting elements</dfn> an element <var title="">element</var>, the UA must perform the following steps:</p>
<ol><li><p>If there are already three elements in the <a href="#list-of-active-formatting-elements">list of
active formatting elements</a> after the last list marker, if
any, or anywhere in the list if there are no list markers, that
have the same tag name, namespace, and attributes as <var title="">element</var>, then remove the earliest such element from
the <a href="#list-of-active-formatting-elements">list of active formatting elements</a>. For these
purposes, the attributes must be compared as they were when the
elements were created by the parser; two elements have the same
attributes if all their parsed attributes can be paired such that
the two attributes in each pair have identical names, namespaces,
and values (the order of the attributes does not matter).</p>
<p class="note">This is the Noah's Ark clause. But with three per
family instead of two.</p></li>
<li><p>Add <var title="">element</var> to the <a href="#list-of-active-formatting-elements">list of active
formatting elements</a>.</p></li>
</ol><p>When the steps below require the UA to <dfn id="reconstruct-the-active-formatting-elements">reconstruct the
active formatting elements</dfn>, the UA must perform the following
steps:</p>
<ol><li>If there are no entries in the <a href="#list-of-active-formatting-elements">list of active formatting
elements</a>, then there is nothing to reconstruct; stop this
algorithm.</li>
<li>If the last (most recently added) entry in the <a href="#list-of-active-formatting-elements">list of
active formatting elements</a> is a marker, or if it is an
element that is in the <a href="#stack-of-open-elements">stack of open elements</a>, then
there is nothing to reconstruct; stop this algorithm.</li>
<li>Let <var title="">entry</var> be the last (most recently added)
element in the <a href="#list-of-active-formatting-elements">list of active formatting
elements</a>.</li>
<li>If there are no entries before <var title="">entry</var> in the
<a href="#list-of-active-formatting-elements">list of active formatting elements</a>, then jump to step
8.</li>
<li>Let <var title="">entry</var> be the entry one earlier than
<var title="">entry</var> in the <a href="#list-of-active-formatting-elements">list of active formatting
elements</a>.</li>
<li>If <var title="">entry</var> is neither a marker nor an element
that is also in the <a href="#stack-of-open-elements">stack of open elements</a>, go to step
4.</li>
<li>Let <var title="">entry</var> be the element one later than
<var title="">entry</var> in the <a href="#list-of-active-formatting-elements">list of active formatting
elements</a>.</li>
<li><a href="tree-construction.html#create-an-element-for-the-token">Create an element for the token</a> for which the
element <var title="">entry</var> was created, to obtain <var title="">new element</var>.</li>
<li>Append <var title="">new element</var> to the <a href="#current-node">current
node</a> and push it onto the <a href="#stack-of-open-elements">stack of open
elements</a> so that it is the new <a href="#current-node">current
node</a>.</li>
<li>Replace the entry for <var title="">entry</var> in the list
with an entry for <var title="">new element</var>.</li>
<li>If the entry for <var title="">new element</var> in the
<a href="#list-of-active-formatting-elements">list of active formatting elements</a> is not the last
entry in the list, return to step 7.</li>
</ol><p>This has the effect of reopening all the formatting elements that
were opened in the current body, cell, or caption (whichever is
youngest) that haven't been explicitly closed.</p>
<p class="note">The way this specification is written, the
<a href="#list-of-active-formatting-elements">list of active formatting elements</a> always consists of
elements in chronological order with the least recently added
element first and the most recently added element last (except for
while steps 8 to 11 of the above algorithm are being executed, of
course).</p>
<p>When the steps below require the UA to <dfn id="clear-the-list-of-active-formatting-elements-up-to-the-last-marker">clear the list of
active formatting elements up to the last marker</dfn>, the UA must
perform the following steps:</p>
<ol><li>Let <var title="">entry</var> be the last (most recently added)
entry in the <a href="#list-of-active-formatting-elements">list of active formatting elements</a>.</li>
<li>Remove <var title="">entry</var> from the <a href="#list-of-active-formatting-elements">list of active
formatting elements</a>.</li>
<li>If <var title="">entry</var> was a marker, then stop the
algorithm at this point. The list has been cleared up to the last
marker.</li>
<li>Go to step 1.</li>
</ol><h5 id="the-element-pointers"><span class="secno">8.2.3.4 </span>The element pointers</h5>
<p>Initially, the <dfn id="head-element-pointer"><code title="">head</code> element
pointer</dfn> and the <dfn id="form-element-pointer"><code title="">form</code> element
pointer</dfn> are both null.</p>
<p>Once a <code><a href="semantics.html#the-head-element">head</a></code> element has been parsed (whether
implicitly or explicitly) the <a href="#head-element-pointer"><code title="">head</code>
element pointer</a> gets set to point to this node.</p>
<p>The <a href="#form-element-pointer"><code title="">form</code> element pointer</a>
points to the last <code><a href="forms.html#the-form-element">form</a></code> element that was opened and
whose end tag has not yet been seen. It is used to make form
controls associate with forms in the face of dramatically bad
markup, for historical reasons.</p>
<h5 id="other-parsing-state-flags"><span class="secno">8.2.3.5 </span>Other parsing state flags</h5>
<p>The <dfn id="scripting-flag">scripting flag</dfn> is set to "enabled" if <a href="webappapis.html#concept-n-script" title="concept-n-script">scripting was enabled</a> for the
<code><a href="infrastructure.html#document">Document</a></code> with which the parser is associated when the
parser was created, and "disabled" otherwise.</p>
<p class="note">The <a href="#scripting-flag">scripting flag</a> can be enabled even
when the parser was originally created for the <a href="the-end.html#html-fragment-parsing-algorithm">HTML fragment
parsing algorithm</a>, even though <code><a href="scripting-1.html#the-script-element">script</a></code> elements
don't execute in that case.</p>
<p>The <dfn id="frameset-ok-flag">frameset-ok flag</dfn> is set to "ok" when the parser is
created. It is set to "not ok" after certain tokens are seen.</p>
</div></body></html>