parsing.html 97.8 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en-US-x-Hixie" ><head><title>8.2 Parsing HTML documents &#8212; HTML5 </title><style type="text/css">
   pre { margin-left: 2em; white-space: pre-wrap; }
   h2 { margin: 3em 0 1em 0; }
   h3 { margin: 2.5em 0 1em 0; }
   h4 { margin: 2.5em 0 0.75em 0; }
   h5, h6 { margin: 2.5em 0 1em; }
   h1 + h2, h1 + h2 + h2 { margin: 0.75em 0 0.75em; }
   h2 + h3, h3 + h4, h4 + h5, h5 + h6 { margin-top: 0.5em; }
   p { margin: 1em 0; }
   hr:not(.top) { display: block; background: none; border: none; padding: 0; margin: 2em 0; height: auto; }
   dl, dd { margin-top: 0; margin-bottom: 0; }
   dt { margin-top: 0.75em; margin-bottom: 0.25em; clear: left; }
   dt + dt { margin-top: 0; }
   dd dt { margin-top: 0.25em; margin-bottom: 0; }
   dd p { margin-top: 0; }
   dd dl + p { margin-top: 1em; }
   dd table + p { margin-top: 1em; }
   p + * > li, dd li { margin: 1em 0; }
   dt, dfn { font-weight: bold; font-style: normal; }
   dt dfn { font-style: italic; }
   pre, code { font-size: inherit; font-family: monospace; font-variant: normal; }
   pre strong { color: black; font: inherit; font-weight: bold; background: yellow; }
   pre em { font-weight: bolder; font-style: normal; }
   @media screen { code { color: orangered; } code :link, code :visited { color: inherit; } }
   var sub { vertical-align: bottom; font-size: smaller; position: relative; top: 0.1em; }
   table { border-collapse: collapse; border-style: hidden hidden none hidden; }
   table thead, table tbody { border-bottom: solid; }
   table tbody th:first-child { border-left: solid; }
   table tbody th { text-align: left; }
   table td, table th { border-left: solid; border-right: solid; border-bottom: solid thin; vertical-align: top; padding: 0.2em; }
   blockquote { margin: 0 0 0 2em; border: 0; padding: 0; font-style: italic; }

   .bad, .bad *:not(.XXX) { color: gray; border-color: gray; background: transparent; }
   .matrix, .matrix td { border: none; text-align: right; }
   .matrix { margin-left: 2em; }
   .dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
   .dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
   .dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }

   .toc dfn, h1 dfn, h2 dfn, h3 dfn, h4 dfn, h5 dfn, h6 dfn { font: inherit; }
   img.extra { float: right; }
   pre.idl { border: solid thin; background: #EEEEEE; color: black; padding: 0.5em 1em; }
   pre.idl :link, pre.idl :visited { color: inherit; background: transparent; }
   pre.css { border: solid thin; background: #FFFFEE; color: black; padding: 0.5em 1em; }
   pre.css:first-line { color: #AAAA50; }
   dl.domintro { color: green; margin: 2em 0 2em 2em; padding: 0.5em 1em; border: none; background: #DDFFDD; }
   hr + dl.domintro, div.impl + dl.domintro { margin-top: 2.5em; margin-bottom: 1.5em; }
   dl.domintro dt, dl.domintro dt * { color: black; text-decoration: none; }
   dl.domintro dd { margin: 0.5em 0 1em 2em; padding: 0; }
   dl.domintro dd p { margin: 0.5em 0; }
   dl.switch { padding-left: 2em; }
   dl.switch > dt { text-indent: -1.5em; }
   dl.switch > dt:before { content: '\21AA'; padding: 0 0.5em 0 0; display: inline-block; width: 1em; text-align: right; line-height: 0.5em; }
   dl.triple { padding: 0 0 0 1em; }
   dl.triple dt, dl.triple dd { margin: 0; display: inline }
   dl.triple dt:after { content: ':'; }
   dl.triple dd:after { content: '\A'; white-space: pre; }
   .diff-old { text-decoration: line-through; color: silver; background: transparent; }
   .diff-chg, .diff-new { text-decoration: underline; color: green; background: transparent; }
   a .diff-new { border-bottom: 1px blue solid; }

   h2 { page-break-before: always; }
   h1, h2, h3, h4, h5, h6 { page-break-after: avoid; }
   h1 + h2, hr + h2.no-toc { page-break-before: auto; }

   p  > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]),
   li > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]), { border-bottom: solid #9999CC; }

   div.head { margin: 0 0 1em; padding: 1em 0 0 0; }
   div.head p { margin: 0; }
   div.head h1 { margin: 0; }
   div.head .logo { float: right; margin: 0 1em; }
   div.head .logo img { border: none } /* remove border from top image */
   div.head dl { margin: 1em 0; }
   div.head p.copyright, div.head p.alt { font-size: x-small; font-style: oblique; margin: 0; }

   body > .toc > li { margin-top: 1em; margin-bottom: 1em; }
   body > .toc.brief > li { margin-top: 0.35em; margin-bottom: 0.35em; }
   body > .toc > li > * { margin-bottom: 0.5em; }
   body > .toc > li > * > li > * { margin-bottom: 0.25em; }
   .toc, .toc li { list-style: none; }

   .brief { margin-top: 1em; margin-bottom: 1em; line-height: 1.1; }
   .brief li { margin: 0; padding: 0; }
   .brief li p { margin: 0; padding: 0; }

   .category-list { margin-top: -0.75em; margin-bottom: 1em; line-height: 1.5; }
   .category-list::before { content: '\21D2\A0'; font-size: 1.2em; font-weight: 900; }
   .category-list li { display: inline; }
   .category-list li:not(:last-child)::after { content: ', '; }
   .category-list li > span, .category-list li > a { text-transform: lowercase; }
   .category-list li * { text-transform: none; } /* don't affect <code> nested in <a> */

   .XXX { color: #E50000; background: white; border: solid red; padding: 0.5em; margin: 1em 0; }
   .XXX > :first-child { margin-top: 0; }
   p .XXX { line-height: 3em; }
   .annotation { border: solid thin black; background: #0C479D; color: white; position: relative; margin: 8px 0 20px 0; }
   .annotation:before { position: absolute; left: 0; top: 0; width: 100%; height: 100%; margin: 6px -6px -6px 6px; background: #333333; z-index: -1; content: ''; }
   .annotation :link, .annotation :visited { color: inherit; }
   .annotation :link:hover, .annotation :visited:hover { background: transparent; }
   .annotation span { border: none ! important; }
   .note { color: green; background: transparent; font-family: sans-serif; }
   .warning { color: red; background: transparent; }
   .note, .warning { font-weight: bolder; font-style: italic; }
   p.note, div.note { padding: 0.5em 2em; }
   span.note { padding: 0 2em; }
   .note p:first-child, .warning p:first-child { margin-top: 0; }
   .note p:last-child, .warning p:last-child { margin-bottom: 0; }
   .warning:before { font-style: normal; }
   p.note:before { content: 'Note: '; }
   p.warning:before { content: '\26A0 Warning! '; }

   .bookkeeping:before { display: block; content: 'Bookkeeping details'; font-weight: bolder; font-style: italic; }
   .bookkeeping { font-size: 0.8em; margin: 2em 0; }
   .bookkeeping p { margin: 0.5em 2em; display: list-item; list-style: square; }
   .bookkeeping dt { margin: 0.5em 2em 0; }
   .bookkeeping dd { margin: 0 3em 0.5em; }

   h4 { position: relative; z-index: 3; }
   h4 + .element, h4 + div + .element { margin-top: -2.5em; padding-top: 2em; }
   .element {
     background: #EEEEFF;
     color: black;
     margin: 0 0 1em 0.15em;
     padding: 0 1em 0.25em 0.75em;
     border-left: solid #9999FF 0.25em;
     position: relative;
     z-index: 1;
   }
   .element:before {
     position: absolute;
     z-index: 2;
     top: 0;
     left: -1.15em;
     height: 2em;
     width: 0.9em;
     background: #EEEEFF;
     content: ' ';
     border-style: none none solid solid;
     border-color: #9999FF;
     border-width: 0.25em;
   }

   .example { display: block; color: #222222; background: #FCFCFC; border-left: double; margin-left: 2em; padding-left: 1em; }
   td > .example:only-child { margin: 0 0 0 0.1em; }

   ul.domTree, ul.domTree ul { padding: 0 0 0 1em; margin: 0; }
   ul.domTree li { padding: 0; margin: 0; list-style: none; position: relative; }
   ul.domTree li li { list-style: none; }
   ul.domTree li:first-child::before { position: absolute; top: 0; height: 0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
   ul.domTree li:not(:last-child)::after { position: absolute; top: 0; bottom: -0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
   ul.domTree span { font-style: italic; font-family: serif; }
   ul.domTree .t1 code { color: purple; font-weight: bold; }
   ul.domTree .t2 { font-style: normal; font-family: monospace; }
   ul.domTree .t2 .name { color: black; font-weight: bold; }
   ul.domTree .t2 .value { color: blue; font-weight: normal; }
   ul.domTree .t3 code, .domTree .t4 code, .domTree .t5 code { color: gray; }
   ul.domTree .t7 code, .domTree .t8 code { color: green; }
   ul.domTree .t10 code { color: teal; }

   body.dfnEnabled dfn { cursor: pointer; }
   .dfnPanel {
     display: inline;
     position: absolute;
     z-index: 10;
     height: auto;
     width: auto;
     padding: 0.5em 0.75em;
     font: small sans-serif, Droid Sans Fallback;
     background: #DDDDDD;
     color: black;
     border: outset 0.2em;
   }
   .dfnPanel * { margin: 0; padding: 0; font: inherit; text-indent: 0; }
   .dfnPanel :link, .dfnPanel :visited { color: black; }
   .dfnPanel p { font-weight: bolder; }
   .dfnPanel * + p { margin-top: 0.25em; }
   .dfnPanel li { list-style-position: inside; }

   #configUI { position: absolute; z-index: 20; top: 10em; right: 1em; width: 11em; font-size: small; }
   #configUI p { margin: 0.5em 0; padding: 0.3em; background: #EEEEEE; color: black; border: inset thin; }
   #configUI p label { display: block; }
   #configUI #updateUI, #configUI .loginUI { text-align: center; }
   #configUI input[type=button] { display: block; margin: auto; }

   fieldset { margin: 1em; padding: 0.5em 1em; }
   fieldset > legend + * { margin-top: 0; }
   fieldset > :last-child { margin-bottom: 0; }
   fieldset p { margin: 0.5em 0; }

   .stability {
     position: fixed;
     bottom: 0;
     left: 0; right: 0;
     margin: 0 auto 0 auto !important;
    z-index: 1000;
     width: 50%;
     background: maroon; color: yellow;
     -webkit-border-radius: 1em 1em 0 0;
     -moz-border-radius: 1em 1em 0 0;
     border-radius: 1em 1em 0 0;
     -moz-box-shadow: 0 0 1em #500;
     -webkit-box-shadow: 0 0 1em #500;
     box-shadow: 0 0 1em red;
     padding: 0.5em 1em;
     text-align: center;
   }
   .stability strong {
     display: block;
   }
   .stability input {
     appearance: none; margin: 0; border: 0; padding: 0.25em 0.5em; background: transparent; color: black;
     position: absolute; top: -0.5em; right: 0; font: 1.25em sans-serif; text-align: center;
   }
   .stability input:hover {
     color: white;
     text-shadow: 0 0 2px black;
   }
   .stability input:active {
     padding: 0.3em 0.45em 0.2em 0.55em;
   }
   .stability :link, .stability :visited,
   .stability :link:hover, .stability :visited:hover {
     background: transparent;
     color: white;
   }

  </style><link href="data:text/css,.impl%20%7B%20display:%20none;%20%7D%0Ahtml%20%7B%20border:%20solid%20yellow;%20%7D%20.domintro:before%20%7B%20display:%20none;%20%7D" id="author" rel="alternate stylesheet" title="Author documentation only"><link href="data:text/css,.impl%20%7B%20background:%20%23FFEEEE;%20%7D%20.domintro:before%20%7B%20background:%20%23FFEEEE;%20%7D" id="highlight" rel="alternate stylesheet" title="Highlight implementation
requirements"><link href="http://www.w3.org/StyleSheets/TR/W3C-WD" rel="stylesheet" type="text/css"><style type="text/css">

   .applies thead th > * { display: block; }
   .applies thead code { display: block; }
   .applies tbody th { whitespace: nowrap; }
   .applies td { text-align: center; }
   .applies .yes { background: yellow; }

   .matrix, .matrix td { border: hidden; text-align: right; }
   .matrix { margin-left: 2em; }

   .dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
   .dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
   .dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }

   td.eg { border-width: thin; text-align: center; }

   #table-example-1 { border: solid thin; border-collapse: collapse; margin-left: 3em; }
   #table-example-1 * { font-family: "Essays1743", serif; line-height: 1.01em; }
   #table-example-1 caption { padding-bottom: 0.5em; }
   #table-example-1 thead, #table-example-1 tbody { border: none; }
   #table-example-1 th, #table-example-1 td { border: solid thin; }
   #table-example-1 th { font-weight: normal; }
   #table-example-1 td { border-style: none solid; vertical-align: top; }
   #table-example-1 th { padding: 0.5em; vertical-align: middle; text-align: center; }
   #table-example-1 tbody tr:first-child td { padding-top: 0.5em; }
   #table-example-1 tbody tr:last-child td { padding-bottom: 1.5em; }
   #table-example-1 tbody td:first-child { padding-left: 2.5em; padding-right: 0; width: 9em; }
   #table-example-1 tbody td:first-child::after { content: leader(". "); }
   #table-example-1 tbody td { padding-left: 2em; padding-right: 2em; }
   #table-example-1 tbody td:first-child + td { width: 10em; }
   #table-example-1 tbody td:first-child + td ~ td { width: 2.5em; }
   #table-example-1 tbody td:first-child + td + td + td ~ td { width: 1.25em; }

   .apple-table-examples { border: none; border-collapse: separate; border-spacing: 1.5em 0em; width: 40em; margin-left: 3em; }
   .apple-table-examples * { font-family: "Times", serif; }
   .apple-table-examples td, .apple-table-examples th { border: none; white-space: nowrap; padding-top: 0; padding-bottom: 0; }
   .apple-table-examples tbody th:first-child { border-left: none; width: 100%; }
   .apple-table-examples thead th:first-child ~ th { font-size: smaller; font-weight: bolder; border-bottom: solid 2px; text-align: center; }
   .apple-table-examples tbody th::after, .apple-table-examples tfoot th::after { content: leader(". ") }
   .apple-table-examples tbody th, .apple-table-examples tfoot th { font: inherit; text-align: left; }
   .apple-table-examples td { text-align: right; vertical-align: top; }
   .apple-table-examples.e1 tbody tr:last-child td { border-bottom: solid 1px; }
   .apple-table-examples.e1 tbody + tbody tr:last-child td { border-bottom: double 3px; }
   .apple-table-examples.e2 th[scope=row] { padding-left: 1em; }
   .apple-table-examples sup { line-height: 0; }

   .details-example img { vertical-align: top; }

   #base64-table {
     white-space: nowrap;
     font-size: 0.6em;
     column-width: 6em;
     column-count: 5;
     column-gap: 1em;
     -moz-column-width: 6em;
     -moz-column-count: 5;
     -moz-column-gap: 1em;
     -webkit-column-width: 6em;
     -webkit-column-count: 5;
     -webkit-column-gap: 1em;
   }
   #base64-table thead { display: none; }
   #base64-table * { border: none; }
   #base64-table tbody td:first-child:after { content: ':'; }
   #base64-table tbody td:last-child { text-align: right; }

   #named-character-references-table {
     white-space: nowrap;
     font-size: 0.6em;
     column-width: 30em;
     column-gap: 1em;
     -moz-column-width: 30em;
     -moz-column-gap: 1em;
     -webkit-column-width: 30em;
     -webkit-column-gap: 1em;
   }
   #named-character-references-table > table > tbody > tr > td:first-child + td,
   #named-character-references-table > table > tbody > tr > td:last-child { text-align: center; }
   #named-character-references-table > table > tbody > tr > td:last-child:hover > span { position: absolute; top: auto; left: auto; margin-left: 0.5em; line-height: 1.2; font-size: 5em; border: outset; padding: 0.25em 0.5em; background: white; width: 1.25em; height: auto; text-align: center; }
   #named-character-references-table > table > tbody > tr#entity-CounterClockwiseContourIntegral > td:first-child { font-size: 0.5em; }

   .glyph.control { color: red; }

   @font-face {
     font-family: 'Essays1743';
     src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743.ttf');
   }
   @font-face {
     font-family: 'Essays1743';
     font-weight: bold;
     src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Bold.ttf');
   }
   @font-face {
     font-family: 'Essays1743';
     font-style: italic;
     src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Italic.ttf');
   }
   @font-face {
     font-family: 'Essays1743';
     font-style: italic;
     font-weight: bold;
     src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-BoldItalic.ttf');
   }

  </style><style type="text/css">
   .domintro:before { display: table; margin: -1em -0.5em -0.5em auto; width: auto; content: 'This box is non-normative. Implementation requirements are given below this box.'; color: black; font-style: italic; border: solid 2px; background: white; padding: 0 0.25em; }
  </style><script type="text/javascript">
   function getCookie(name) {
     var params = location.search.substr(1).split("&");
     for (var index = 0; index < params.length; index++) {
       if (params[index] == name)
         return "1";
       var data = params[index].split("=");
       if (data[0] == name)
         return unescape(data[1]);
     }
     var cookies = document.cookie.split("; ");
     for (var index = 0; index < cookies.length; index++) {
       var data = cookies[index].split("=");
       if (data[0] == name)
         return unescape(data[1]);
     }
     return null;
   }
  </script>
  <script src="link-fixup.js" type="text/javascript"></script>
  <link href="style.css" rel="stylesheet"><link href="syntax.html" title="8 The HTML syntax" rel="prev">
  <link href="spec.html#contents" title="Table of contents" rel="index">
  <link href="tokenization.html" title="8.2.4 Tokenization" rel="next">
  </head><body><div class="head" id="head">
<div id="multipage-common">
  <p class="stability" id="wip"><strong>This is a work in
  progress!</strong> For the latest updates from the HTML WG, possibly
  including important bug fixes, please look at the <a href="http://dev.w3.org/html5/spec/Overview.html">editor's draft</a> instead.
  There may also be a more
  <a href="http://www.w3.org/TR/html5">up-to-date Working Draft</a>
   with changes based on resolution of Last Call issues.
  <input onclick="closeWarning(this.parentNode)" type="button" value="&#9587;&#8413;"></p>
  <script type="text/javascript">
   function closeWarning(element) {
     element.parentNode.removeChild(element);
     var date = new Date();
     date.setDate(date.getDate()+4);
     document.cookie = 'hide-obsolescence-warning=1; expires=' + date.toGMTString();
   }
   if (getCookie('hide-obsolescence-warning') == '1')
     setTimeout(function () { document.getElementById('wip').parentNode.removeChild(document.getElementById('wip')); }, 2000);
  </script></div>

   <p><a href="http://www.w3.org/"><img alt="W3C" height="48" src="http://www.w3.org/Icons/w3c_home" width="72"></a></p>

   <h1>HTML5</h1>
   </div><div>
   <a href="syntax.html" class="prev">8 The HTML syntax</a> &#8211;
   <a href="spec.html#contents">Table of contents</a> &#8211;
   <a href="tokenization.html" class="next">8.2.4 Tokenization</a>
  <ol class="toc"><li><ol><li><a href="parsing.html#parsing"><span class="secno">8.2 </span>Parsing HTML documents</a>
    <ol><li><a href="parsing.html#overview-of-the-parsing-model"><span class="secno">8.2.1 </span>Overview of the parsing model</a></li><li><a href="parsing.html#the-input-stream"><span class="secno">8.2.2 </span>The input stream</a>
      <ol><li><a href="parsing.html#determining-the-character-encoding"><span class="secno">8.2.2.1 </span>Determining the character encoding</a></li><li><a href="parsing.html#character-encodings-0"><span class="secno">8.2.2.2 </span>Character encodings</a></li><li><a href="parsing.html#preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</a></li><li><a href="parsing.html#changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</a></li></ol></li><li><a href="parsing.html#parse-state"><span class="secno">8.2.3 </span>Parse state</a>
      <ol><li><a href="parsing.html#the-insertion-mode"><span class="secno">8.2.3.1 </span>The insertion mode</a></li><li><a href="parsing.html#the-stack-of-open-elements"><span class="secno">8.2.3.2 </span>The stack of open elements</a></li><li><a href="parsing.html#the-list-of-active-formatting-elements"><span class="secno">8.2.3.3 </span>The list of active formatting elements</a></li><li><a href="parsing.html#the-element-pointers"><span class="secno">8.2.3.4 </span>The element pointers</a></li><li><a href="parsing.html#other-parsing-state-flags"><span class="secno">8.2.3.5 </span>Other parsing state flags</a></li></ol></li></ol></li></ol></li></ol></div>

  <div class="impl">

  <h3 id="parsing"><span class="secno">8.2 </span>Parsing HTML documents</h3>

  <p><i>This section only applies to user agents, data mining tools,
  and conformance checkers.</i></p>

  <p class="note">The rules for parsing XML documents into DOM trees
  are covered by the next section, entitled "<a href="the-xhtml-syntax.html#the-xhtml-syntax">The XHTML
  syntax</a>".</p>

  <p>For <a href="dom.html#html-documents">HTML documents</a>, user agents must use the parsing
  rules described in this section to generate the DOM trees. Together,
  these rules define what is referred to as the <dfn id="html-parser">HTML
  parser</dfn>.</p>

  <div class="note">

   <p>While the HTML syntax described in this specification bears a
   close resemblance to SGML and XML, it is a separate language with
   its own parsing rules.</p>

   <p>Some earlier versions of HTML (in particular from HTML2 to
   HTML4) were based on SGML and used SGML parsing rules. However, few
   (if any) web browsers ever implemented true SGML parsing for HTML
   documents; the only user agents to strictly handle HTML as an SGML
   application have historically been validators. The resulting
   confusion &#8212; with validators claiming documents to have one
   representation while widely deployed Web browsers interoperably
   implemented a different representation &#8212; has wasted decades
   of productivity. This version of HTML thus returns to a non-SGML
   basis.</p>

   <p>Authors interested in using SGML tools in their authoring
   pipeline are encouraged to use XML tools and the XML serialization
   of HTML.</p>

  </div>

  <p>This specification defines the parsing rules for HTML documents,
  whether they are syntactically correct or not. Certain points in the
  parsing algorithm are said to be <dfn id="parse-error" title="parse error">parse
  errors</dfn>. The error handling for parse errors is well-defined:
  user agents must either act as described below when encountering
  such problems, or must abort processing at the first error that they
  encounter for which they do not wish to apply the rules described
  below.</p>

  <p>Conformance checkers must report at least one parse error
  condition to the user if one or more parse error conditions exist in
  the document and must not report parse error conditions if none
  exist in the document. Conformance checkers may report more than one
  parse error condition if more than one parse error condition exists
  in the document. Conformance checkers are not required to recover
  from parse errors.</p>

  <p class="note">Parse errors are only errors with the
  <em>syntax</em> of HTML. In addition to checking for parse errors,
  conformance checkers will also verify that the document obeys all
  the other conformance requirements described in this
  specification.</p>

  <p>For the purposes of conformance checkers, if a resource is
  determined to be in <a href="syntax.html#syntax">the HTML syntax</a>, then it is an
  <a href="dom.html#html-documents" title="HTML documents">HTML document</a>.</p>

  </div><div class="impl">

  <h4 id="overview-of-the-parsing-model"><span class="secno">8.2.1 </span>Overview of the parsing model</h4>

  <p>The input to the HTML parsing process consists of a stream of
  Unicode characters, which is passed through a
  <a href="tokenization.html#tokenization">tokenization</a> stage followed by a <a href="tree-construction.html#tree-construction">tree
  construction</a> stage. The output is a <code><a href="infrastructure.html#document">Document</a></code>
  object.</p>

  <p class="note">Implementations that <a href="infrastructure.html#non-scripted">do not
  support scripting</a> do not have to actually create a DOM
  <code><a href="infrastructure.html#document">Document</a></code> object, but the DOM tree in such cases is
  still used as the model for the rest of the specification.</p>

  <p>In the common case, the data handled by the tokenization stage
  comes from the network, but <a href="apis-in-html-documents.html#dynamic-markup-insertion" title="dynamic markup
  insertion">it can also come from script</a> running in the user
  agent, e.g. using the <code title="dom-document-write"><a href="apis-in-html-documents.html#dom-document-write">document.write()</a></code> API.</p>

  <p><img alt="" height="554" src="parsing-model-overview.png" width="427"></p>

  <p id="nestedParsing">There is only one set of states for the
  tokenizer stage and the tree construction stage, but the tree
  construction stage is reentrant, meaning that while the tree
  construction stage is handling one token, the tokenizer might be
  resumed, causing further tokens to be emitted and processed before
  the first token's processing is complete.</p>

  <div class="example">

   <p>In the following example, the tree construction stage will be
   called upon to handle a "p" start tag token while handling the
   "script" end tag token:</p>

   <pre>...
&lt;script&gt;
 document.write('&lt;p&gt;');
&lt;/script&gt;
...</pre>

  </div>

  <p>To handle these cases, parsers have a <dfn id="script-nesting-level">script nesting
  level</dfn>, which must be initially set to zero, and a <dfn id="parser-pause-flag">parser
  pause flag</dfn>, which must be initially set to false.</p>

  </div><div class="impl">

  <h4 id="the-input-stream"><span class="secno">8.2.2 </span>The <dfn>input stream</dfn></h4>

  <p>The stream of Unicode characters that comprises the input to the
  tokenization stage will be initially seen by the user agent as a
  stream of bytes (typically coming over the network or from the local
  file system). The bytes encode the actual characters according to a
  particular <em>character encoding</em>, which the user agent must
  use to decode the bytes into characters.</p>

  <p class="note">For XML documents, the algorithm user agents must
  use to determine the character encoding is given by the XML
  specification. This section does not apply to XML documents. <a href="references.html#refsXML">[XML]</a></p>


  <h5 id="determining-the-character-encoding"><span class="secno">8.2.2.1 </span>Determining the character encoding</h5>

  <p>In some cases, it might be impractical to unambiguously determine
  the encoding before parsing the document. Because of this, this
  specification provides for a two-pass mechanism with an optional
  pre-scan. Implementations are allowed, as described below, to apply
  a simplified parsing algorithm to whatever bytes they have available
  before beginning to parse the document. Then, the real parser is
  started, using a tentative encoding derived from this pre-parse and
  other out-of-band metadata. If, while the document is being loaded,
  the user agent discovers an encoding declaration that conflicts with
  this information, then the parser can get reinvoked to perform a
  parse of the document with the real encoding.</p>

  <p id="documentEncoding">User agents must use the following
  algorithm (the <dfn id="encoding-sniffing-algorithm">encoding sniffing algorithm</dfn>) to determine
  the character encoding to use when decoding a document in the first
  pass. This algorithm takes as input any out-of-band metadata
  available to the user agent (e.g. the <a href="fetching-resources.html#content-type" title="Content-Type">Content-Type metadata</a> of the document)
  and all the bytes available so far, and returns an encoding and a
  <dfn id="concept-encoding-confidence" title="concept-encoding-confidence">confidence</dfn>. The
  confidence is either <i>tentative</i>, <i>certain</i>, or
  <i>irrelevant</i>. The encoding used, and whether the confidence in
  that encoding is <i>tentative</i> or <i>certain</i>, is <a href="tree-construction.html#meta-charset-during-parse">used during the parsing</a> to
  determine whether to <a href="#change-the-encoding">change the encoding</a>. If no
  encoding is necessary, e.g. because the parser is operating on a
  stream of Unicode characters and doesn't have to use an encoding at
  all, then the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> is
  <i>irrelevant</i>.</p>

  <ol><li><p>If the user has explicitly instructed the user agent to
   override the document's character encoding with a specific
   encoding, optionally return that encoding with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
   <i>certain</i> and abort these steps.</p></li>

   <li><p>If the transport layer specifies an encoding, and it is
   supported, return that encoding with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
   <i>certain</i>, and abort these steps.</p></li>

   <li>

    <p>The user agent may wait for more bytes of the resource to be
    available, either in this step or at any later step in this
    algorithm. For instance, a user agent might wait 500ms or 1024
    bytes, whichever came first. In general preparsing the source to
    find the encoding improves performance, as it reduces the need to
    throw away the data structures used when parsing upon finding the
    encoding information. However, if the user agent delays too long
    to obtain data to determine the encoding, then the cost of the
    delay could outweigh any performance improvements from the
    preparse.</p>

    <p class="note">The authoring conformance requirements for
    character encoding declarations limit them to only appearing <a href="semantics.html#charset1024">in the first 1024 bytes</a>. User agents are
    therefore encouraged to use the preparse algorithm below (part of
    these steps) on the first 1024 bytes, but not to stall beyond
    that.</p>

   </li>

   <li><p>For each of the rows in the following table, starting with
   the first one and going down, if there are as many or more bytes
   available than the number of bytes in the first column, and the
   first bytes of the file match the bytes given in the first column,
   then return the encoding given in the cell in the second column of
   that row, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
   <i>certain</i>, and abort these steps:</p>

    
    <table><thead><tr><th>Bytes in Hexadecimal
       </th><th>Encoding
     </th></tr></thead><tbody><tr><td>FE FF
       </td><td>Big-endian UTF-16
      </td></tr><tr><td>FF FE
       </td><td>Little-endian UTF-16
      </td></tr><tr><td>EF BB BF
       </td><td>UTF-8
    </td></tr></tbody></table><p class="note">This step looks for Unicode Byte Order Marks
   (BOMs).</p></li>

   <li><p>Otherwise, the user agent will have to search for explicit
   character encoding information in the file itself. This should
   proceed as follows:

    </p><p>Let <var title="">position</var> be a pointer to a byte in the
    input stream, initially pointing at the first byte. If at any
    point during these substeps the user agent either runs out of
    bytes or decides that scanning further bytes would not be
    efficient, then skip to the next step of the overall character
    encoding detection algorithm. User agents may decide that scanning
    <em>any</em> bytes is not efficient, in which case these substeps
    are entirely skipped.</p>

    <p>Now, repeat the following "two" steps until the algorithm
    aborts (either because user agent aborts, as described above, or
    because a character encoding is found):</p>

    <ol><li><p>If <var title="">position</var> points to:</p>

      <dl class="switch"><dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '&lt;!--')</dt>
       <dd>

        <p>Advance the <var title="">position</var> pointer so that it
        points at the first 0x3E byte which is preceded by two 0x2D
        bytes (i.e. at the end of an ASCII '--&gt;' sequence) and comes
        after the 0x3C byte that was found. (The two 0x2D bytes can be
        the same as the those in the '&lt;!--' sequence.)</p>

       </dd>

       <dt>A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and finally one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '&lt;meta' followed by a space or slash)</dt>
       <dd>

        <ol><li><p>Advance the <var title="">position</var> pointer so
         that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
         0x2F byte (the one in sequence of characters matched
         above).</p></li>

         <li><p>Let <var title="">attribute list</var> be an empty
         list of strings.</p></li> 
         <li><p>Let <var title="">got pragma</var> be false.</p></li>

         <li><p>Let <var title="">need pragma</var> be null.</p></li>

         <li><p>Let <var title="">charset</var> be the null value
         (which, for the purposes of this algorithm, is distinct from
         an unrecognised encoding or the empty string).</p></li>

         <li><p><i>Attributes</i>: <a href="#concept-get-attributes-when-sniffing" title="concept-get-attributes-when-sniffing">Get an
         attribute</a> and its value. If no attribute was sniffed,
         then jump to the <i>processing</i> step below.</p></li>

         <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
         labeled <i>attributes</i>.</p>

         </li><li><p>Add the attribute's name to <var title="">attribute
         list</var>.</p>

         </li><li>

          <p>Run the appropriate step from the following list, if one
          applies:</p>

          <dl class="switch"><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>

           <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
           pragma</var> to true.</p></dd>

           <dt>If the attribute's name is "<code title="">content</code>"</dt>

           <dd><p>Apply the <a href="fetching-resources.html#algorithm-for-extracting-an-encoding-from-a-meta-element">algorithm for extracting an encoding
           from a <code>meta</code> element</a>, giving the
           attribute's value as the string to parse. If an encoding is
           returned, and if <var title="">charset</var> is still set
           to null, let <var title="">charset</var> be the encoding
           returned, and set <var title="">need pragma</var> to
           true.</p></dd>

           <dt>If the attribute's name is "<code title="">charset</code>"</dt>

           <dd><p>Let <var title="">charset</var> be the encoding
           corresponding to the attribute's value, and set <var title="">need pragma</var> to false.</p></dd>

          </dl></li>

         <li><p>Return to the step labeled <i>attributes</i>.</p></li>

         <li><p><i>Processing</i>: If <var title="">need pragma</var>
         is null, then jump to the second step of the overall "two
         step" algorithm.</p></li>

         <li><p>If <var title="">mode</var> is true but <var title="">got pragma</var> is false, then jump to the second
         step of the overall "two step" algorithm.</p></li>

         <li><p>If <var title="">charset</var> is a UTF-16 encoding,
         change the value of <var title="">charset</var> to
         UTF-8.</p></li>

         <li><p>If <var title="">charset</var> is not a supported
         character encoding, then jump to the second step of the
         overall "two step" algorithm.</p></li>

         <li><p>Return the encoding given by <var title="">charset</var>, with <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
         <i>tentative</i>, and abort all these steps.</p></li>

        </ol></dd>

       <dt>A sequence of bytes starting with a 0x3C byte (ASCII &lt;), optionally a 0x2F byte (ASCII /), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)</dt>
       <dd>

        <ol><li><p>Advance the <var title="">position</var> pointer so
         that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
         0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
         (ASCII &gt;) byte.</p></li>

         <li><p>Repeatedly <a href="#concept-get-attributes-when-sniffing" title="concept-get-attributes-when-sniffing">get an
         attribute</a> until no further attributes can be found,
         then jump to the second step in the overall "two step"
         algorithm.</p></li>

        </ol></dd>

       <dt>A sequence of bytes starting with: 0x3C 0x21 (ASCII '&lt;!')</dt>
       <dt>A sequence of bytes starting with: 0x3C 0x2F (ASCII '&lt;/')</dt>
       <dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '&lt;?')</dt>
       <dd>

        <p>Advance the <var title="">position</var> pointer so that it
        points at the first 0x3E byte (ASCII &gt;) that comes after the
        0x3C byte that was found.</p>

       </dd>

       <dt>Any other byte</dt>
       <dd>

        <p>Do nothing with that byte.</p>

       </dd>

      </dl></li>

     <li>Move <var title="">position</var> so it points at the next
     byte in the input stream, and return to the first step of this
     "two step" algorithm.</li>

    </ol><p>When the above "two step" algorithm says to <dfn id="concept-get-attributes-when-sniffing" title="concept-get-attributes-when-sniffing">get an
    attribute</dfn>, it means doing this:</p>

    <ol><li><p>If the byte at <var title="">position</var> is one of 0x09
     (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR),
     0x20 (ASCII space), or 0x2F (ASCII /) then advance <var title="">position</var> to the next byte and redo this
     substep.</p></li>

     <li><p>If the byte at <var title="">position</var> is 0x3E (ASCII
     &gt;), then abort the "get an attribute" algorithm. There isn't
     one.</p></li>

     <li><p>Otherwise, the byte at <var title="">position</var> is the
     start of the attribute name. Let <var title="">attribute
     name</var> and <var title="">attribute value</var> be the empty
     string.</p></li>

     <li><p><i>Attribute name</i>: Process the byte at <var title="">position</var> as follows:</p>

      <dl class="switch"><dt>If it is 0x3D (ASCII =), and the <var title="">attribute
       name</var> is longer than the empty string</dt>

       <dd>Advance <var title="">position</var> to the next byte and
       jump to the step below labeled <i>value</i>.</dd>

       <dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
       FF), 0x0D (ASCII CR), or 0x20 (ASCII space)</dt>

       <dd>Jump to the step below labeled <i>spaces</i>.</dd>

       <dt>If it is 0x2F (ASCII /) or 0x3E (ASCII &gt;)</dt>

       <dd>Abort the "get an attribute" algorithm. The attribute's
       name is the value of <var title="">attribute name</var>, its
       value is the empty string.</dd>

       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
       Z)</dt>

       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
       name</var> (where <var title="">b</var> is the value of the
       byte at <var title="">position</var>).</dd>

       <dt>Anything else</dt>

       <dd>Append the Unicode character with the same code point as the
       value of the byte at <var title="">position</var>) to <var title="">attribute name</var>. (It doesn't actually matter how
       bytes outside the ASCII range are handled here, since only
       ASCII characters can contribute to the detection of a character
       encoding.)</dd>

      </dl></li>

     <li><p>Advance <var title="">position</var> to the next byte and
     return to the previous step.</p></li>

     <li><p><i>Spaces</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
     advance <var title="">position</var> to the next byte, then,
     repeat this step.</p></li>

     <li><p>If the byte at <var title="">position</var> is
     <em>not</em> 0x3D (ASCII =), abort the "get an attribute"
     algorithm. The attribute's name is the value of <var title="">attribute name</var>, its value is the empty
     string.</p></li>

     <li><p>Advance <var title="">position</var> past the 0x3D (ASCII
     =) byte.</p></li>

     <li><p><i>Value</i>: If the byte at <var title="">position</var> is one of 0x09 (ASCII TAB), 0x0A (ASCII
     LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then
     advance <var title="">position</var> to the next byte, then,
     repeat this step.</p></li>

     <li><p>Process the byte at <var title="">position</var> as
     follows:</p>

      <dl class="switch"><dt>If it is 0x22 (ASCII ") or 0x27 (ASCII ')</dt>

       <dd>

        <ol><li>Let <var title="">b</var> be the value of the byte at
         <var title="">position</var>.</li>

         <li>Advance <var title="">position</var> to the next
         byte.</li>

         <li>If the value of the byte at <var title="">position</var>
         is the value of <var title="">b</var>, then advance <var title="">position</var> to the next byte and abort the "get
         an attribute" algorithm. The attribute's name is the value of
         <var title="">attribute name</var>, and its value is the
         value of <var title="">attribute value</var>.</li>

         <li>Otherwise, if the value of the byte at <var title="">position</var> is in the range 0x41 (ASCII A) to
         0x5A (ASCII Z), then append a Unicode character to <var title="">attribute value</var> whose code point is 0x20 more
         than the value of the byte at <var title="">position</var>.</li>

         <li>Otherwise, append a Unicode character to <var title="">attribute value</var> whose code point is the same as
         the value of the byte at <var title="">position</var>.</li>

         <li>Return to the second step in these substeps.</li>

        </ol></dd>

       <dt>If it is 0x3E (ASCII &gt;)</dt>

       <dd>Abort the "get an attribute" algorithm. The attribute's
       name is the value of <var title="">attribute name</var>, its
       value is the empty string.</dd>


       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
       Z)</dt>

       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
       value</var> (where <var title="">b</var> is the value of the
       byte at <var title="">position</var>). Advance <var title="">position</var> to the next byte.</dd>

       <dt>Anything else</dt>

       <dd>Append the Unicode character with the same code point as the
       value of the byte at <var title="">position</var>) to <var title="">attribute value</var>. Advance <var title="">position</var> to the next byte.</dd>

      </dl></li>

     <li><p>Process the byte at <var title="">position</var> as
     follows:</p>

      <dl class="switch"><dt>If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII
       FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII
       &gt;)</dt>

       <dd>Abort the "get an attribute" algorithm. The attribute's
       name is the value of <var title="">attribute name</var> and its
       value is the value of <var title="">attribute value</var>.</dd>

       <dt>If it is in the range 0x41 (ASCII A) to 0x5A (ASCII
       Z)</dt>

       <dd>Append the Unicode character with code point <span title=""><var title="">b</var>+0x20</span> to <var title="">attribute
       value</var> (where <var title="">b</var> is the value of the
       byte at <var title="">position</var>).</dd>

       <dt>Anything else</dt>

       <dd>Append the Unicode character with the same code point as the
       value of the byte at <var title="">position</var>) to <var title="">attribute value</var>.</dd>

      </dl></li>

     <li><p>Advance <var title="">position</var> to the next byte and
     return to the previous step.</p></li>

    </ol><p>For the sake of interoperability, user agents should not use a
    pre-scan algorithm that returns different results than the one
    described above. (But, if you do, please at least let us know, so
    that we can improve this algorithm and benefit everyone...)</p>

   </li>

   <li><p>If the user agent has information on the likely encoding for
   this page, e.g. based on the encoding of the page when it was last
   visited, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
   <i>tentative</i>, and abort these steps.</p></li>

   <li>

    <p>The user agent may attempt to autodetect the character encoding
    from applying frequency analysis or other algorithms to the data
    stream. Such algorithms may use information about the resource
    other than the resource's contents, including the address of the
    resource. If autodetection succeeds in determining a character
    encoding, then return that encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
    <i>tentative</i>, and abort these steps. <a href="references.html#refsUNIVCHARDET">[UNIVCHARDET]</a></p>

    <p class="note">The UTF-8 encoding has a highly detectable bit
    pattern. Documents that contain bytes with values greater than
    0x7F which match the UTF-8 pattern are very likely to be UTF-8,
    while documents with byte sequences that do not match it are very
    likely not. User-agents are therefore encouraged to search for
    this common encoding. <a href="references.html#refsPPUTF8">[PPUTF8]</a> <a href="references.html#refsUTF8DET">[UTF8DET]</a></p>

   </li>

   <li>

    <p>Otherwise, return an implementation-defined or user-specified
    default character encoding, with the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a>
    <i>tentative</i>.</p>

    <p>In controlled environments or in environments where the
    encoding of documents can be prescribed (for example, for user
    agents intended for dedicated use in new networks), the
    comprehensive <code title="">UTF-8</code> encoding is
    suggested.</p>

    <p>In other environments, the default encoding is typically
    dependent on the user's locale (an approximation of the languages,
    and thus often encodings, of the pages that the user is likely to
    frequent). The following table gives suggested defaults based on
    the user's locale, for compatibility with legacy content. Locales
    are identified by BCP 47 language tags. <a href="references.html#refsBCP47">[BCP47]</a></p>

    
    <table><thead><tr><th>Locale language
       </th><th>Suggested default encoding
     </th></tr></thead><tbody><tr><td>ar
       </td><td>UTF-8

      </td></tr><tr><td>be
       </td><td>ISO-8859-5

      </td></tr><tr><td>bg
       </td><td>windows-1251

      </td></tr><tr><td>cs<!-- -CZ -->
       </td><td>ISO-8859-2

      </td></tr><tr><td>cy
       </td><td>UTF-8

      </td></tr><tr><td>fa<!-- -IR -->
       </td><td>UTF-8

      </td></tr><tr><td>he<!-- -IL -->
       </td><td>windows-1255

      </td></tr><tr><td>hr
       </td><td>UTF-8

      </td></tr><tr><td>hu<!-- -HU -->
       </td><td>ISO-8859-2

      </td></tr><tr><td>ja 
       </td><td>Windows-31J 

      </td></tr><tr><td>kk
       </td><td>UTF-8

      </td></tr><tr><td>ko<!-- -KR -->
       </td><td>windows-949 <!-- EUC-KR -->

      </td></tr><tr><td>ku
       </td><td>windows-1254 <!-- ISO-8859-9 -->

      </td></tr><tr><td>lt
       </td><td>windows-1257

      </td></tr><tr><td>lv<!-- -LV -->
       </td><td>ISO-8859-13

      </td></tr><tr><td>mk<!-- -MK -->
       </td><td>UTF-8

      </td></tr><tr><td>or
       </td><td>UTF-8

      </td></tr><tr><td>pl<!-- -PL -->
       </td><td>ISO-8859-2

      </td></tr><tr><td>ro
       </td><td>UTF-8

      </td></tr><tr><td>ru
       </td><td>windows-1251

      </td></tr><tr><td>sk
       </td><td>windows-1250

      </td></tr><tr><td>sl
       </td><td>ISO-8859-2

      </td></tr><tr><td>sr
       </td><td>UTF-8

      </td></tr><tr><td>th
       </td><td>windows-874 <!-- TIS-620 -->

      </td></tr><tr><td>tr<!-- -TR -->
       </td><td>windows-1254 <!-- ISO-8859-9 -->

      </td></tr><tr><td>uk
       </td><td>windows-1251

      </td></tr><tr><td>vi
       </td><td>UTF-8

      </td></tr><tr><td>zh-CN
       </td><td>GB18030

      </td></tr><tr><td>zh-TW
       </td><td>Big5

      </td></tr><tr><td>All other locales
       </td><td>windows-1252

    </td></tr></tbody></table></li>

  </ol><p>The <a href="dom.html#document-s-character-encoding">document's character encoding</a> must immediately
  be set to the value returned from this algorithm, at the same time
  as the user agent uses the returned value to select the decoder to
  use for the input stream.</p>

  <p class="note">This algorithm is a <a href="introduction.html#willful-violation">willful violation</a>
  of the HTTP specification, which requires that the encoding be
  assumed to be ISO-8859-1 in the absence of a <a href="semantics.html#character-encoding-declaration">character
  encoding declaration</a> to the contrary, and of RFC 2046,
  which requires that the encoding be assumed to be US-ASCII in the
  absence of a <a href="semantics.html#character-encoding-declaration">character encoding declaration</a> to the
  contrary. This specification's third approach is motivated by a
  desire to be maximally compatible with legacy content. <a href="references.html#refsHTTP">[HTTP]</a> <a href="references.html#refsRFC2046">[RFC2046]</a></p>


  <h5 id="character-encodings-0"><span class="secno">8.2.2.2 </span>Character encodings</h5>

  <p>User agents must at a minimum support the UTF-8 and Windows-1252
  encodings, but may support more. <a href="references.html#refsRFC3629">[RFC3629]</a> <a href="references.html#refsWIN1252">[WIN1252]</a></p>

  <p class="note">It is not unusual for Web browsers to support dozens
  if not upwards of a hundred distinct character encodings.</p>

  <p>User agents must support the <a href="infrastructure.html#preferred-mime-name">preferred MIME name</a> of
  every character encoding they support, and should support all the
  IANA-registered names and aliases of every character encoding they
  support. <a href="references.html#refsIANACHARSET">[IANACHARSET]</a></p>

  <p>When comparing a string specifying a character encoding with the
  name or alias of a character encoding to determine if they are
  equal, user agents must remove any leading or trailing <a href="common-microsyntaxes.html#space-character" title="space character">space characters</a> in both names, and
  then perform the comparison in an <a href="infrastructure.html#ascii-case-insensitive">ASCII
  case-insensitive</a> manner.</p>

  <hr><p>When a user agent would otherwise use an encoding given in the
  first column of the following table to either convert content to
  Unicode characters or convert Unicode characters to bytes, it must
  instead use the encoding given in the cell in the second column of
  the same row. When a byte or sequence of bytes is treated
  differently due to this encoding aliasing, it is said to have been
  <dfn id="misinterpreted-for-compatibility">misinterpreted for compatibility</dfn>.</p>

  <table id="table-encoding-overrides"><caption>Character encoding overrides</caption>
   <thead><tr><th> Input encoding </th><th> Replacement encoding </th><th> References
   </th></tr></thead><tbody><tr><td> EUC-KR </td><td> windows-949 </td><td>
         <a href="references.html#refsEUCKR">[EUCKR]</a>
         <a href="references.html#refsWIN949">[WIN949]</a>
    </td></tr><tr><td> EUC-JP </td><td> CP51932 </td><td>
         <a href="references.html#refsEUCJP">[EUCJP]</a>
         <a href="references.html#refsCP51932">[CP51932]</a>
    </td></tr><tr><td> GB2312 </td><td> GBK </td><td>
         <a href="references.html#refsRFC1345">[RFC1345]</a>
         <a href="references.html#refsGBK">[GBK]</a>
    </td></tr><tr><td> GB_2312-80 </td><td> GBK </td><td>
         <a href="references.html#refsRFC1345">[RFC1345]</a>
         <a href="references.html#refsGBK">[GBK]</a>
    </td></tr><tr><td> ISO-8859-1 </td><td> windows-1252 </td><td>
         <a href="references.html#refsRFC1345">[RFC1345]</a>
         <a href="references.html#refsWIN1252">[WIN1252]</a>
    </td></tr><tr><td> ISO-8859-9 </td><td> windows-1254 </td><td>
         <a href="references.html#refsRFC1345">[RFC1345]</a>
         <a href="references.html#refsWIN1254">[WIN1254]</a>
    </td></tr><tr><td> ISO-8859-11 </td><td> windows-874 </td><td>
         <a href="references.html#refsISO885911">[ISO885911]</a>
         <a href="references.html#refsWIN874">[WIN874]</a>
    </td></tr><tr><td> KS_C_5601-1987 </td><td> windows-949 </td><td>
         <a href="references.html#refsRFC1345">[RFC1345]</a>
         <a href="references.html#refsWIN949">[WIN949]</a>
    </td></tr><tr><td> Shift_JIS </td><td> Windows-31J </td><td>
         <a href="references.html#refsSHIFTJIS">[SHIFTJIS]</a>
         <a href="references.html#refsWIN31J">[WIN31J]</a>
    </td></tr><tr><td> TIS-620 </td><td> windows-874 </td><td>
         <a href="references.html#refsTIS620">[TIS620]</a>
         <a href="references.html#refsWIN874">[WIN874]</a>
    </td></tr><tr><td> US-ASCII </td><td> windows-1252 </td><td>
         <a href="references.html#refsRFC1345">[RFC1345]</a>
         <a href="references.html#refsWIN1252">[WIN1252]</a>
   </td></tr></tbody></table><p class="note">The requirement to treat certain encodings as other
  encodings according to the table above is a <a href="introduction.html#willful-violation">willful
  violation</a> of the W3C Character Model specification, motivated
  by a desire for compatibility with legacy content. <a href="references.html#refsCHARMOD">[CHARMOD]</a></p>

  <p>When a user agent is to use the UTF-16 encoding but no BOM has
  been found, user agents must default to UTF-16LE.</p>

  <p class="note">The requirement to default UTF-16 to LE rather than
  BE is a <a href="introduction.html#willful-violation">willful violation</a> of RFC 2781, motivated by a
  desire for compatibility with legacy content. <a href="references.html#refsRFC2781">[RFC2781]</a></p>

  <hr><p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
  encodings. <a href="references.html#refsCESU8">[CESU8]</a> <a href="references.html#refsUTF7">[UTF7]</a> <a href="references.html#refsBOCU1">[BOCU1]</a> <a href="references.html#refsSCSU">[SCSU]</a></p>

  <p>Support for encodings based on EBCDIC is not recommended. This
  encoding is rarely used for publicly-facing Web content.</p>

  <p>Support for UTF-32 is not recommended. This encoding is rarely
  used, and frequently implemented incorrectly.</p>

  <p class="note">This specification does not make any attempt to
  support EBCDIC-based encodings and UTF-32 in its algorithms; support
  and use of these encodings can thus lead to unexpected behavior in
  implementations of this specification.</p>



  <h5 id="preprocessing-the-input-stream"><span class="secno">8.2.2.3 </span>Preprocessing the input stream</h5>

  <p>Given an encoding, the bytes in the input stream must be
  converted to Unicode characters for the tokenizer, as described by
  the rules for that encoding, except that the leading U+FEFF BYTE
  ORDER MARK character, if any, must not be stripped by the encoding
  layer (it is stripped by the rule below).</p> 
  <p>Bytes or sequences of bytes in the original byte stream that
  could not be converted to Unicode code points must be converted to
  U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is
  UTF-8, the bytes must be <a href="infrastructure.html#decoded-as-utf-8-with-error-handling" title="decoded as UTF-8, with error
  handling">decoded with the error handling</a> defined in this
  specification.</p>

  <p class="note">Bytes or sequences of bytes in the original byte
  stream that did not conform to the encoding specification
  (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are
  errors that conformance checkers are expected to report.</p>

  <p>Any byte or sequence of bytes in the original byte stream that is
  <a href="#misinterpreted-for-compatibility">misinterpreted for compatibility</a> is a <a href="#parse-error">parse
  error</a>.</p>

  <p>One leading U+FEFF BYTE ORDER MARK character must be ignored if
  any are present.</p>

  <p class="note">The requirement to strip a U+FEFF BYTE ORDER MARK
  character regardless of whether that character was used to determine
  the byte order is a <a href="introduction.html#willful-violation">willful violation</a> of Unicode,
  motivated by a desire to increase the resilience of user agents in
  the face of na&#239;ve transcoders.</p>

  <p>Any occurrences of any characters in the ranges U+0001 to U+0008,
     U+000E to U+001F,  U+007F
   to U+009F, U+FDD0
  to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF,
  U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE,
  U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF,
  U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
  U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
  U+10FFFE, and U+10FFFF are <a href="#parse-error" title="parse error">parse
  errors</a>. These are all control characters or permanently
  undefined Unicode characters (noncharacters).</p>

  <p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
  characters are treated specially. Any CR characters that are
  followed by LF characters must be removed, and any CR characters not
  followed by LF characters must be converted to LF characters. Thus,
  newlines in HTML DOMs are represented by LF characters, and there
  are never any CR characters in the input to the
  <a href="tokenization.html#tokenization">tokenization</a> stage.</p>

  <p>The <dfn id="next-input-character">next input character</dfn> is the first character in the
  input stream that has not yet been <dfn id="consumed">consumed</dfn>. Initially,
  the <i><a href="#next-input-character">next input character</a></i> is the first character in the
  input. The <dfn id="current-input-character">current input character</dfn> is the last character
  to have been <i><a href="#consumed">consumed</a></i>.</p>

  <p>The <dfn id="insertion-point">insertion point</dfn> is the position (just before a
  character or just before the end of the input stream) where content
  inserted using <code title="dom-document-write"><a href="apis-in-html-documents.html#dom-document-write">document.write()</a></code> is actually
  inserted. The insertion point is relative to the position of the
  character immediately after it, it is not an absolute offset into
  the input stream. Initially, the insertion point is
  undefined.</p>

  <p>The "EOF" character in the tables below is a conceptual character
  representing the end of the <a href="#the-input-stream">input stream</a>. If the parser
  is a <a href="apis-in-html-documents.html#script-created-parser">script-created parser</a>, then the end of the
  <a href="#the-input-stream">input stream</a> is reached when an <dfn id="explicit-eof-character">explicit "EOF"
  character</dfn> (inserted by the <code title="dom-document-close"><a href="apis-in-html-documents.html#dom-document-close">document.close()</a></code> method) is
  consumed. Otherwise, the "EOF" character is not a real character in
  the stream, but rather the lack of any further characters.</p>


  <h5 id="changing-the-encoding-while-parsing"><span class="secno">8.2.2.4 </span>Changing the encoding while parsing</h5>

  <p>When the parser requires the user agent to <dfn id="change-the-encoding">change the
  encoding</dfn>, it must run the following steps. This might happen
  if the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> described above
  failed to find an encoding, or if it found an encoding that was not
  the actual encoding of the file.</p>

  <ol><li>If the new encoding is identical or equivalent to the encoding
   that is already being used to interpret the input stream, then set
   the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
   <i>certain</i> and abort these steps. This happens when the
   encoding information found in the file matches what the
   <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> determined to be the
   encoding, and in the second pass through the parser if the first
   pass found that the encoding sniffing algorithm described in the
   earlier section failed to find the right encoding.</li>

   <li>If the encoding that is already being used to interpret the
   input stream is a UTF-16 encoding, then set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
   <i>certain</i> and abort these steps. The new encoding is ignored;
   if it was anything but the same encoding, then it would be clearly
   incorrect.</li>

   <li>If the new encoding is a UTF-16 encoding, change it to
   UTF-8.</li>

   <li>If all the bytes up to the last byte converted by the current
   decoder have the same Unicode interpretations in both the current
   encoding and the new encoding, and if the user agent supports
   changing the converter on the fly, then the user agent may change
   to the new converter for the encoding on the fly. Set the
   <a href="dom.html#document-s-character-encoding">document's character encoding</a> and the encoding used to
   convert the input stream to the new encoding, set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
   <i>certain</i>, and abort these steps.</li>

   <li>Otherwise, <a href="history.html#navigate">navigate</a> to the
   document again, with <a href="history.html#replacement-enabled">replacement enabled</a>, and using
   the same <a href="history.html#source-browsing-context">source browsing context</a>, but this time skip
   the <a href="#encoding-sniffing-algorithm">encoding sniffing algorithm</a> and instead just set
   the encoding to the new encoding and the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
   <i>certain</i>. Whenever possible, this should be done without
   actually contacting the network layer (the bytes should be
   re-parsed from memory), even if, e.g., the document is marked as
   not being cacheable. If this is not possible and contacting the
   network layer would involve repeating a request that uses a method
   other than HTTP GET (<a href="fetching-resources.html#concept-http-equivalent-get" title="concept-http-equivalent-get">or
   equivalent</a> for non-HTTP URLs), then instead set the <a href="#concept-encoding-confidence" title="concept-encoding-confidence">confidence</a> to
   <i>certain</i> and ignore the new encoding. The resource will be
   misinterpreted. User agents may notify the user of the situation,
   to aid in application development.</li>

  </ol></div><div class="impl">

  <h4 id="parse-state"><span class="secno">8.2.3 </span>Parse state</h4>

  <h5 id="the-insertion-mode"><span class="secno">8.2.3.1 </span>The insertion mode</h5>

  <p>The <dfn id="insertion-mode">insertion mode</dfn> is a state variable that controls
  the primary operation of the tree construction stage.</p>

  <p>Initially, the <a href="#insertion-mode">insertion mode</a> is "<a href="tree-construction.html#the-initial-insertion-mode" title="insertion mode: initial">initial</a>". It can change to
  "<a href="tree-construction.html#the-before-html-insertion-mode" title="insertion mode: before html">before html</a>",
  "<a href="tree-construction.html#the-before-head-insertion-mode" title="insertion mode: before head">before head</a>",
  "<a href="tree-construction.html#parsing-main-inhead" title="insertion mode: in head">in head</a>", "<a href="tree-construction.html#parsing-main-inheadnoscript" title="insertion mode: in head noscript">in head noscript</a>",
  "<a href="tree-construction.html#the-after-head-insertion-mode" title="insertion mode: after head">after head</a>", "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>", "<a href="tree-construction.html#parsing-main-incdata" title="insertion mode: text">text</a>", "<a href="tree-construction.html#parsing-main-intable" title="insertion
  mode: in table">in table</a>", "<a href="tree-construction.html#parsing-main-intabletext" title="insertion mode: in
  table text">in table text</a>", "<a href="tree-construction.html#parsing-main-incaption" title="insertion mode: in
  caption">in caption</a>", "<a href="tree-construction.html#parsing-main-incolgroup" title="insertion mode: in column
  group">in column group</a>", "<a href="tree-construction.html#parsing-main-intbody" title="insertion mode: in
  table body">in table body</a>", "<a href="tree-construction.html#parsing-main-intr" title="insertion mode: in
  row">in row</a>", "<a href="tree-construction.html#parsing-main-intd" title="insertion mode: in cell">in
  cell</a>", "<a href="tree-construction.html#parsing-main-inselect" title="insertion mode: in select">in
  select</a>", "<a href="tree-construction.html#parsing-main-inselectintable" title="insertion mode: in select in table">in
  select in table</a>", "<a href="tree-construction.html#parsing-main-afterbody" title="insertion mode: after
  body">after body</a>", "<a href="tree-construction.html#parsing-main-inframeset" title="insertion mode: in
  frameset">in frameset</a>", "<a href="tree-construction.html#parsing-main-afterframeset" title="insertion mode: after
  frameset">after frameset</a>", "<a href="tree-construction.html#the-after-after-body-insertion-mode" title="insertion mode:
  after after body">after after body</a>", and "<a href="tree-construction.html#the-after-after-frameset-insertion-mode" title="insertion mode: after after frameset">after after
  frameset</a>" during the course of the parsing, as described in
  the <a href="tree-construction.html#tree-construction">tree construction</a> stage. The insertion mode affects
  how tokens are processed and whether CDATA sections are
  supported.</p>

  <p>Several of these modes, namely "<a href="tree-construction.html#parsing-main-inhead" title="insertion mode: in
  head">in head</a>", "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in
  body</a>", "<a href="tree-construction.html#parsing-main-intable" title="insertion mode: in table">in
  table</a>", and "<a href="tree-construction.html#parsing-main-inselect" title="insertion mode: in select">in
  select</a>", are special, in that the other modes defer to them
  at various times. When the algorithm below says that the user agent
  is to do something "<dfn id="using-the-rules-for">using the rules for</dfn> the <var title="">m</var> insertion mode", where <var title="">m</var> is one
  of these modes, the user agent must use the rules described under
  the <var title="">m</var> <a href="#insertion-mode">insertion mode</a>'s section, but
  must leave the <a href="#insertion-mode">insertion mode</a> unchanged unless the
  rules in <var title="">m</var> themselves switch the <a href="#insertion-mode">insertion
  mode</a> to a new value.</p>

  <p>When the insertion mode is switched to "<a href="tree-construction.html#parsing-main-incdata" title="insertion
  mode: text">text</a>" or "<a href="tree-construction.html#parsing-main-intabletext" title="insertion mode: in table
  text">in table text</a>", the <dfn id="original-insertion-mode">original insertion mode</dfn>
  is also set. This is the insertion mode to which the tree
  construction stage will return.</p>

  <hr><p>When the steps below require the UA to <dfn id="reset-the-insertion-mode-appropriately">reset the insertion
  mode appropriately</dfn>, it means the UA must follow these
  steps:</p>

  <ol><li>Let <var title="">last</var> be false.</li>

   <li>Let <var title="">node</var> be the last node in the
   <a href="#stack-of-open-elements">stack of open elements</a>.</li>

   <li><i>Loop</i>: If <var title="">node</var> is the first node in
   the stack of open elements, then set <var title="">last</var> to
   true and set <var title="">node</var> to the <var title="concept-frag-parse-context"><a href="the-end.html#concept-frag-parse-context">context</a></var> element.
   (<a href="the-end.html#fragment-case">fragment case</a>)</li>

   <li>If <var title="">node</var> is a <code><a href="the-button-element.html#the-select-element">select</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inselect" title="insertion mode: in select">in select</a>" and abort these
   steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>

   <li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-td-element">td</a></code> or
   <code><a href="tabular-data.html#the-th-element">th</a></code> element and <var title="">last</var> is false, then
   switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intd" title="insertion
   mode: in cell">in cell</a>" and abort these steps.</li>

   <li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-tr-element">tr</a></code> element, then
   switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intr" title="insertion
   mode: in row">in row</a>" and abort these steps.</li>

   <li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-tbody-element">tbody</a></code>,
   <code><a href="tabular-data.html#the-thead-element">thead</a></code>, or <code><a href="tabular-data.html#the-tfoot-element">tfoot</a></code> element, then switch the
   <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intbody" title="insertion mode: in
   table body">in table body</a>" and abort these steps.</li>

   <li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-caption-element">caption</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-incaption" title="insertion mode: in caption">in caption</a>" and abort
   these steps.</li>

   <li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-colgroup-element">colgroup</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-incolgroup" title="insertion mode: in column group">in column group</a>" and
   abort these steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>

   <li>If <var title="">node</var> is a <code><a href="tabular-data.html#the-table-element">table</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-intable" title="insertion mode: in table">in table</a>" and abort these
   steps.</li>

   <li>If <var title="">node</var> is a <code><a href="semantics.html#the-head-element">head</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>" ("<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>"! <em> not "<a href="tree-construction.html#parsing-main-inhead" title="insertion mode: in head">in head</a>"</em>!) and abort
   these steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li> 
   <li>If <var title="">node</var> is a <code><a href="sections.html#the-body-element">body</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in body">in body</a>" and abort these
   steps.</li>

   <li>If <var title="">node</var> is a <code><a href="obsolete.html#frameset">frameset</a></code> element,
   then switch the <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inframeset" title="insertion mode: in frameset">in frameset</a>" and abort
   these steps. (<a href="the-end.html#fragment-case">fragment case</a>)</li>

   <li>If <var title="">node</var> is an <code><a href="semantics.html#the-html-element">html</a></code> element,
   then  switch the <a href="#insertion-mode">insertion mode</a>
   to "<a href="tree-construction.html#the-before-head-insertion-mode" title="insertion mode: before head">before
   head</a>"   Then,  abort these steps. (<a href="the-end.html#fragment-case">fragment
   case</a>)</li> 
   <li>If <var title="">last</var> is true, then switch the
   <a href="#insertion-mode">insertion mode</a> to "<a href="tree-construction.html#parsing-main-inbody" title="insertion mode: in
   body">in body</a>" and abort these steps. (<a href="the-end.html#fragment-case">fragment
   case</a>)</li>

   <li>Let <var title="">node</var> now be the node before <var title="">node</var> in the <a href="#stack-of-open-elements">stack of open
   elements</a>.</li>

   <li>Return to the step labeled <i>loop</i>.</li>

  </ol><h5 id="the-stack-of-open-elements"><span class="secno">8.2.3.2 </span>The stack of open elements</h5>

  <p>Initially, the <dfn id="stack-of-open-elements">stack of open elements</dfn> is empty. The
  stack grows downwards; the topmost node on the stack is the first
  one added to the stack, and the bottommost node of the stack is the
  most recently added node in the stack (notwithstanding when the
  stack is manipulated in a random access fashion as part of <a href="tree-construction.html#adoptionAgency">the handling for misnested tags</a>).</p>

  <p>The "<a href="tree-construction.html#the-before-html-insertion-mode" title="insertion mode: before html">before
  html</a>" <a href="#insertion-mode">insertion mode</a> creates the
  <code><a href="semantics.html#the-html-element">html</a></code> root element node, which is then added to the
  stack.</p>

  <p>In the <a href="the-end.html#fragment-case">fragment case</a>, the <a href="#stack-of-open-elements">stack of open
  elements</a> is initialized to contain an <code><a href="semantics.html#the-html-element">html</a></code>
  element that is created as part of <a href="the-end.html#html-fragment-parsing-algorithm" title="html fragment
  parsing algorithm">that algorithm</a>. (The <a href="the-end.html#fragment-case">fragment
  case</a> skips the "<a href="tree-construction.html#the-before-html-insertion-mode" title="insertion mode: before
  html">before html</a>" <a href="#insertion-mode">insertion mode</a>.)</p>

  <p>The <code><a href="semantics.html#the-html-element">html</a></code> node, however it is created, is the topmost
  node of the stack. It only gets popped off the stack when the parser
  <a href="the-end.html#stop-parsing" title="stop parsing">finishes</a>.</p>

  <p>The <dfn id="current-node">current node</dfn> is the bottommost node in this
  stack.</p>

  <p>The <dfn id="current-table">current table</dfn> is the last <code><a href="tabular-data.html#the-table-element">table</a></code>
  element in the <a href="#stack-of-open-elements">stack of open elements</a>, if there is
  one. If there is no <code><a href="tabular-data.html#the-table-element">table</a></code> element in the <a href="#stack-of-open-elements">stack of
  open elements</a> (<a href="the-end.html#fragment-case">fragment case</a>), then the
  <a href="#current-table">current table</a> is the first element in the <a href="#stack-of-open-elements">stack
  of open elements</a> (the <code><a href="semantics.html#the-html-element">html</a></code> element).</p>

  <p>Elements in the stack fall into the following categories:</p>

  <dl><dt><dfn id="special">Special</dfn></dt>
   <dd><p>The following elements have varying levels of special
   parsing rules: HTML's <code><a href="sections.html#the-address-element">address</a></code>, <code><a href="obsolete.html#the-applet-element">applet</a></code>,
   <code><a href="the-map-element.html#the-area-element">area</a></code>, <code><a href="sections.html#the-article-element">article</a></code>, <code><a href="sections.html#the-aside-element">aside</a></code>,
   <code><a href="semantics.html#the-base-element">base</a></code>, <code><a href="obsolete.html#basefont">basefont</a></code>, <code><a href="obsolete.html#bgsound">bgsound</a></code>,
   <code><a href="grouping-content.html#the-blockquote-element">blockquote</a></code>, <code><a href="sections.html#the-body-element">body</a></code>, <code><a href="text-level-semantics.html#the-br-element">br</a></code>,
   <code><a href="the-button-element.html#the-button-element">button</a></code>, <code><a href="tabular-data.html#the-caption-element">caption</a></code>, <code><a href="obsolete.html#center">center</a></code>,
   <code><a href="tabular-data.html#the-col-element">col</a></code>, <code><a href="tabular-data.html#the-colgroup-element">colgroup</a></code>, <code><a href="interactive-elements.html#the-command-element">command</a></code>,
   <code><a href="grouping-content.html#the-dd-element">dd</a></code>, <code><a href="interactive-elements.html#the-details-element">details</a></code>, <code><a href="obsolete.html#dir">dir</a></code>,
   <code><a href="grouping-content.html#the-div-element">div</a></code>, <code><a href="grouping-content.html#the-dl-element">dl</a></code>, <code><a href="grouping-content.html#the-dt-element">dt</a></code>,
   <code><a href="the-iframe-element.html#the-embed-element">embed</a></code>, <code><a href="forms.html#the-fieldset-element">fieldset</a></code>, <code><a href="grouping-content.html#the-figcaption-element">figcaption</a></code>,
   <code><a href="grouping-content.html#the-figure-element">figure</a></code>, <code><a href="sections.html#the-footer-element">footer</a></code>, <code><a href="forms.html#the-form-element">form</a></code>,
   <code><a href="obsolete.html#frame">frame</a></code>, <code><a href="obsolete.html#frameset">frameset</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h1</a></code>,
   <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h2</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h3</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h4</a></code>, <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h5</a></code>,
   <code><a href="sections.html#the-h1-h2-h3-h4-h5-and-h6-elements">h6</a></code>, <code><a href="semantics.html#the-head-element">head</a></code>, <code><a href="sections.html#the-header-element">header</a></code>,
   <code><a href="sections.html#the-hgroup-element">hgroup</a></code>, <code><a href="grouping-content.html#the-hr-element">hr</a></code>, <code><a href="semantics.html#the-html-element">html</a></code>,
   <code><a href="the-iframe-element.html#the-iframe-element">iframe</a></code>,  <code><a href="embedded-content-1.html#the-img-element">img</a></code>, <code><a href="the-input-element.html#the-input-element">input</a></code>,
   <code><a href="obsolete.html#isindex-0">isindex</a></code>, <code><a href="grouping-content.html#the-li-element">li</a></code>, <code><a href="semantics.html#the-link-element">link</a></code>,
   <code><a href="obsolete.html#listing">listing</a></code>, <code><a href="obsolete.html#the-marquee-element">marquee</a></code>, <code><a href="interactive-elements.html#the-menu-element">menu</a></code>,
   <code><a href="semantics.html#the-meta-element">meta</a></code>, <code><a href="sections.html#the-nav-element">nav</a></code>, <code><a href="obsolete.html#noembed">noembed</a></code>,
   <code><a href="obsolete.html#noframes">noframes</a></code>, <code><a href="scripting-1.html#the-noscript-element">noscript</a></code>, <code><a href="the-iframe-element.html#the-object-element">object</a></code>,
   <code><a href="grouping-content.html#the-ol-element">ol</a></code>, <code><a href="grouping-content.html#the-p-element">p</a></code>, <code><a href="the-iframe-element.html#the-param-element">param</a></code>,
   <code><a href="obsolete.html#plaintext">plaintext</a></code>, <code><a href="grouping-content.html#the-pre-element">pre</a></code>, <code><a href="scripting-1.html#the-script-element">script</a></code>,
   <code><a href="sections.html#the-section-element">section</a></code>, <code><a href="the-button-element.html#the-select-element">select</a></code>, <code><a href="semantics.html#the-style-element">style</a></code>,
   <code><a href="interactive-elements.html#the-summary-element">summary</a></code>, <code><a href="tabular-data.html#the-table-element">table</a></code>, <code><a href="tabular-data.html#the-tbody-element">tbody</a></code>,
   <code><a href="tabular-data.html#the-td-element">td</a></code>, <code><a href="the-button-element.html#the-textarea-element">textarea</a></code>, <code><a href="tabular-data.html#the-tfoot-element">tfoot</a></code>,
   <code><a href="tabular-data.html#the-th-element">th</a></code>, <code><a href="tabular-data.html#the-thead-element">thead</a></code>, <code><a href="semantics.html#the-title-element">title</a></code>,
   <code><a href="tabular-data.html#the-tr-element">tr</a></code>, <code><a href="grouping-content.html#the-ul-element">ul</a></code>, <code><a href="text-level-semantics.html#the-wbr-element">wbr</a></code>, and
   <code><a href="obsolete.html#xmp">xmp</a></code>; MathML's <code title="">mi</code>, <code title="">mo</code>, <code title="">mn</code>, <code title="">ms</code>, <code title="">mtext</code>, and <code title="">annotation-xml</code>; and SVG's <code title="">foreignObject</code>, <code title="">desc</code>, and
   <code title="">title</code>.</p></dd> 
   <dt><dfn id="formatting">Formatting</dfn></dt>
   <dd><p>The following HTML elements are those that end up in the
   <a href="#list-of-active-formatting-elements">list of active formatting elements</a>: <code><a href="text-level-semantics.html#the-a-element">a</a></code>,
   <code><a href="text-level-semantics.html#the-b-element">b</a></code>, <code><a href="obsolete.html#big">big</a></code>, <code><a href="text-level-semantics.html#the-code-element">code</a></code>,
   <code><a href="text-level-semantics.html#the-em-element">em</a></code>, <code><a href="obsolete.html#font">font</a></code>, <code><a href="text-level-semantics.html#the-i-element">i</a></code>,
   <code><a href="obsolete.html#nobr">nobr</a></code>, <code><a href="text-level-semantics.html#the-s-element">s</a></code>, <code><a href="text-level-semantics.html#the-small-element">small</a></code>,
   <code><a href="obsolete.html#strike">strike</a></code>, <code><a href="text-level-semantics.html#the-strong-element">strong</a></code>, <code><a href="obsolete.html#tt">tt</a></code>, and
   <code><a href="text-level-semantics.html#the-u-element">u</a></code>.</p></dd>

   <dt><dfn id="ordinary">Ordinary</dfn></dt>
   <dd><p>All other elements found while parsing an HTML
   document.</p></dd>

  </dl><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-the-specific-scope" title="has an element in the specific scope">have an element in a
  specific scope</dfn> consisting of a list of element types <var title="">list</var> when the following algorithm terminates in a
  match state:</p>

  <ol><li><p>Initialize <var title="">node</var> to be the <a href="#current-node">current
   node</a> (the bottommost node of the stack).</p></li>

   <li><p>If <var title="">node</var> is the target node, terminate in
   a match state.</p></li>

   <li><p>Otherwise, if <var title="">node</var> is one of the element
   types in <var title="">list</var>, terminate in a failure
   state.</p></li>

   <li><p>Otherwise, set <var title="">node</var> to the previous
   entry in the <a href="#stack-of-open-elements">stack of open elements</a> and return to step
   2. (This will never fail, since the loop will always terminate in
   the previous step if the top of the stack &#8212; an
   <code><a href="semantics.html#the-html-element">html</a></code> element &#8212; is reached.)</p></li>

  </ol><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-scope" title="has an element in scope">have an element in scope</dfn> when
  it <a href="#has-an-element-in-the-specific-scope">has an element in the specific scope</a> consisting
  of the following element types:</p>

  <ul class="brief"><li><code><a href="obsolete.html#the-applet-element">applet</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="tabular-data.html#the-caption-element">caption</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="semantics.html#the-html-element">html</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li> 
   <li><code><a href="tabular-data.html#the-table-element">table</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="tabular-data.html#the-td-element">td</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="tabular-data.html#the-th-element">th</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="obsolete.html#the-marquee-element">marquee</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="the-iframe-element.html#the-object-element">object</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code title="">mi</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
   <li><code title="">mo</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
   <li><code title="">mn</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
   <li><code title="">ms</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
   <li><code title="">mtext</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
   <li><code title="">annotation-xml</code> in the <a href="namespaces.html#mathml-namespace">MathML namespace</a></li>
   <li><code title="">foreignObject</code> in the <a href="namespaces.html#svg-namespace">SVG namespace</a></li>
   <li><code title="">desc</code> in the <a href="namespaces.html#svg-namespace">SVG namespace</a></li>
   <li><code title="">title</code> in the <a href="namespaces.html#svg-namespace">SVG namespace</a></li>
  </ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-list-item-scope" title="has an element in list item scope">have an element in list
  item scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
  scope</a> consisting of the following element types:</p>

  <ul class="brief"><li>All the element types listed above for the <i><a href="#has-an-element-in-scope">has an element
   in scope</a></i> algorithm.</li>
   <li><code><a href="grouping-content.html#the-ol-element">ol</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="grouping-content.html#the-ul-element">ul</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
  </ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-button-scope" title="has an element in button scope">have an element in button
  scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
  scope</a> consisting of the following element types:</p>

  <ul class="brief"><li>All the element types listed above for the <i><a href="#has-an-element-in-scope">has an element
   in scope</a></i> algorithm.</li>
   <li><code><a href="the-button-element.html#the-button-element">button</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
  </ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-table-scope" title="has an element in table scope">have an element in table
  scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
  scope</a> consisting of the following element types:</p>

  <ul class="brief"><li><code><a href="semantics.html#the-html-element">html</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li> 
   <li><code><a href="tabular-data.html#the-table-element">table</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
  </ul><p>The <a href="#stack-of-open-elements">stack of open elements</a> is said to <dfn id="has-an-element-in-select-scope" title="has an element in select scope">have an element in select
  scope</dfn> when it <a href="#has-an-element-in-the-specific-scope">has an element in the specific
  scope</a> consisting of all element types <em>except</em> the
  following:</p>

  <ul class="brief"><li><code><a href="the-button-element.html#the-optgroup-element">optgroup</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
   <li><code><a href="the-button-element.html#the-option-element">option</a></code> in the <a href="namespaces.html#html-namespace-0">HTML namespace</a></li>
  </ul><p>Nothing happens if at any time any of the elements in the
  <a href="#stack-of-open-elements">stack of open elements</a> are moved to a new location in,
  or removed from, the <code><a href="infrastructure.html#document">Document</a></code> tree. In particular, the
  stack is not changed in this situation. This can cause, amongst
  other strange effects, content to be appended to nodes that are no
  longer in the DOM.</p>

  <p class="note">In some cases (namely, when <a href="tree-construction.html#adoptionAgency">closing misnested formatting elements</a>),
  the stack is manipulated in a random-access fashion.</p>


  <h5 id="the-list-of-active-formatting-elements"><span class="secno">8.2.3.3 </span>The list of active formatting elements</h5>

  <p>Initially, the <dfn id="list-of-active-formatting-elements">list of active formatting elements</dfn> is
  empty. It is used to handle mis-nested <a href="#formatting" title="formatting">formatting element tags</a>.</p>

  <p>The list contains elements in the <a href="#formatting">formatting</a>
  category, and scope markers. The scope markers are inserted when
  entering <code><a href="obsolete.html#the-applet-element">applet</a></code> elements, buttons, <code><a href="the-iframe-element.html#the-object-element">object</a></code>
  elements, marquees, table cells, and table captions, and are used to
  prevent formatting from "leaking" <em>into</em> <code><a href="obsolete.html#the-applet-element">applet</a></code>
  elements, buttons, <code><a href="the-iframe-element.html#the-object-element">object</a></code> elements, marquees, and
  tables.</p>

  <p class="note">The scope markers are unrelated to the concept of an
  element being <a href="#has-an-element-in-scope" title="has an element in scope">in
  scope</a>.</p>

  <p>In addition, each element in the <a href="#list-of-active-formatting-elements">list of active formatting
  elements</a> is associated with the token for which it was
  created, so that further elements can be created for that token if
  necessary.</p>

  <p>When the steps below require the UA to <dfn id="push-onto-the-list-of-active-formatting-elements">push onto the list of
  active formatting elements</dfn> an element <var title="">element</var>, the UA must perform the following steps:</p>

  <ol><li><p>If there are already three elements in the <a href="#list-of-active-formatting-elements">list of
   active formatting elements</a> after the last list marker, if
   any, or anywhere in the list if there are no list markers, that
   have the same tag name, namespace, and attributes as <var title="">element</var>, then remove the earliest such element from
   the <a href="#list-of-active-formatting-elements">list of active formatting elements</a>. For these
   purposes, the attributes must be compared as they were when the
   elements were created by the parser; two elements have the same
   attributes if all their parsed attributes can be paired such that
   the two attributes in each pair have identical names, namespaces,
   and values (the order of the attributes does not matter).</p>

   <p class="note">This is the Noah's Ark clause. But with three per
   family instead of two.</p></li> 
   <li><p>Add <var title="">element</var> to the <a href="#list-of-active-formatting-elements">list of active
   formatting elements</a>.</p></li>

  </ol><p>When the steps below require the UA to <dfn id="reconstruct-the-active-formatting-elements">reconstruct the
  active formatting elements</dfn>, the UA must perform the following
  steps:</p>

  <ol><li>If there are no entries in the <a href="#list-of-active-formatting-elements">list of active formatting
   elements</a>, then there is nothing to reconstruct; stop this
   algorithm.</li>

   <li>If the last (most recently added) entry in the <a href="#list-of-active-formatting-elements">list of
   active formatting elements</a> is a marker, or if it is an
   element that is in the <a href="#stack-of-open-elements">stack of open elements</a>, then
   there is nothing to reconstruct; stop this algorithm.</li>

   <li>Let <var title="">entry</var> be the last (most recently added)
   element in the <a href="#list-of-active-formatting-elements">list of active formatting
   elements</a>.</li>

   <li>If there are no entries before <var title="">entry</var> in the
   <a href="#list-of-active-formatting-elements">list of active formatting elements</a>, then jump to step
   8.</li>

   <li>Let <var title="">entry</var> be the entry one earlier than
   <var title="">entry</var> in the <a href="#list-of-active-formatting-elements">list of active formatting
   elements</a>.</li>

   <li>If <var title="">entry</var> is neither a marker nor an element
   that is also in the <a href="#stack-of-open-elements">stack of open elements</a>, go to step
   4.</li>

   <li>Let <var title="">entry</var> be the element one later than
   <var title="">entry</var> in the <a href="#list-of-active-formatting-elements">list of active formatting
   elements</a>.</li>

   <li><a href="tree-construction.html#create-an-element-for-the-token">Create an element for the token</a> for which the
   element <var title="">entry</var> was created, to obtain <var title="">new element</var>.</li>

   <li>Append <var title="">new element</var> to the <a href="#current-node">current
   node</a> and push it onto the <a href="#stack-of-open-elements">stack of open
   elements</a> so that it is the new <a href="#current-node">current
   node</a>.</li>

   <li>Replace the entry for <var title="">entry</var> in the list
   with an entry for <var title="">new element</var>.</li>

   <li>If the entry for <var title="">new element</var> in the
   <a href="#list-of-active-formatting-elements">list of active formatting elements</a> is not the last
   entry in the list, return to step 7.</li>

  </ol><p>This has the effect of reopening all the formatting elements that
  were opened in the current body, cell, or caption (whichever is
  youngest) that haven't been explicitly closed.</p>

  <p class="note">The way this specification is written, the
  <a href="#list-of-active-formatting-elements">list of active formatting elements</a> always consists of
  elements in chronological order with the least recently added
  element first and the most recently added element last (except for
  while steps 8 to 11 of the above algorithm are being executed, of
  course).</p>

  <p>When the steps below require the UA to <dfn id="clear-the-list-of-active-formatting-elements-up-to-the-last-marker">clear the list of
  active formatting elements up to the last marker</dfn>, the UA must
  perform the following steps:</p>

  <ol><li>Let <var title="">entry</var> be the last (most recently added)
   entry in the <a href="#list-of-active-formatting-elements">list of active formatting elements</a>.</li>

   <li>Remove <var title="">entry</var> from the <a href="#list-of-active-formatting-elements">list of active
   formatting elements</a>.</li>

   <li>If <var title="">entry</var> was a marker, then stop the
   algorithm at this point. The list has been cleared up to the last
   marker.</li>

   <li>Go to step 1.</li>

  </ol><h5 id="the-element-pointers"><span class="secno">8.2.3.4 </span>The element pointers</h5>

  <p>Initially, the <dfn id="head-element-pointer"><code title="">head</code> element
  pointer</dfn> and the <dfn id="form-element-pointer"><code title="">form</code> element
  pointer</dfn> are both null.</p>

  <p>Once a <code><a href="semantics.html#the-head-element">head</a></code> element has been parsed (whether
  implicitly or explicitly) the <a href="#head-element-pointer"><code title="">head</code>
  element pointer</a> gets set to point to this node.</p>

  <p>The <a href="#form-element-pointer"><code title="">form</code> element pointer</a>
  points to the last <code><a href="forms.html#the-form-element">form</a></code> element that was opened and
  whose end tag has not yet been seen. It is used to make form
  controls associate with forms in the face of dramatically bad
  markup, for historical reasons.</p>


  <h5 id="other-parsing-state-flags"><span class="secno">8.2.3.5 </span>Other parsing state flags</h5>

  <p>The <dfn id="scripting-flag">scripting flag</dfn> is set to "enabled" if <a href="webappapis.html#concept-n-script" title="concept-n-script">scripting was enabled</a> for the
  <code><a href="infrastructure.html#document">Document</a></code> with which the parser is associated when the
  parser was created, and "disabled" otherwise.</p>

  <p class="note">The <a href="#scripting-flag">scripting flag</a> can be enabled even
  when the parser was originally created for the <a href="the-end.html#html-fragment-parsing-algorithm">HTML fragment
  parsing algorithm</a>, even though <code><a href="scripting-1.html#the-script-element">script</a></code> elements
  don't execute in that case.</p>

  <p>The <dfn id="frameset-ok-flag">frameset-ok flag</dfn> is set to "ok" when the parser is
  created. It is set to "not ok" after certain tokens are seen.</p>

  </div></body></html>