The HTML end tag means end of document, or does it? : Algorithms for the masses

The HTML end tag means end of document, or does it?

As anyone who’s ever written an HTML document would surely know, everything apart from the initial DOCTYPE declaration appears in between <html> and </html>. Putting it in XML terms, an HTML document consists of one element, the HTML element. And, as it happens, it has two elements within it: the head and the body. End of story? Well, no; otherwise I wouldn’t be writing this.

Paul Usher (DevExpress tech evangelist extraordinaire) and I were perusing some extremely – can I be blunt here? – crappy ASP.NET MVC code the other day, written by an development outsourcing company that obviously should be doing everything but development, when I came across an whole bunch of <script> tags below the final </html> tag. Wait, what? (And that wasn’t the biggest facepalm in this HTML document: for example Bootstrap, which uses jQuery for its plugins, was being loaded before jQuery. Which led to an error, which led to us reading this code in the first place.)

So I did some searching. I mean it seems pretty obvious to me that the HTML end tag is, well, the end of the HTML document, but maybe I was wrong.

First step is the specifications for HTML at the World Wide Web Consortium (W3C). There under section 8 of the HTML5 spec, we are told:

“Documents must consist of the following parts, in the given order:

Optionally, a single "BOM" (U+FEFF) character.

Any number of comments and space characters.

A DOCTYPE.

Any number of comments and space characters.

The root element, in the form of an html element.

Any number of comments and space characters.”

In other words, after the HTML element itself (and elsewhere in the spec it says that element can only consist of a head element followed by a body element) all that can appear are HTML comments (defined elsewhere in the spec) and space characters (ditto). Certainly no script elements can be there. If I were writing a parser, I could pretty much assume that everything after that closing HTML tag could be ignored. The weird thing is, the browser (in my case Firefox) was reading, loading, and acting on those extra-curricular scripts.

A bit more research turned up a couple of StackOverflow questions about the subject (and believe me it’s hard to know what keywords to search for). The best one I found even came with a recommendation from Google about deferred CSS being put after that </html> tag. A couple of valid points were raised about this practice, the main one being that it’s entirely user-agent specific as to what happens. In other words, there’s no guarantee that the browser you are using will act in the same way as the one your user might be using (across the universe of all possible desktop and mobile browsers). Validation services (such as W3C’s) will certainly label the HTML as being invalid, but a browser will usually bend the rules a bit and try to do the right thing (for some definition of “right”). I have no idea why Google devs of all people would recommend putting anything after </html>.

So there you have it. Don’t put anything after the closing tag for the HTML element. You’re going to basically be crossing your fingers that it’ll be found and parsed and acted on correctly. Just move it up to just before the </body> tag – that’s a whole two statements, if you’re counting – and you’ll be fine.

Slippery when wet

Now playing:
The Jazzmasters - Down so Low
(from The Jazzmasters 3)

Fri 9-Oct-2015 6:30 PM Blog / tags: html5 endtag

Loading links to posts on similar topics...

previous post next post

No Responses

Feel free to add a comment...

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

Emphasize with italics: surround word with underscores _emphasis_
Emphasize strongly: surround word with double-asterisks **strong**
Link: surround text with square brackets, url with parentheses [text](url)
Inline code: surround text with backticks `IEnumerable`
Unordered list: start each line with an asterisk, space * an item
Ordered list: start each line with a digit, period, space 1. an item
Insert code block: start each line with four spaces
Insert blockquote: start each line with right-angle-bracket, space > Now is the time...

by Julian M Bucknall