Matt Peperell | Musings on markup of deeply nested tags

This post started off as a conversation with a Twitter mutual (also called Matt), in which we lamented some of the pain-points of HTML and came up with a couple of proto-ideas of solutions.

Markup, particularly in HTML and its relatives, can often have deeply nested tags. Sometimes, when writing a closing tag, the hierarchical context of its opening countpart can be off-screen. This is particularly the case if one of the tags contains a lot of content.

Take a look at the following example:

<html>
    <body>
        <div class="article">
            <section class="introduction">
            <h1>Introduction</h1>
            <p>This article demonstrates the use of placeholder text. It
            contains pseudo-latin text which starts with the wording
                <span lang="latin"> Lorem ipsum dolor sit amet</span>
            and </p>
        </div>
    </body>
</html>

This example is short so the off-screen effect does not apply so strongly. But one of the ideas was to have implicit closing tags, borne out of the idea that though HTML tags can be nested, they may not overlap. Using this suggestion, the previous fragment could be represented as:

<html>
    <body>
        <div class="article">
            <section class="introduction">
            <h1>Introduction</h1>
            <p>This article demonstrates the use of placeholder text. It
            contains pseudo-latin text which starts with the wording
                <span lang="latin"> Lorem ipsum dolor sit amet</span>
            and
</html>

Notice the absence of the </p>, </div> and </body> tags. Since the </html> tag cannot appear until these previous three tags have themselves be closed, why not have the </html> tag close them implicitly? As I write this, I’ve come up with an extra idea - that even the </html> tag, in this example, can be omitted if it were to appear at the end of a stream.

There are a few benefits to this:

less chance of making a typo in a closing tag’s name,
less screen real-estate used,
less need to keep track of 100% of context in mental state.

It’s also slightly future-proof; if further opening tags are added then they do not necessarily need their closing tags to be added.

A second proposal that we discussed, again based on the non-permisibility of overlapping tags, is to have closing tags of the following form

<html>
    <body>
        <div class="article">
            <section class="introduction">
            <h1>Introduction</h1>
            <p>This article demonstrates the use of placeholder text. It
            contains pseudo-latin text which starts with the wording
                <span lang="latin"> Lorem ipsum dolor sit amet</span>
            and
<////>

See the 4 / symbols? Although the HTML standard requires it, there is no need for the tag name to appear after the closing symbol. So this <////> tag closes 4 tags. It doesn’t have all of the benefits that the previous idea does, but it is still lightweight. And a small extension to this idea is to have an integer count:

<html>
    <body>
        <div class="article">
            <section class="introduction">
            <h1>Introduction</h1>
            <p>This article demonstrates the use of placeholder text. It
            contains pseudo-latin text which starts with the wording
                <span lang="latin"> Lorem ipsum dolor sit amet</span>
            and
</4/>

Note that the number refers to the number of closing tags, not the number of omitted / symbols. So </4/> means close 4 tags. Labouring the point, perhaps, but it’s easy to miscount (or mistype) the the number of closings to be performed by the tag </////////////>? It’s much kinder on the eyes/fingers/brain to see </13/>.

With the use of minimisers (such as might be performed by CDNs and other layer 7 technologies), the emphasis presented by use of indentation would be lost, and so use of the <////> and </4/> forms harder to debug than named closing tags.

Neither me-Matt nor other-Matt are members of the W3C. I’m also not anywhere near claiming that any of these proposals are ready for submission or adoption by the W3C, but it’s interesting to muse on what might be appropriate.

Perhaps a lighter-weight version of this would be to use a filter or wrapper language (such as jinja2, SASS, LESS, or Cheetah) and have the processing be done on the remote end. This also improves adoption because only servers need updating, rather than the plethora of browsers, screen readers, etc that exist in the wild. I’ve not yet written anything that does this, but I’d be interested to hear your thoughts, and especially if you go so far as to implement a filter.