Mark Polyakov

How to properly escape characters (especially spaces) in an HTML anchor name.

Problem

A well-loved HTML feature is the ability to have a link to a specific part of a page. Specifically, the “fragment portion” of a URL can identify an HTML element in the page, and after loading the page will automatically scroll to that element. For example,

No known cat breeds are considered arachnids, as I <a href="#cat-arachnids">explain later</a>.
...
...
<h2 id="cat-arachnids">Cat Arachnids</h2>
There is currently no overlap between the felines and the arachnids. Thankfully, our best and brightest scientists are working hard to rectify this.
...
...

For simple cases, everything is good. But what if you want to include special characters in the anchor name?

Things are a little interesting because the anchor name must both be written in the id attribute and in a URL. Most HTML attributes are allowed to be arbitrary strings, but unfortunately id attributes are an exception:

“When specified on HTML elements, the id attribute value must be unique amongst all the IDs in the element’s tree and must contain at least one character. The value must not contain any ASCII whitespace.”

(from the HTML5 spec).

Furthermore, the spec also clarifies that the id attribute should not be URL-encoded (the URL is decoded before trying to find a matching ID), so it is not correct to simply escape the spaces using %20. In fact, using only valid HTML, it is impossible to write an anchor target using the id attribute which will match a URL fragment identifier with a space in it.

Solution

However, you may be aware that there’s another way to create a named anchor in HTML: Using an <a> tag with the name attribute. As of HTML5, this is deprecated, but still valid. While name attributes are allowed to legally contain any character (including whitespace), we actually don’t want to include literal spaces; the spec specifies that, when using a name-d <a> tag as a fragment identifier, the name is URL-encoded, unlike when using the id attribute. So we should do something like <a name='cat%20arachnids'></a>

Non-ASCII characters

There doesn’t seem to be anything in the spec prohibiting the id attribute from storing non-ascii characters. The only prohibited characters are “ASCII Whitespace”.

What do Browsers do?

Both Chrome and Firefox break the spec to make spaces work better, by treating %20 specially in id attributes. According to the spec, %20 in the id attribute should match only %2520 in the URL. Chrome and Firefox both do work if you specify that URL, but they also let %20 in the ID match %20 in the URL. This is only for %20, though: If you put any other URL-encoded character into the id attribute, it will be treated according to the spec (ie, if your attribute is my%2Fanchor, it will only match the fragment URL my%252Fanchor, and not my%2Fanchor).

TL;DR

  • Do you need ASCII whitespace in your anchor names?
    • Yes: Use <a name="my%20url-encoded%20name">...</a>
    • No: Use <span id="my-non-url-encoded-name">...</span>

Both of these options are valid HTML5, though the former is technically deprecated.