Problem
A well-loved HTML feature is the ability to have a link to a specific part of a page. Specifically, the “fragment portion” of a URL can identify an HTML element in the page, and after loading the page will automatically scroll to that element. For example,
No known cat breeds are considered arachnids, as I <a href="#cat-arachnids">explain later</a>.
...
...
<h2 id="cat-arachnids">Cat Arachnids</h2>
There is currently no overlap between the felines and the arachnids. Thankfully, our best and brightest scientists are working hard to rectify this.
...
...
For simple cases, everything is good. But what if you want to include special characters in the anchor name?
Things are a little interesting because the anchor name must both be written in the id
attribute
and in a URL. Most HTML attributes are allowed to be arbitrary strings, but unfortunately id
attributes are an exception:
“When specified on HTML elements, the id attribute value must be unique amongst all the IDs in the element’s tree and must contain at least one character. The value must not contain any ASCII whitespace.”
(from the HTML5 spec).
Furthermore, the spec also
clarifies that
the id
attribute should not be URL-encoded (the URL is decoded before trying to find a matching
ID), so it is not correct to simply escape the spaces using %20
. In fact, using only valid HTML,
it is impossible to write an anchor target using the id
attribute which will match a URL fragment
identifier with a space in it.
Solution
However, you may be aware that there’s another way to create a named anchor in HTML: Using an <a>
tag with the name
attribute. As of HTML5, this is
deprecated, but still valid. While
name
attributes are allowed to legally contain any character (including whitespace), we actually
don’t want to include literal spaces; the spec specifies that, when using a name
-d <a>
tag as a
fragment identifier, the name is URL-encoded, unlike when using the id
attribute. So we should
do something like <a name='cat%20arachnids'></a>
Non-ASCII characters
There doesn’t seem to be anything in the spec prohibiting the id
attribute from storing non-ascii characters. The only prohibited characters are “ASCII Whitespace”.
What do Browsers do?
Both Chrome and Firefox break the spec to make spaces work better, by treating %20
specially in id
attributes. According to the spec, %20
in the id
attribute should match only %2520
in the URL. Chrome and Firefox both do work if you specify that URL, but they also let %20
in the ID match %20
in the URL. This is only for %20
, though: If you put any other URL-encoded character into the id
attribute, it will be treated according to the spec (ie, if your attribute is my%2Fanchor
, it will only match the fragment URL my%252Fanchor
, and not my%2Fanchor
).
TL;DR
- Do you need ASCII whitespace in your anchor names?
- Yes: Use
<a name="my%20url-encoded%20name">...</a>
- No: Use
<span id="my-non-url-encoded-name">...</span>
- Yes: Use
Both of these options are valid HTML5, though the former is technically deprecated.