HTML5 charset declarations & validation
Fixed!
The W3C Validator used to give spurious warnings when using an HTML5-style charset. As of this bug has been fixed. Thanks Ville Skyttä!
I’ll leave the article for posterity, but the original summary stands: use <meta charset="utf-8"> for your HTML5 charset declaration.
The W3 validator can be a bit fussy with HTML5, giving several incorrect character encoding-related warnings depending on the input method used. This is due to a bug in an underlying Perl library (W3C Validator bug, Perl HTML::Encoding bug), and you can use Validator Nu instead, which doesn’t have this bug.
The validator is supposed to take character encoding (charset) from the first available source in this list:
- HTTP Content-Type (headers)
- then (if applicable) xml declaration
- then look for a meta
- then fall back to utf-8
For authors, only one of these snippets is sufficient to set the character encoding in HTML5:
- HTTP-header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - HTML4-style meta:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - Short meta:
<meta charset="utf-8">
If passing the W3C validator is important (eg your client asks about it), here is what’s needed to prevent the warnings from showing:
| Input method | Current warning-free requirements |
|---|---|
| Validate by URI | The W3C validator requires both an Apache HTTP-header of Content-Type: text/html; charset=UTF-8 and a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> element |
| Validate by File Upload | The W3C validator requires a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> element |
Note that for “Validate by Direct Input” the WC3 validator will always give at least one character encoding-related warning.
Testing results
| HTTP-header declaration | In-document declaration | W3 Validator result |
|---|---|---|
Content-Type: text/html; charset=UTF-8 |
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
|
Content-Type: text/html; charset=UTF-8 |
<meta charset="utf-8"> |
|
Content-Type: text/html; charset=UTF-8 |
None |
|
| None | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
|
| None | <meta charset="utf-8"> |
|
| None | None |
|
Note you can simulate no HTTP-header by using “Validate by File Upload”.
Also note that if you copy & paste into the Direct Input field, character encoding meta elements are ignored and you’ll always get the warning Using Direct Input mode: UTF-8 character encoding assumed
:
Unlike the ‘by URI’ and ‘by File Upload’ modes, the ‘Direct Input’ mode of the validator provides validated content in the form of characters pasted or typed in the validator’s form field. This will automatically make the data UTF-8, and therefore the validator does not need to determine the character encoding of your document, and will ignore any charset information specified.
Checking and fixing HTTP-headers
For more information about setting a character encoding, and why it’s a good thing, refer to the W3C’s Character Encoding in HTML and I18N FAQ: Setting charset information in .htaccess articles. You can check a page’s HTTP-headers with:
- Safari’s Web Inspector or Chrome’s Developer Tools: via Resources → select the HTML document → Response Headers
- Firefox: via Firebug’s Network panel (click on file), or via Chris Pederick’s Web Developer Toolbar (Information → View Response Headers)
- Opera’s Developer Tools (Dragonfly): via Tools → Advanced → Developer Tools, in the Network tab (click on file then choose “Headers”)
- via these web-based tools
You can add a Content-Type HTTP-header with this .htaccess declaration (W3C I18N FAQ again):
AddType 'text/html; charset=UTF-8' html

.jp