Oli.jp

Articles…

HTML5 charset declarations & validation

Fixed!

The W3C Validator used to give spurious warnings when using an HTML5-style charset. As of this bug has been fixed. Thanks Ville Skyttä!

I’ll leave the article for posterity, but the original summary stands: use <meta charset="utf-8"> for your HTML5 charset declaration.

The W3 validator can be a bit fussy with HTML5, giving several incorrect character encoding-related warnings depending on the input method used. This is due to a bug in an underlying Perl library (W3C Validator bug, Perl HTML::Encoding bug), and you can use Validator Nu instead, which doesn’t have this bug.

The validator is supposed to take character encoding (charset) from the first available source in this list:

  1. HTTP Content-Type (headers)
  2. then (if applicable) xml declaration
  3. then look for a meta
  4. then fall back to utf-8

For authors, only one of these snippets is sufficient to set the character encoding in HTML5:

If passing the W3C validator is important (eg your client asks about it), here is what’s needed to prevent the warnings from showing:

Input method Current warning-free requirements
Validate by URI The W3C validator requires both an Apache HTTP-header of Content-Type: text/html; charset=UTF-8 and a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> element
Validate by File Upload The W3C validator requires a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> element

Note that for “Validate by Direct Input” the WC3 validator will always give at least one character encoding-related warning.

Testing results

HTTP-header declaration In-document declaration W3 Validator result
Content-Type: text/html; charset=UTF-8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  • CheckPass!
Content-Type: text/html; charset=UTF-8 <meta charset="utf-8">
  • InformativeNo Character encoding declared at document level
Content-Type: text/html; charset=UTF-8 None
  • InformativeNo Character encoding declared at document level
None <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  • CheckPass!
None <meta charset="utf-8">
  • WarningNo Character Encoding Found! Falling back to UTF-8
  • InformativeNo Character encoding declared at document level
None None
  • WarningNo Character Encoding Found! Falling back to UTF-8
  • InformativeNo Character encoding declared at document level
  • ErrorNo explicit character encoding declaration has been seen yet (assumed utf-8) but the document contains non-ASCII

Note you can simulate no HTTP-header by using “Validate by File Upload”.

Also note that if you copy & paste into the Direct Input field, character encoding meta elements are ignored and you’ll always get the warning Using Direct Input mode: UTF-8 character encoding assumed:

Unlike the ‘by URI’ and ‘by File Upload’ modes, the ‘Direct Input’ mode of the validator provides validated content in the form of characters pasted or typed in the validator’s form field. This will automatically make the data UTF-8, and therefore the validator does not need to determine the character encoding of your document, and will ignore any charset information specified.

Checking and fixing HTTP-headers

For more information about setting a character encoding, and why it’s a good thing, refer to the W3C’s Character Encoding in HTML and I18N FAQ: Setting charset information in .htaccess articles. You can check a page’s HTTP-headers with:

  • Safari’s Web Inspector or Chrome’s Developer Tools: via Resources → select the HTML document → Response Headers
  • Firefox: via Firebug’s Network panel (click on file), or via Chris Pederick’s Web Developer Toolbar (Information → View Response Headers)
  • Opera’s Developer Tools (Dragonfly): via Tools → Advanced → Developer Tools, in the Network tab (click on file then choose “Headers”)
  • via these web-based tools

You can add a Content-Type HTTP-header with this .htaccess declaration (W3C I18N FAQ again):

AddType 'text/html; charset=UTF-8' html

Feedback

I hope that was of use. If you have any questions, feedback, or have found a mistake, please let me know via Twitter (@boblet). Tweet about it using this shortlink: http://01i.jp/h5charset

Changes #

  1. Finally migrated from boblet.tumblr.com, allowing me to walk the walk with HTML5 templates improvements, & CSS3 enrichment too.
  2. Added a note that this bug has been fixed. Yay!

Icons from FamFamFamCheck