Web::HTML::Validator
DOM Conformance Checker
SYNOPSIS
use Web::HTML::Validator;
my $val = Web::HTML::Validator->new;
$val->onerror (sub {
my %arg = @_;
warn get_node_path ($arg{node}), ": ",
($arg{level} || "Error"), ": ",
$arg{type}, "\n";
});
$val->check_node ($doc);
DESCRIPTION
The Perl module Web::HTML::Validator
contains methods for conformance checking (or validation) of DOM tree with regard to relevant Web standards such as HTML and CSS. Although the module name contains "HTML", it can also be used to check the conformance of non-HTML XML documents. See also "SPECIFICATIONS".
METHODS
This module has following methods:
$val = Web::HTML::Validator->new
-
Create a new instance of the validator.
$val->check_node ($node)
-
Validate the specified node. If the node is not a document node, the node is validated as if it were an orphaned node, i.e. a node with no parent or owner. The node can be an attribute, but element- or attribute-specific validation is not performed in that case.
Errors and warnings are reported through the
onerror
handler. $code = $val->onerror
$val->onerror ($code)
-
Get or set the error handler for the validator. Any conformance error, as well as warning and additional processing information, is reported to the handler. See
<https://github.com/manakai/data-errors/blob/master/doc/onerror.txt>
for details of error handling.The value should not be set during the validation. If the value is changed, the result is undefined.
$dids = $val->di_data_set
$val->di_data_set ($dids)
-
Get or set the "di" data set for the validator. It is used for reporting errors in nested subdocuments contained in the validated document (e.g.
iframe
srcdoc
documents).See also SuikaWiki:manakai index data structure
<http://wiki.suikawiki.org/n/manakai%20index%20data%20structures>
. $boolean = $val->scripting
$val->scripting ($boolean)
-
Get or set scripting is enabled (true) or disabled (false) for the purpose of validation. By default, scripting is disabled. It affects validation of the HTML
noscript
element.The value should not be set during the validation. If the value is changed, the result is undefined.
$boolean = $val->image_viewable
$val->image_viewable ($boolean)
-
Get or set whether the intended user is known to be able to view images or not. Its default is false (not known). This affects whether missing of the
alt
attribute of theimg
element is conforming or not.The value should not be set during the validation. If the value is changed, the result is undefined.
Since the input to the validator is a DOM, not a string, syntax-level conformance errors can't be checked. For detecting any conformance error, you have to parse the string using appropriate parser (Web::HTML::Parser for HTML, or Web::XML::Parser for XML), and then invoke the validator with the result DOM as the input.
DEPENDENCY
In addition to the dependency described in the README file <https://github.com/manakai/perl-web-markup/blob/master/README.pod#dependency>
, following modules (and modules required by them) are required by this module:
- perl-web-css <https://github.com/manakai/perl-web-css>
- perl-web-datetime <https://github.com/manakai/perl-web-datetime>
- perl-web-langtag <https://github.com/manakai/perl-web-langtag>
- perl-web-resource <https://github.com/manakai/perl-web-resource>
- perl-web-url <https://github.com/manakai/perl-web-url>
- perl-regexp-utils <https://github.com/wakaba/perl-regexp-utils>
- perl-web-js <https://github.com/manakai/perl-web-js>
-
In addition, JE is required.
SPECIFICATIONS
- XML
-
Extensible Markup Language (XML) 1.0
<https://www.w3.org/TR/xml/>
.XML 1.0 Fifth Edition Specification Errata
<https://www.w3.org/XML/xml-V10-5e-errata>
. - XMLNS
-
Namespaces in XML 1.0
<https://www.w3.org/TR/xml-names/>
.Namespaces in XML 1.0 (Third Edition) Errata
<https://www.w3.org/XML/2009/xml-names-errata>
. - HTML
-
HTML Standard
<http://c.whatwg.org/>
.The
html
element in the HTML namespace MAY be used as the root element.A
DocumentFragment
MAY contain any child element and text node.The children of a
template
element in the HTML namespace (which is different from the template content of the element) MUST be empty.Contents of the
noscript
element when scripting is enabled and theiframe
element MUST be validated as follows:Let /context/ be the element in question. Let /container/ be a new HTML element whose node document is same as the node document of /context/. The local name of the element is the return value of the following substeps: If /context/ is an HTML |iframe| element, return |span|. Otherwise, if /context/ is a descendant of a |head| element or a descendant of a template content whose content model is metadata content, return |head|. Otherwise, if /context/ has a parent element and the content model of the parent element would require the content model of /context/ be phrasing content given that /context/ were transparent, return |span|. Otherwise, return |div|. Invoke the HTML fragment parsing algorithm with /container/ as the /context/ element and the |textContent| attribute value of /context/ as the /input/. Append the returned list of nodes to /container/ in order. If this step results in one or more parse errors, /context/ is not conforming. Let /disallowed/ be an empty list. Add elements disallowed by content model of inclusive ancestors of /context/ to /disallowed/, if any. If /context/ is an HTML |iframe| element, add HTML |script| element to /disallowed/. If /context/ is an HTML |noscript| element, add HTML |noscript| and |script| elements to /disallowed/. Check the conformance of /container/ and its descendants, with the following exceptions: Elements in /disallowed/ MUST NOT be used. If /container/ is an HTML |head| element, it MUST contain only HTML |link|, |style|, and |meta| elements. The |head| element does not require any |title| element.
Note that this is a willful violation to the HTML Standard to simplify the validation process, as the spec's requirements are too complex to implement nevertheless that complexity would not help authors as much. The set of the validation errors detected by these steps is not exactly same as that of the HTML Standard.
Unless otherwise specified, for the purpose of validation of HTML documents or fragments (serialized in the HTML syntax and then) embedded within other DOM attribute or node, such as the
srcdoc
attribute of the HTMLiframe
element, and Atom or RSS elements, whether scripting is enabled or disabled for the document associated with the HTML parser used to parse the document or fragment, as well as whether scripting is enabled or disabled for the nodes returned by the HTML parser, is same as whether scripting is enabled or disabled for the node document of the node.If the |http-equiv| attribute of the |meta| element is in the Default style state, the |content| attribute value MUST NOT be the empty string.
- OBSVOCAB
-
manakai's Conformance Checking Guideline for Obsolete HTML Elements and Attributes
<http://suika.suikawiki.org/www/markup/html/exts/manakai-obsvocab>
. - XSLT
-
XSL Transformations (XSLT) Version 1.0
<http://www.w3.org/TR/xslt>
.XSL Transformations (XSLT) Version 1.0 Specification Errata
<http://www.w3.org/1999/11/REC-xslt-19991116-errata/>
.Key words "must" and "should" are to be interpreted as described in RFC 2119.
- ATOM
-
The Atom Syndication Format
<http://tools.ietf.org/html/rfc4287>
,<http://www.rfc-editor.org/errata_search.php?rfc=4287>
.The
rel
attribute value MUST be a link type or link relation for which semantics in Atom document is specified. It MUST NOT be a non-conforming link type. - ATOM03
-
The Atom Syndication Format 0.3 (PRE-DRAFT)
<https://github.com/mnot/I-D/blob/master/Published/atom-format/draft-nottingham-atom-format-02.xml>
.The
rel
attribute value MUST be a link type or link relation for which semantics in Atom 0.3 document is specified. It MUST NOT be a non-conforming link type. - ATOMTHREADS
-
Atom Threading Extension
<http://tools.ietf.org/html/rfc4685>
. - ATOMHISTORY
-
Feed Paging and Archiving
<https://tools.ietf.org/html/rfc5005>
. - ATOMPUB
-
The Atom Publishing Protocol
<https://tools.ietf.org/html/rfc5023>
. - ATOMDELETED
-
The Atom "deleted-entry" Element
<http://tools.ietf.org/html/rfc6721>
. - RSS1
-
RDF Site Summary (RSS) 1.0
<http://web.resource.org/rss/1.0/spec>
. - RSSDC
-
RDF Site Summary 1.0 Modules: Dublin Core
<http://web.resource.org/rss/1.0/modules/dc/>
. - DCES
-
DCMI: Dublin Core Metadata Element Set, Version 1.1: Reference Description
<http://dublincore.org/documents/dces/>
. - RSSCONTENT
-
RDF Site Summary 1.0 Modules: Content
<http://web.resource.org/rss/1.0/modules/content/>
. - HATENA
-
はてなXML名前空間 - Hatena Developer Center
<http://developer.hatena.ne.jp/ja/documents/other/misc/xmlns>
. - RSS2
-
RSS 2.0
<http://www.rssboard.org/rss-specification>
. - RSSBP
-
RSS Best Practices Profile
<http://www.rssboard.org/rss-profile>
. - MEDIARSS
-
Media RSS Specification
<http://www.rssboard.org/media-rss>
. - ITUNES
-
RSS tags for Podcasts Connect - Podcasts Connect Help
<https://help.apple.com/itc/podcasts_connect/#/itcb54353390>
. - CSSSTYLEATTR
-
CSS Style Attributes
<http://dev.w3.org/csswg/css-style-attr/>
.CSS Syntax
<http://dev.w3.org/csswg/css-syntax/#parse-a-list-of-declarations>
. - SCHEMAORG
-
Schema.org
<http://schema.org/>
.An item value whose data type is
<http://schema.org/Integer>
MUST be a valid integer. An item value whose data type is<http://schema.org/URL>
MUST be an absolute URL. - DATAVOCAB
-
data-vocabulary.org
<http://www.data-vocabulary.org/>
.Structured data
<https://support.google.com/webmasters/topic/2643152?ref_topic=30163>
.If the value is defined as a URL, image, or link, it MUST be an absolute URL.
- ARIA
-
Accessible Rich Internet Applications (WAI-ARIA) 1.1
<http://w3c.github.io/aria/aria/aria.html>
.When an attribute value is defined as "token list", the value MUST be a valid unordered set of unique space-separated tokens.
When an attribute value is defined as "ID reference list", the value MUST be a valid ordered set of unique space-separated tokens.
- OGP
-
The Open Graph protocol
<http://ogp.me/>
.The RDF schema
<http://ogp.me/ns/ogp.me.ttl>
.Open Graph Reference Documentation
<https://developers.facebook.com/docs/reference/opengraph>
.Creating Custom Stories
<https://developers.facebook.com/docs/opengraph/creating-custom-stories/>
.Achievements API
<https://developers.facebook.com/docs/games/achievements>
.Open Graph protocol
<http://web.archive.org/web/20111006152122/http://developers.facebook.com/docs/opengraph/>
. - OGPMIXI
-
技術仕様 << mixi Developer Center (ミクシィ デベロッパーセンター)
<http://developer.mixi.co.jp/connect/mixi_plugin/mixi_check/spec_mixi_check/>
. - OGPGREE
-
Social Feedback - GREE Developer Center
<https://docs.developer.gree.net/ja/platform/connect/socialfeedback>
. - HTMLPRE5924
-
HTML Standard Tracker
<https://html5.org/r/5924>
. - HTMLPRE5925
-
HTML Standard Tracker
<https://html5.org/r/5925>
. - WHATWGWIKI
-
WHATWG Wiki MetaExtensions
<https://wiki.whatwg.org/wiki/MetaExtensions>
.WHATWG Wiki RelExtensions
<https://wiki.whatwg.org/wiki/RelExtensions>
.Unless otherwise specified, link types marked as "accepted" in the RelExtensions table MUST be treated as if it were part of the Microformats Wiki's relevant table.
- UFWIKI
-
Microformats Wiki - existing rel values
<http://microformats.org/wiki/existing-rel-values>
. - CSP
-
Content Security Policy
<https://w3c.github.io/webappsec/specs/content-security-policy/>
. - MIMESNIFF
-
MIME Sniffing
<https://mimesniff.spec.whatwg.org/>
. - URL
-
URL Standard
<https://url.spec.whatwg.org/>
. - MANAKAI
-
manakai DOM Extensions
<https://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%20DOM%20Extensions>
.Any node MAY be used as orphan node.
- VALLANGS
-
DOM Tree Validation
<https://manakai.github.io/spec-dom/validation-langs>
.
The validator also supports much more Web standards (indirectly via required modules), including but not limited to CSS, IETF BCP 47 language tags, Encoding Standard, and XML 1.0 DTD.
Note that HTML2, HTML3, HTML4, HTML 5.0, HTML 5.1, HTML 5.2, HTML 5.3, XML 1.1, Namespaces in XML 1.1, XML Base, xml:id
, XLink, XInclude, XHTML1, XHTML Modularization, Ruby Annotations, RSS 0.9, RDFa, XForms, XHTML2, HLink, XML Events, XFrames, and RDF/XML are not considered as applicable specifications. The module does not support ARIA attributes in its own namespace. Also, the module does not support historical HTML features no longer part of the language, except for those explicitly listed in the OBSVOCAB specification. See the OBSVOCAB specification for details.
AUTHOR
Wakaba <wakaba@suikawiki.org>.
LICENSE
Copyright 2007-2018 Wakaba <wakaba@suikawiki.org>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.