Web::HTML::Dumper
Dump DOM tree by the parser test format
SYNOPSIS
use Web::HTML::Dumper qw(dumptree);
warn dumptree $doc;
DESCRIPTION
The Web::HTML::Dumper
exports a function, dumptree
, which serializes the given document into the format used in HTML parser tests.
FUNCTION
The module exports a function:
$dumped = dumptree $doc
-
Dump the DOM tree. The argument must be a DOM document object (i.e. an instance of Web::DOM::Document class). The function returns the dump for the document and its subtree.
DUMP FORMAT
The function serializes the DOM tree into the format used in HTML parser tests, as described in <http://wiki.whatwg.org/wiki/Parser_tests#Tree_Construction_Tests>
and <https://github.com/html5lib/html5lib-tests/tree/master/tree-construction>
, with following exceptions:
- Only the "#document" part of the tree construction test is returned.
- No "| " prefix is prepended to lines.
- XML-only node types are also supported.
-
Element type definition, entity, and notation nodes attached to a document type node is serialized as if they were children of the document type node. They are inserted before any children of the document type node, sorted by node types in the aforementioned order, then by code point order of their node names.
Element type definition nodes are represented as
<!ELEMENT
, followed by aU+0020
SPACE
character, followed by the node name, followed by aU+0020
SPACE
character, followed by thecontentModelText
of the node, followed by>
.Entity nodes are represented as
<!ENTITY
, followed by aU+0020
SPACE
character, followed by the node name, followed by aU+0020
SPACE
character, followed by list oftextContent
,publicId
, andsystemId
of the node (the empty string is used when the value isundef
), where each item is enclosed by"
characters, separated by aU+0020
SPACE
character, followed by aU+0020
SPACE
character, followed by thenotationName
of the node, if it is notundef
, followed by>
.Notation nodes are represented as
<!NOTATION
, followed by aU+0020
SPACE
character, followed by the node name, followed by aU+0020
SPACE
character, followecd by list ofpublicId
andsystemId
of the node (the empty string is used when the value isundef
), where each item is enclosed by"
characters, separated by aU+0020
SPACE
character, followed by>
.Attribute definition nodes attached to an element type definition node is serialized as if they were children of the element type node, sorted by code point order of their node names.
Attribute type definition nodes are represented as the node name, followed by a
U+0020
SPACE
character, followed by the keyword represented bydeclaredType
of the node (orENUMERATION
if it represents the enumerated type), followed by aU+0020
SPACE
character, followed by(
, followed by the list ofallowedTokens
of the node separated by|
, followed by)
, followed by aU+0020
SPACE
character, followed by the keyword represented bydefaultType
or the node (orEXPLICIT
if it reprensets the explicit default value), followed by aU+0020
SPACE
character, followed by"
, followed by thetextContent
of the node, followed by"
. - Namespace designators are extended.
-
The namespace designator for the HTML namespace (
http://www.w3.org/1999/xhtml
) ishtml
. While elements in the HTML namespace are serialized without the namespace designator as in original format, attributes in the HTML namespace are serialized with this namespace designator.An application can define a custom namespace designator by setting the key-value pair to the
%$Web::HTML::Dumper::NamespaceMapping
hash:$Web::HTML::Dumper::NamespaceMapping->{$url} = $prefix;
For example, if the application does:
$Web::HTML::Dumper::NamespaceMapping ->{q<urn:x-suika-fam-cx:markup:suikawiki:0:9:>} = 'sw';
... then
document
in the SuikaWiki/0.9 namespace is serialized assw document
.When no namespace designator is explicitly defined for a namespace, the namespace designator for the namespace is
{
followed by the namespace URL followed by}
. If an element has no namespace, the namespace designator for the element is{}
.
SEE ALSO
Parser tests - WHATWG Wiki <http://wiki.whatwg.org/wiki/Parser_tests>
.
html5lib-tests <https://github.com/html5lib/html5lib-tests/tree/master/tree-construction>
.
AUTHOR
Wakaba <wakaba@suikawiki.org>.
LICENSE
Copyright 2007-2013 Wakaba <wakaba@suikawiki.org>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.