The manakai project

Web::DOM

A Perl DOM implementation

SYNOPSIS

  use Web::DOM::Document;
  
  my $doc = new Web::DOM::Document;
  my $el = $doc->create_element ('a');
  $el->set_attribute (href => 'http://www.whatwg.org/');
  $doc->append_child ($el);

DESCRIPTION

The Web::DOM modules is a pure-Perl DOM implementation. It implements various Web standard specifications, including DOM Living Standard and HTML Living Standard.

USAGE

The Web::DOM::Document module provides the new method returning a new document object, which corresponds to the new Document () constructor in JavaScript Web browser environment.

  my $doc = new Web::DOM::Document; # XML document by default
  $doc->manakai_is_html (1); # Change to HTML document

Using the document object, the application can create various DOM object, using standard DOM methods:

  my $el = $doc->create_element ('p'); # HTML element
  my $el = $doc->create_element_ns ($nsurl, $qname);
  $el->set_attribute (class => 'hoge fuga');
  my $text = $doc->create_text_node ('text');
  my $comment = $doc->create_comment ('data');

Please note that DOM attributes and methods are available in perllish_underscored_name rather than domSpecificationsCamelCaseName.

Alternatively, you can instantiate the document object from an HTML or XML string, using the DOMParser interface:

  my $parser = new Web::DOM::Parser;
  my $doc = $parser->parse_from_string ($string, 'text/html');
  my $doc = $parser->parse_from_string ($string, 'application/xhtml+xml');

Your favorite query methods are also available:

  $el = $doc->get_element_by_id ('site-logo');
  $el = $doc->query_selector ('article > p:first-child');
  $el = $doc->evaluate ('//div[child::p]', $doc)->iterate_next;
  $col = $doc->get_elements_by_tag_name ('p');
  $col = $doc->get_elements_by_class_name ('blog-entry');
  $col = $doc->images;

For more information, see documentation of relevant modules. For example, methods available on the document object is listed in the Web::DOM::Document documentation. Frequently used modules include:

Web::DOM::Document

The Document interface.

Web::DOM::Element

The Element interface.

Web::DOM::Exception

The DOMException interface.

Web::DOM::HTMLCollection

The HTMLCollection interface.

Web::DOM::Parser

The DOMParser interface.

DOM MAPPING

The modules implement the manakai's DOM Perl Binding specification <http://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%27s%20DOM%20Perl%20Binding>, which defines the mapping between WebIDL/DOM and Perl.

As a general rule, the object implementing the DOM interface I is an instance of the class (or the class that is a subclass of the class) Web::DOM::I. However, applications should not rely on this, as the class inheritance hierarchy could be different from the interface's one, and could be changed in future revision of the module implementation. In particular, applications should not test whether the object is an instance of the interface that is defined with the [NoInterfaceObject] extended attribute. For example, the ParentNode interface is defined with the extended attribute. The Web::DOM::Document class inherits the Web::DOM::ParentNode class, as the Document interface implements the ParentNode interface according to the DOM Standard, but applications should not test $node->isa ('Web::DOM::ParentNode').

The constructor of a DOM interface, if any, is implemented as the new class method. For example, the constructor of the Document interface can be invoked by Web::DOM::Document->new.

Attributes, methods, and constants of a DOM interface can be accessible as methods of the object implementing the interface. For example, the innerHTML attribute of the Element interface is accessible as the inner_html method of the element objects. If a method corresponding to the attribute is invoked with no argument, it acts as the getter of the attribute. If the method is invoked with an argument, it acts as the setter of the attribute.

  $string_returned_by_getter = $el->inner_html;
  $el->inner_html ($string_received_by_setter);
  
  $string_returned_by_method = $el->get_attribute ($string);
  
  $el->node_type == $el->ELEMENT_NODE;

Some objects accept array operations:

  @children = @{$el->child_nodes};
  $length = @{$el->child_nodes};
  
  $first_child = $el->child_nodes->[0];
  $second_child = $el->child_nodes->[1];
  $second_last_child = $el->child_nodes->[-2];

CONSTRUCTORS

Following classes have the constructor (i.e. the new method):

CONSTANTS

Following modules export constants (by loading them using the use statement):

NOTE ON PRIVATE METHODS

Some classes contain private methods and variables. Applications must not invoke or use them. As a general rule methods with name starting by _ is private, although there might be exceptions (e.g. _manakai_border_spacing_x method, reflecting CSS -manakai-border-spacing-x property, is not a private method). Anything EXCEPT for followings are private and should not be used:

DOM APIs as documented in relevant pod documentation

For example, Web::DOM::Node::child_nodes, Web::DOM::Implementation::create_document, Web::DOM::Event::new, and Web::DOM::Node::ELEMENT_NODE are explicitly mentioned in their pod section.

Perl standard operations

For example, can and isa methods of any object, "" and 0+ operation of any object, $Web::DOM::Document::VERSION variable, use Web::DOM::Node operation (which implicitly invokes the Web::DOM::Node::import method).

Applications can also rely on isa method with class name derived from DOM interface name whose definition does not contain [NoInterfaceObject]. For example, $object->isa ('Web::DOM::Node') does (and will) work as intended, while $object->isa ('Web::DOM::CanvasPathMethod') (defined with [NoInterfaceObject]) or $object->isa ('Web::DOM::StringArray') (not derived from a DOM interface name) might not. However, it is not considered a good practice to compare objects by its class name in sophiscated object-oriented programs.

Public APIs are not intended to be changed backward incompatibly in later stage of the development of these modules unless it is really necessary for some significant reasons (e.g. security concerns, or to resolve spec compatibility issues). Anything else could be changed, including package/file mapping of classes which do not provide constructors or constants.

SPECIFICATIONS

Specifications defining features supported by the modules include:

DOM

DOM Standard <http://dom.spec.whatwg.org/>.

DOMPARSING

DOM Parsing and Serialization Standard <http://domparsing.spec.whatwg.org/>.

DOM3CORE

Document Object Model (DOM) Level 3 Core Specification <http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/DOM3-Core.html>.

DOMXPATH

Document Object Model XPath <http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html>.

HTML

HTML Standard <http://www.whatwg.org/specs/web-apps/current-work/>.

DOMDTDEF

DOM Document Type Definitions <http://suika.suikawiki.org/www/markup/xml/domdtdef/domdtdef>.

DOMPERL

manakai's DOM Perl Binding <http://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%27s%20DOM%20Perl%20Binding>.

MANAKAI

manakai DOM Extensions <http://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%20DOM%20Extensions>.

For the complete list of relevant specifications, see documentations of the modules.

DEPENDENCY

The modules require Perl 5.10 or later.

Following features require the perl-web-markup package <https://github.com/manakai/perl-web-markup> (Web::HTML::Parser and its family): inner_html, outer_html, insert_adjacent_html, DOMParser, and XMLSerializer; (Web::XPath::Parser and related modules): XPathEvaluator and XPathExpression; (Web::HTML::Microdata: manakai_get_properties).

Following features require the perl-web-css package <https://github.com/manakai/perl-web-css>: query_selector, query_selector_all, CSSStyleSheet, CSSRule and its subclasses, and CSSStyleDeclaration.

Following features require the perl-web-encodings package <https://github.com/manakai/perl-web-encodings>: setter of input_encoding method of Document and Entity.

Features performing URL-related operations require the perl-web-url package <https://github.com/manakai/perl-web-url>, which depends on the perl-web-encodings package <https://github.com/manakai/perl-web-encodings>. Such features include: base_uri, manakai_set_url, manakai_entity_uri, manakai_entity_base_uri, declaration_base_uri, manakai_declaration_base_uri, action, cite, codebase, data, formaction, href, longdesc, object, ping, poster, and src.

Following features require modules in the perl-web-datetime package <https://github.com/manakai/perl-web-datetime>: value of Web::DOM::AtomDateConstruct, create_atom_feed_document, create_atom_entry_element, updated_element, and published_element.

Using CSS, Selectors, and Media Queries

How CSS style sheets are parsed and how CSSOM tree structure looks like depend on how much of CSS features are supported by the user agent. Since the web-dom module set by itself is not a rendering engine, most CSS features are considered as "not supported", therefore by default parsing discards most of CSS declarations. If you'd like to construct a CSS-based application on the top of the web-dom module set, you should turn on features you are supporting, through Web::CSS::MediaResolver module in the web-css package. The Web::CSS::MediaResolver object for a document's CSS parser can be accessed like this:

  use Web::CSS::Parser;
  my $parser = Web::CSS::Parser->get_parser_for_document ($doc);
  $resolver = $parser->media_resolver;

... where $doc is the document node with which the CSS style sheet in question will be associated. Then, you can set the "supported" flag of features you are supporting, like this:

  $resolver->{prop}->{display} = 1;
  $resolver->{prop_value}->{display}->{block} = 1;

For more information on usage of the resolver, see Web::CSS::MediaResolver in the web-css package.

DEVELOPMENT

Latest version of the modules is available from the GitHub repository: <https://github.com/manakai/perl-web-dom>.

Test results can be reviewed at: <https://travis-ci.org/manakai/perl-web-dom>.

HISTORY

The manakai project has been developed several generations of DOM implementation. The current DOM3 implementation <https://github.com/wakaba/manakai/tree/master/lib/Message/DOM> had been worked since 2007.

The Web::DOM modules has been developed as replacement for those modules, supporting the current DOM Standard. It does not reuse most of the code of the older implementation, and many useless DOM3 features are not implemented. However, it does implement some DOM3 features that is really necessary for backward compatibility, as well as non-standard manakai extensions. It should be possible for applications using the old implementation to migrate to the new implementation by just replacing class name and as such.

Obsolete features

Following features fully or partially implemented in previous versions of manakai DOM implementations are considered obsolete and will not be implemented by these modules unless they are reintroduced by some DOM specification or found to be necessary for backward compatibility:

DOMImplementationRegistry, DOMImplementationSource, DOMImplementationList, DOM features, DOMStringList, StringExtended, read-only nodes, EntityReference, CDATASection, replaceWholeText, isElementContentWhitespace, specified setter, hasReplacementTree setter, DOM3 configuration parameters, configuration parameters for DOM3 spec compatible DTD-based node operations, DOM3 DOMError, DOM Standard DOMError, DOMErrorHandler, UserDataHandler, DOMLocator, isId and family, internalSubset, TypeInfo and schemaTypeInfo, DOM3 LS, namespaces for DOM3 events, EventException, MutationEvent, MutationNameEvent, TextEvent, DocumentEvent->canDispatch, DocumentType->implementation, Document->createXHTMLDocument, URIReference, InternetMediaType, MANAKAI_FILTER_OPAQUE, Document->manakaiCreateSerialWalker, SerialWalker. HTMLElement->irrelevant, HTMLAnchorElement->media, HTMLAreaElement->media, HTMLCommandElement, HTMLDataGridElement, HTMLEventSourceElement, HTMLIsIndexElement, HTMLLegendElement->form, HTMLMenuElement->autosubmit, HTMLBlockquoteElement, HTMLStrictlyInlineContainerExtended, HTMLStructuredInlineContainerExtended, HTMLStructuredInlineContainerExtended, HTMLSectioningElementExtended, HTMLListElementExtended, HTMLDListElementExtended, CSSStyleDeclaration->styleFloat. Overloaded operators ==, !=, and .=, write operations through overloaded @{} and %{} operators for NodeList, NamedNodeMap, and HTMLCollection. Attr, Entity, and AttributeDefinition nodes can no longer contain Text nodes.

By default the DocumentType node can no longer contain ProcessingInstruction nodes as children. The old behavior can be restored by setting a true value to the manakai-allow-doctype-children configuration parameter (See Web::DOM::Configuration).

The strict_error_checking attribute no longer disables random exceptions as defined in DOM3 specification; its scope is formally defined in the manakai DOM Extensions specification [MANAKAI].

TODO

The initial milestone of the project is reimplementing the subset of DOM supported by the original manakai's DOM implementation <https://github.com/wakaba/manakai/tree/master/lib/Message/DOM>, except for obsolete features. Following features will be (re)implemented in due course:

CSSOM Cascading API

getComputedStyle [CSSOM], Element.prototype.manakaiComputedStyle, Window.prototype.manakaiGetComputedStyle, Window.prototype.setDocument [MANAKAI]

WebVTT DOM [HTML] [WEBVTT]

More features not supported by previous versions of manakai DOM implementation are expected to be implemented as well, including but not limited to:

HTMLFormControlsCollection, HTMLOptionsCollection [HTML]
Mutation observers [DOM]
Selectors API Level 2 features
DocumentStyle API [CSSOM]
<?xml-stylesheet?> API [CSSOM]
@font-face, @page [CSSOM]
SVGElement->style [CSSOM]
GetStyleUtils, PseudoElement [CSSOM]
New mutation methods [DOM]

prepend, append, before, after, replace, remove

DOM Ranges

DOM Ranges interfaces and methods [DOM]; Ranges support in DOM Core methods and attributes [DOM]; Range.prototype.createContextualFragment [DOMPARSING].

Shadow DOM [DOM]
Custom Elements [DOM, HTML]

In addition, source codes of the modules include many "XXX" markers, indicating TODO items.

Middle priority: URL; Encoding; Promise.

Lower priority: Form API; HTMLMediaElement and related interfaces; Canvas; The ImageBitmap interface; The Screen interface; SVG; DnD; The RelatedEvent interface; The Window interface and related interfaces; The History interface and related interfaces; The Location interface; The Navigator interface and related interfaces; Scripting; Workers; Console; XHR; EventSource; WebSocket; postMessage and related interfaces; Storage; IndexedDB; Fullscreen; Notifications. JS-compatible Date, JSON objects.

Very low priority: Zip; XSLT 1.0.

At the time of writing, there is no plan to implement the properties attribute of the HTMLElement interface (Instead, the manakaiGetProperties method is implemented).

LIMITATIONS

Methods returning the index or position in some list or string, whose IDL type is a number type, do not convert the value as specified by the WebIDL specification and the DOM Perl Binding specification. This should not be a problem as it is not realistic to have lists of items whose length is greater than, or nearly equal to 2**31 in both Perl's runtime environment and realworld use cases.

Although the modules implement APIs as used in the Web platform, they does not support the Web's security model, i.e. the same-origin policy. It does not make sense for Perl applications.

AUTHOR

Wakaba <wakaba@suikawiki.org>.

LICENSE

Copyright 2007-2019 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.