DOM Tree Validation

The manakai project, 6 May 2018

Latest version
https://manakai.github.io/spec-dom/validation-langs

Status of This Document

This document is a technical specification produced as part of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time.

The scope of this specification is limited to the products within the manakai project. It does not intended to be implemented by multiple parties, although nothing prevents it from implemented by other DOM implementations.

Table of contents

  1. 1 Introduction
    1. 1.1 Scope
    2. 1.2 History
  2. 2 Infrastructure
  3. 3 General rules of validation
    1. 3.1 Intepretation of old specifications
    2. 3.2 Validation of obsolete features
    3. 3.3 Unfamiliar features
    4. 3.4 Limited use features
  4. 4 Validation of superglobal attributes
    1. 4.1 Validation of XML attributes
  5. 5 Validation of feed namespaces
    1. 5.1 RSS1 elements
    2. 5.2 RSS2 elements
      1. 5.2.1 Media RSS elements
      2. 5.2.2 iTunes elements
    3. 5.3 Atom elements
      1. 5.3.1 GData elements
  6. 6 Validation of OGP
  7. 7 Validation of templates
    1. 7.1 Validation of XSLT
  8. 8 Data
  9. Author

1 Introduction

This section is non-normative.

This specification defines details of DOM tree validation not covered by other applicable specifications.

1.1 Scope

This specification's goals are to:

... for validators produced by the manakai project.

The obsvocab specification might be merged with this specification in due course.

1.2 History

This specification was originally published at <http://suika.suikawiki.org/www/markup/xml/validation-langs> as Handling of unknown namespaces in conformance checking since .

Some of texts in this specification were originally part of comments and documentations of the Perl modules developed by the the manakai project.

Earlier versions of this specification had defined validation mode, which was an attempt to define interaction of RDF/XML, RSS1, RSS2, and XSLT vocabularies with the rest of the platform in validation. The concept was abandoned because it introduced much complexity into the specification and validators.

Earlier versions of this specification had contained non-normative descriptions of how to validate HTML script and style elements. As definitions of these elements were slightly simplified in the HTML Standard, these descriptions were removed.

Earlier versions of this specification had defined RDF/XML integration, which is dropped due to lack of interest.

2 Infrastructure

This specification depends on the Infra Standard. The terms list, code point, concatenate, ASCII case-insensitive, HTML namespace, XML namespace, and XMLNS namespace are defined by the Infra Standard.

The term label is defined by Encoding Standard.

The terms valid URL string, absolute-URL-with-fragment string, and relative-URL-with-fragment string are defined by URL Standard.

The terms MIME type and valid MIME type is defined by MIME Sniffing Standard.

The term valid language tag is defined by BCP 47.

3 General rules of validation

For the purpose of this specification, user agents are conformance checkers, also known as validators.

User agents MUST implement the DOM Standard. Terms tree, child, parent, ancestor, descendant, node, node document, document, content type, document element, element, attribute list, attribute, value, namespace (of element or attribute), prefix (of element or attribute), local name (of element or attribute), comment (Comment), processing instruction (ProcessingInstruction), target, template content, child text content, shadow root, and shadow tree are defined by the DOM Standard.

User agents MUST implement the HTML Standard. Terms applicable specification, willful violation, document base URL, content model, inter-element whitespace, nothing, valid non-negative integer, valid floating-point number, valid global date and time string, valid e-mail address, parse a URL, HTML element, div, and parse error are defined by the HTML Standard.

An element or attribute is in no namespace if its namespace is null. A non-null namespace is a namespace that is not null.


Unless otherwise specified, an element MAY have child comments and processing instructions. In addition, unless otherwise specified, an element whose content model is not a text (with or without additional constraints) MAY have child inter-element whitespaces. They are ignored for the purpose of content model validation of the element.

The shadow root's content model is flow content.

3.1 Intepretation of old specifications

This subsection defines how to interpret old unmaintained specifications for the purpose of validation.

These are sometimes willful violations to relevant specifications.

When a specification is written without clear statements of requirements, elements and attributes in the namespace and RSS2 elements defined by the specification MUST NOT conflict with the descriptions in that specification. Elements and attributes in the namespace and RSS2 elements not defined by any specification MUST NOT be used. Deprecated features SHOULD NOT be used.

Features that are "reserved" cannot be used unless they are defined by later specification.

When a value has to match to a production rule, it MUST also conform to other requirements for that construct.

When an element's content or an attribute is defined to be one of the following constructs, the element's child text content or the attribute value MUST be conform to the corresponding consturct instead:

A text that SHOULD be interpreted as plain text

A text.

A char

A code point

A number
A number value
A count
A value in pixels
A value in bytes
A non-negative integer
A non-zero integer
A kilobits per second rate
A height
A width

A valid non-negative integer.

An RFC 1766 language tag
An RFC 3066 language tag
A BCP 47 language identifier

A valid language tag.

A URL (without referencing the latest URL Standard)
A url
A URL with a scheme defined by the IANA Registry of URI Schemes
A URI
A uri-reference
An RFC 3986 URI reference
An IRI
An IRI reference

A valid URL string.

If IRI and IRI reference are distinguished and it is identified as an IRI, it MUST also be an absolute-URL-with-fragment string.

To interpret a string, the rules to parse a URL MUST be used with the node document of the node in which the string appears.

The XML Base specification and the xml:base attribute are obsolete.

A relative URI reference

A valid URL string that is a relative-URL-with-fragment string.

A media type
A MIME media type
A standard MIME type

A valid MIME type.

A media type (MIME content type); the charset parameter should not be specified explicitly

A valid MIME type with no charset parameter.

A charset registered with IANA or start with X-

A label

An addr-spec
An e-mail address
An email address

A valid e-mail address.

An email address (in the RSS2 specifications)

A string in the RECOMMENDED format for e-mail addresses.

An id

Same as the id attribute in no namespace of HTML elements

HTML
HTML markup
An HTML 4.01 markup that SHOULD be such that it could validly appear directly within an HTML div element
A text suitable for presentation as HTML

An HTML fragment content.

When an element has to be an XHTML Modularization div element that SHOULD be suitable for handling as XHTML, it MUST be a div element in the HTML namespace.

When an element has HTML fragment content, setting its child text content to innerHTML of a new div element in the HTML namespace MUST NOT generate a parse error or create a non-conforming tree.

3.2 Validation of obsolete features

Validators are expected to detect obsolete features and show errors and alternative features, if known, to improve authoring experiences.

When a specification is identified as obsolete, the features, including but not limited to the elements and attributes, defined by the specification MUST NOT be used unless otherwise specified by another specification. User agents MAY ignore requirements in the specification. The namespaces defined by an obsolete specification are obsolete unless otherwise specified by another specification.

When a namespace is identified as obsolete, the elements and attributes in the namespace, as well as attributes in no namespace, MUST NOT be used unless otherwise specified by another specification.

The following namespaces are obsolete:

Though obsolete, some attributes in the http://www.w3.org/1999/xlink namespace are allowed to be specified for SVG elements.

These were once implemented by some Web user agents or used on the Web but no longer considered as part of the Web platform. Use of them could be authoring errors.

3.3 Unfamiliar features

There are many protocol or language features that are extensible. A user agent might or might not know how to validate a feature's instance. Therefore, the answer to a question "Is this instance conforming?" can be yes, no, or unfamiliar.

When a user agent is unfamiliar to a feature, it MUST report the validation result as unfamiliar.


A user agent either fully supports, partially supports, or does not support a non-null namespace.

A non-null namespace is fully supported if all of its elements and attributes are defined by applicable specifications and all of them are somewhat implemented.

The HTML Standard defines that "Authors MUST NOT use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications". Use of elements or attributes in a fully supported namespace is not allowed unless they are specified in applicable specifications.

A non-null namespace is partially supported if its elements and attributes are defined by applicable specifications and some of them are implemented.

A non-null namespace is not supported if its elements and attributes are not implemented. A user agent that implements RSS1 but does not implement RDF/XML other than as part of RSS1 support does support the RSS namespace fully or partially, but it does not support the RDF namespace.

Whether a namespace is fully supported, partially supported, or not supported, are orthogonal to its elements and attributes are unknown or not. A user agent might not support the validation of an attribute value of a fully supported namespace.


Unknown elements are following elements:

Unknown attributes are following attributes:

Unknown elements, attributes in no namespace of unknown elements, other than superglobal attributes, and unknown attributes are unfamiliar features.


An unknown element or unknown attribute MUST NOT be used anywhere except where they are explicitly allowed.

An unknown element MAY be used as the document element or as an orphan node. It MAY also be used where any kind of element is allowed.

An unknown element MAY have any attribute in no namespace or any unknown attribute.

In addition, any attribute allowed by an applicable specification is allowed to be specified for an unknown element.

An unknown element MAY have any kind of child.

An unknown attribute MAY have any value. An attribute in no namespace of an unknown element MAY have any value.

Validators are expected to report errors or warnings on unknown elements and attributes useful for authors.

This specification is not intended to override any other specification's requirements.

For a public Web document, non-standard element which is not defined by any applicable specification ought to be reported as an error.

In the following example, as the bookmark element in the http://mybookmark.example/ namespace is not defined by any applicable specification, this fragment is non-conforming:

<div xmlns="http://www.w3.org/1999/xhtml">
  <bookmark xmlns="http://mybookmark.example/">Hello</bookmark>
</div>

For XML data that is not expected to be directly shown to user (e.g. an XML data retrieved via XMLHttpRequest), use of null- or non-standard namespaces ought not to be an error.

For example, following document fragment should not be considered as non-conforming, nevertheless none of data and item elements and the name attribute is defined by any public standard:

<data>
  <item name="x1"/>
  <item name="x2"><p xmlns="http://www.w3.org/1999/xhtml">Hi!</p></item>
</data>

If the p element contained an item element in no namespace, it ought to be reported as an error, as no standard defines the item element in no namespace as phrasing content.

If there is an element in the http://www.w3.org/2000/svg/ namespace, an error or warning ought to be reported. It is likely an authoring error.


Unknown processing instructions are processing instructions whose target does not begin with xml- (ASCII case-insensitive). Unknown processing instructions are unfamiliar features.

Processing instructions whose target begins with xml- are reserved.

3.4 Limited use features

Several features are intended for limited use, i.e. not expected to be implemented universally. Such features still have legitimated usages, e.g. used internally during preparation of public documents, used as data formats of particular applications, served as data to be processed by public applications, and so on. Use of limited use features are not invalid. However, use of such features in public context could be problematic. At user option, user agents MAY report an error when an instance of limited use feature is detected.

An unprocessed internal vocabulary might be erroneously exposed within a public document when the authoring pipeline is partially broken. A validator with the option enabled can detect this problem.

Following features are intended for limited use:

4 Validation of superglobal attributes

Unless otherwise specified, attributes class, id, and slot in no namespace MUST conform to the requirements for the attribute with same name for an HTML element.

This does not mean use of them are conforming.

4.1 Validation of XML attributes

User agents MUST implement the XML specification and the XML Namespaces specification. The XML specification is Extensible Markup Language (XML) 1.0 and Errata in REC-xml-20081126. The xml:lang and xml:space attributes are defined by the XML specification. The XML Namespaces specification is Namespaces in XML 1.0 and Namespaces in XML 1.0 (Third Edition) Errata.

Since user agents interpret xml:lang and xml:space attributes defined by the XML specification according to the XML Namespaces specification, they are considered as lang and space attributes in the XML namespace, not xml:lang and xml:space attributes in no namespace.

Specifications XML Base and xml:id are obsolete.

Therefore use of xml:base and xml:id attributes are non-conforming.

For the purpose of validation of the base element in the HTML namespace, the attributes in the XMLNS namespace are not considered as taking URLs.

Base URLs are not applied to them.

Element and attributes whose prefix or local name begins with xml (ASCII case-insensitive) MUST NOT be considered as reserved as required by the XML Namespaces specification.

This is a willful violation to the XML Namespaces specification for consistency with the XML specification.

5 Validation of feed namespaces

If an element is unique property element and the element's parent is not null, the element's parent MUST NOT have another child element with same namespace and local name.

5.1 RSS1 elements

User agents MUST implement the RSS1 specification. The RSS1 specification is RDF Site Summary (RSS) 1.0. The RDF namespace is http://www.w3.org/1999/02/22-rdf-syntax-ns#. The RSS namespace is http://purl.org/rss/1.0/. The Dublin Core namespace is http://purl.org/dc/elements/1.1/.

User agents MUST implement the RSS1 content specification. The RSS1 content specification is RDF Site Summary 1.0 Modules: Content. The RSS content namespace is http://purl.org/rss/1.0/modules/content/. The RSS1 content specification is obsolete except for the encoded element in the RSS content namespace.

The encoded element in the RSS content namespace, which is only shown as a "draft" in the specification, is de facto part of RSS1, while other "formal" elements in that specification has not been used at all.


An element or attribute in the RDF namespace is either RSS1 RDF element or attribute or RDF/XML RDF element or attribute. An RSS1 RDF element or attribute MUST be validated against the requirements for RSS1. An RDF/XML RDF element or attribute MUST be validated against the requirements for RDF/XML. If the user agent does not support RDF/XML, RDF/XML RDF elements and attributes are unknown RDF elements and attributes.

An element is an RSS 1.0 rdf:RDF element if:

At user option, user agents can validate a document as an RSS1 document (ignoring its original context) by overriding its content type to application/rss+xml.

If an RDF element in the RDF namespace is an RSS 1.0 rdf:RDF element, it is an RSS1 RDF element.

If a Seq element in the RDF namespace is a child of an items element in the RSS namespace, it is an RSS1 RDF element.

If an li element in the RDF namespace is a child of a Seq element in the RDF namespace that is an RSS1 RDF element, it is an RSS1 RDF element.

If an about or resource attribute in the RSS namespace is specified for an RSS1 RDF element, it is an RSS1 RDF attribute. A user agent that does not support RDF/XML MAY treat other about and resource attributes in the RSS namespace as RSS1 RDF attributes.

All other elements and attributes in the RDF namespace are RDF/XML RDF elements and attributes.

Though the RSS1 specification references the RDF/XML specification, a user agent can implement the validation by just implementing the requirements in the RSS1 specification and this specification, ignoring the RDF/XML specification.


A channel or item element in the RSS namespace MAY have child unknown elements as long as all of the following conditions are met:

Children unknown elements of a channel or item element in the RSS namespace are unique property elements.


A description element in the RSS namespace whose parent is an item element in the RSS namespace has an HTML fragment content. An encoded element in the RSS content namespace has an HTML fragment content.

A channel element in the RSS namespace MAY have children link elements in the Atom namespace.

5.2 RSS2 elements

User agents MUST implement the RSS2 specifications. The RSS2 specifications are RSS 2.0 and Really Simple Syndication Best Practices Profile.

A document is an RSS2 document if its document element is an rss element in no namespace.

RSS 0.91, 0.92, 0.93, and 0.94 documents are RSS2 documents.

An element is an RSS2 element if it is in no namespace and its node document is an RSS2 document. An RSS2 channel element is an RSS2 element whose local name is channel. An RSS2 item element is an RSS2 element whose local name is item. RSS2 elements MUST be validated against the requirements for RSS2.


An RSS2 item element MAY have at most one child updated element in the Atom namespace.

These are unique property elements:

An RSS2 item element or an RSS2 channel element MAY have children unknown elements.

5.2.1 Media RSS elements

The Media RSS namespace is http://search.yahoo.com/mrss/ or http://search.yahoo.com/mrss. These are synonyms. The latter namespace is obsolete.

The namespace URL without slash has been used for historical reason. It is non-conforming.

These are unique property elements:

The average attribute value of a starRating element in the Media RSS namespace MUST be a valid floating-point number.

The max attribute value of a starRating element in the Media RSS namespace MUST be a valid non-negative integer.

The min attribute value of a starRating element in the Media RSS namespace MUST be a valid non-negative integer.

5.2.2 iTunes elements

The iTunes namespace is http://www.itunes.com/dtds/podcast-1.0.dtd or http://www.itunes.com/DTDs/Podcast-1.0.dtd. These are synonyms. The latter namespace is obsolete.

Children elements in the iTunes namespace of an RSS2 channel element or an RSS2 item element are unique property elements.

The content model of the image element in the iTunes namespace is nothing.

5.3 Atom elements

User agents MUST implement the Atom specification. The Atom specification is RFC 4287.

Atom and its extension specifications allow extensions such that almost everything is allowed, which is not useful for validators. This specification defines stricter restrictions for the purpose of validation.

The Atom namespace is http://www.w3.org/2005/Atom. The Atom 0.3 namespace is http://purl.org/atom/ns#. The Atom Threading namespace is http://purl.org/syndication/thread/1.0. The Atom Feed Paging and Archiving namespace is http://purl.org/syndication/history/1.0. Atom family namespaces are the Atom 0.3 namespace, the Atom namespace, the Atom Threading namespace, the Atom Feed Paging and Archiving namespace, http://www.w3.org/2007/app, and http://purl.org/atompub/tombstones/1.0. An Atom family element is an element in one of Atom family namespaces.

The terms Date construct and Person construct are defined by Atom 1.0 specification.


Elements and attributes MUST conform to the constraints expressed in the RELAX NG schema fragments in the applicable specifications.

For an Atom family element, an attribute or child that is not explicitly allowed by an applicable specification MUST NOT be used.

Atom extensible elements are following elements:

An Atom extensible element MAY have children unknown elements.

These are unique property elements:

A content element in the Atom 0.3 namespace with a type attribute whose value is text/html (ASCII case-insensitive) has an HTML fragment content.

need to define <atom:content> content validation when type is a MIME type


An entry element in the Atom namespace MAY have children group and thumbnail elements in the Media RSS namespace.

Elements in the Atom 0.3 namespace MUST NOT be used.

Atom 0.3 is obsolete.

5.3.1 GData elements

The GData namespace is http://schemas.google.com/g/2005.

A Person construct MAY have at most one child image element in the GData namespace.

The image element in the GData namespace MUST have a rel attribute whose value is http://schemas.google.com/g/2005#thumbnail.

The image element in the GData namespace MAY have a width attribute whose value is a valid non-negative integer.

The image element in the GData namespace MAY have a height attribute whose value is a valid non-negative integer.

The image element in the GData namespace MUST have a src attribute whose value is a valid URL string.

The content model of the image element in the GData namespace is nothing.

6 Validation of OGP

A meta element in the HTML namespace MAY have a property attribute. If the attribute is specified, the element MUST NOT have a name attribute or an attribute that cannot be used when a name attribute is specified. The content attribute MUST be specified if a property attribute is specified. An HTML meta element with property attribute is metadata content (it is not a phrasing content).

The value of a property attribute MUST be an OGP property name. A OGP property name is a property value defined by an applicable specification or a prefixed property value. A prefixed property value is a property prefix followed by a U+003A COLON character (:) followed by one or more characters. A property prefix is a string of one or more characters that is not a U+003A COLON character (:) and is not used by property value defined by an applicable specification as prefix.

A property value MUST NOT be used unless it is defined in the context it is used by an applicable specification.

Many property values are only defined for speciic og:type values.

If the property attribute value is og:type, the content attribute value MUST be a value allowed as an og:type value or a prefixed property value.

An example of applicable specifications is The Open Graph protocol.

7 Validation of templates

Templates embedded in a document are not rendered and are often incomplete until they are actually used as part of the document. As such, they are sometimes exepmted from the formal requirements of the specifications. However, whether templates are in error or not could be useful information for authors who want to ensure generated trees would not be broken because of poorly authored templates.

If a node is a template root, it MUST be validated in the template mode.

In the template mode, any violation to the requirements except for those of template specifications is marked as in template. User agents SHOULD render errors in template and the other errors in different manners. At user option, user agents MAY hide errors in template.

This means that errors in the node itself, its attributes, its descendants, its template contents, and its shadow roots are distinguished from errors not in the template mode.

In the template mode, no other template root is recognized.

Template specifications of a template root are specifications defining template root's language. If not specified, there is no template specification.

Template contents are template roots.

HTML elements with hidden attribute in no namespace are template roots.

7.1 Validation of XSLT

User agents supporting XSLT1 MUST implement the XSLT1 specifications. The XSLT1 specifications are DOM XPath and documents directly or indirectly referenced from it defining XPath1 and XSLT1, including XSLT Transformations (XSLT) and non-normative descriptions in the HTML Standard. The terms literal result element, template (of XSLT), attribute value template, extension element (of XSLT), and extension namespace (of XSLT), are defined by the XSLT1 specifications.

The XSLT namespace is http://www.w3.org/1999/XSL/Transform. An element is XSLT element if its namespace is the XSLT namespace.

A document is an XSLT stylesheet if at least one of the following conditions is true:

At user option, user agents can validate a document as a XSLT stylesheet (ignoring its original context) by overriding its content type to application/xslt+xml.

For the purpose of validation, any child of the template content of a template element in the HTML element MUST be treated as if it were a child of the element when its node document is an XSLT stylesheet.

Attributes in non-null namespace MUST NOT be specified for XSLT elements unless they are allowed by applicable specification.

For example, attributes in the XMLNS namespace and unknown attributes are allowed. Attributes in the Atom Threading namespace are not.

The value of the following attributes MUST be 1.0:

Elements other than XSLT elements MUST NOT be used as children of a stylesheet or transform element in the XSLT namespace unless they are allowed by applicable specifications.

The value of the method attribute in no namespace of an output element in XSLT namespace or the data-type attribute in no namespace of a sort element in XSLT namespace MUST NOT be a QName unless it represents a value allowed by an applicable specification.

The version attribute in no namespace SHOULD NOT be specified for an output element in XSLT namespace. If specified, its value MUST be 1.0.

Only known meaningful combinations of attributes are: <xsl:output method="xml">, <xsl:output method="xml" version="1.0">, <xsl:output method="html">, and <xsl:output method="text">.

The extension namespaces specified by the extension-element-prefixes attribute in no namespace or in the XSLT namespace MUST be one of XSLT extension namespace candidates. The XSLT extension namespace candidates are namespaces XSLT extension element candidates are belong to.

An element is XSLT extension element candidate if its semantics as an extension element is defined by its specification.

Unknown elements whose namespace is not null are XSLT extension element candidates.

Templates are template roots whose template specifications are the XSLT1 specifications and the specifications of the extension elements.

Various attributes in XSLT templates, including those of literal result elements, can contain attribute value templates, which makes validation complicated (or impossible). How to handle them is a quality-of-implementation issue.

8 Data

This section is non-normative.

The data-web-defs repository contains some machine-readable data for definitions in this specification, in the following file:

The tests-web repository contains validation test data in these directories:

Author

This document is written by Wakaba <wakaba@suikawiki.org> and is produced as part of the the manakai project.

Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.