3 General rules of validation

For the purpose of this specification, user agents are conformance checkers, also known as validators.

User agents MUST implement the DOM Standard. Terms tree, child, parent, ancestor, descendant, node, node document, document, content type, document element, element, attribute list, attribute, value, namespace (of element or attribute), prefix (of element or attribute), local name (of element or attribute), comment (Comment), processing instruction (ProcessingInstruction), target, template content, child text content, shadow root, and shadow tree are defined by the DOM Standard.

User agents MUST implement the HTML Standard. Terms applicable specification, willful violation, document base URL, content model, inter-element whitespace, nothing, valid non-negative integer, valid floating-point number, valid global date and time string, valid e-mail address, parse a URL, HTML element, div, and parse error are defined by the HTML Standard.

An element or attribute is in no namespace if its namespace is null. A non-null namespace is a namespace that is not null.

Unless otherwise specified, an element MAY have child comments and processing instructions. In addition, unless otherwise specified, an element whose content model is not a text (with or without additional constraints) MAY have child inter-element whitespaces. They are ignored for the purpose of content model validation of the element.

The shadow root's content model is flow content.

3.1 Intepretation of old specifications

This subsection defines how to interpret old unmaintained specifications for the purpose of validation.

These are sometimes willful violations to relevant specifications.

When a specification is written without clear statements of requirements, elements and attributes in the namespace and RSS2 elements defined by the specification MUST NOT conflict with the descriptions in that specification. Elements and attributes in the namespace and RSS2 elements not defined by any specification MUST NOT be used. Deprecated features SHOULD NOT be used.

Features that are "reserved" cannot be used unless they are defined by later specification.

When a value has to match to a production rule, it MUST also conform to other requirements for that construct.

When an element's content or an attribute is defined to be one of the following constructs, the element's child text content or the attribute value MUST be conform to the corresponding consturct instead:

A text that SHOULD be interpreted as plain text

A text.

A char

A code point

A number

A number value

A count

A value in pixels

A value in bytes

A non-negative integer

A non-zero integer

A kilobits per second rate

A height

A width

A valid non-negative integer.

An RFC 1766 language tag

An RFC 3066 language tag

A BCP 47 language identifier

A valid language tag.

A URL (without referencing the latest URL Standard)

A url

A URL with a scheme defined by the IANA Registry of URI Schemes

A URI

A uri-reference

An RFC 3986 URI reference

An IRI

An IRI reference

A valid URL string.

If IRI and IRI reference are distinguished and it is identified as an IRI, it MUST also be an absolute-URL-with-fragment string.

To interpret a string, the rules to parse a URL MUST be used with the node document of the node in which the string appears.

The XML Base specification and the xml:base attribute are obsolete.

A relative URI reference

A valid URL string that is a relative-URL-with-fragment string.

A media type

A MIME media type

A standard MIME type

A valid MIME type.

A media type (MIME content type); the charset parameter should not be specified explicitly

A valid MIME type with no charset parameter.

A charset registered with IANA or start with X-

A label

An addr-spec

An e-mail address

An email address

A valid e-mail address.

An email address (in the RSS2 specifications)

A string in the RECOMMENDED format for e-mail addresses.

An id

Same as the id attribute in no namespace of HTML elements

HTML

HTML markup

An HTML 4.01 markup that SHOULD be such that it could validly appear directly within an HTML div element

A text suitable for presentation as HTML

An HTML fragment content.

When an element has to be an XHTML Modularization div element that SHOULD be suitable for handling as XHTML, it MUST be a div element in the HTML namespace.

When an element has HTML fragment content, setting its child text content to innerHTML of a new div element in the HTML namespace MUST NOT generate a parse error or create a non-conforming tree.

3.2 Validation of obsolete features

Validators are expected to detect obsolete features and show errors and alternative features, if known, to improve authoring experiences.

When a specification is identified as obsolete, the features, including but not limited to the elements and attributes, defined by the specification MUST NOT be used unless otherwise specified by another specification. User agents MAY ignore requirements in the specification. The namespaces defined by an obsolete specification are obsolete unless otherwise specified by another specification.

When a namespace is identified as obsolete, the elements and attributes in the namespace, as well as attributes in no namespace, MUST NOT be used unless otherwise specified by another specification.

The following namespaces are obsolete:

http://www.w3.org/TR/WD-xsl
http://www.w3.org/1999/xlink
http://www.w3.org/2001/xml-events
http://www.w3.org/2002/06/xhtml2
http://www.w3.org/2002/06/xhtml2/
http://www.w3.org/2005/07/aaa

Though obsolete, some attributes in the http://www.w3.org/1999/xlink namespace are allowed to be specified for SVG elements.

These were once implemented by some Web user agents or used on the Web but no longer considered as part of the Web platform. Use of them could be authoring errors.

3.3 Unfamiliar features

There are many protocol or language features that are extensible. A user agent might or might not know how to validate a feature's instance. Therefore, the answer to a question "Is this instance conforming?" can be yes, no, or unfamiliar.

When a user agent is unfamiliar to a feature, it MUST report the validation result as unfamiliar.

A user agent either fully supports, partially supports, or does not support a non-null namespace.

A non-null namespace is fully supported if all of its elements and attributes are defined by applicable specifications and all of them are somewhat implemented.

The HTML Standard defines that "Authors MUST NOT use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications". Use of elements or attributes in a fully supported namespace is not allowed unless they are specified in applicable specifications.

A non-null namespace is partially supported if its elements and attributes are defined by applicable specifications and some of them are implemented.

A non-null namespace is not supported if its elements and attributes are not implemented. A user agent that implements RSS1 but does not implement RDF/XML other than as part of RSS1 support does support the RSS namespace fully or partially, but it does not support the RDF namespace.

Whether a namespace is fully supported, partially supported, or not supported, are orthogonal to its elements and attributes are unknown or not. A user agent might not support the validation of an attribute value of a fully supported namespace.

Unknown elements are following elements:

Elements in unsupported namespaces
Elements in partially supported namespaces not implemented by the user agent
Unknown RDF elements
Elements in no namespace except for RSS2 elements

Unknown attributes are following attributes:

Attributes in unsupported namespaces
Attributes in partially supported namespaces not implemented by the user agent
Unknown RDF attributes

Unknown elements, attributes in no namespace of unknown elements, other than superglobal attributes, and unknown attributes are unfamiliar features.

An unknown element or unknown attribute MUST NOT be used anywhere except where they are explicitly allowed.

An unknown element MAY be used as the document element or as an orphan node. It MAY also be used where any kind of element is allowed.

An unknown element MAY have any attribute in no namespace or any unknown attribute.

In addition, any attribute allowed by an applicable specification is allowed to be specified for an unknown element.

An unknown element MAY have any kind of child.

An unknown attribute MAY have any value. An attribute in no namespace of an unknown element MAY have any value.

Validators are expected to report errors or warnings on unknown elements and attributes useful for authors.

This specification is not intended to override any other specification's requirements.

For a public Web document, non-standard element which is not defined by any applicable specification ought to be reported as an error.

In the following example, as the bookmark element in the http://mybookmark.example/ namespace is not defined by any applicable specification, this fragment is non-conforming:

<div xmlns="http://www.w3.org/1999/xhtml">
  <bookmark xmlns="http://mybookmark.example/">Hello</bookmark>
</div>

For XML data that is not expected to be directly shown to user (e.g. an XML data retrieved via XMLHttpRequest), use of null- or non-standard namespaces ought not to be an error.

For example, following document fragment should not be considered as non-conforming, nevertheless none of data and item elements and the name attribute is defined by any public standard:

<data>
  <item name="x1"/>
  <item name="x2"><p xmlns="http://www.w3.org/1999/xhtml">Hi!</p></item>
</data>

If the p element contained an item element in no namespace, it ought to be reported as an error, as no standard defines the item element in no namespace as phrasing content.

If there is an element in the http://www.w3.org/2000/svg/ namespace, an error or warning ought to be reported. It is likely an authoring error.

Unknown processing instructions are processing instructions whose target does not begin with xml- (ASCII case-insensitive). Unknown processing instructions are unfamiliar features.

Processing instructions whose target begins with xml- are reserved.

3.4 Limited use features

Several features are intended for limited use, i.e. not expected to be implemented universally. Such features still have legitimated usages, e.g. used internally during preparation of public documents, used as data formats of particular applications, served as data to be processed by public applications, and so on. Use of limited use features are not invalid. However, use of such features in public context could be problematic. At user option, user agents MAY report an error when an instance of limited use feature is detected.

An unprocessed internal vocabulary might be erroneously exposed within a public document when the authoring pipeline is partially broken. A validator with the option enabled can detect this problem.

Following features are intended for limited use:

Unknown elements
Unknown attribute
RDF/XML RDF elements
RDF/XML RDF atributes
Elements title, description, publisher, contributor, type, format, identifier, source, language, coverage, or rights in the Dublin Core namespace
RSS2 elements whose local name is rating, skipDays, day, skipHours, hour, textInput, or ttl
Elements and attributes in the namespaces http://www.w3.org/2000/09/xmldsig#, http://purl.org/rss/1.0/modules/slash/, Atom Threading namespace, http://purl.org/syndication/history/1.0, http://purl.org/atompub/tombstones/1.0, http://www.w3.org/2007/app, http://schemas.google.com/g/2005, and http://www.hatena.ne.jp/info/xmlns#
XSLT elements
MIME types registered with IANA whose registration states that its intended usage is limited use
Private-use subtags
Language tag i-default
Private use language codes, country codes, and script codes
Private use code points
Other features identified by the user agent

If a user agent supports WebDAV, the elements in the namespaces defined by the WebDAV specifications would be marked as limited use, as they are not supported by general-purpose Web user agents and are not suitable for public Web documents.

5 Validation of feed namespaces

If an element is unique property element and the element's parent is not null, the element's parent MUST NOT have another child element with same namespace and local name.

5.1 RSS1 elements

User agents MUST implement the RSS1 specification. The RSS1 specification is RDF Site Summary (RSS) 1.0. The RDF namespace is http://www.w3.org/1999/02/22-rdf-syntax-ns#. The RSS namespace is http://purl.org/rss/1.0/. The Dublin Core namespace is http://purl.org/dc/elements/1.1/.

User agents MUST implement the RSS1 content specification. The RSS1 content specification is RDF Site Summary 1.0 Modules: Content. The RSS content namespace is http://purl.org/rss/1.0/modules/content/. The RSS1 content specification is obsolete except for the encoded element in the RSS content namespace.

The encoded element in the RSS content namespace, which is only shown as a "draft" in the specification, is de facto part of RSS1, while other "formal" elements in that specification has not been used at all.

An element or attribute in the RDF namespace is either RSS1 RDF element or attribute or RDF/XML RDF element or attribute. An RSS1 RDF element or attribute MUST be validated against the requirements for RSS1. An RDF/XML RDF element or attribute MUST be validated against the requirements for RDF/XML. If the user agent does not support RDF/XML, RDF/XML RDF elements and attributes are unknown RDF elements and attributes.

An element is an RSS 1.0 rdf:RDF element if:

Its namespace is the RDF namespace,
Its local name is RDF, and
Either:
- Its parent is a document whose content type is application/rss+xml, or
- Its prefix is rdf, it has an xmlns attribute in the XMLNS namespace whose value is the RSS namespace, and it has an rdf attribute in the XMLNS namespace whose value is the RDF namespace.

At user option, user agents can validate a document as an RSS1 document (ignoring its original context) by overriding its content type to application/rss+xml.

If an RDF element in the RDF namespace is an RSS 1.0 rdf:RDF element, it is an RSS1 RDF element.

If a Seq element in the RDF namespace is a child of an items element in the RSS namespace, it is an RSS1 RDF element.

If an li element in the RDF namespace is a child of a Seq element in the RDF namespace that is an RSS1 RDF element, it is an RSS1 RDF element.

If an about or resource attribute in the RSS namespace is specified for an RSS1 RDF element, it is an RSS1 RDF attribute. A user agent that does not support RDF/XML MAY treat other about and resource attributes in the RSS namespace as RSS1 RDF attributes.

All other elements and attributes in the RDF namespace are RDF/XML RDF elements and attributes.

Though the RSS1 specification references the RDF/XML specification, a user agent can implement the validation by just implementing the requirements in the RSS1 specification and this specification, ignoring the RDF/XML specification.

A channel or item element in the RSS namespace MAY have child unknown elements as long as all of the following conditions are met:

Its namespace is a non-null namespace.
Its namespace is not the RDF namespace.
If its prefix is null, its local name does not start with xml (ASCII case-insensitive).
Its prefix does not start with xml (ASCII case-insensitive).

Children unknown elements of a channel or item element in the RSS namespace are unique property elements.

A description element in the RSS namespace whose parent is an item element in the RSS namespace has an HTML fragment content. An encoded element in the RSS content namespace has an HTML fragment content.

A channel element in the RSS namespace MAY have children link elements in the Atom namespace.

5.2 RSS2 elements

User agents MUST implement the RSS2 specifications. The RSS2 specifications are RSS 2.0 and Really Simple Syndication Best Practices Profile.

A document is an RSS2 document if its document element is an rss element in no namespace.

RSS 0.91, 0.92, 0.93, and 0.94 documents are RSS2 documents.

An element is an RSS2 element if it is in no namespace and its node document is an RSS2 document. An RSS2 channel element is an RSS2 element whose local name is channel. An RSS2 item element is an RSS2 element whose local name is item. RSS2 elements MUST be validated against the requirements for RSS2.

An RSS2 item element MAY have at most one child updated element in the Atom namespace.

These are unique property elements:

Children creator elements in the Dublin Core namespace of an RSS2 channel element or an RSS2 item element.
Children encoded elements in the RSS content namespace of an RSS2 item element.
Children comments elements in the namespace http://purl.org/rss/1.0/modules/slash/ of an RSS2 item element.

An RSS2 item element or an RSS2 channel element MAY have children unknown elements.

5.2.1 Media RSS elements

This section applies to user agents implementing Media RSS.

The Media RSS namespace is http://search.yahoo.com/mrss/ or http://search.yahoo.com/mrss. These are synonyms. The latter namespace is obsolete.

The namespace URL without slash has been used for historical reason. It is non-conforming.

These are unique property elements:

Children starRating elements in the Media RSS namespace of a community element in the Media RSS namespace.
Children statistics elements in the Media RSS namespace of a community element in the Media RSS namespace.
Children tags elements in the Media RSS namespace of a community element in the Media RSS namespace.

The average attribute value of a starRating element in the Media RSS namespace MUST be a valid floating-point number.

The max attribute value of a starRating element in the Media RSS namespace MUST be a valid non-negative integer.

The min attribute value of a starRating element in the Media RSS namespace MUST be a valid non-negative integer.

5.2.2 iTunes elements

This section applies to user agents implementing Podcast.

The iTunes namespace is http://www.itunes.com/dtds/podcast-1.0.dtd or http://www.itunes.com/DTDs/Podcast-1.0.dtd. These are synonyms. The latter namespace is obsolete.

Children elements in the iTunes namespace of an RSS2 channel element or an RSS2 item element are unique property elements.

The content model of the image element in the iTunes namespace is nothing.

5.3 Atom elements

User agents MUST implement the Atom specification. The Atom specification is RFC 4287.

Atom and its extension specifications allow extensions such that almost everything is allowed, which is not useful for validators. This specification defines stricter restrictions for the purpose of validation.

The Atom namespace is http://www.w3.org/2005/Atom. The Atom 0.3 namespace is http://purl.org/atom/ns#. The Atom Threading namespace is http://purl.org/syndication/thread/1.0. The Atom Feed Paging and Archiving namespace is http://purl.org/syndication/history/1.0. Atom family namespaces are the Atom 0.3 namespace, the Atom namespace, the Atom Threading namespace, the Atom Feed Paging and Archiving namespace, http://www.w3.org/2007/app, and http://purl.org/atompub/tombstones/1.0. An Atom family element is an element in one of Atom family namespaces.

The terms Date construct and Person construct are defined by Atom 1.0 specification.

Elements and attributes MUST conform to the constraints expressed in the RELAX NG schema fragments in the applicable specifications.

For an Atom family element, an attribute or child that is not explicitly allowed by an applicable specification MUST NOT be used.

Atom extensible elements are following elements:

Atom family elements which allow extension elements, extensionElement, or extensionSansTitleElement in content
deleted-entry elements in namespace http://purl.org/atompub/tombstones/1.0
feed elements in the Atom 0.3 namespace
entry elements in the Atom 0.3 namespace

An Atom extensible element MAY have children unknown elements.

These are unique property elements:

Children complete elements in the Atom Feed Paging and Archiving namespace of a feed element in the Atom namespace.
Children archive elements in the Atom Feed Paging and Archiving namespace of a feed element in the Atom namespace.

A content element in the Atom 0.3 namespace with a type attribute whose value is text/html (ASCII case-insensitive) has an HTML fragment content.

need to define <atom:content> content validation when type is a MIME type

An entry element in the Atom namespace MAY have children group and thumbnail elements in the Media RSS namespace.

Elements in the Atom 0.3 namespace MUST NOT be used.

Atom 0.3 is obsolete.

5.3.1 GData elements

This section applies to user agents implementing GData image element if there is no other specification specifying it.

The GData namespace is http://schemas.google.com/g/2005.

A Person construct MAY have at most one child image element in the GData namespace.

The image element in the GData namespace MUST have a rel attribute whose value is http://schemas.google.com/g/2005#thumbnail.

The image element in the GData namespace MAY have a width attribute whose value is a valid non-negative integer.

The image element in the GData namespace MAY have a height attribute whose value is a valid non-negative integer.

The image element in the GData namespace MUST have a src attribute whose value is a valid URL string.

The content model of the image element in the GData namespace is nothing.

7 Validation of templates

Templates embedded in a document are not rendered and are often incomplete until they are actually used as part of the document. As such, they are sometimes exepmted from the formal requirements of the specifications. However, whether templates are in error or not could be useful information for authors who want to ensure generated trees would not be broken because of poorly authored templates.

If a node is a template root, it MUST be validated in the template mode.

In the template mode, any violation to the requirements except for those of template specifications is marked as in template. User agents SHOULD render errors in template and the other errors in different manners. At user option, user agents MAY hide errors in template.

This means that errors in the node itself, its attributes, its descendants, its template contents, and its shadow roots are distinguished from errors not in the template mode.

In the template mode, no other template root is recognized.

Template specifications of a template root are specifications defining template root's language. If not specified, there is no template specification.

Template contents are template roots.

HTML elements with hidden attribute in no namespace are template roots.

7.1 Validation of XSLT

This section applies to user agents implementing XSLT1.

User agents supporting XSLT1 MUST implement the XSLT1 specifications. The XSLT1 specifications are DOM XPath and documents directly or indirectly referenced from it defining XPath1 and XSLT1, including XSLT Transformations (XSLT) and non-normative descriptions in the HTML Standard. The terms literal result element, template (of XSLT), attribute value template, extension element (of XSLT), and extension namespace (of XSLT), are defined by the XSLT1 specifications.

The XSLT namespace is http://www.w3.org/1999/XSL/Transform. An element is XSLT element if its namespace is the XSLT namespace.

A document is an XSLT stylesheet if at least one of the following conditions is true:

The document has a document element which is a stylesheet element in the XSLT namespace,
The document has a document element which is a transform element in the XSLT namespace,
The document has a document element with the version attribute in the XSLT namespace, or
The document's content type is application/xslt+xml or text/xsl,

At user option, user agents can validate a document as a XSLT stylesheet (ignoring its original context) by overriding its content type to application/xslt+xml.

For the purpose of validation, any child of the template content of a template element in the HTML element MUST be treated as if it were a child of the element when its node document is an XSLT stylesheet.

Attributes in non-null namespace MUST NOT be specified for XSLT elements unless they are allowed by applicable specification.

For example, attributes in the XMLNS namespace and unknown attributes are allowed. Attributes in the Atom Threading namespace are not.

The value of the following attributes MUST be 1.0:

version attribute in no namespace of stylesheet elements in the XSLT namespace,
version attribute in no namespace of transform elements in the XSLT namespace, and
version attribute in the XSLT namespace.

Elements other than XSLT elements MUST NOT be used as children of a stylesheet or transform element in the XSLT namespace unless they are allowed by applicable specifications.

The value of the method attribute in no namespace of an output element in XSLT namespace or the data-type attribute in no namespace of a sort element in XSLT namespace MUST NOT be a QName unless it represents a value allowed by an applicable specification.

The version attribute in no namespace SHOULD NOT be specified for an output element in XSLT namespace. If specified, its value MUST be 1.0.

Only known meaningful combinations of attributes are: <xsl:output method="xml">, <xsl:output method="xml" version="1.0">, <xsl:output method="html">, and <xsl:output method="text">.

The extension namespaces specified by the extension-element-prefixes attribute in no namespace or in the XSLT namespace MUST be one of XSLT extension namespace candidates. The XSLT extension namespace candidates are namespaces XSLT extension element candidates are belong to.

An element is XSLT extension element candidate if its semantics as an extension element is defined by its specification.

Unknown elements whose namespace is not null are XSLT extension element candidates.

The null namespace, the XML namespace, the XMLNS namespace, and the XSLT namespace are not in the XSLT extension namespace candidates.

Templates are template roots whose template specifications are the XSLT1 specifications and the specifications of the extension elements.

Various attributes in XSLT templates, including those of literal result elements, can contain attribute value templates, which makes validation complicated (or impossible). How to handle them is a quality-of-implementation issue.

DOM Tree Validation

The manakai project, 6 May 2018

Status of This Document

Table of contents

1 Introduction

1.1 Scope

1.2 History

2 Infrastructure

3 General rules of validation

3.1 Intepretation of old specifications

3.2 Validation of obsolete features

3.3 Unfamiliar features

3.4 Limited use features

4 Validation of superglobal attributes

4.1 Validation of XML attributes

5 Validation of feed namespaces

5.1 RSS1 elements

5.2 RSS2 elements

5.2.1 Media RSS elements

5.2.2 iTunes elements

5.3 Atom elements

5.3.1 GData elements

6 Validation of OGP

7 Validation of templates

7.1 Validation of XSLT

8 Data

Author