The manakai project

Web::HTML::Tokenizer

An HTML and XML tokenizer

DESCRIPTION

THIS MODULE IS DEPRECATED. DON'T USE THIS MODULE FOR NEW APPLICATIONS.

The Web::HTML::Tokenizer module provides an implementation of HTML and XML tokenizer. Unlike its name, this module can be used for XML documents as well as HTML. It is not intended to be used directly from general-purpose applications; instead it is used as part of HTML or XML parser, such as Web::HTML::Parser and Web::XML::Parser.

The module is intended to be a conforming HTML tokenizer according to Web Applications 1.0 specification (though it is meaningless to discuess the conformance of the tokenizer standalone). By setting the XML flag, it can also tokenize XML documents in a way consistent with the HTML tokenization specification. You might consider it as an implementation of the XML5 tokenization algorithm as "patched" by later HTML5 development.

SEE ALSO

Web::HTML::Parser, Web::XML::Parser.

Web::HTML::InputStream.

SPECIFICATIONS

[HTML]

HTML Living Standard <http://www.whatwg.org/specs/web-apps/current-work/complete.html#tokenization>.

[XML]

XML 1.0 <http://www.w3.org/TR/xml/>.

XML 1.1 <http://www.w3.org/TR/xml11/>.

XML5. See <http://suika.suikawiki.org/~wakaba/wiki/sw/n/XML5> for references.

AUTHOR

Wakaba <wakaba@suikawiki.org>.

LICENSE

Copyright 2007-2014 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.