The manakai project

Web::CSS::Selectors::Parser

A Selectors parser

SYNOPSIS

  use Web::CSS::Selectors::Parser;
  my $parser = Web::CSS::Selectors::Parser->new;
  $parsed_selectors = $parser->parse_char_string ($selectors);

DESCRIPTION

The Web::CSS::Selectors::Parser is a parser for Selectors, the element pattern language used in CSS. It parses a Selectors string into parsed data structure, if the input is valid, or reports a parse error, otherwise. In addition, it provides a method to compute the specificity of a parsed selector.

METHODS

$parser = Web::CSS::Selectors::Parser->new

Creates a new instance of the Selectors parser.

$parser->context ($context)
$context = $parser->context

Return or specify the Web::CSS::Context object used to resolve namespaces in Selectors.

$code = $parser->onerror
$parser->onerror ($code)

Return or specify the code reference to which any errors and warnings during the parsing is reported. The code would receive the following name-value pairs:

type (string, always specified)

A short string describing the kind of the error. Descriptions of error types are available at <http://suika.suikawiki.org/gate/2007/html/error-description#{type}>, where {type} is an error type string.

For the list of error types, see <http://suika.suikawiki.org/gate/2007/html/error-description>.

level (string, always specified)

A character representing the level or severity of the error, which is one of the following characters: m (violation to a MUST-level requirement), s (violation to a SHOULD-level requirement), w (a warning), and i (an informational notification).

token (always specified)

A Web::CSS::Tokenizer token where the error is detected.

uri (a reference to string)

The URL in which the input selectors string is found.

value (string, possibly missing)

A part of the input, in which an error is detected.

$parsed = $parser->parse_char_string_as_selectors ($selectors)

Parses a character string. If it is a valid list of selectors, the method returns the parsed list of selectors data structure. Otherwise, it returns undef.

$specificity = $parser->get_selector_specificity ($parsed_selector)

XXX

Returns the specificity of a parsed selector data structure. Note that the input has to be a selector, not a group of selectors.

The return value is an array reference with four values: The style attribute flag (always 0), a, b, and c.

DATA STRUCTURES

This section describes the "list of selectors" data structure.

A list of selectors

An array reference, which contains one or more selector data structures. They corresponds to selectors in the original group of selectors string, in order.

A selector

A selector is represented as an array reference, which contains pairs of a combinator constant and a sequence of simple selector data structure. They corresponds to sequences of simple selector and combinators appears in the original selector string, in order. Note that the first (index 0) item is always the descendant combinator constant.

The constants below represent the types of combinators.

DESCENDANT_COMBINATOR

A descendant combinator.

CHILD_COMBINATOR

A child combinator.

ADJACENT_SIBLING_COMBINATOR

An adjacent sibling combinator.

GENERAL_SIBLING_COMBINATOR

A general sibling combinator.

A sequence of simple selectors

A sequence of simple selector is represented as an array reference, which contains simple selector data strucutures. They corresponds to simple selectors in the original sequence of simple selectors string, in order.

A simple selector

A simple selector is represented as an array reference whose first (index 0) item is the type of simple selector and the following items are arguments to the simple selector.

The constants below represent the types of simple selectors (or parts of simple selectors).

ELEMENT_SELECTOR

The "element selector" simple selector data structure takes the following form:

  [ELEMENT_SELECTOR, $nsurl, $local_name, $prefix, $wc_prefix, $wc_type]

The item with index 1 is the namespace URL of the selector. If it is undef, any namespace (including the null namespace) matches. If it is the empty string, only the null namespace matches. Otherwise, the specified namespace URL is compared literally with the target element's (non-null) namespace URL.

The item with index 2 is the local name of the selector. If it is undef, any local name matches. Otherwise, the specified local name is compared literally with the target element's local name (In HTML, it might be compared ASCII case-insensitively, however).

The item with index 3 is the namespace prefix of the selector. If it is the empty string, the selector has no prefix (and separator '|'). Otherwise, if it is non-undef value, it is the namespace prefix. Otherwise, the value is undef and the item has no effect (Either the null namespace notation ('|' without prefix) or wildcard prefix ('*|') is used).

The item with index 4 is the namespace wildcard flag of the selector. It is a boolean value representing whether the wildcard prefix ('*|') is explicitly used or not.

The item with index 5 is the local name wildcard flag of the selector. It is a boolean value representing whether the type selector is explicitly used or not.

A sequence of simple selectors always contains a simple selector whose type is ELEMENT_SELECTOR as its first component. There are following patterns of "element selector" simple selector:

  [ELEMENT_SELECTOR, undef, 'a'  , ''   , 0, 0]   a
  [ELEMENT_SELECTOR, undef, undef, ''   , 0, 1]   *
  [ELEMENT_SELECTOR, ''   , 'a'  , undef, 0, 0]  |a
  [ELEMENT_SELECTOR, ''   , undef, undef, 0, 1]  |*
  [ELEMENT_SELECTOR, undef, 'a'  , undef, 1, 0] *|a
  [ELEMENT_SELECTOR, undef, undef, undef, 1, 1] *|*
  [ELEMENT_SELECTOR, undef, undef, ''   , 0, 0] .b (= *.b)

  @namespace '';
  [ELEMENT_SELECTOR, ''   , 'a'  , ''   , 0, 0]   a
  [ELEMENT_SELECTOR, ''   , undef, ''   , 0, 1]   *
  [ELEMENT_SELECTOR, ''   , 'a'  , undef, 0, 0]  |a
  [ELEMENT_SELECTOR, ''   , undef, undef, 0, 1]  |*
  [ELEMENT_SELECTOR, undef, 'a'  , undef, 1, 0] *|a
  [ELEMENT_SELECTOR, undef, undef, undef, 1, 1] *|*
  [ELEMENT_SELECTOR, ''   , undef, ''   , 0, 0] .b (= *.b)

  @namespace 'ns';
  [ELEMENT_SELECTOR, 'ns' , 'a'  , ''   , 0, 0]   a
  [ELEMENT_SELECTOR, 'ns' , undef, ''   , 0, 1]   *
  [ELEMENT_SELECTOR, ''   , 'a'  , undef, 0, 0]  |a
  [ELEMENT_SELECTOR, ''   , undef, undef, 0, 1]  |*
  [ELEMENT_SELECTOR, undef, 'a'  , undef, 1, 0] *|a
  [ELEMENT_SELECTOR, undef, undef, undef, 1, 1] *|*
  [ELEMENT_SELECTOR, 'ns' , undef, ''   , 0, 0] .b (= *.b)

  @namespace p '';
  [ELEMENT_SELECTOR, ''   , 'a'  , 'p'  , 0, 0] p|a
  [ELEMENT_SELECTOR, ''   , undef, 'p'  , 0, 1] p|*

  @namespace p 'ns';
  [ELEMENT_SELECTOR, 'ns' , 'a'  , 'p'  , 0, 0] p|a
  [ELEMENT_SELECTOR, 'ns' , undef, 'p'  , 0, 1] p|*

  In :not() or :match()
  [ELEMENT_SELECTOR, undef, undef, ''   , 0, 0] .b (= *|*.b)
ID_SELECTOR

An ID selector. The first argument (item of index 1) is the ID.

CLASS_SELECTOR

A class selector. The first argument (item of index 1) is the class.

PSEUDO_CLASS_SELECTOR

A pseudo-class selector. The first argument (item of index 1) is the pseudo-class name in lowercase. If the pseudo-class takes a string or identifier argument (e.g. :lang() or :contains()), the second argument (item of index 2) is the argument (with no case folding). Otherwise, if the pseudo-class takes a an+b argument (e.g. :nth-child()), the second argument (item of index 2) represents the a value and the third argument (item of index 3) represents the b value (Even an incomplete argument is normalized to this form). If the pseudo-class takes a list of selectors (e.g. :not()), the item with index 2 is the list of selectors data structure, representing the selectors within the functional notation.

PSEUDO_ELEMENT_SELECTOR

A pseudo-element specification. The first argument (item of index 1) is the pseudo-element name in lowercase. If the pseudo-element takes a list of selectors (e.g. ::cue()), the item with index 2 is the list of selectors data structure, representing the selectors within the functional notation.

ATTRIBUTE_SELECTOR

An attribute selector. The first argument (item of index 1) is the attribute name. The second argument (item of index 2) is the type of matching. The third argument (item of index 3) depends on the type of matching. The fourth argument (item of index 4) is the namespace prefix, if exists and is not the empty string, or undef otherwise.

The constants below represent the types of matches used in attribute selectors.

EXISTS_MATCH

Match by the existence of an attribute. The third argument (item of index 3) is undef.

EQUALS_MATCH

Exact match. The third argument (item of index 3) is the expected value.

INCLUDES_MATCH

Includes match (typically used for class attributes). The third argument (item of index 3) is the expected value.

DASH_MATCH

Dash match (typically used for language tag attributes). The third argument (item of index 3) is the expected value.

PREFIX_MATCH

Prefix match. The third argument (item of index 3) is the expected value.

SUFFIX_MATCH

Suffix match. The third argument (item of index 3) is the expected value.

SUBSTRING_MATCH

Substring match. The third argument (item of index 3) is the expected value.

The constants mentioned in this section can be exported by useing the module:

  use Web::CSS::Selectors::Parser;

SPECIFICATIONS

Selectors Level 4 <http://dev.w3.org/csswg/selectors4/>.

CSSOM <http://dev.w3.org/csswg/cssom/>.

The CSS syntax <http://www.w3.org/TR/CSS21/syndata.html>.

The style attribute specificity <http://www.w3.org/TR/CSS21/cascade.html#specificity>.

manakai Selectors Extensions <http://suika.suikawiki.org/gate/2005/sw/manakai/Selectors%20Extensions>.

Supported standards - Selectors <http://suika.suikawiki.org/gate/2007/html/standards#selectors>.

SEE ALSO

Web::CSS::Selectors::API, Web::CSS::Selectors::Serializer.

AUTHOR

Wakaba <wakaba@suikawiki.org>.

LICENSE

Copyright 2007-2013 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.