The manakai project

Web::RDF::XML::Parser

An RDF/XML parser

SYNOPSIS

  use Web::RDF::XML::Parser;
  $rdf = Web::RDF::XML::Parser->new;
  $rdf->ontriple (sub {
    push @result, {@_};
  });
  $rdf->convert_document ($doc);

DESCRIPTION

The Web::RDF::XML::Parser module is an implementation of RDF/XML. Using this module, RDF triples embedded within RDF/XML document or document fragment can be extracted.

The RDF/XML format is no longer widely used. Though this module is still maintained as part of the manakai project, use of it is not recommended.

This module is unsuitable for processing RSS 1.0 documents. Use Web::Feed::Parser instead.

METHODS

Following methods are available:

$rdf = Web::RDF::XML::Parser->new

Create an RDF/XML parser.

$rdf->convert_document ($doc)

Extract the triples from a document. The argument must be a DOM Document (e.g. a Web::DOM::Document object). Extracted triples are reported through the ontriple callback.

$rdf->convert_rdf_element ($doc)

Extract the triples from an element. The argument must be a DOM Element containing the triples, e.g. an rdf:RDF element. Extracted triples are reported through the ontriple callback.

$rdf->ontriple ($code)
$code = $rdf->ontriple

Get or set the callback function which is invoked for each triple extracted from the document.

The callback is invoked with following name/value pairs as arguments: subject, predicate, object, and node. The callback is not expected to throw any exception. Values subject, predicate, object are parsed term data structures (see Web::RDF::Checker). The node from which the triple is extracted is given as node.

$rdf->onbnodeid ($code)
$code = $rdf->onbnodeid

Get or set the code reference that is invoked whenever a blank node identifier is to be constructed.

The code is invoked with an argument, which is used within the module to identify a blank node. The code can return the argument as is, or it can return a modified copy of the argument. Anyway, the returned value is used as the blank node identifier. The code must return the same value for the same argument. The code must return different values for different arguments. The code is not expected to throw any exception.

This hook is useful when a document contains multiple RDF fragment such that blank nodes within them have to be distinguished.

The value should not be set while the parser is running. If the value is changed, the result is undefined.

$code = $rdf->onerror
$rdf->onerror ($code)

Get or set the error handler for the parser. Any parse error, as well as warning and additional processing information, is reported to the handler. See <https://github.com/manakai/data-errors/blob/master/doc/onerror.txt> for details of error handling.

The value should not be set while the parser is running. If the value is changed, the result is undefined.

$code = $rdf->onnonrdfnode
$rdf->onnonrdfnode ($code)

Get or set the code reference that is invoked whenever a non-RDF node is detected. Note that use of such a node in an RDF/XML fragment is non-conforming. This hook is intended for injecting validation codes (e.g. by Web::HTML::Validator). Note that the node can be a misplaced rdf:RDF element, for example.

The code is invoked with an argument, which is the node in question. The code is expected not to throw any exception. The value should not be set while the parser is running. If the value is changed, the result is undefined.

$code = $rdf->onattr
$rdf->onattr ($code)

Get or set the code reference that is invoked whenever an attribute is encounted by the parser. This hook is intended for injecting validation codes (e.g. by Web::HTML::Validator).

The code is invoked with two arguments: the node in question and the type of the attribute, which is one of followings:

  common   Normal attributes (e.g. xml:lang="" and xmlns="")
  url      RDF/XML attributes whose value is a URL
  rdf-id   RDF/XML attributes whose value is an rdf-id (NCName)
  string   RDF/XML attributes whose value is a string
  misc     Other RDF/XML attributes

The code is expected not to throw any exception. The value should not be set while the parser is running. If the value is changed, the result is undefined.

ERROR HANDLING

This module extracts RDF triples from RDF/XML fragment using the algorithm described in the RDF/XML specification. When the input does not conform to the grammer, it try to recover from the error by most "natural" way; it might or might not report additional triples depending on how the input is non-conforming.

In most cases the input is non-conforming, the module reports one or more errors through the onerror handler. To detect all the conformance errors, you have to use a conformance checker (e.g. Web::HTML::Validator) that invokes this module with appropriate hooks and postprocessors.

DEPENDENCY

Perl 5.8 or later is required.

This module requires the Web::URL::Canonicalize module in the perl-web-url repository <https://github.com/manakai/perl-web-url>.

In addition, it expects DOM objects (e.g. Web::DOM::Document and Web::DOM::Element from <https://github.com/manakai/perl-web-dom>) as input, although there is no direct dependency.

SPECIFICATIONS

RDFXML

RDF 1.1 XML Syntax <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-xml/index.html>.

XMLBASE

XML Base <https://www.w3.org/TR/xmlbase/>.

XML Base Specification Errata <https://www.w3.org/2009/01/xmlbase-errata>.

VALLANGS

DOM Tree Validation <https://rawgit.com/manakai/spec-dom/409d6f6c0685e96c5b0d2c7aeb894ed567f0d651/validation-langs.html#rdf/xml-integration>.

AUTHOR

Wakaba <wakaba@suikawiki.org>.

LICENSE

Copyright 2013-2018 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.