The manakai project


URL parser




Following methods are available:

$parser = Web::URL::Parser->new

Create a new parser.

$url = $parser->parse_proxy_env ($string)

Parse a string using a proxy environment variable parser. If failed, undef is returned. Otherwise, a URL record object (Web::URL) is returned.

This method is appropriate for parsing http_proxy, https_proxy, or ftp_proxy environment variable value, decoded by platform-locale-dependent character encoding.

$result = $parser->split_by_urls ($string, NAME => VALUE, ...)

Extract URLs in a free-form text for autolinking.

The first argument must be the text string to be parsed.

The second and later arguments are interpretd as name/value pairs of options. If the lax option is specified to a true value, parsing is performed in the lax mode. A public Web application (e.g. a forum service interpreting user-posted entries) should not use the lax mode. A client application (e.g. an e-mail client displaying a plain-text mail message) should use the lax mode.

This method only extracts http: and https: URLs. In the lax mode, ttp: and ttps: URL schemes are also detected and are interpreted as http: and https:, respectively.

This method returns an array reference. It contains substrings of the input text, in order, as array references. An inner array is either a text array or a link array. A text array's 0-th item is a text string, representing a substring that is not a URL. A link array's 0-th item is a text string, representing a substring that is interpreted as a URL, and its 1-th item is a text string, which can be used as an input to the URL parser.


    $result = $parser->split_by_urls ("See later!");
    # $result = [
    #   ["See "],
    #   ["", ""],
    #   [" later!"],
    # ];

    $result = $parser->split_by_urls ("ttps://", lax => 1);
    # $result = [
    #   ["", "ttps://"],
    # ];
$html = $parser->text_to_autolinked_html ($text, NAME => VALUE)

Autolink URLs in a text.

Arguments and how they are handled are same as the split_by_urls; the first argument is the text string that is parsed, and the remaining arguments are named parameters.

Unlike split_by_urls, this method returns a text string, which is an HTML fragment (that is suitable as content of an HTML span element). URLs are replaced by an HTML a element with class=url-link. Any HTML special character in the input text is escaped as appropriate.


Web Transport Processing <>.


Wakaba <>.


Copyright 2016-2019 Wakaba <>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.