Char::Normalize::FullwidthHalfwidth
Fullwidth/halfwidth character normalization
SYNOPSIS
use Char::Normalize::FullwidthHalfwidth qw/normalize_width/;
$s = <>;
normalize_width (\$s);
print $s;
DESCRIPTION
The Char::Normalize::FullwidthHalfwidth
module provides a function that normalizes fullwidth/halfwidth compatibility characters into their canonical representations.
FUNCTIONS
This module provides functions normalize_width
and combine_voiced_sound_marks
. They can be imported to a package by specifying their names as arguments to the use
statement:
use Char::Normalize::FullwidthHalfwidth qw/normalize_width/;
Note that the use
statement does not export anything unless the function names were explicitly specified.
Alternatively, you can invoke the functions in their fully-qualified forms:
require Char::Normalize::FullwidthHalfwidth;
Char::Normalize::FullwidthHalfwidth::normalize_width (\$scalarref);
normalize_width ($scalarref)
-
Normalize the fullwidth/halfwidth characters in the scalar referenced by the argument into their preferable form. The argument must be a scalar reference. The scalar is treated as a character string (possibly with the utf8 flag set), not a byte string. The function returns the scalar reference.
The function performs the following conversions:
- A character
U+3000
IDEOGRAPHIC SPACE
(so-called fullwidth space) -
Replaced by a
U+0020
SPACE
(so-called halfwidth space) character. - Characters in the range
U+FF01
..U+FF5E
(so-called fullwidth ASCII characters) -
Replaced by a character in the range
U+0021
..U+007E
(so-called halfwidth ASCII characters). - Characters in the range
U+FF61
..U+FF9F
(halfwidth Katakana) -
Replaced by a corresponding so-called fullwidth Katakana (or ideographic punctuation). Note that
U+FF9E
HALFWIDTH KATAKANA VOICED SOUND MARK
andU+FF9F
HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
are replaced byU+3099
COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
andU+309A
COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
respectively, not their spacing variants. - Characters in the range
U+FFE0
..U+FFE6
(fullwidth symbols) -
Replaced by a corresponding canonical character.
- A character
combine_voiced_sound_marks ($scalarref)
-
Replace any sequence of (fullwidth) hiragana or katakana followed by a
U+3099
COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
orU+309A
COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
by its precomposed form, if possible.In many cases you would like to apply this function just after the
normalize_width
function. $t = get_fwhw_normalized $s
-
Return a normalized copy of the argument string (not reference).
It performes normalization performed by
normalize_width
andcombine_voiced_sound_marks
, as well as some additional convertions.
BUGS
Not all compatibility characters in the fullwidth and halfwidth block of the Unicode Standard are currently supported - especially, halfwidth Hangul syllables are not converted to their fullwidth equivalents. A future version of this module is expected to address this issue by extending the conversion table.
AUTHOR
Wakaba <wakaba@suikawiki.org>.
HISTORY
This module was originally developed as part of SuikaWiki https://suika.suikawiki.org/~wakaba/wiki/sw/n/SuikaWiki.
LICENSE
Copyright 2008-2016 Wakaba <wakaba@suikawiki.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.