Char::Normalize::FullwidthHalfwidth
Fullwidth/halfwidth character normalization
SYNOPSIS
use Char::Normalize::FullwidthHalfwidth qw/normalize_width/;
$s = <>;
normalize_width (\$s);
print $s;
DESCRIPTION
The Char::Normalize::FullwidthHalfwidth module provides a function that normalizes fullwidth/halfwidth compatibility characters into their canonical representations.
FUNCTIONS
This module provides functions normalize_width and combine_voiced_sound_marks. They can be imported to a package by specifying their names as arguments to the use statement:
use Char::Normalize::FullwidthHalfwidth qw/normalize_width/;
Note that the use statement does not export anything unless the function names were explicitly specified.
Alternatively, you can invoke the functions in their fully-qualified forms:
require Char::Normalize::FullwidthHalfwidth;
Char::Normalize::FullwidthHalfwidth::normalize_width (\$scalarref);
normalize_width ($scalarref)-
Normalize the fullwidth/halfwidth characters in the scalar referenced by the argument into their preferable form. The argument must be a scalar reference. The scalar is treated as a character string (possibly with the utf8 flag set), not a byte string. The function returns the scalar reference.
The function performs the following conversions:
- A character
U+3000IDEOGRAPHIC SPACE(so-called fullwidth space) -
Replaced by a
U+0020SPACE(so-called halfwidth space) character. - Characters in the range
U+FF01..U+FF5E(so-called fullwidth ASCII characters) -
Replaced by a character in the range
U+0021..U+007E(so-called halfwidth ASCII characters). - Characters in the range
U+FF61..U+FF9F(halfwidth Katakana) -
Replaced by a corresponding so-called fullwidth Katakana (or ideographic punctuation). Note that
U+FF9EHALFWIDTH KATAKANA VOICED SOUND MARKandU+FF9FHALFWIDTH KATAKANA SEMI-VOICED SOUND MARKare replaced byU+3099COMBINING KATAKANA-HIRAGANA VOICED SOUND MARKandU+309ACOMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARKrespectively, not their spacing variants. - Characters in the range
U+FFE0..U+FFE6(fullwidth symbols) -
Replaced by a corresponding canonical character.
- A character
combine_voiced_sound_marks ($scalarref)-
Replace any sequence of (fullwidth) hiragana or katakana followed by a
U+3099COMBINING KATAKANA-HIRAGANA VOICED SOUND MARKorU+309ACOMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARKby its precomposed form, if possible.In many cases you would like to apply this function just after the
normalize_widthfunction. $t = get_fwhw_normalized $s-
Return a normalized copy of the argument string (not reference).
It performes normalization performed by
normalize_widthandcombine_voiced_sound_marks, as well as some additional convertions.
BUGS
Not all compatibility characters in the fullwidth and halfwidth block of the Unicode Standard are currently supported - especially, halfwidth Hangul syllables are not converted to their fullwidth equivalents. A future version of this module is expected to address this issue by extending the conversion table.
AUTHOR
Wakaba <wakaba@suikawiki.org>.
HISTORY
This module was originally developed as part of SuikaWiki https://suika.suikawiki.org/~wakaba/wiki/sw/n/SuikaWiki.
LICENSE
Copyright 2008-2016 Wakaba <wakaba@suikawiki.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.