This document defines CJK number (漢数字) parsing and serializing algorithms.
This specification depends on the Infra Standard.
The terms string, length, and concatenate are defined by the Infra Standard.
The operators ×, /, and % are defined by the Encoding Standard.
The term serialize an integer is defined by the Fetch Standard.
The empty string is a string whose length is zero (0).
To append a string s to a string variable v, set v to the concatenation of « v, s ».
This section defines an algorithm to parse CJK numbers (漢数字).
Digits and other characters used in CJK numbers are defined by the following table. Each character in the table has its value shown in the value column of the same row and belongs to the categories shown in the categories column of the same row. If the value column's content is "-", the character has no value.
The value of a character is a non-negative integer.
There are following categories: CJK digit, CJK zero, CJK non-zero digit, CJK ten, CJK multiple tens, CJK hundred, CJK multiple hundreds, CJK thousand, CJK ten thousand, CJK hundred million, CJK trillion, CJK ten quadrillion, CJK hundred quintillion, CJK septillion, CJK ten octillion, CJK and separator, CJK digit group separator, and CJK decimal separator.
Character | Value | Categories | ||
---|---|---|---|---|
Code point | Name | Character | ||
U+0020
| SPACE
| - | CJK digit group separator | |
U+002C
| COMMA
| , | - | CJK digit group separator |
U+002E
| FULL STOP
| . | - | CJK decimal separator |
U+0030
| DIGIT ZERO
| 0 | 0 | CJK digit, CJK zero |
U+0031
| DIGIT ONE
| 1 | 1 | CJK digit, CJK non-zero digit |
U+0032
| DIGIT TWO
| 2 | 2 | CJK digit, CJK non-zero digit |
U+0033
| DIGIT THREE
| 3 | 3 | CJK digit, CJK non-zero digit |
U+0034
| DIGIT FOUR
| 4 | 4 | CJK digit, CJK non-zero digit |
U+0035
| DIGIT FIVE
| 5 | 5 | CJK digit, CJK non-zero digit |
U+0036
| DIGIT SIX
| 6 | 6 | CJK digit, CJK non-zero digit |
U+0037
| DIGIT SEVEN
| 7 | 7 | CJK digit, CJK non-zero digit |
U+0038
| DIGIT EIGHT
| 8 | 8 | CJK digit, CJK non-zero digit |
U+0039
| DIGIT NINE
| 9 | 9 | CJK digit, CJK non-zero digit |
U+00A0
| NO-BREAK SPACE
| - | CJK digit group separator | |
U+00B7
| MIDDLE DOT
| · | - | CJK digit group separator |
U+2009
| THIN SPACE
| - | CJK digit group separator | |
U+202F
| NARROW NO-BREAK SPACE
| - | CJK digit group separator | |
U+3007
| IDEOGRAPHIC NUMBER ZERO
| 〇 | 0 | CJK digit, CJK zero |
U+30FB
| KATAKANA MIDDLE DOT
| ・ | - | CJK decimal separator |
U+4E00
| CJK UNIFIED IDEOGRAPH-4E00
| 一 | 1 | CJK digit, CJK non-zero digit |
U+4E03
| CJK UNIFIED IDEOGRAPH-4E03
| 七 | 7 | CJK digit, CJK non-zero digit |
U+4E07
| CJK UNIFIED IDEOGRAPH-4E07
| 万 | 10000 | CJK ten thousand |
U+4E09
| CJK UNIFIED IDEOGRAPH-4E09
| 三 | 3 | CJK digit, CJK non-zero digit |
U+4E17
| CJK UNIFIED IDEOGRAPH-4E17
| 丗 | 30 | CJK multiple tens |
U+4E5D
| CJK UNIFIED IDEOGRAPH-4E5D
| 九 | 9 | CJK digit, CJK non-zero digit |
U+4E8C
| CJK UNIFIED IDEOGRAPH-4E8C
| 二 | 2 | CJK digit, CJK non-zero digit |
U+4E94
| CJK UNIFIED IDEOGRAPH-4E94
| 五 | 5 | CJK digit, CJK non-zero digit |
U+4E96
| CJK UNIFIED IDEOGRAPH-4E96
| 亖 | 4 | CJK digit, CJK non-zero digit |
U+4EAC
| CJK UNIFIED IDEOGRAPH-4EAC
| 京 | 10000000000000000 | CJK ten quadrillion |
U+4EBF
| CJK UNIFIED IDEOGRAPH-4EBF
| 亿 | 100000000 | CJK hundred million |
U+4EDF
| CJK UNIFIED IDEOGRAPH-4EDF
| 仟 | 1000 | CJK thousand |
U+4F0D
| CJK UNIFIED IDEOGRAPH-4F0D
| 伍 | 5 | CJK digit, CJK non-zero digit |
U+4F70
| CJK UNIFIED IDEOGRAPH-4F70
| 佰 | 100 | CJK hundred |
U+5104
| CJK UNIFIED IDEOGRAPH-5104
| 億 | 100000000 | CJK hundred million |
U+5146
| CJK UNIFIED IDEOGRAPH-5146
| 兆 | 1000000000000 | CJK trillion |
U+516B
| CJK UNIFIED IDEOGRAPH-516B
| 八 | 8 | CJK digit, CJK non-zero digit |
U+516D
| CJK UNIFIED IDEOGRAPH-516D
| 六 | 6 | CJK digit, CJK non-zero digit |
U+5341
| CJK UNIFIED IDEOGRAPH-5341
| 十 | 10 | CJK ten |
U+5343
| CJK UNIFIED IDEOGRAPH-5343
| 千 | 1000 | CJK thousand |
U+5344
| CJK UNIFIED IDEOGRAPH-5344
| 卄 | 20 | CJK multiple tens |
U+5345
| CJK UNIFIED IDEOGRAPH-5345
| 卅 | 30 | CJK multiple tens |
U+534C
| CJK UNIFIED IDEOGRAPH-534C
| 卌 | 40 | CJK multiple tens |
U+53C1
| CJK UNIFIED IDEOGRAPH-53C1
| 叁 | 3 | CJK digit, CJK non-zero digit |
U+53C2
| CJK UNIFIED IDEOGRAPH-53C2
| 参 | 3 | CJK digit, CJK non-zero digit |
U+53C3
| CJK UNIFIED IDEOGRAPH-53C3
| 參 | 3 | CJK digit, CJK non-zero digit |
U+53C4
| CJK UNIFIED IDEOGRAPH-53C4
| 叄 | 3 | CJK digit, CJK non-zero digit |
U+53C8
| CJK UNIFIED IDEOGRAPH-53C8
| 又 | - | CJK and separator |
U+56DB
| CJK UNIFIED IDEOGRAPH-56DB
| 四 | 4 | CJK digit, CJK non-zero digit |
U+5793
| CJK UNIFIED IDEOGRAPH-5793
| 垓 | 100000000000000000000 | CJK hundred quintillion |
U+58F1
| CJK UNIFIED IDEOGRAPH-58F1
| 壱 | 1 | CJK digit, CJK non-zero digit |
U+58F9
| CJK UNIFIED IDEOGRAPH-58F9
| 壹 | 1 | CJK digit, CJK non-zero digit |
U+5EFE
| CJK UNIFIED IDEOGRAPH-5EFE
| 廾 | 20 | CJK multiple tens |
U+5EFF
| CJK UNIFIED IDEOGRAPH-5EFF
| 廿 | 20 | CJK multiple tens |
U+5F0C
| CJK UNIFIED IDEOGRAPH-5F0C
| 弌 | 1 | CJK digit, CJK non-zero digit |
U+5F0D
| CJK UNIFIED IDEOGRAPH-5F0D
| 弍 | 2 | CJK digit, CJK non-zero digit |
U+5F0E
| CJK UNIFIED IDEOGRAPH-5F0E
| 弎 | 3 | CJK digit, CJK non-zero digit |
U+5F10
| CJK UNIFIED IDEOGRAPH-5F10
| 弐 | 2 | CJK digit, CJK non-zero digit |
U+62FE
| CJK UNIFIED IDEOGRAPH-62FE
| 拾 | 10 | CJK ten |
U+634C
| CJK UNIFIED IDEOGRAPH-634C
| 捌 | 8 | CJK digit, CJK non-zero digit |
U+6709
| CJK UNIFIED IDEOGRAPH-6709
| 有 | - | CJK and separator |
U+67D2
| CJK UNIFIED IDEOGRAPH-67D2
| 柒 | 7 | CJK digit, CJK non-zero digit |
U+6F06
| CJK UNIFIED IDEOGRAPH-6F06
| 漆 | 7 | CJK digit, CJK non-zero digit |
U+7396
| CJK UNIFIED IDEOGRAPH-7396
| 玖 | 9 | CJK digit, CJK non-zero digit |
U+767E
| CJK UNIFIED IDEOGRAPH-767E
| 百 | 100 | CJK hundred |
U+7695
| CJK UNIFIED IDEOGRAPH-7695
| 皕 | 200 | CJK multiple hundreds |
U+79ED
| CJK UNIFIED IDEOGRAPH-79ED
| 秭 | 1000000000000000000000000 | CJK septillion |
U+7A63
| CJK UNIFIED IDEOGRAPH-7A63
| 穣 | 10000000000000000000000000000 | CJK ten octillion |
U+7A70
| CJK UNIFIED IDEOGRAPH-7A70
| 穰 | 10000000000000000000000000000 | CJK ten octillion |
U+8086
| CJK UNIFIED IDEOGRAPH-8086
| 肆 | 4 | CJK digit, CJK non-zero digit |
U+842C
| CJK UNIFIED IDEOGRAPH-842C
| 萬 | 10000 | CJK ten thousand |
U+8CAE
| CJK UNIFIED IDEOGRAPH-8CAE
| 貮 | 2 | CJK digit, CJK non-zero digit |
U+8CB3
| CJK UNIFIED IDEOGRAPH-8CB3
| 貳 | 2 | CJK digit, CJK non-zero digit |
U+8CEA
| CJK UNIFIED IDEOGRAPH-8CEA
| 質 | 7 | CJK digit, CJK non-zero digit |
U+8D30
| CJK UNIFIED IDEOGRAPH-8D30
| 贰 | 2 | CJK digit, CJK non-zero digit |
U+9621
| CJK UNIFIED IDEOGRAPH-9621
| 阡 | 1000 | CJK thousand |
U+9646
| CJK UNIFIED IDEOGRAPH-9646
| 陆 | 6 | CJK digit, CJK non-zero digit |
U+964C
| CJK UNIFIED IDEOGRAPH-964C
| 陌 | 100 | CJK hundred |
U+9678
| CJK UNIFIED IDEOGRAPH-9678
| 陸 | 6 | CJK digit, CJK non-zero digit |
U+96F6
| CJK UNIFIED IDEOGRAPH-96F6
| 零 | 0 | CJK digit, CJK zero |
U+FF0C
| FULLWIDTH COMMA
| , | - | CJK digit group separator |
U+FF0E
| FULLWIDTH FULL STOP
| . | - | CJK decimal separator |
U+FF10
| FULLWIDTH DIGIT ZERO
| 0 | 0 | CJK digit, CJK zero |
U+FF11
| FULLWIDTH DIGIT ONE
| 1 | 1 | CJK digit, CJK non-zero digit |
U+FF12
| FULLWIDTH DIGIT TWO
| 2 | 2 | CJK digit, CJK non-zero digit |
U+FF13
| FULLWIDTH DIGIT THREE
| 3 | 3 | CJK digit, CJK non-zero digit |
U+FF14
| FULLWIDTH DIGIT FOUR
| 4 | 4 | CJK digit, CJK non-zero digit |
U+FF15
| FULLWIDTH DIGIT FIVE
| 5 | 5 | CJK digit, CJK non-zero digit |
U+FF16
| FULLWIDTH DIGIT SIX
| 6 | 6 | CJK digit, CJK non-zero digit |
U+FF17
| FULLWIDTH DIGIT SEVEN
| 7 | 7 | CJK digit, CJK non-zero digit |
U+FF18
| FULLWIDTH DIGIT EIGHT
| 8 | 8 | CJK digit, CJK non-zero digit |
U+FF19
| FULLWIDTH DIGIT NINE
| 9 | 9 | CJK digit, CJK non-zero digit |
U+2099C
| CJK UNIFIED IDEOGRAPH-2099C
| 𠦜 | 40 | CJK multiple tens |
U+25771
| CJK UNIFIED IDEOGRAPH-25771
| 𥝱 | 1000000000000000000000000 | CJK septillion |
To parse a CJK number string, the implementation MUST run these steps:
These steps returns either a number or null. The null value represents an error.
Running the steps to parse a CJK number
with 三十五
returns 35 while running
with 四万五万
returns null
.
The rules for parsing a large CJK number are these steps, which share the same input with the steps that invoke these steps:
0.
.
The rules for parsing a small CJK number are these steps, which share the same input with the steps that invoke these steps:
To serialize a number in CJK-10000-grouped number string, with an integer number, the implementation MUST run these steps:
穣
" to string.
𥝱
" to string.
垓
" to string.
京
" to string.
兆
" to string.
億
" to string.
万
" to string.
MINUS SIGN
character
(−):
Running the steps to serialize a number in
CJK-10000-grouped number string with 1230567
returns 123万567
.
This section is non-normative.
There is a JSON data file on values of characters defined in this document.
There are test data:
There is an implementation: perl-number-cjk.
This document is written by Wakaba <wakaba@suikawiki.org>.
This document is developed as part of the manakai project.
Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.