From 540 to 214: How the Kangxi Radicals Became the Backbone of CJK Lexicography

hbaristr 5분 분량

An indexing algorithm that has survived 310 years

Pick a character — say 漢. Open a paper dictionary. How do you find it? There is no alphabet. There are tens of thousands of glyphs. You need a key.

The key, since 1716, has been 214 radicals. Same 214 in your Japanese dictionary today. Same 214 in Unicode. Same 214 the Qing court signed off on under the Kangxi Emperor. Three centuries, one schema, no migration.

That kind of longevity in a data structure is rare. Worth understanding why.

The reduction path

The first serious attempt was Xu Shen's Shuowen Jiezi around 100 CE — 9,353 characters slotted under 540 recurring elements (bu 部). The idea worked. The schema didn't. 540 keys is too many; most of them held a handful of entries. The index was overfit to the corpus.

Year Dictionary Radicals Characters Innovation
100 CE Shuowen Jiezi (Xu Shen) 540 9,353 First radical-based index
1615 Zihui (Mei Yingzuo) 214 33,179 Radical-and-stroke sorting
1716 Kangxi Zidian (Zhang Yushu et al.) 214 47,035 Imperial standardization
1999 Unicode 3.0 214 27,484+ Digital encoding (U+2F00--U+2FD5)
2009 PRC Standard GF 0011 201 -- Simplified-character variant

The compression took 1,500 years. Mei Yingzuo's Zihui (1615) did two things: drop radicals that earned their keep on too few characters, and collapse positional variants of the same root into one entry. Then sort by residual stroke count — strokes left after the radical is removed. Two-level lookup. 214 keys. It scaled.

A century later, Zhang Yushu and Chen Tingjing shipped the Kangxi Zidian on Mei's 214-key schema — 47,035 characters across twelve volumes, sanctioned by the emperor. Imperial imprimatur turned a good index into the canonical one. Lock-in by decree.

Frontispiece of an 1827 reprint of the Kangxi Dictionary (康熙字典)
Frontispiece of an 1827 reprint of the *Kangxi Zidian (康熙字典). The 1716 original ran to twelve volumes and indexed 47,035 characters under Mei Yingzuo's 214-radical scheme. Source: Wikimedia Commons.*

A wildly unbalanced tree

The 214 keys are not uniformly loaded. The top ten alone account for roughly 23% of the dictionary:

Rank # Radical Meaning Kangxi Entries
1 140 艸 (艹) grass 1,902
2 85 水 (氵) water 1,595
3 75 tree 1,369
4 64 手 (扌) hand 1,203
5 30 mouth 1,146
6 61 心 (忄) heart 1,115
7 142 insect 1,067
8 118 bamboo 953
9 149 speech 861
10 120 silk 823

Bottom of the table: radical 138 (艮, stopping) carries 5 entries. The ratio is 380:1. A database engineer would call this a hot-key problem and reshard.

The Qing did not reshard. They were right not to. Worst-case lookup stays O(log n) once you have the radical, and finding the radical is constant time for a trained eye. The asymmetry is semantic — humanity has talked about water, plants, hands, and hearts more than about whatever 艮 indexes. The histogram is a snapshot of what people have written about. That's a feature, not skew.

Chart showing all 214 Kangxi radicals in old-style fonts, ordered by stroke count
The full set of 214 Kangxi radicals rendered in old-style fonts that imitate the original Kangxi Zidian shapes, ordered by stroke count (1 stroke at left, 17 at right). Source: Wikimedia Commons.

Competing indexing systems

Every century since, someone has tried to replace it.

System Year Keys Prerequisite Best For
Kangxi radicals 1716 214 + strokes Memorize 214 + variants Semantic browsing, print dictionaries
Four-corner (Wang Yunwu) 1926 4--5 digit code 10 stroke-shape rules Telegraphers, numeric indexing
SKIP (Jack Halpern) 1990 4 patterns + strokes None (geometric division) Foreign learners
Spahn-Hadamitzky 1996 79 radicals Memorize 79 Learner dictionaries
Cangjie input 1976 24 forms, 5 keys Decomposition rules Fast digital input

SKIP is the cleanest piece of UX in the bunch. Four geometric patterns — left-right, up-down, enclosure, solid — then count strokes. Zero memorization. Halpern designed it for the foreign learner who can see shapes but cannot yet read meaning.

The trade is brutal: SKIP has no semantic content. Looking up 海 by Kangxi tells you it sits with 1,594 other water-related characters. Looking up 海 by SKIP tells you it is left-right and the left half is three strokes. One answer is a thread you can pull on. The other is a hash.

Kangxi survives because it is both an index and an ontology. Most systems are one or the other. Doing both is the moat.

The Unicode compromise

The Unicode Consortium had to make a call in the 1990s. The same glyph appears twice in the writing system — once as a character meaning "one" (一), once as the radical for "one" (⼀). Same shape, different role. Merge them and you lose the metadata. Split them and you have to teach every renderer about both.

They split. The 214 radicals live at U+2F00--U+2FD5, separate from CJK Unified Ideographs. U+2F00 (⼀) is Kangxi Radical One — metadata. U+4E00 (一) is the character "one" — content. A supplementary block at U+2E80--U+2EFF carries the positional variants (氵 for 水 on the left, 忄 for 心, and so on).

The architecture is honest about what the radicals are: a 300-year-old indexing layer preserved alongside the data it indexes. Not deleted, not collapsed, not refactored. Same 214 keys.

The right call. Schemas that survive long enough become infrastructure, and infrastructure outlives the problem it was designed for. The Kangxi index was built for woodblock-printed dictionaries in Qing China. It is now the routing table for kanji in your phone's IME, the radical column in JMdict, and the URL slugs in our radical index.

Three centuries, one schema, no migration. Pick your indexes carefully.

References

  • Xu Shen, Shuowen Jiezi (100 CE)
  • Mei Yingzuo, Zihui (1615)
  • Kharlamova, E. (2021). "Unification of the Chinese Radicals." SSRN.
  • Halpern, J. (1990). SKIP. kanji.org.
  • Unicode 17.0.0, Chapter 18: East Asian Scripts.

피드백 보내기

선택사항 — 답변이 필요한 경우에만.