From 540 to 214: How the Kangxi Radicals Became the Backbone of CJK Lexicography

An indexing algorithm that has survived 310 years

Pick a character — say 漢. Open a paper dictionary. How do you find it? There is no alphabet. There are tens of thousands of glyphs. You need a key.

The key, since 1716, has been 214 radicals. Same 214 in your Japanese dictionary today. Same 214 in Unicode. Same 214 the Qing court signed off on under the Kangxi Emperor. Three centuries, one schema, no migration.

That kind of longevity in a data structure is rare. Worth understanding why.

The reduction path

The first serious attempt was Xu Shen's Shuowen Jiezi around 100 CE — 9,353 characters slotted under 540 recurring elements (bu 部). The idea worked. The schema didn't. 540 keys is too many; most of them held a handful of entries. The index was overfit to the corpus.

Year	Dictionary	Radicals	Characters	Innovation
100 CE	Shuowen Jiezi (Xu Shen)	540	9,353	First radical-based index
1615	Zihui (Mei Yingzuo)	214	33,179	Radical-and-stroke sorting
1716	Kangxi Zidian (Zhang Yushu et al.)	214	47,035	Imperial standardization
1999	Unicode 3.0	214	27,484+	Digital encoding (U+2F00--U+2FD5)
2009	PRC Standard GF 0011	201	--	Simplified-character variant

The compression took 1,500 years. Mei Yingzuo's Zihui (1615) did two things: drop radicals that earned their keep on too few characters, and collapse positional variants of the same root into one entry. Then sort by residual stroke count — strokes left after the radical is removed. Two-level lookup. 214 keys. It scaled.

A century later, Zhang Yushu and Chen Tingjing shipped the Kangxi Zidian on Mei's 214-key schema — 47,035 characters across twelve volumes, sanctioned by the emperor. Imperial imprimatur turned a good index into the canonical one. Lock-in by decree.

Frontispiece of an 1827 reprint of the *Kangxi Zidian (康熙字典). The 1716 original ran to twelve volumes and indexed 47,035 characters under Mei Yingzuo's 214-radical scheme. Source: Wikimedia Commons.*

A wildly unbalanced tree

The 214 keys are not uniformly loaded. The top ten alone account for roughly 23% of the dictionary:

Rank	#	Radical	Meaning	Kangxi Entries
1	140	艸 (艹)	grass	1,902
2	85	水 (氵)	water	1,595
3	75	木	tree	1,369
4	64	手 (扌)	hand	1,203
5	30	口	mouth	1,146
6	61	心 (忄)	heart	1,115
7	142	虫	insect	1,067
8	118	竹	bamboo	953
9	149	言	speech	861
10	120	糸	silk	823

Bottom of the table: radical 138 (艮, stopping) carries 5 entries. The ratio is 380:1. A database engineer would call this a hot-key problem and reshard.

The Qing did not reshard. They were right not to. Worst-case lookup stays O(log n) once you have the radical, and finding the radical is constant time for a trained eye. The asymmetry is semantic — humanity has talked about water, plants, hands, and hearts more than about whatever 艮 indexes. The histogram is a snapshot of what people have written about. That's a feature, not skew.

The full set of 214 Kangxi radicals rendered in old-style fonts that imitate the original Kangxi Zidian shapes, ordered by stroke count (1 stroke at left, 17 at right). Source: Wikimedia Commons.

Competing indexing systems

Every century since, someone has tried to replace it.

System	Year	Keys	Prerequisite	Best For
Kangxi radicals	1716	214 + strokes	Memorize 214 + variants	Semantic browsing, print dictionaries
Four-corner (Wang Yunwu)	1926	4--5 digit code	10 stroke-shape rules	Telegraphers, numeric indexing
SKIP (Jack Halpern)	1990	4 patterns + strokes	None (geometric division)	Foreign learners
Spahn-Hadamitzky	1996	79 radicals	Memorize 79	Learner dictionaries
Cangjie input	1976	24 forms, 5 keys	Decomposition rules	Fast digital input

SKIP is the cleanest piece of UX in the bunch. Four geometric patterns — left-right, up-down, enclosure, solid — then count strokes. Zero memorization. Halpern designed it for the foreign learner who can see shapes but cannot yet read meaning.

The trade is brutal: SKIP has no semantic content. Looking up 海 by Kangxi tells you it sits with 1,594 other water-related characters. Looking up 海 by SKIP tells you it is left-right and the left half is three strokes. One answer is a thread you can pull on. The other is a hash.

Kangxi survives because it is both an index and an ontology. Most systems are one or the other. Doing both is the moat.

The Unicode compromise

The Unicode Consortium had to make a call in the 1990s. The same glyph appears twice in the writing system — once as a character meaning "one" (一), once as the radical for "one" (⼀). Same shape, different role. Merge them and you lose the metadata. Split them and you have to teach every renderer about both.

They split. The 214 radicals live at U+2F00--U+2FD5, separate from CJK Unified Ideographs. U+2F00 (⼀) is Kangxi Radical One — metadata. U+4E00 (一) is the character "one" — content. A supplementary block at U+2E80--U+2EFF carries the positional variants (氵 for 水 on the left, 忄 for 心, and so on).

The architecture is honest about what the radicals are: a 300-year-old indexing layer preserved alongside the data it indexes. Not deleted, not collapsed, not refactored. Same 214 keys.

The right call. Schemas that survive long enough become infrastructure, and infrastructure outlives the problem it was designed for. The Kangxi index was built for woodblock-printed dictionaries in Qing China. It is now the routing table for kanji in your phone's IME, the radical column in JMdict, and the URL slugs in our radical index.

Three centuries, one schema, no migration. Pick your indexes carefully.

References

Xu Shen, Shuowen Jiezi (100 CE)
Mei Yingzuo, Zihui (1615)
Kharlamova, E. (2021). "Unification of the Chinese Radicals." SSRN.
Halpern, J. (1990). SKIP. kanji.org.
Unicode 17.0.0, Chapter 18: East Asian Scripts.