# From 540 to 214: How the Kangxi Radicals Became the Backbone of CJK Lexicography

## An indexing algorithm that has survived 310 years

Pick a character — say 漢. Open a paper dictionary. How do you find it? There is no alphabet. There are tens of thousands of glyphs. You need a key.

The key, since 1716, has been 214 radicals. Same 214 in your Japanese dictionary today. Same 214 in Unicode. Same 214 the Qing court signed off on under the [Kangxi Emperor](https://en.wikipedia.org/wiki/Kangxi_Emperor). Three centuries, one schema, no migration.

That kind of longevity in a data structure is rare. Worth understanding why.

### The reduction path

The first serious attempt was [Xu Shen](https://en.wikipedia.org/wiki/Xu_Shen)'s [*Shuowen Jiezi*](https://en.wikipedia.org/wiki/Shuowen_Jiezi) around 100 CE — 9,353 characters slotted under 540 recurring elements (*bu* 部). The idea worked. The schema didn't. 540 keys is too many; most of them held a handful of entries. The index was overfit to the corpus.

| Year | Dictionary | Radicals | Characters | Innovation |
|------|-----------|:--------:|:----------:|------------|
| 100 CE | *Shuowen Jiezi* (Xu Shen) | 540 | 9,353 | First radical-based index |
| 1615 | *Zihui* (Mei Yingzuo) | 214 | 33,179 | Radical-and-stroke sorting |
| 1716 | *Kangxi Zidian* (Zhang Yushu et al.) | 214 | 47,035 | Imperial standardization |
| 1999 | Unicode 3.0 | 214 | 27,484+ | Digital encoding (U+2F00--U+2FD5) |
| 2009 | PRC Standard GF 0011 | 201 | -- | Simplified-character variant |

The compression took 1,500 years. [Mei Yingzuo](https://en.wikipedia.org/wiki/Mei_Yingzuo)'s [*Zihui*](https://en.wikipedia.org/wiki/Zihui) (1615) did two things: drop radicals that earned their keep on too few characters, and collapse positional variants of the same root into one entry. Then sort by *residual* stroke count — strokes left after the radical is removed. Two-level lookup. 214 keys. It scaled.

A century later, [Zhang Yushu](https://en.wikipedia.org/wiki/Zhang_Yushu) and [Chen Tingjing](https://en.wikipedia.org/wiki/Chen_Tingjing) shipped the [*Kangxi Zidian*](https://en.wikipedia.org/wiki/Kangxi_Dictionary) on Mei's 214-key schema — 47,035 characters across twelve volumes, sanctioned by the emperor. Imperial imprimatur turned a good index into the canonical one. Lock-in by decree.

![Frontispiece of an 1827 reprint of the Kangxi Dictionary (康熙字典)](https://upload.wikimedia.org/wikipedia/commons/e/e8/Kangxi_Dictionary_1827.JPG)
*Frontispiece of an 1827 reprint of the *Kangxi Zidian* (康熙字典). The 1716 original ran to twelve volumes and indexed 47,035 characters under Mei Yingzuo's 214-radical scheme. Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Kangxi_Dictionary_1827.JPG).*

### A wildly unbalanced tree

The 214 keys are not uniformly loaded. The top ten alone account for roughly 23% of the dictionary:

| Rank | # | Radical | Meaning | Kangxi Entries |
|:----:|:-:|:-------:|---------|:--------------:|
| 1 | [140](/radicals/140) | 艸 (艹) | grass | 1,902 |
| 2 | [85](/radicals/85) | 水 (氵) | water | 1,595 |
| 3 | [75](/radicals/75) | 木 | tree | 1,369 |
| 4 | [64](/radicals/64) | 手 (扌) | hand | 1,203 |
| 5 | [30](/radicals/30) | 口 | mouth | 1,146 |
| 6 | [61](/radicals/61) | 心 (忄) | heart | 1,115 |
| 7 | [142](/radicals/142) | 虫 | insect | 1,067 |
| 8 | [118](/radicals/118) | 竹 | bamboo | 953 |
| 9 | [149](/radicals/149) | 言 | speech | 861 |
| 10 | [120](/radicals/120) | 糸 | silk | 823 |

Bottom of the table: [radical 138](/radicals/138) (艮, stopping) carries 5 entries. The ratio is 380:1. A database engineer would call this a hot-key problem and reshard.

The Qing did not reshard. They were right not to. Worst-case lookup stays O(log n) once you have the radical, and finding the radical is constant time for a trained eye. The asymmetry is *semantic* — humanity has talked about water, plants, hands, and hearts more than about whatever 艮 indexes. The histogram is a snapshot of what people have written about. That's a feature, not skew.

![Chart showing all 214 Kangxi radicals in old-style fonts, ordered by stroke count](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/List_of_the_214_Kangxi_Radicals_-_old_style.svg/960px-List_of_the_214_Kangxi_Radicals_-_old_style.svg.png)
*The full set of 214 Kangxi radicals rendered in old-style fonts that imitate the original Kangxi Zidian shapes, ordered by stroke count (1 stroke at left, 17 at right). Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:List_of_the_214_Kangxi_Radicals_-_old_style.svg).*

### Competing indexing systems

Every century since, someone has tried to replace it.

| System | Year | Keys | Prerequisite | Best For |
|--------|:----:|:----:|-------------|----------|
| Kangxi radicals | 1716 | 214 + strokes | Memorize 214 + variants | Semantic browsing, print dictionaries |
| [Four-corner](https://en.wikipedia.org/wiki/Four-corner_method) (Wang Yunwu) | 1926 | 4--5 digit code | 10 stroke-shape rules | Telegraphers, numeric indexing |
| [SKIP](https://en.wikipedia.org/wiki/SKIP_(kanji_indexing_system)) ([Jack Halpern](https://en.wikipedia.org/wiki/Jack_Halpern_(linguist))) | 1990 | 4 patterns + strokes | None (geometric division) | Foreign learners |
| Spahn-Hadamitzky | 1996 | 79 radicals | Memorize 79 | Learner dictionaries |
| [Cangjie input](https://en.wikipedia.org/wiki/Cangjie_input_method) | 1976 | 24 forms, 5 keys | Decomposition rules | Fast digital input |

SKIP is the cleanest piece of UX in the bunch. Four geometric patterns — left-right, up-down, enclosure, solid — then count strokes. Zero memorization. Halpern designed it for the foreign learner who can see shapes but cannot yet read meaning.

The trade is brutal: SKIP has no semantic content. Looking up 海 by Kangxi tells you it sits with 1,594 other water-related characters. Looking up 海 by SKIP tells you it is left-right and the left half is three strokes. One answer is a thread you can pull on. The other is a hash.

Kangxi survives because it is both an index *and* an ontology. Most systems are one or the other. Doing both is the moat.

### The Unicode compromise

The Unicode Consortium had to make a call in the 1990s. The same glyph appears twice in the writing system — once as a character meaning "one" (一), once as the radical for "one" (⼀). Same shape, different role. Merge them and you lose the metadata. Split them and you have to teach every renderer about both.

They split. The 214 radicals live at [U+2F00--U+2FD5](https://en.wikipedia.org/wiki/Kangxi_Radicals_(Unicode_block)), separate from CJK Unified Ideographs. U+2F00 (⼀) is *Kangxi Radical One* — metadata. U+4E00 (一) is the character "one" — content. A supplementary block at U+2E80--U+2EFF carries the positional variants (氵 for 水 on the left, 忄 for 心, and so on).

The architecture is honest about what the radicals are: a 300-year-old indexing layer preserved alongside the data it indexes. Not deleted, not collapsed, not refactored. Same 214 keys.

The right call. Schemas that survive long enough become infrastructure, and infrastructure outlives the problem it was designed for. The Kangxi index was built for woodblock-printed dictionaries in Qing China. It is now the routing table for kanji in your phone's IME, the radical column in JMdict, and the URL slugs in [our radical index](/radicals).

Three centuries, one schema, no migration. Pick your indexes carefully.

### References

- Xu Shen, *Shuowen Jiezi* (100 CE)
- Mei Yingzuo, *Zihui* (1615)
- Kharlamova, E. (2021). "Unification of the Chinese Radicals." SSRN.
- Halpern, J. (1990). SKIP. kanji.org.
- Unicode 17.0.0, Chapter 18: East Asian Scripts.

