# The Power of Radicals: How 214 Building Blocks Unlock Thousands of Kanji

## A hash function from 1716

The [214 Kangxi radicals](https://en.wikipedia.org/wiki/Kangxi_radical) are not a learning aid. They are an indexing system — a piece of pre-modern infrastructure for the problem of *how do you alphabetize a script with no alphabet?*

The answer, codified in the [Kangxi Dictionary](https://en.wikipedia.org/wiki/Kangxi_Dictionary) of 1716 under the [Kangxi Emperor](https://en.wikipedia.org/wiki/Kangxi_Emperor), was to assign every one of 47,035 characters to exactly one of 214 buckets based on a shared graphical component, then sort by residual stroke count inside the bucket. That is a hash function. The keys are characters. The hash is visual. The collisions are resolved by counting strokes.

It survived three centuries because the function is good.

![A volume of the original Kangxi Dictionary on display at the Chinese Dictionary Museum](https://upload.wikimedia.org/wikipedia/commons/4/49/Kangxi_Dictionary_-_Chinese_Dictionary_Museum.JPG)
*A volume of the Kangxi Dictionary (康熙字典) on display at the Chinese Dictionary Museum at Huangcheng Xiangfu. Compiled 1710–1716 under the Kangxi Emperor, it fixed the 214-radical scheme that still governs character lookup three centuries later. Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Kangxi_Dictionary_-_Chinese_Dictionary_Museum.JPG) (CC BY-SA 4.0).*

## The distribution follows a power law

Of course it does. Anything assembled by humans across centuries tends to.

The 47,035 characters spread across 214 radicals at a mean of 220 per radical and a median of 64 — the gap between mean and median is the first hint that the tail is fat. The top 10 radicals swallow 10,665 characters. That is 23% of the dictionary, held by under 5% of the buckets. The bottom quartile of radicals together cover less than [radical 140](/radicals/140) (grass) alone.

| Rank | # | Radical | Meaning | Kangxi Count | % of Total |
|:---:|:---:|:---:|----------|:---:|:---:|
| 1 | [140](/radicals/140) | 艸 | grass | 1,902 | 4.04% |
| 2 | [85](/radicals/85) | 水 | water | 1,595 | 3.39% |
| 3 | [75](/radicals/75) | 木 | tree | 1,369 | 2.91% |
| 4 | [64](/radicals/64) | 手 | hand | 1,203 | 2.56% |
| 5 | [30](/radicals/30) | 口 | mouth | 1,146 | 2.44% |
| 6 | [61](/radicals/61) | 心 | heart | 1,115 | 2.37% |
| 7 | [142](/radicals/142) | 虫 | insect | 1,067 | 2.27% |
| 8 | [118](/radicals/118) | 竹 | bamboo | 953 | 2.03% |
| 9 | [149](/radicals/149) | 言 | speech | 861 | 1.83% |
| 10 | [120](/radicals/120) | 糸 | silk | 823 | 1.75% |
| 11 | [167](/radicals/167) | 金 | metal | 806 | 1.71% |
| 12 | [38](/radicals/38) | 女 | woman | 681 | 1.45% |
| 13 | [130](/radicals/130) | 肉 | meat | 674 | 1.43% |
| 14 | [109](/radicals/109) | 目 | eye | 647 | 1.38% |
| 15 | [86](/radicals/86) | 火 | fire | 639 | 1.36% |

*Source: Kangxi Dictionary radical counts via Wikipedia; percentages computed against 47,035 total entries.*

![Chart showing all 214 Kangxi radicals laid out in old-style font, grouped by stroke count](https://upload.wikimedia.org/wikipedia/commons/3/3b/List_of_the_214_Kangxi_Radicals_-_old_style.svg)
*The full set of 214 Kangxi radicals rendered in old-style fonts that match the original 1716 glyph shapes, ordered by stroke count from 一 (1 stroke) up to 龠 (17 strokes). Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:List_of_the_214_Kangxi_Radicals_-_old_style.svg) (CC BY-SA 3.0).*

The floor is [radical 138](/radicals/138) (艮) at 5 characters. That is a 380:1 ratio between the most and least productive bucket. Grass, water, and wood account for over a tenth of every character ever catalogued — a fossil of the agricultural world that wrote the language.

## What knowing the radical actually buys you

A common Chinese character carries roughly 9.56 bits of [entropy](https://www.johndcook.com/blog/2019/10/18/chinese-character-entropy/) (Cook, 2019). The interesting question — how much does identifying the radical narrow that down?

For a character in [radical 140](/radicals/140), knowing the radical leaves you log2(1,902) ≈ 10.9 bits of search inside the bucket — but you have already discarded 45,133 other candidates outright. For [radical 138](/radicals/138), the radical nearly *is* the answer: log2(5) = 2.3 bits left.

Rare radicals are near-unique identifiers. Common ones are weak hints. This is the same intuition Li et al. (2023) formalized as *self-information of radicals* (SIR), a measure of each radical's discriminative power in neural character recognition. Weight radicals by their information content and zero-shot recognition accuracy improves by 3–5%. The 18th-century lexicographers picked a feature that 21st-century neural nets still want to attend to.

## Positional variants: shape under pressure

A radical does not always look like itself. When it shifts position inside a character, it deforms — losing strokes to fit the slot. Japanese pedagogy names seven canonical slots: *hen* (left), *tsukuri* (right), *kanmuri* (top), *ashi* (bottom), *tare* (top-left drape), *nyou* (left-bottom wrap), *kamae* (enclosure).

| Base Form | Variant | Position | Name | Example Characters |
|:---:|:---:|----------|----------|----------|
| [人](/kanjis/4eba) | 亻 | hen (left) | にんべん | [休](/kanjis/4f11), [体](/kanjis/4f53), [作](/kanjis/4f5c) |
| [水](/kanjis/6c34) | 氵 | hen (left) | さんずい | [海](/kanjis/6d77), [河](/kanjis/6cb3), [泳](/kanjis/6cf3) |
| [心](/kanjis/5fc3) | 忄 | hen (left) | りっしんべん | 快, 情, 悟 |
| 心 | 㣺 | ashi (bottom) | したごころ | [恭](/kanjis/606d), [慕](/kanjis/6155) |
| [手](/kanjis/624b) | 扌 | hen (left) | てへん | [持](/kanjis/6301), [打](/kanjis/6253), [投](/kanjis/6295) |
| [火](/kanjis/706b) | 灬 | ashi (bottom) | れんが | 然, 煮, 熱 |
| 犬 | 犭 | hen (left) | けものへん | [猫](/kanjis/732b), [独](/kanjis/72ec), [狂](/kanjis/72c2) |
| 示 | 礻 | hen (left) | しめすへん | [社](/kanjis/793e), [神](/kanjis/795e), [礼](/kanjis/793c) |
| 衣 | 衤 | hen (left) | ころもへん | [被](/kanjis/88ab), [袋](/kanjis/888b), [裸](/kanjis/88f8) |
| 刀 | 刂 | tsukuri (right) | りっとう | [判](/kanjis/5224), [別](/kanjis/5225), [刻](/kanjis/523b) |
| 食 | 飠 | hen (left) | しょくへん | [飲](/kanjis/98f2), [飯](/kanjis/98ef), [館](/kanjis/9928) |
| 艸 | 艹 | kanmuri (top) | くさかんむり | [花](/kanjis/82b1), [茶](/kanjis/8336), [薬](/kanjis/85ac) |
| 竹 | ⺮ | kanmuri (top) | たけかんむり | [笑](/kanjis/7b11), [筆](/kanjis/7b46), [箱](/kanjis/7bb1) |
| 老 | 耂 | kanmuri (top) | おいかんむり | [考](/kanjis/8003), [者](/kanjis/8005) |
| 肉 | 月 | hen (left) | にくづき | [腕](/kanjis/8155), [脳](/kanjis/8133), [肺](/kanjis/80ba) |

The meat radical ([肉](/radicals/130)) is the system's one honest failure: its variant is graphically identical to the moon radical ([月](/radicals/74)). [腕](/kanjis/8155) (arm) carries meat-月. [朝](/kanjis/671d) (morning) carries moon-月. Same pixels, different etymology. You resolve it by context or you don't resolve it. Everywhere else the visual logic holds — here it gives up.

## Competing systems: 214 vs 201 vs 79

The Kangxi set is not the only proposal on the table. Three serious alternatives exist, each optimizing for a different reader.

| System | Radicals | Year | Scope | Design philosophy |
|----------|:---:|:---:|----------|----------|
| Kangxi (康熙) | 214 | 1716 | 47,035 chars | Comprehensive historical standard |
| PRC Standard (GF 0011) | 201 | 2009 | Simplified Chinese | Merged rare Kangxi radicals for modern use |
| [Spahn-Hadamitzky](https://en.wikipedia.org/wiki/Mark_Spahn) | 79 | 1996 | Japanese learners | Maximally reduced for practical lookup |

The PRC's 201-radical [GF 0011-2009](https://en.wikipedia.org/wiki/Chinese_character_radicals#GF_0011-2009) drops 13 rarely-encountered Kangxi radicals and adjusts forms for simplified glyphs. A 6% cut — conservative, backward-compatible with most dictionary conventions.

Mark Spahn and Wolfgang Hadamitzky go further. *The Kanji Dictionary* collapses 214 categories down to 79 by merging anything learners reliably confuse, and cross-lists every compound under each of its component characters. The trade is honest: faster lookup, less semantic structure. Etymology loses, beginners win.

Jack Halpern's [SKIP](https://en.wikipedia.org/wiki/Jack_Halpern_(linguist)) — System of Kanji Indexing by Patterns — abandons radicals entirely. It indexes by geometric division and stroke counts, which solves the one problem radicals can't: looking up a character you cannot decompose. The cost is the semantic signal you used to get for free.

Three systems, three answers to a question of taste. None of them retire the Kangxi set.

## Why neural nets rediscovered the same primitive

Modern NLP has wandered into the same insight 18th-century lexicographers had: characters compose. A radical-aware model can recognize a character it has never seen by assembling it from parts it has — stroke trees, IDS decomposition, attention over radical sequences. Zero-shot character recognition is the regime where the difference shows up most cleanly, and radical-aware architectures consistently win there.

The 214 Kangxi radicals are not perfect. The grass overloading and the meat-moon collision are receipts of that. They survived because each radical does three jobs at once: it is a compressed semantic label, a visual hash key, and a mnemonic anchor. Nothing built since has matched the triple.

Browse the [complete radical index](/radicals) to walk all 214 with their variant forms and classified kanji.

### References

- Kangxi Dictionary (康熙字典), 1716. Radical counts and distribution statistics via [Wikipedia](https://en.wikipedia.org/wiki/Kangxi_radical).
- Cook, J.D. (2019). [Chinese character frequency and entropy](https://www.johndcook.com/blog/2019/10/18/chinese-character-entropy/).
- Li, X., et al. (2023). Self-information of radicals: A new clue for zero-shot Chinese character recognition. *Pattern Recognition*, 140, 109598.
- Spahn, M. & Hadamitzky, W. (1996). *The Kanji Dictionary*. Tuttle Publishing.
- GF 0011-2009, Table of Indexing Chinese Character Components. PRC Ministry of Education.
- Unicode Consortium. [Kangxi Radicals block](https://en.wikipedia.org/wiki/Kangxi_Radicals_(Unicode_block)) (U+2F00–U+2FD5), CJK Radicals Supplement (U+2E80–U+2EFF).

