The Power of Radicals: How 214 Building Blocks Unlock Thousands of Kanji

hbaristr 約7分で読めます

A hash function from 1716

The 214 Kangxi radicals are not a learning aid. They are an indexing system — a piece of pre-modern infrastructure for the problem of how do you alphabetize a script with no alphabet?

The answer, codified in the Kangxi Dictionary of 1716 under the Kangxi Emperor, was to assign every one of 47,035 characters to exactly one of 214 buckets based on a shared graphical component, then sort by residual stroke count inside the bucket. That is a hash function. The keys are characters. The hash is visual. The collisions are resolved by counting strokes.

It survived three centuries because the function is good.

A volume of the original Kangxi Dictionary on display at the Chinese Dictionary Museum
A volume of the Kangxi Dictionary (康熙字典) on display at the Chinese Dictionary Museum at Huangcheng Xiangfu. Compiled 1710–1716 under the Kangxi Emperor, it fixed the 214-radical scheme that still governs character lookup three centuries later. Source: Wikimedia Commons (CC BY-SA 4.0).

The distribution follows a power law

Of course it does. Anything assembled by humans across centuries tends to.

The 47,035 characters spread across 214 radicals at a mean of 220 per radical and a median of 64 — the gap between mean and median is the first hint that the tail is fat. The top 10 radicals swallow 10,665 characters. That is 23% of the dictionary, held by under 5% of the buckets. The bottom quartile of radicals together cover less than radical 140 (grass) alone.

Rank # Radical Meaning Kangxi Count % of Total
1 140 grass 1,902 4.04%
2 85 water 1,595 3.39%
3 75 tree 1,369 2.91%
4 64 hand 1,203 2.56%
5 30 mouth 1,146 2.44%
6 61 heart 1,115 2.37%
7 142 insect 1,067 2.27%
8 118 bamboo 953 2.03%
9 149 speech 861 1.83%
10 120 silk 823 1.75%
11 167 metal 806 1.71%
12 38 woman 681 1.45%
13 130 meat 674 1.43%
14 109 eye 647 1.38%
15 86 fire 639 1.36%

Source: Kangxi Dictionary radical counts via Wikipedia; percentages computed against 47,035 total entries.

Chart showing all 214 Kangxi radicals laid out in old-style font, grouped by stroke count
The full set of 214 Kangxi radicals rendered in old-style fonts that match the original 1716 glyph shapes, ordered by stroke count from 一 (1 stroke) up to 龠 (17 strokes). Source: Wikimedia Commons (CC BY-SA 3.0).

The floor is radical 138 (艮) at 5 characters. That is a 380:1 ratio between the most and least productive bucket. Grass, water, and wood account for over a tenth of every character ever catalogued — a fossil of the agricultural world that wrote the language.

What knowing the radical actually buys you

A common Chinese character carries roughly 9.56 bits of entropy (Cook, 2019). The interesting question — how much does identifying the radical narrow that down?

For a character in radical 140, knowing the radical leaves you log2(1,902) ≈ 10.9 bits of search inside the bucket — but you have already discarded 45,133 other candidates outright. For radical 138, the radical nearly is the answer: log2(5) = 2.3 bits left.

Rare radicals are near-unique identifiers. Common ones are weak hints. This is the same intuition Li et al. (2023) formalized as self-information of radicals (SIR), a measure of each radical's discriminative power in neural character recognition. Weight radicals by their information content and zero-shot recognition accuracy improves by 3–5%. The 18th-century lexicographers picked a feature that 21st-century neural nets still want to attend to.

Positional variants: shape under pressure

A radical does not always look like itself. When it shifts position inside a character, it deforms — losing strokes to fit the slot. Japanese pedagogy names seven canonical slots: hen (left), tsukuri (right), kanmuri (top), ashi (bottom), tare (top-left drape), nyou (left-bottom wrap), kamae (enclosure).

Base Form Variant Position Name Example Characters
hen (left) にんべん , ,
hen (left) さんずい , ,
hen (left) りっしんべん 快, 情, 悟
ashi (bottom) したごころ ,
hen (left) てへん , ,
ashi (bottom) れんが 然, 煮, 熱
hen (left) けものへん , ,
hen (left) しめすへん , ,
hen (left) ころもへん , ,
tsukuri (right) りっとう , ,
hen (left) しょくへん , ,
kanmuri (top) くさかんむり , ,
kanmuri (top) たけかんむり , ,
kanmuri (top) おいかんむり ,
hen (left) にくづき , ,

The meat radical () is the system's one honest failure: its variant is graphically identical to the moon radical (). (arm) carries meat-月. (morning) carries moon-月. Same pixels, different etymology. You resolve it by context or you don't resolve it. Everywhere else the visual logic holds — here it gives up.

Competing systems: 214 vs 201 vs 79

The Kangxi set is not the only proposal on the table. Three serious alternatives exist, each optimizing for a different reader.

System Radicals Year Scope Design philosophy
Kangxi (康熙) 214 1716 47,035 chars Comprehensive historical standard
PRC Standard (GF 0011) 201 2009 Simplified Chinese Merged rare Kangxi radicals for modern use
Spahn-Hadamitzky 79 1996 Japanese learners Maximally reduced for practical lookup

The PRC's 201-radical GF 0011-2009 drops 13 rarely-encountered Kangxi radicals and adjusts forms for simplified glyphs. A 6% cut — conservative, backward-compatible with most dictionary conventions.

Mark Spahn and Wolfgang Hadamitzky go further. The Kanji Dictionary collapses 214 categories down to 79 by merging anything learners reliably confuse, and cross-lists every compound under each of its component characters. The trade is honest: faster lookup, less semantic structure. Etymology loses, beginners win.

Jack Halpern's SKIP — System of Kanji Indexing by Patterns — abandons radicals entirely. It indexes by geometric division and stroke counts, which solves the one problem radicals can't: looking up a character you cannot decompose. The cost is the semantic signal you used to get for free.

Three systems, three answers to a question of taste. None of them retire the Kangxi set.

Why neural nets rediscovered the same primitive

Modern NLP has wandered into the same insight 18th-century lexicographers had: characters compose. A radical-aware model can recognize a character it has never seen by assembling it from parts it has — stroke trees, IDS decomposition, attention over radical sequences. Zero-shot character recognition is the regime where the difference shows up most cleanly, and radical-aware architectures consistently win there.

The 214 Kangxi radicals are not perfect. The grass overloading and the meat-moon collision are receipts of that. They survived because each radical does three jobs at once: it is a compressed semantic label, a visual hash key, and a mnemonic anchor. Nothing built since has matched the triple.

Browse the complete radical index to walk all 214 with their variant forms and classified kanji.

References

  • Kangxi Dictionary (康熙字典), 1716. Radical counts and distribution statistics via Wikipedia.
  • Cook, J.D. (2019). Chinese character frequency and entropy.
  • Li, X., et al. (2023). Self-information of radicals: A new clue for zero-shot Chinese character recognition. Pattern Recognition, 140, 109598.
  • Spahn, M. & Hadamitzky, W. (1996). The Kanji Dictionary. Tuttle Publishing.
  • GF 0011-2009, Table of Indexing Chinese Character Components. PRC Ministry of Education.
  • Unicode Consortium. Kangxi Radicals block (U+2F00–U+2FD5), CJK Radicals Supplement (U+2E80–U+2EFF).

フィードバックをおくる

にんい — おへんじがひつようなばあいのみ。