# The Architecture of Kanji: Components, Positions, and Composition Rules

## Kanji are a compression algorithm

Unicode's CJK Unified Ideographs block encodes 97,680 characters. The [KanjiJump](https://www.kanjijump.com/browse/atomic) decomposition reduces the 3,500 most-used Japanese kanji to 281 atomic components — 200 if you fold positional variants together. Roughly 12:1 compression, from an alphabet smaller than a Scrabble set. The *Cihai* dictionary catalogs 675 primitives across 16,339 characters; a 2009 Chinese national standard trims it to 514 for common use.

This is not metaphor. Kanji form a combinatorial writing system with a formal grammar, positional constraints, and a phonetic encoding layer — properties Unicode has literally codified into [twelve composition operators](https://en.wikipedia.org/wiki/Ideographic_Description_Characters).

### The twelve composition operators

Unicode block [U+2FF0–U+2FFB](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-18/) defines Ideographic Description Characters — prefix operators that describe how components combine. A context-free grammar for glyph structure:

| Symbol | Code Point | Name | Example | Decomposition |
|:------:|------------|------|---------|---------------|
| ⿰ | U+2FF0 | Left to right | [相](/kanjis/76f8) | ⿰木目 |
| ⿱ | U+2FF1 | Above to below | 杏 | ⿱木口 |
| ⿲ | U+2FF2 | Left-middle-right | 衍 | ⿲彳氵亍 |
| ⿳ | U+2FF3 | Above-middle-below | [京](/kanjis/4eac) | ⿳亠口小 |
| ⿴ | U+2FF4 | Full surround | [回](/kanjis/56de) | ⿴囗口 |
| ⿵ | U+2FF5 | Surround from above | 凰 | ⿵几皇 |
| ⿶ | U+2FF6 | Surround from below | [凶](/kanjis/51f6) | ⿶凵㐅 |
| ⿷ | U+2FF7 | Surround from left | 匠 | ⿷匚斤 |
| ⿸ | U+2FF8 | Above-left surround | [病](/kanjis/75c5) | ⿸疒丙 |
| ⿹ | U+2FF9 | Above-right surround | [戒](/kanjis/6212) | ⿹戈廾 |
| ⿺ | U+2FFA | Below-left surround | [超](/kanjis/8d85) | ⿺走召 |
| ⿻ | U+2FFB | Overlaid | 巫 | ⿻工从 |

![Examples of Ideographic Description Sequences: 字 decomposes as ⿱宀子, 匠 as ⿷匚斤, 京 as ⿳亠口小, 米 as ⿻八木](https://upload.wikimedia.org/wikipedia/commons/b/bf/Ideographic_description_sequences.png)
*Worked decomposition: each character on the left is rewritten as an IDC operator (the dashed box) followed by its operand components. Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Ideographic_description_sequences.png).*

Ten of the twelve are binary; ⿲ and ⿳ take three operands. Unicode 15.1 added four more (U+2FFC–U+2FFF) for left-open surround, bottom-right surround, horizontal reflection, and rotation — sixteen in total. The original twelve carry most of the load.

![The U+2FF0 through U+2FFF Unicode block: sixteen Ideographic Description Characters as code-point cells](https://upload.wikimedia.org/wikipedia/commons/c/c5/UCB_Ideographic_Description_Characters.png)
*The full Ideographic Description Characters block (U+2FF0–U+2FFF), including the four operators added in Unicode 15.1. Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:UCB_Ideographic_Description_Characters.png).*

The [CHISE project](https://www.chise.org/) and the [cjkvi-ids](https://github.com/cjkvi/cjkvi-ids) database have applied IDS decomposition to over 75,000 CJK ideographs — a machine-readable structural atlas of the entire character space. The distribution is heavily skewed. Gao and Kao (2002) found that over 60% of high-frequency characters use ⿰ (left-right), roughly 20% use ⿱ (top-bottom), and the rest split across enclosure and overlay. Left-right dominance is a fingerprint of the phono-semantic architecture: semantic radical on the left, phonetic on the right.

### The seven positional slots

Japanese pedagogy names seven [component positions](https://en.wikipedia.org/wiki/Radical_(Chinese_characters)#Positional_variants) (部首の位置). They aren't labels for a chart — they're structural constraints that decide which shape variant a component takes and which slot it gets to live in.

| Position | Japanese | Reading | Location | Examples |
|----------|----------|---------|----------|---------|
| [Hen](https://en.wiktionary.org/wiki/%E5%81%8F#Japanese) | 偏 | へん | Left side | 氵 in [海](/kanjis/6d77), 亻 in [休](/kanjis/4f11), 扌 in [持](/kanjis/6301) |
| [Tsukuri](https://en.wiktionary.org/wiki/%E6%97%81#Japanese) | 旁 | つくり | Right side | 刂 in [判](/kanjis/5224), 攵 in 教, 頁 in 頭 |
| [Kanmuri](https://en.wiktionary.org/wiki/%E5%86%A0#Japanese) | 冠 | かんむり | Top (crown) | 艹 in [花](/kanjis/82b1), 宀 in [安](/kanjis/5b89), 雨 in [雲](/kanjis/96f2) |
| [Ashi](https://en.wiktionary.org/wiki/%E8%84%9A#Japanese) | 脚 | あし | Bottom (legs) | 灬 in [然](/kanjis/7136), 心 in [思](/kanjis/601d), 皿 in [盤](/kanjis/76e4) |
| [Tare](https://en.wiktionary.org/wiki/%E5%9E%82#Japanese) | 垂 | たれ | Top-left drape | 广 in [店](/kanjis/5e97), 疒 in [病](/kanjis/75c5), 尸 in [届](/kanjis/5c4a) |
| [Nyou](https://en.wiktionary.org/wiki/%E7%B9%9E#Japanese) | 繞 | にょう | Bottom-left wrap | 辶 in [道](/kanjis/9053), 廴 in [建](/kanjis/5efa), 之 in [芝](/kanjis/829d) |
| [Kamae](https://en.wiktionary.org/wiki/%E6%A7%8B#Japanese) | 構 | かまえ | Full/partial surround | 門 in [間](/kanjis/9593), 囗 in [国](/kanjis/56fd), 行 in [術](/kanjis/8853) |

Hen and tsukuri dominate — over 60% of all placements, a direct consequence of ⿰'s prevalence. Across the 2,136 joyo kanji, 6 radicals cover 25% of all characters and 50 radicals cover 75%. Almost all of them sit in hen or kanmuri. Many radicals are pinned to a single slot: 氵 is always hen, 刂 is always tsukuri, 艹 is always kanmuri. Move a component and its shape mutates. [水](/kanjis/6c34) becomes 氵 on the left. [心](/kanjis/5fc3) becomes 忄 on the left but stays 心 on the bottom. [火](/kanjis/706b) collapses into 灬 underneath.

### Phonetic components — the sound layer

Somewhere between 67% and 82% of kanji are phono-semantic compounds ([形声文字](https://en.wikipedia.org/wiki/Chinese_character_classification#Phono-semantic_compound_characters)), depending on whose count you trust. The phonetic component (声符 *seifu*) carries the on'yomi; the semantic radical signals the meaning domain. [EDRDG](https://www.edrdg.org/~jwb/kanjiphonetics/) catalogs 150 phonetic components. KanjiJump documents 808 — and notes that 74% of the 3,500 most-used kanji either include a sound component or serve as one.

Reliability is uneven. Some series hold at 100%. Others rot through centuries of sound change between Old Chinese and modern on'yomi. Ten of the most productive:

| Component | On'yomi | Derivatives | Reliability | Example Series |
|:---------:|---------|:-----------:|:-----------:|----------------|
| 匕 | ヒ | ~30 | Medium | [比](/kanjis/6bd4), 匙, [旨](/kanjis/65e8), [尼](/kanjis/5c3c), [北](/kanjis/5317) |
| [者](/kanjis/8005) | シャ | ~23 | Medium | [暑](/kanjis/6691), [署](/kanjis/7f72), [諸](/kanjis/8af8), [緒](/kanjis/7dd2), [都](/kanjis/90fd) |
| [生](/kanjis/751f) | セイ | ~21 | Medium | [性](/kanjis/6027), [星](/kanjis/661f), [姓](/kanjis/59d3), [牲](/kanjis/7272), [産](/kanjis/7523) |
| 勹 | ホウ | ~20 | Medium | [包](/kanjis/5305), [抱](/kanjis/62b1), [泡](/kanjis/6ce1), [砲](/kanjis/7832), [飽](/kanjis/98fd) |
| 隹 | サイ | ~19 | Medium | [推](/kanjis/63a8), [維](/kanjis/7dad), [雄](/kanjis/96c4), [集](/kanjis/96c6), [準](/kanjis/6e96) |
| [可](/kanjis/53ef) | カ | ~17 | High | [何](/kanjis/4f55), [河](/kanjis/6cb3), [荷](/kanjis/8377), [歌](/kanjis/6b4c), [苛](/kanjis/82db) |
| 圭 | ケイ | ~17 | High | [掛](/kanjis/639b), 畦, 桂, 蛙, [街](/kanjis/8857) |
| [方](/kanjis/65b9) | ホウ | ~16 | High | [放](/kanjis/653e), [防](/kanjis/9632), [紡](/kanjis/7d21), [坊](/kanjis/574a), [芳](/kanjis/82b3) |
| [白](/kanjis/767d) | ハク | ~15 | High | [伯](/kanjis/4f2f), [拍](/kanjis/62cd), [泊](/kanjis/6cca), [迫](/kanjis/8feb), [舶](/kanjis/8236) |
| [各](/kanjis/5404) | カク | ~15 | High | [格](/kanjis/683c), [閣](/kanjis/95a3), [額](/kanjis/984d), [客](/kanjis/5ba2), [略](/kanjis/7565) |

*High: >80% of derivatives share the predicted on'yomi. Medium: 50–80%. Sources: EDRDG, The Kanji Code, KanjiJump.*

The perfect series are the highest-leverage components in the whole writing system. [票](/kanjis/7968) (ヒョウ) generates 12 derivatives — [標](/kanjis/6a19), [漂](/kanjis/6f02), 瓢, 剽, and more — every one ヒョウ, zero exceptions. 冓 (コウ) yields 10 ([構](/kanjis/69cb), [溝](/kanjis/6e9d), [講](/kanjis/8b1b), [購](/kanjis/8cfc)), all コウ. [包](/kanjis/5305) (ホウ) gives 6 ([抱](/kanjis/62b1), [泡](/kanjis/6ce1), [砲](/kanjis/7832), 胞, [飽](/kanjis/98fd)), all ホウ. Learn one component, predict the on'yomi of every character in the family on sight. That is a compounding asset, not flashcard busywork.

### Decomposition as compilation

The [CHISE project](https://www.chise.org/) (Character Processing Based on a Huge Structured Environment), out of Kyoto University, serializes its IDS decomposition database in RDF and exposes it via SPARQL. Each character is a tree — composition operators at the nodes, atomic components at the leaves. An abstract syntax tree for a glyph. The [cjk-decomp](https://github.com/amake/cjk-decomp) project covers 75,000 ideographs and surfaces roughly 10,000 intermediate composite components sitting between the atomic primitives and the final characters.

The shape mirrors how a compiler represents an expression. Terminals (atomic strokes and components). Non-terminals (composite sub-components). Production rules (the IDS operators). The implication is the interesting part: kanji are not 50,000 independent symbols. They are 50,000 *strings* generated by a grammar with 300–700 terminals and 12 production rules. The writing system is closer to a codebase than a dictionary — and component analysis is the decompiler.

We wrote one. The [Kanji Atlas](/components) renders the full component graph for the 2,136 joyo characters. [Atlas Grade 1](/components/grade/1) is the easiest place to see the principle in action — every Grade 1 character broken down into its kanji, [radicals](/radicals), and [graphemes](/graphemes).

### References

- Unicode Consortium. "Ideographic Description Characters." *The Unicode Standard*, [Chapter 18](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-18/).
- CHISE Project. [chise.org](https://www.chise.org/)
- cjkvi-ids. IDS Data for CJK Unified Ideographs. [github.com/cjkvi/cjkvi-ids](https://github.com/cjkvi/cjkvi-ids)
- amake/cjk-decomp. Decomposition data for 75,000 CJK ideographs. [github.com/amake/cjk-decomp](https://github.com/amake/cjk-decomp)
- KanjiJump. "The 281 Atomic Kanji Components." [kanjijump.com](https://www.kanjijump.com/browse/atomic)
- Gao, D.G. & Kao, H.S.R. (2002). Chinese character structure analysis. *Acta Psychologica Sinica*.
- EDRDG. Kanji Phonetic Components. [edrdg.org](https://www.edrdg.org/~jwb/kanjiphonetics/)
- Millen, A. *The Kanji Code*. [thekanjicode.com](https://thekanjicode.com/)
- Wikipedia. "[Ideographic Description Characters](https://en.wikipedia.org/wiki/Ideographic_Description_Characters)," "[Chinese Character Components](https://en.wikipedia.org/wiki/Chinese_character_components)," "[List of Kanji Radicals by Frequency](https://en.wikipedia.org/wiki/List_of_kanji_radicals_by_frequency)."

