Phono-Semantic Compounds: The 80% Rule Kanji Guides Skip

Most kanji are not pictures. They are spellings.

The single most useful fact about kanji: roughly four in five are phono-semantic compounds — 形声 (keisei moji), one component classifying meaning, another component recording (or once recording) the sound. The Shuowen Jiezi, Xu Shen's Han-dynasty catalog from around 100 CE, comes in at ~82%. Modern Mandarin counts land at 80–90% (DeFrancis 1989, Boltz 1994). For the joyo set, 61% are explicitly classified keisei, and that figure understates it — most of the so-called 会意 (compound-ideograph) entries are reanalyzed as phonetic compounds under modern paleography.

So why does almost every English-language kanji curriculum teach the character as an opaque atom — a shape to be glued to a keyword via an invented story? The phonetic gets silently dropped. Heisig acknowledges it in a footnote and then walks away. The apps clone Heisig. Two thousand mnemonics later, the student still cannot read 請 on first sight.

The phonetic is not a vestigial historical artifact. It is a live retrieval cue that collapses thousands of on'yomi into roughly a thousand phonetic series. This post lays out the structure of those series, the recoverability rate, why the readings drifted in the first place, and how to actually use them.

The corpus claim, with numbers

Phono-semantic compounding is not "common." It is the dominant generative mechanism for the entire CJK character space. Four independent counts agree on the magnitude.

Corpus	Sample size	% phono-semantic	Source
Shuowen Jiezi (Han, 100 CE)	9,353 chars	~82%	Boltz (1994), Tab. 4
Modern Mandarin general	~7,000 chars	~81%	DeFrancis (1989), p. 84
Modern Mandarin, all CJK Unicode	~50,000+ chars	~90%	Williams (1965); Boltz (1994)
Joyo kanji (explicit class)	2,136 chars	~61%	Kanji Portraits (2021)
Joyo kanji (incl. reanalyzed 会意)	2,136 chars	~80%	Boltz (1994) reclassification applied to joyo

The joyo gap is curricular, not structural. Japanese elementary school front-loads pictographs — 山, 川, 日, 月 — and compound ideographs because six-year-olds can draw them. By the time you reach grade 6 and 中学校以降, phono-semantic compounds dominate as heavily as in the wider character space. Yamashita & Maru (2000) and Joyce et al. (2014) both report that orthographic-neighborhood density in Japanese — the count of kanji sharing a phonetic — peaks for late-elementary and secondary kanji. Which is exactly the slice adult learners spend years grinding through.

The first-order conclusion is simple. If you are studying joyo from grade 3 onward without exploiting the phonetic component, you are throwing away the single largest source of structural redundancy in the writing system.

Worked example: the 青 series

The phonetic 青 (sei / shō, "blue-green") is the canonical introductory case. The right-hand or upper slot is fixed at 青; the left-hand semantic radical varies. The phonetic predicts セイ in kan-on (the Tang stratum) and ショウ in go-on (the older Wu stratum). Read the table.

Kanji	Semantic radical	Radical sense	On'yomi	Gloss
青	(whole char.)	—	セイ / ショウ	blue-green
清	氵 (water)	water-clean	セイ / ショウ	clean, pure
晴	日 (sun)	sky condition	セイ	clear weather
精	米 (rice)	refined grain	セイ / ショウ	refined, essence
請	言 (speech)	verbal request	セイ / シン	request
情	忄 (heart)	felt state	ジョウ / セイ	emotion
静	争 (struggle, contracted)	non-struggle	セイ / ジョウ	quiet, still
靖	立 (stand)	standing-firm	セイ / ジョウ	peaceful

Eight characters. One phonetic. One on'yomi family — セイ with a regular ジョウ go-on doublet on three of them. The semantic radical does the disambiguating: water + 青 = clean; sun + 青 = clear weather; heart + 青 = emotion. This is the architecture DeFrancis (1989, ch. 5) called the misnamed ideograph: the character is not an idea-picture, it is a spelling with a determinative bolted on. The same point Peter Boodberg had been making in Berkeley since the 1930s.

The 請 → シン and 情 → ジョウ outliers are not noise. They are the residue of borrowing the same Old Chinese phonetic at two different centuries (kan-on vs. go-on), which is the next section.

At least three more series

工 (kō) — high reliability with one famous defector

Kanji	Semantic radical	On'yomi	Gloss
工	(whole)	コウ / ク	craft
江	氵 (water)	コウ	(large) river
紅	糸 (silk)	コウ / ク	crimson
虹	虫 (insect/serpent)	コウ	rainbow
功	力 (strength)	コウ / ク	merit, achievement
攻	攵 (strike)	コウ	attack
空	穴 (hole)	クウ	sky, empty

Six of seven read コウ in kan-on. 空 is the famous defector — クウ — because 空 underwent an extra vowel change in transit (Middle Chinese kʰuwŋ → クウ rather than コウ). And the defector is more memorable than the regulars, precisely because it breaks the pattern. File 空 as "the 工-series exception" and you have effectively learned eight characters from one phonetic.

寺 (ji) — series drift across layers

Kanji	Semantic radical	On'yomi	Gloss
寺	(whole)	ジ	temple
時	日 (sun)	ジ	time
詩	言 (speech)	シ	poem
侍	亻 (person)	ジ / シ	attendant, samurai
持	扌 (hand)	ジ	hold
特	牛 (ox)	トク	special
等	竹 (bamboo)	トウ	rank, equal

The more interesting kind of series — the kind that punishes naive rule-following and rewards historical understanding. Three of seven (詩, 特, 等) deviate from ジ. The reason: 寺 in Old Chinese is reconstructed by Baxter & Sagart (2014) as something like *s-ɢəʔ, with an *s- prefix that conditioned different reflexes in different daughter dialects. By the time the Tang court was lecturing Japanese envoys on kan-on, that prefix had bleached out unevenly — 寺 itself stayed voiced (ジ), 詩 went to シ, 特 and 等 took an entirely different vocalic path (トク, トウ). The series did not fail. It fissioned. A learner who knows this still holds all seven characters together as a family of recognizable etymological cousins — a far stronger memory than seven unrelated stories about samurai and bamboo.

主 (shu) — clean kan-on series with one wago surprise

Kanji	Semantic radical	On'yomi	Gloss
主	(whole)	シュ / ス	master
注	氵 (water)	チュウ	pour, note
住	亻 (person)	ジュウ	dwell
柱	木 (tree)	チュウ	pillar
駐	馬 (horse)	チュウ	halt, station
往	彳 (step)	オウ	go toward

Five of six read チュウ / ジュウ. 往 is the outlier — and the history is that the modern shape inherited a different OC root, with 主 functioning more as a graphical stabilizer than a true phonetic (see Schuessler 2007 for the *waŋʔ reconstruction). For a learner the heuristic is dead simple: see 主 on the right, try チュウ first, fall back to ジュウ. Right roughly 80% of the time across joyo. Above 90% within the kan-on layer.

Empirical recoverability

What rate of return are we actually getting? The most-cited modern Mandarin numbers come from Zhou Youguang (1978) and Hsiao & Shillcock (2006): perfect prediction in 26–39% of phono-semantic characters, "approximate" prediction (same initial or same rhyme) in 58–66%, no useful signal in roughly the remaining third. These figures exclude the ~18% of kanji that are pictographs or compound ideographs.

Sino-Japanese on'yomi are more complicated. Recoverability is higher within a single borrowing layer (go-on, kan-on, or tō-on) but lower across layers, because Japan borrowed each phonetic at a different century and each borrowing froze a different snapshot of Chinese phonology. The four-layer structure is treated in detail in On'yomi vs Kun'yomi: A Practical Guide; for now, the relevant point is that asking "does the phonetic predict the on'yomi" is the wrong question. The right question is which on'yomi.

A worked comparison for the 青 series:

Kanji	Kan-on	Go-on	Both layers consistent?
青	セイ	ショウ	yes
清	セイ	ショウ	yes
晴	セイ	(rare)	yes
精	セイ	ショウ	yes
請	セイ	シン	no (go-on shifted)
情	セイ	ジョウ	yes (regular doublet)
静	セイ	ジョウ	yes (regular doublet)

Within kan-on alone, the 青 series is 100% consistent across the seven characters above — every one reads セイ. Within go-on it is 6/7 consistent (ショウ or its regular ジョウ voicing partner), with 請 → シン the lone defector. Dramatically higher than the cross-layer ~67% Hamilton (2013) reports for joyo as a whole. That gap is the whole point: layer-aware analysis recovers signal that cross-layer analysis loses.

The pedagogically actionable form: when you meet a new kanji and you recognize its phonetic, ask first what is the kan-on of the phonetic? That single question — the kan-on, not the on'yomi in general — has the highest hit rate of any heuristic available to a non-native learner.

Why the readings drifted

Old Chinese (c. 1000 BCE) → Late Han Chinese (~200 CE, the Wu-derived go-on stratum) → Middle Chinese (~600 CE, the Tang-derived kan-on stratum) → Early Modern Mandarin (the tō-on stratum) → modern Mandarin and modern on'yomi. Five centuries of sound change separate go-on from kan-on alone. The single largest source of cross-layer divergence in phonetic series is consonantal: Old Chinese had voicing distinctions and consonant clusters that Middle Chinese collapsed, and Japanese phonotactics flattened them further. (The Swedish sinologist Bernhard Karlgren is the one who first reconstructed Middle Chinese systematically — every modern reconstruction is a descendant of his 1915–1926 work.)

The 各 (kaku) series is the cleanest worked example, because Baxter & Sagart (2014) reconstruct a stable Old Chinese root.

Kanji	Old Chinese (B&S)	Middle Chinese	Kan-on	Modern gloss
各	*kˤak	kak	カク	each
客	*kʰˤrak	khaek	カク / キャク	guest
格	*kˤrak	kaek	カク	rank, framework
落	*kə.rˤak	lak	ラク	fall
絡	*kə.rˤak	lak	ラク	bind, network
略	*[r]ak	ljak	リャク	abbreviate

Notice the OC initial k- in 各, 客, 格 — preserved as カ- in kan-on. Then 落 and 絡, which had the cluster *kə.rˤak: the velar prefix dropped, the *r became Middle Chinese *l, and Japanese borrowed *l as ラ. Then 略, which had no velar at all (just *[r]ak), gives リャク. The 各 series is entirely coherent at the Old Chinese level — every member contains *-rak or *-ak — but it looks fragmented at the Japanese surface (カク vs. ラク vs. リャク) because Japanese borrowed the *output of a thousand years of Chinese consonant simplification, not the input.

For the historically curious: William Baxter and Laurent Sagart's Old Chinese: A New Reconstruction (2014) is the current reference. The Wiktionary Appendix on Old Chinese phonology is a serviceable open summary, and individual character entries on Wiktionary now routinely list B&S reconstructions in the etymology section. Schuessler's ABC Etymological Dictionary of Old Chinese (2007) is a usable second reference. For the phonological-squeeze rules governing the OC-to-on'yomi transition specifically, see the table in On'yomi vs Kun'yomi.

A page from a Song dynasty edition of the Shuowen Jiezi (c. 100 CE), showing characters grouped under 言 (speech) with their small-seal forms and Xu Shen's glosses. The Shuowen is the first systematic attempt to classify the entire character space, and its tally — roughly 82% phono-semantic — is also the earliest surviving documentation of the 80% rule. Source: Wikimedia Commons.

Why Heisig dropped phonetics

James Heisig is explicit about the design choice. The introduction to Remembering the Kanji, Vol. 1 (1977; current 6th ed. 2011) defers all reading instruction to Volume II so the learner can first build "a permanent mnemonic foothold" on character form and English keyword. He grants the existence of phonetic series and reading regularity, then deliberately excludes them from Volume I on the theory that learning meaning, form, and reading simultaneously is too expensive.

The trade is real, and it is asymmetric. RTK Volume I produces faster initial production — the student can write 2,200 keyword-cued kanji within a few months — but pays for it in slower long-term reading. The Volume-I graduate has 2,200 mnemonics for shapes and zero mental model of how on'yomi cluster. That mental model is precisely what lets a Chinese-trained learner sight-read unfamiliar Japanese newspaper text at 80%+ accuracy. Encounter 請 cold and the Heisig graduate looks it up; the phonetics-aware learner sees 青 on the right, hypothesizes セイ, and is correct. Multiplied across the long tail of low-frequency joyo kanji adult learners grind through for years, the two curricula produce strikingly different reading speeds at the five-year mark.

A more thorough critique appears in the review of the reference and method literature. The short version: RTK Volume I is a defensible design considered alone but a strange shape for a complete curriculum, and the honest move for an adult learner is to treat RTK as a writing primer and run a phonetics-first reading curriculum in parallel — not after.

How to actually use this in study

Three rules, in order of importance.

When you meet a new kanji, find the phonetic first. The phonetic is most often the right-hand element (tsukuri) in a left-right composition, less often the bottom (ashi), least often the upper enclosure (tare). The semantic radical is most often the left-hand element (hen). This positional regularity is itself a structural feature — see The Architecture of Kanji for the seven-position taxonomy and the IDC operators.
If you have seen another kanji with the same phonetic, try its kan-on first. Within a series the kan-on is the most consistent layer: 6/7 to 7/7 hit rates are typical, vs. 4/7 to 6/7 for cross-layer guesses. If kan-on fails, fall back to the regular voicing partner (セイ → ジョウ, ホウ → ボウ) before giving up.
Note exceptions explicitly — they are memorable because they break the rule. 空 clinches the 工 series precisely because it does not read コウ. 等 does the same for the 寺 series. The deviation is not a failure of the heuristic; it is a free anchor.

The way to operationalize this: keep a per-phonetic study list, not a per-grade study list. Tomoko Toyoda's Kanji Networks (1995, Journal of the Association of Teachers of Japanese) recommends exactly this, and it is the spine of the Kanji Atlas — every kanji shown alongside its component graph, phonetic included when one is identified. Browsing the grade 4 atlas surfaces under-exploited phonetic families faster than working linearly through a textbook, because the atlas shows the families instead of burying them in alphabetical order.

The deeper psychological claim — empirically testable — is that the phonetic series is not just a study shortcut but the actual cognitive unit proficient L2 readers store. Joyce, Masuda & Ogawa (2014) found orthographic-neighborhood effects (faster recognition for kanji with many phonetic siblings) as strong in advanced L2 readers as in native readers, which means the phonetic-family representation gets built by exposure even when the explicit curriculum ignores it. Your choice as a learner is whether to build it deliberately or accidentally. Deliberate is faster.

References

Baxter, W.H. & Sagart, L. (2014). Old Chinese: A New Reconstruction. Oxford University Press. WorldCat.
Boltz, W.G. (1994). The Origin and Early Development of the Chinese Writing System. American Oriental Society. WorldCat.
DeFrancis, J. (1989). Visible Speech: The Diverse Oneness of Writing Systems. University of Hawai'i Press. WorldCat.
Hamilton, N. (2013). "Identifying Useful Phonetic Components of Kanji." academia.edu.
Heisig, J.W. (2011). Remembering the Kanji, Vol. 1: A Complete Course on How Not to Forget the Meaning and Writing of Japanese Characters, 6th ed. University of Hawai'i Press.
Hsiao, J.H. & Shillcock, R. (2006). "Analyses of a Chinese phonetic-component database: Implications for orthographic processing." Journal of Psycholinguistic Research, 35(5), 405–426.
Joyce, T., Masuda, H. & Ogawa, T. (2014). "Jōyō kanji as core building blocks of the Japanese writing system." Written Language and Literacy, 17(2), 173–194.
Schuessler, A. (2007). ABC Etymological Dictionary of Old Chinese. University of Hawai'i Press.
Toyoda, T. (1995). "Kanji networks: An exploration of compounding errors." Journal of the Association of Teachers of Japanese, 29(1), 71–87. JSTOR.
Williams, C.A.S. (1965). Outlines of Chinese Symbolism and Art Motives. Tuttle.
Yamashita, H. & Maru, Y. (2000). "Compounds and word boundaries in Japanese." Working paper, Tohoku University.
Zhou, Y.G. (1978). "现代汉字中声旁的表音功能问题" [On the phonetic function of phonetic components in modern Chinese characters]. Zhongguo Yuwen, 1978(3), 172–177.